Saturday, January 31, 2009

String Manipulation in COBOL

A string refers to a sequence of characters. String manipulation operations include finding a particular character/sub-string in a string, replacing particular character/sub-string in a string, concatenating strings and segmenting strings. All these functions are handled by three verbs INSPECT, STRING and UNSTRING in COBOL. EXAMINE is the obsolete version of INSPECT supported in COBOL74.

INSPECT
  • INSPECT- FOR TALLYING - It is used to tally the occurrence of a single character or groups of characters in a data field.
    Syntax:
    INSPECT identifier-1 TALLYING identifier-2 FOR ALL/LEADING literal-1|identifier-3 [BEFORE|AFTER INITIAL identifier-4|literal-2] - Optional.
    INSPECT identifier-1 TALLYING identifier-2 FOR CHARACTERS [BEFORE|AFTER INITIAL identifier-4|literal-2] - Optional.
    Main String is identifier-1 and count is stored in identifier-2. Literal-1 or Identifier-3 is a character or group-of-characters you are looking in the main-string. INSPECT further qualifies the search with BEFORE and AFTER of the initial occurrence of identifier-4 or literal-2.
    Example:
    WS-NAME - 'HELLO HOW ARE YOU DOING?'
    INSPECT WS-NAME TALLYING WS-COUNT ALL 'O'
    BEFORE INITIAL 'HOW' AFTER INITIAL 'YOU'
    END-INSPECT
    Result:
    WS-COUNT contains - 2
  • INSPECT- FOR REPLACING : It is used to replace the occurrence of a single character or groups of characters in a data field.
    Syntax:
    INSPECT identifier-1 REPLACING ALL|LEADING literal-1|identifier-2 BY identifier-3|literal-2 [BEFORE|AFTER INITIAL identifier-4|literal-2] - Optional.
    INSPECT identifier-1 REPLACING CHARACTERS BY identifier-2 BEFORE|AFTER INITIAL identifier-3|literal-1
  • INSPECT-FOR COUNTING AND REPLACING It is a combination of the above two methods. INSPECT identifier-1 TALLYING (tallying part ) REPLACING (replacing part)
STRING

STRING command is used to concatenate one or more strings.

Syntax:
STRING identifier-1 / literal-1, identifier-2/ literal-2 DELIMITED BY (identifier-3/literal-3/SIZE) INTO identifier-4 END-STRING.
01 VAR1 PIC X(10) VALUE 'MUTHU '
01 VAR2 PIC X(10) VALUE 'SARA '
01 VAR2 PIC X(20).
To get display 'MUTHU,SARA'
STRING VAR1 DELIMITED BY ' ' ',' DELIMITED BY SIZE VAR2 DELIMITED BY ' ' INTO VAR3 END-STRING.
The receiving field must be an elementary data item with no editing symbols and JUST RIGHT clause.
With STRING statement, specific characters of a string can be replaced whereas MOVE replaces the full string.
01 AGE-OUT PIC X(12) VALUE '12 YEARS OLD'.
STRING '18' DELIMITED BY SIZE INTO AGE-OUT. => 18 YEARS OLD.

Reference Modification - equivalent of SUBSTR

'Reference modification' is used to retrieve or overwrite a sub-string of a string. ':' is known as reference modification operator.

Syntax:
String(Starting-Position:Length)
MOVE '18' TO AGE-OUT(1:2) does the same as what we did with STRING command.
When it is used in array elements, the syntax is Array-element (occurrence) (Starting-Position:Length)

UNSTRING

UNSTRING command is used to split one string to many strings.

Syntax:

UNSTRING identifier-1 [DELIMITED BY (ALL/) identifier2/literal1 [,OR (ALL/) (identifier-3/literal-2),..]] INTO identifier-4 [,DELIMITER IN identifier-5, COUNT IN identifier-6] [,identifier-7 [,DELIMITER IN identifier-8, COUNT IN identifier-9]
01 WS-DATA PIC X(12) VALUE '10/200/300/1'.
UNSTRING WS-DATA DELIMITED BY '/'
INTO WS-FLD1 DELIMITER IN WS-D1 COUNT IN WS-C1
WS-FLD2 DELIMITER IN WS-D2 COUNT IN WS-C2
WS-FLD3 DELIMITER IN WS-D3 COUNT IN WS-C3
END-UNSTRING.
Result:
WS-FLD1 = 10 WS-FLD2 =200 WS-FLD3=300 WS-C1 = 2 WS-C2=3 WS-C3=3 WS-D1 = '/' WS-D2='/' WS-D3 ='/'
ON OVERFLOW can be coded with STRING and UNSTRING. If there is STRING truncation then the imperative statements followed ON OVERFLOW will be executed.

Please do provide your comments or reactions. You may also post your questions in the comments.

Tuesday, January 6, 2009

Cobol numeric & computational data types

What are different type of COMP fields in COBOl ? What is COMP ? What is COMP-1 ? What is COMP-3 ?

These are few of the most asked and uncomfortably answered question in cobol. Let's discuss this today. I will try my best to be as simple and explanatory as possible, along with examples.

  • WS-NUM PIC (5) VALUE '12345' :

This is a numeric variable in cobol. This type of variable is stored in mainframe as display or character format. For the above variable, if we read it in HEX : its stored as " F1F2F3F4F5". It will occupy 5 bytes storage.


Note: These variables are not most efficient for numerical calculations. For any type of calculation they need to converted to binary or comp-3 format for calculations. Binary is more native to the system. Binary is approx. 5-8 times faster and occupies lesser space.

  • Comp or BINARY (both are synonymous):
COMP is binary integer data. For binary numbers, 8 bits or 1 byte, will store unsigned values from 0 to 255 or signed values from -128 to +127.
S9(1) - S9(4) COMP is 2 byte integer (-32768 - +32767 , half word)
S9(5) - S9(9) COMP is 4 byte integer( full word) ;
S9(10) - S9(18)COMP is 8 byte integer(double word )
i.e. - 9(2) is same as 9(4); 9(5) is same as 9(9); 9(10) is same as 9(18) in terms of storage.

  • COMPUTATIONAL-1 or COMP-1 : WS-NUM COMP-1
These are specified for internal floating-point items (single precision). COMP-1 items are 4 bytes long. The sign is contained in the first bit of the leftmost byte and the exponent is contained in the remaining 7 bits. The last 3 bytes contain the mantissa.

  • COMPUTATIONAL-2 or COMP-2 : WS-NUM COMP-2
These are specified for internal floating-point items (double precision). COMP-2 items are 8 bytes long. The sign is contained in the first bit of the leftmost byte and the remaining 7 bits contain the exponent. The remaining 7 bytes contain the mantissa.

  • COMPUTATIONAL-3 or COMP-3 (internal decimal)
This is the equivalent to PACKED-DECIMAL. Comp-3 stores data in a BCD "binary coded decimal" format with the sign after the least significant digit. This format is more storage and CPU efficient in case of numerical calcualtions.
Packed Decimal representation stores two decimal digits in one byte. A packed decimal representation stores decimal digits in each "nibble" of a byte. Each byte has two nibbles, and each nibble is indicated by a hexadecimal digit. For example, the value 12 would be stored in two nibbles, using the hexadecimal digits 1 and 2. The sign indication is dependent on your operating environment. On an IBM mainframe, the sign is indicated by the last nibble. The C indicates a positive value, and D indicates a negative value.

The mainframe can perform arithmetic functions on packed-decimal fields without having to convert the format. Storing numeric values in a packed-decimal format may save a significant amount of storage space. For example, on the mainframe the value 12,345 would be five (5) bytes in length (i.e. x'F1F2F3F4F5'). If the same information is stored in a packed-decimal (i.e. USAGE IS COMP-3) the field would be three (3) bytes in length (i.e. x'12345C').

Example for comp-3 size calculations :
PIC S9(7) COMP-3. Byte size = (7 + 1) / 2 = 4
PIC S9(5)V99 COMP-3. Byte size = (5 + 2 + 1) / 2 = 4
PIC S9(6) COMP-3. Byte size = (6 + 1) / 2 = 3.5, rounded to 4
Comp-3 fields reserve a nybble for the sign, even for "unsigned" values, so the following fields are still 4 bytes:
PIC 9(7) COMP-3. Byte size = (7 + 1) / 2 = 4
PIC 9(6) COMP-3. Byte size = (6 + 1) / 2 = 3.5, rounded to 4

There is an automated tool to calculate size for different comp variables in cobol. Click here
Hope you find the above post useful. Please do provide your feedback or comments, if you had any. You may also post any of your cobol queries in the comments.