TAW11 - 1 - Unicode

Software systems use fixed-length bit sequences for internal character representation.
This length specifies the number of characters that can be displayed in total and a
Character Set Table is used to match the assignment between characters and bit
sequence. For example, the ASCII character set, which has 8 bits in length, consists of
256 characters.
If you were using the ASCII Character Set and you wanted other characters to be
processed, you would need to load a different character set table.
Extra work is therefore involved when different users use different character sets and
parallel text-based processing is required. Exchanging data between these is also not so
easy.
The Unicode Character Set Table has been defined for this purpose and it is large
enough to contain all the current character sets. It has a 16-bit sequence length, which
results in 65,536 possible codes. SAP supports the Unicode Character Set since SAP
Web Application Server 6.10.
Predefined data types in Unicode programs include the character-type: C, N, D,
T and STRING. Structure types that contain components of these types would
also form a part of the character-type.
In non-Unicode systems, a character of this type is one byte.
In Unicode systems, it is as long as a character on the respective platform. X

and XSTRING-type variables are described as byte-type. Earlier everything was
treated as character type but from SAP Web Application server 6.10, there is a
distinction made between character type and byte type arguments.
For compatibility, character string commands in their standard form always

expect character-type arguments. The statements are then converted by the
system, character by character. The corresponding variants of these statements
for byte sequence processing are recognizable by the IN BYTE MODE addition.
With this addition, the statements expect byte-type arguments and are
converted byte by byte.
The STRLEN function always expects character-type variables and returns their
length in characters. With type C variables, only the occupied length is relevant
and trailing blanks are not counted.
The XSTRLEN function returns the length of byte sequences. It always expects
byte-type variables and returns the current length for type XSTRING and the
defined length in bytes for type X.
The image on your screen shows sample code for both these functions.
Apart from the comparison operators shown on the left of the image, you will
notice six new operators that have been defined and which are identified by the
prefix BYTE. Usage of this is also shown in the sample code on the screen.
Depending on the platform, some data types require a specific alignment to be
met. For example, there may be a requirement to begin at a specific memory
address. Within a structure, during runtime, “alignment” bytes would be inserted
by the system either before or after the component with details of the alignment.
The system first creates a Unicode fragment view to check whether such
conversion is possible. The view groups together adjacent components and
alignment gaps.
This view can be seen in the classic debugger. A sample is shown on your
screen for reference.
If the fragments of the source and target structures match the type and length
as the length of the shorter structure, conversion is allowed. Else an error
occurs in the Unicode check.
If the target structure is longer than the source structure, the character-type
components of the remainder are filled with space characters. All other
components in the remainder are filled with the type-specific initial values.
Alignment gaps are filled with null bytes. Components from other types like P, F,
String and XString are not considered but treated individually.
Continuing with the rules for conversion, we will now look at some rules for
conversion from structures to elementary data objects.
- If a structure contains only character-type data, is the same as a type C data

object during conversion.
- If the structure is not completely character type, the single field must be type
C and the structure must begin with a character-type fragment that is at least
as long as the single field.
If the target field is a structure, the remaining character-type fragments are filled
with space characters and all other components with the type-specific initial
value.
From our earlier learning in this course, you will remember that for character-
type variables, offset and length are interpreted character by character and
types X and XSTRING, the values for offset and length are interpreted byte by
byte.
For structures the offset and length accesses are only permitted in Unicode
programs if the structure is flat and the offset and length specifications only
contain character-type fields starting from the beginning of the structure.
ABAP allows for Unicode character sets since SAP Web Application Server 6.10.
However, you must be careful to ensure that information about the internal length of
your characters does not spill over to your program.
While the ABAP Workbench supports you when working with existing code, you may
have to make certain adjustments. The syntax check has been extended to include
Unicode compatibility also.
To execute the relevant syntax checks, you must set the indicator Unicode Checks
Active in the program (or class) attributes. This is the standard setting in Unicode
systems.
If the Unicode indicator is set for a program (or a class), the syntax check and program
are executed in accordance with the rules described in the Unicode online help. (This is
irrespective of whether the system is a Unicode or a non-Unicode system).
If the Unicode indicator is not set, the program can only be executed in a non-Unicode
system. For such programs, Unicode-specific changes of syntax and semantics do not
apply. However, you can use all the language enhancements introduced in connection
with the conversion to Unicode.
The abap/unicode_check parameter controls the execution of Unicode checks during

the syntax check and at ABAP program runtime in a non-Unicode system.
The parameter can take on the following values:
- On: The Unicode checks are performed for each ABAP program. The system
behaves as though the program attribute Unicode Checks Active were set for all
ABAP programs. This option is normally used for preparing a conversion to Unicode.
- Off: Unicode checks are only performed in those programs for which the program
attribute Unicode Checks Active has been set.
As of SAP Web Application Server 6.10, you can use the transaction UCCHECK to
check several Repository objects for Unicode compatibility at the same time. The
transaction always checks the active program version. You can also use it to apply the
Unicode Checks Active attribute to several programs (which must be original programs
in the system). However, you should only do this with programs that are actually
Unicode-enabled, otherwise the program terminates when it is executed.

TAW11 - 1 - Unicode

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TAW11 - 1 - Unicode

Uploaded by

Copyright:

Available Formats

Software systems use fixed-length bit sequences for internal character representation.

In non-Unicode systems, a character of this type is one byte.

In Unicode systems, it is as long as a character on the respective platform. X

For compatibility, character string commands in their standard form always

- If a structure contains only character-type data, is the same as a type C data

The abap/unicode_check parameter controls the execution of Unicode checks during

The parameter can take on the following values:

You might also like