If 16 bit UCS2 characters as defined in
ISO/IEC 10646 [17] are used in an alpha field, the coding can take one of three forms. If the terminal supports UCS2 coding of alpha fields in the UICC, the terminal shall support all three coding schemes for character sets containing 128 characters or less; for character sets containing more than 128 characters, the terminal shall at least support the first coding scheme. If the alpha field record contains GSM default alphabet characters only, then none of these schemes shall be used in that record. Within a record, only one coding scheme, either the GSM default alphabet (see
TS 23.038), or one of the three described below, shall be used.
1)
If the first byte in the alpha string is
'80', then the remaining bytes are 16 bit UCS2 characters, with the More Significant Byte (MSB) of the UCS2 character coded in the lower numbered byte of the alpha field, and the Less Significant Byte (LSB) of the UCS2 character is coded in the higher numbered alpha field byte, i.e. Byte 2 of the alpha field contains the More Significant Byte (MSB) of the first UCS2 character, and Byte 3 of the alpha field contains the Less Significant Byte (LSB) of the first UCS2 character (as shown below). Unused bytes shall be set to
'FF', and if the alpha field is an even number of bytes in length, then the last (unusable) byte shall be set to
'FF'.
EXAMPLE 1:
Byte 1 |
Byte 2 |
Byte 3 |
Byte 4 |
Byte 5 |
Byte 6 |
Byte 7 |
Byte 8 |
Byte 9 |
'80' | Ch1MSB | Ch1LSB | Ch2MSB | Ch2LSB | Ch3MSB | Ch3LSB | 'FF' | 'FF' |
2)
If the first byte of the alpha string is set to
'81', then the second byte contains a value indicating the number of characters in the string, and the third byte contains an 8 bit number which defines bits 15 to 8 of a 16 bit base pointer, where bit 16 is set to zero, and bits 7 to 1 are also set to zero. These sixteen bits constitute a base pointer to a
"half-page" in the UCS2 code space, to be used with some or all of the remaining bytes in the string. The fourth and subsequent bytes in the string contain codings as follows; if bit 8 of the byte is set to zero, the remaining 7 bits of the byte contain a GSM Default Alphabet character, whereas if bit 8 of the byte is set to one, then the remaining seven bits are an offset value added to the 16 bit base pointer defined earlier, and the resultant 16 bit value is a UCS2 code point, and completely defines a UCS2 character.
EXAMPLE 2:
Byte 1 |
Byte 2 |
Byte 3 |
Byte 4 |
Byte 5 |
Byte 6 |
Byte 7 |
Byte 8 |
Byte 9 |
'81' | '05' | '13' | '53' | '95' | 'A6' | 'XX' | 'FF' | 'FF' |
In the above example:
-
Byte 2 indicates that there are 5 characters in the string.
-
Byte 3 indicates bits 15 to 8 of the base pointer, and indicates a bit pattern of 0hhh hhhh h000 0000 as the 16 bit base pointer number. Bengali characters for example start at code position 0980 (0000 1001 1000 0000), which is indicated by the coding '13' in Byte 3 (shown by the bold digits).
-
Byte 4 indicates GSM Default Alphabet character '53', i.e. 'S'.
-
Byte 5 indicates a UCS2 character offset to the base pointer of '15', expressed in binary as follows 001 0101, which, when added to the base pointer value results in a sixteen bit value of 0000 1001 1001 0101, i.e. '0995', which is the Bengali letter KA.
-
Byte 8 contains the value 'FF', but as the string length is 5, this is a valid character in the string, where the bit pattern 111 1111 is added to the base pointer, yielding a sixteen bit value of 0000 1001 1111 1111 for the UCS2 character (i.e. '09FF').
3)
If the first byte of the alpha string is set to
'82', then the second byte contains a value indicating the number of characters in the string, and the third and fourth bytes contain a 16 bit number which defines the complete 16 bit base pointer to a
"half-page" in the UCS2 code space, for use with some or all of the remaining bytes in the string. The fifth and subsequent bytes in the string contain codings as follows; if bit 8 of the byte is set to zero, the remaining 7 bits of the byte contain a GSM Default Alphabet character, whereas if bit 8 of the byte is set to one, the remaining seven bits are an offset value added to the base pointer defined in bytes three and four, and the resultant 16 bit value is a UCS2 code point, and defines a UCS2 character.
EXAMPLE 3:
Byte 1 |
Byte 2 |
Byte 3 |
Byte 4 |
Byte 5 |
Byte 6 |
Byte 7 |
Byte 8 |
Byte 9 |
'82' | '05' | '05' | '30' | '2D' | '82' | 'D3' | '2D' | '31' |
In the above example:
-
Byte 2 indicates that there are 5 characters in the string.
-
Bytes 3 and 4 contain a sixteen bit base pointer number of '0530', pointing to the first character of the Armenian character set.
-
Byte 5 contains a GSM Default Alphabet character of '2D', which is a dash "-".
-
Byte 6 contains a value '82', which indicates it is an offset of '02' added to the base pointer, resulting in a UCS2 character code of '0532', which represents Armenian character Capital Ben.
-
Byte 7 contains a value 'D3', an offset of '53', which when added to the base pointer results in a UCS2 code point of '0583', representing Armenian Character small Piwr.