Professional Documents
Culture Documents
[Page 5, Table 1 and 2, Hex Code (col F and row C), Dec. Code (col 240
and row 12)] — Insert ‘`’.
(Page 6, Table 3, row ‘FC 252’, col ‘Name’) — Substitute ‘Indian Rupee
Symbol’ for ‘This position shall not be used’.
The Indian Rupee Symbol shall be placed on AltGr+4 key for inscript keyboard
layout as well as for keyboards of QWERTY form.’
(Page 12, Annex A) — Insert ‘`’ below the last row of the table under
each column except for column ‘RMN’.
[Page 13, Annex B, Table Hex Code (col F and row C), Dec. Code (col
240 and row 12)] — Insert ‘`’.
(LITD 20)
vTTavm%
~m~ss*f~orr*~Tifi,ar
Indian Standard
INDIAN SCRIPT CODE FOR INFORMATION
INTERCHANGE - DSCII
(First Reprint JANUARY 1993)
UDC 681’3
@ BIS 1991
PAGE
1. SCOPE 1
2. TERMINOLOGY 1
4.2 Anuswar 3
4.4 Visarg 3
4.7 Conjuncts 4
4.9 Punctuation 4
4.11 Numerals 4
6.3 Halant
6.8 Numerals
(i)
PAGE
7.1 PhoneticSequence 8
7.5 Transliteration 9
9. REFERENCES 9
ANNEXES
B PC-ISCII CODE 13
D INSCRIPT KEYBOARD 15
E ATTRIBUTE CODES 20
H ISCII IN TELEWELEPRINTERS 27
( ii )
IS 13194:1991
Computer Media Sectional Commitee, LTD 37
FORE-WORD
This Indian Standard was adopted by the Bureau of IndianStandards after the draft
finalized by the Computer Media Sectional Committee has been approved by the Electron-
ics and Telecommunication Division Council.
This standard conforms to IS 10401:1982, “8-bit coded character set for information
interchange” (equivalent to IS0 4873). It is intended for use in all computer and commu-
nication media which allow usage of 7 or &bit characters, as per IS 12326:1987 (IS0
2022: 1982) “7-bit and 8-bit coded character set - code extension techniques”.
In an 8-bit environment, the lower 128 characters are the same as defined in IS 103 15: 1982
(IS0 646 IRV) “7-bit coded character set for information interchange” also known as
ASCII character set. The top 128 characters cater to all the 10 Indian scripts based on the
ancient Brahmi script.
In a 7-bit environment the control code SI can be used for invocation of the ISCII code set,
and control code SO can be used for reselection of the ASCII code set.
There are 15 offtcially recognized languages in India: Hindi, Marathi, Sanskrit, Punjabi,
Gujarati, Oriya, Bengali, Assamese, Telugu, Kannada, Malayalam, Tamil, Urdu. Sindhi
and Kashmiri.
Out of these, Urdu, Sindhi and Kashmiri are primarily written in Perso-Arabic scripts, but
get written in Devanagari too (Sindhi is also written in the Gujarati script). Apart from
Perso-Arabic scripts, all the other 10 scripts used for Indian languages have evolved from
the ancient Brahmi script and have a common phonetic structure, making a common char-
acter set possible. The Northern scripts are Devanagari, Punjabi, Gujarati, Oriya, Bengali
and Assamese. while the Southern scripts are Telugu, Kannada, Malayalam and Tamil.
The official language of India, Hindi, is written in the Devanagari script. Devanagari is also
used for writing Marathi and Sanskrit. It is also the official script of Nepal.
As Perso-Arabic scripts have a different alphabet, a different standard is envisaged for
them.
An Attribute mechanism has been provided for selection of different Indian script font and
display attributes. An Extension mechanism allows use of more characters along with the
ISCII code. These are only meant for the environment where no other alternative selection
mechanism is available.
The ISCII code table is a super-set of all the characters required in the ten Brahmi-based
Indian scripts. For convenience, the alphabet of the official script Devanagari (with
diacritic marks for non-Devanagari alphabets) has been used in the standard. For notational
simplicity, elsewhere, the term Indian scripts implies Brahmi-based Indian scripts.
Annex-A provides information on the shapes of the corresponding alphabet of the 10
Indian scripts. Annexes B and C provide information on the adaptation of the ISCII code
for an IBM-PC and “English-Alphabet only” environment. Annex-D defines a suitable
keyboard overlay which is common for all the Indian scripts. Annex-E defines the Attribute
codes used for selection of different scripts and display attributes. Annex-F defines the
Roman script translitemtion scheme for all the Indian scripts. Annex-G defines the Vedic
character set available through the Extension code. Annex-H defines the conversion
mechanism between the ISCII code and the earlier ISSCII-83 code used in bilingual telex
machines.
( iii )
w 13184:1991
History
Since the 7Os, different committees of the Department of Offtcial Languages and the
Department of Electronics (DOE) have been evolving different codes and keyboards
which could cater to all the Jndian scripts due to~their common phonetic structure. Barlier
efforts could not keep the ASCII code intact.
In July 1983, DOE announced the ISSCII-83 code which complied with the IS0 g-bit
code recommendations (“Report of the sub-committee on Standardization of Indian
Scripts and their codes for Information Processing”, DOE, July 1983 ). While retaining
the ASCII character set in the lower half, it provided the Indian script character set in the
upper 96 characters. This also had the recommendation on a Phonographic based
keyboard layout for all the Indian scripts.
A keyboard standard for Indian scripts was brought out by DOE in 1986 (Report of the
committee for “Standardization of Keyboard Layout for Indian Script Based Comput-
ers” in Electronics-Information & Planning, Vol. 14, No. 1, Oct. 1986 ). The report also
contained the recommendation for the corresponding g-bit ISCII code.
There was a revision of the ISCII code by DOE in 1988 for making it more compact, in
order to evolve its corresponding IBM-PC counterpart: PC-ISCII (Report of the sub-
committee on “Standardization of Indian Script codes for Jnformation Interchange”,
DOE, August 1988).
IS13194:1991
Indian Standard
INDIAN SCRIPT CODE FOR INFORMATION
INTERCHANGE - ISCII
1. SCOPE be combined with a vowel to form a syllable.
The ISCII code standard specifies a -I-bit code table which can 2.1 .lO Pure consonant: A consonant which does not have
be used in 7 or e-bit IS0 compatible environment. It allows any vowel implicitly associated with it. Example: all the English
English and Indian script alphabets to be used simultaneously. consonants.
It shall not be used in incompatible environments like that of 2.1.11 Nasal consonant: A consonant pronounced with the
IBM-PC, and with computers which do not allow 8-bit charac- breath passing through the nose. Example: m, n, ng .
ters, or which do not follow IS0 code extension techniques.
It cannot be used in the 5-bit Baudot code used for telecom- 2.1.12 Nasalized vowel: A vowel pronounced with the breath
munications. However transcoding to Baudot is possible as passing both through the nose and the mouth. Example:
given in Annex-H. French & voyage. In Indian scripts this is denoted by a
Chandrabindu diacritic mark.
2.1.1 Letter: A character representing one or more of the 2.1.14 Syllable: A unit of pronunciation uttered without inter-
simple or compound sounds used in speech. It can be any of the ruption, forming whole or part of a word, and usually having one
alphabetic symbols. vowel or diphthong sound optionally surrounded by one or
more consonants. Example: there are two syllables in “water”
2.1.2 Conjunct ( Ligature ): A letter which is a combination of
and three in “inferno”.
two or more basic letters. The shape of the conjunct may, or
may not, give clue to the constituting letters. Example: the joint 2.1.15 Alphabet: Aset of letters used in writing a language. Ex-
form (digraph) of “es”. ample: the English alphabet consists of upper and lower-case
letters A to Z.
2.1.3 Diacritic mark: A mark added to a letter which distin-
guishes it from the same letter without a mark, usually having 2.1.16 Basic alphabet: The minimal set of letters which can be
a different phonetic value or stress. used for uniquely encoding every word of a language. Example:
the basic alphabet for English consists of only the upper-case
2.1.4 lnternatlonal numerals: The conventional 0 to 9 digits
letters A to Z.
used in English for denoting numbers. These are also known as
Indo-Arabic numerals (to differentiate them from the Roman 2.1.17 Phonetic alphabet: An alphabet which has direct cor-
numerals like IX for 9). respondence between letters and sounds. Example: the Indian
scripts.
2.1.5 Script numerals: The 0 to 9 digits in a script, which have
shapes distinct from their international counterparts. 2.1.16 Latin alphabet: The alphabet used for writing the
language of ancient Rome. Also known as the Roman alpha-
2.1.6 Vowel: A letter representing a speech sound made with
bet. Used today for writing English and some other European
thevibration of the vocal cords, but without audible obstruction.
languages.
English examples: a, e, i, o, u.
2.1.19 Script: A distinctive and complete set of characters
2.1.7 Vowel sign: Agraphiccharacter associated with a letter,
used for the written form of one or more languages.
to indicate a vowel to be associated with that character (Matra
in Hindi). 2.1.20 Roman script: The script based on the ancient Roman
alphabet, with the letters A-Z and additional diacritic marks.
2.1.6 Diphthong: A compound vowel character, in which the
Used for writing a language which is not usually written inthe
articulation begins as for one vowel and moves onto another.
Roman alphabet.
Example: as in “coin”, “loud” and “side”.
2.1.21 Romanization: Representation of words of a script
2.1.9 Consonant: A letter representing a speech sound in
using the Roman alphabet, possibly through additions of
which the breath is at least partly obstructed, and which has to
IS 13194:1991
diacnttc marks. Example: Romaji is the romanized form of the 2.3.13 Code extension: The techniques for encoding of
Japanese script. characters that are not included in the character set of a given
code.
2.1.22 Transliteration: Representation of words with the clos-
est corresponding letters in an alphabet of a different language. 2.3.14 Extended character set: Characters which are not
present in the main character set, but are available through
some code extension techniques.
2.2 Font/Display Terminology
2.3.15 ASCII code: American Standard Code for Information
2.2.1 Font: A set of symbols used for display or printing of a Interchange. A 7-bit code which specifies 32 control characters
script in a particular style. and 96 graphic characters, for English language.
2.2.2 Display rendition: The process by which a string of 2.3.16 Transcoding: Aset of tables and rules by which a code-
characters is displayed (or printed). In this process several table can be transformed to another code-table, such that the
consecutive characters may combine with each other on the characters get mapped to their equivalent forms.
screen. The sequence of display of the characters may become
different.
2.4 Other Terminology
2.2.3 Display composing: The process of organizing the
basic shapes available in a font in order to display (or print) a 2.4.1 Direct sorting: Sorting of words done through direct
word. comparison of the corresponding character codes. No special
heuristics or rules are used.
2.3 Character/Coding Terminology 2 4.2 Dictionary sorting order: Order in which the letters
sho:iid be organized within an alphabet, such that words can
2.3.1 Bit: Binary digit. It can have only two values: 0 and 1. get c rder:& according to the language dictionaries. Special
rules may have to be aoplied in add&on to direct sorting to
2.3.2 Byte: A bit string that is operateo upon as a unit. It usually achieve thrs. Example: in English, upper and lower cases have
represents a character and usually consists of eight bits. to be transformed to a single case before direct sortrng is
applied.
2.3.3 Hex digit: Hexadecimal digit, where each digit has 16
values. The values above 9 are denoted by the letters A to F 2.4.3 Default: A value or state which is assumed when no
as shown: A(lO), B(ll), C(12), D(13). E(14). F(i5). Four brts value or state is explicitly stated.
are needed to encode a hex digit.
2.4.4 Keyboard overlay: Defines the characters for each key
2.3.4 Character: A symbol which can represent a letter, a positron (unshifted, shifted etc.), which are meant to replace the
numeral, a punctuation mark, a special symbol or even a standard English characters on a QWERTY keyboard.
control function.
2.3.5 Control character (control code): A character which 3. ISCII CODE PHILOSOPHY
normally has no visual !orm, but affects the recording, process-
ing, transmission or interpretation of data. A code for all the Indian scripts is made possible by their
common origin from the Brahmi script. An optimal keyboard
2.3.6 Graphic character: A character, other than a control overlay for all the Indian scripts, is made possible by the
character, that has a visual representation. Normally handwrit- phonetic nature of the alphabet.
ten, printed or displayed.
There are manifold advantages in having a common code and
2.3.7 5-blt characters (5blt codes): Characters, whose code keyboard for all the Indian scripts. Any software which allows
has 5 bits, allowing representation of 32 characters. ISCII codes to be used, can be used in any Indian script,
enhancing its commercial viability. Furthermore, immediate
2.3.8 7-blt charactem(7-bit codes): Characters, whose code transliteration between different Indian scripts becomes pos-
has 7 bits, allowing representation of 128 characters. sible, just by changing the display modes. Simultaneous availa-
bility of multiple Indian languages in the computer medium will
2.3.9 &blt characters (Eblt codes): Characters, whose code
accelerate their development and facilitate national integration.
has 8 bits, allowing representation of 256 characters.
The e-bit ISCII code retains the standard ASCII code, while the
2.3.10 Character wt: A set of characters grouped together for Indian script keyboard overlay is designed for the standard
a purpose, like that ofrepresenting a script. English QWERTY overlay. This ensures that English can co-
exist with the Indian scripts. This approach also makes it
2.3.11 Code table: Atabte showing the positions allotted to in- feasible to use Indian scripts along with existing English com-
dividual characters from a character set. puters and software, so long as e-bit character codes are al-
lowed.
2.3.12 Character code: Position in the code table of the
character.
2
IS 13194:1991
Vargl ‘4; a 77 v 3
varg 4
4.3 Nasalization Sign: Chandrabindu =
varg 5 The * denotes nasalization of the preceding vowel (can be
P ph b bh m implicit 3r vowel within a consonant). Example: srk, %, M
4=“. Y.
non-Vary
In Devanagari script it often gets substituted with Anuswar, as
(:I
the latter is more convenient for writing. In some words, how-
ever, Anuswar and Chandrabindu can give different meanings.
Note that the consonants ?T ( S ) and B (s) are pronounced Hindiexample:&r(Laugh),&r(Swan).
identically today.
3
4.8 Diacritic Mark: Nukta -
The Nukta is used for 3 and 3 characters, in some Northern
scripts. It is also used for deriving 5 other consonants in the
Devanagari and Punjabi scripts, required for Urdu.
4.9 Punctuation
The original pronunciation of the vowel % (I) is now lost; it gets
pronounced mostly as “ri” or “ru”. All punctuation marks used in Indian scripts are borrowed from
English, except for the full-stop, instead of which a Viram (I) is
The vowels a and 3 are used in Southern scripts for denoting used in the Northern scripts. The Viram is, however, being in-
vowels shorter than Band .$I respectively. creasingiy substituted by a full-stop. A double Viram (II) is aJso
used in Sanskrit texts for indicating a verse ending.
The vowels a (ai) and * (au) are actually diphthongs, although
in Hindi they also get pronounced as longer vowel forms of Y
and Jit respectively. 4.10 Other Signs
Vowelsa and 3n’ are used in modern Devanagari for represent-
4.10.1 Avagrah s is primarily used in Sanskrit texts. It creates
ing the English vowel sounds as in “bat” and “ball” respectively.
an extra stress on the preceding vowel. Two Avagrahs can be
used for creating further extra stress. Avagrah is not used in
Sanskrit infrequently uses three other vowels, which are obso-
modern Indian scripts.
lete today in other Indian scripts. These are:
Vowels: I 5 q 4.10.2 Om 3% is a Hindu religious symbol.
Matras: ; cr 5
4.11 Numerals
Many Indian scripts today use only the international numerals.
4.6 Vowel Omission Sign: Halant :
Even in others, the usage of international numerals instead of
the original forms is increasirrg. Although the Devanagari script
In Indian scripts consonants are assumed to have an implicit
has its own numerals, the official numeral system is the inter-
vowel 3-a” within them unless an explicit Matra (vowel-sign) is
national one.
attached. Thus a special sign Halant (T ) is needed for indicat-
ing that the consonant does not have the implicit 37vowel in it.
In Northern languages, the Halant at the end of a word generally 5. LAYOUT OF tSCII CODETABLE
gets dropped, though the ending still gets pronounced without
a vowel. Example: Ashok = * => &.
The E-bit Code for Latin and Indian script alphabets is given in
Table-l. It consists of 256 positions, arranged in 16 rows and
This doesn’t happen in Southern languages and Sanskrit,
where a Halant is always used to indicate a vowel-less ending. 16 columns. The rows are numbered in decimal as 0 to 15, and
in hex as 0 to F. The columns are numbered in decimal as 0 to
Example: param = mq(Sanskrit word).
240 in increments of 16, and in hex as 0 to F. The lower 128
characters of this table contain the ASCII character set.
4.7 Conjuncts
The 7-bit Code for Indian script alphabets is given in Table-2.
Indian scripts contain numerous conjuncts, which essentially It is meant for an ISOcompatible 7/E bit environment. It consists
are clusters of upto four consonants without the intervening of 94 positions, arranged in 8 columns and 16 rows.
implicit vowels. The shape of these conjuncts can differ from
those of the constituting consonants. These conjunc!s are A position in the Code table is identified in decimal as well as
formed in the ISCII code by putting the Halant (:) character, hex~notation. A character located at decimal column x and row
between the constituent consonants. y will have its decimal position as x+y. A character located at
Example: m=7h:$ a;7f q hex column x and row y, will have its hex position as xy.
ti=q 7:9 sm=%F;_T 9
Hex 0 1 2 3 4 5 6 7 6 9
0 NUL DLE SP 0 Q P P
1 SOH DC1 ! 1 A Q a 4
”
2 STX DC2 2 B R b r
3 ETX DC3 # 3 C S c S
4 EOT DC4 $ 4 D T d t
5 ENQ NAK % 5 E U 8 U
6’ ACK SYN 8 6 F V f V
7 BEL ETB 7 G W g W
6 BS CAN ( 6 H X h X
9 HT EM 1 9 I Y i Y
z
l
10 LF SUB J Z i
11 VT ESC + K I k i
12 FF FS L \ I I
13 CR GS M I m 1
14 so RS N h n
15 SI us / 0 - 0 DE1
Hex F
Hex Dec. 224 240
0 0 EXT
1 1 0
2 2 t
3 3 ?
4 4 ?
5 5 ‘d
6 6 4
7 7 h
6 6 \s
9 9 L
A 10 9
B 11
C 12
D 13
E 14
F 15
IS 13194:1991
‘osition Position
iex Dec. Char ~Name Hex Dec. Char Name
Note: 1. The positions EB-EE and FB-FE, are reserved for future expansion of the code.
2. Scripts corresponding to other Indian languages are given in Annex-A.
6
IS 13194:1991
6.3 Halant -
6.5 The Nukta Character 7
The implicit vowefin a consonant can be removed by addition
of a Halant sign (3 . In the ISCII code conjuncts are formed by The Nukta consonants (5 3BTI a 3 ? n;) get formed by adding
typing a Halant character between consonants. Aconjunct may a Nukta (T) character immediately after the appropriate conso-
consist of upto 4 consonants joined by Halants. Example: nant.
In practice, a Halant sign is shown only if the consonants do not Table 4: ISCII characters derived by appending a Nukta
change their shape by joining up. Tamil script has no conjuncts,
and thus an explicit Halant sign always gets used. Here are Char Nukta Char Name
some Devanagari examples where Halant does not disappear:
% ?F; Consonant QA (Urdu)
zTa=m . 3; q=v q B Consonant KHHA (Urdu)
JT ?T Consonant GHHA (Urdu)
3 a Consonant ZA (Urdu)
6.3.1 Explicit Halant
3 3 Consonant Flapped DA
A Halant is used between consonants to form conjuncts. But z ? Consonant Flapped OHA
many times in Sanskrit and Vedic texts, one may wish to show B R Consonant FA (Urdu)
an Explicit Halant which would be shown on the previous P % Vowel RII (Sanskrit)
._
consonant, and which would prevent the consonant from 5 Vowel Sign RII (Sanskrit)
joining with the next one. Two consecutive Halants form an % Vowel LI (Sanskrit)
Explicit Halant. Example: r; z Vowel Sign LI (Sanskrit)
3 Vowel LII (Sanskrit)
: z;i Vowel Sign LII (Sanskrit)
f.
en Sign OM
I S Vowel Stress Sign AVAGRAH
(Sanskrit)
7
IS 13194:1991
6.6 Attribute Code (ATR) Vowel combinations and consonant combinations would get
ordered as shown below.
The Attribute code, followed by a displayable ASCII character,
defines a font attribute applicable for the following char-
acters. This mechanism is meant for use in that medium
where alternative font selection mechanism is not available.
The details are given in the Annex-E.
7.2 Direct Sorting The spelling of a word contains all the information necessary
for display composition, which can be automatically done
Since there are variations in ordering of a few consonants through display algorithms. It becomes possible to type in a
between different Indian scripts, it is not possible to achieve text, without even looking at the display. When the tedium of
perfectsorting in all Indian scripts. Special routines would be composing goes away, on-line authoring becomes possible,
required when some characters like “Nukta” need to be ignored where an author can think out new text while he is typing it.
for the purpose of sorting. For most purposes, however, the
Unique spellings are essential for making spelling checkers
direct sorting achieved through the ISCII code should be
and dictionaries. They are atso essential to facilitate finding of
sufficient.
words in a word-processor, or for information retrieval from a
data-base.
8
IS 13194:1991
In ISCII code some logically related sub-sets can be identified l Nukta (N) can come after only the consonants with
through simple range comparisons. Using these it is possible to which it can combine.
predict a syllable boundary for an Indian script word. This may be
necessary for composing fonts for display purposes, or for l The above syntax ignores the vowels derived through
hyphenation at a~syllable boundary. Nukta ( ~8 , TZand r$ ) and the Avagrah sign s.
Vowel modifiers (D) f L T IS 103 15, 7-bit coded character set for information interchange,
which is equivalent to IS0 646.
IS 12326 (1987). 7-bit and B-bit coded character sets - Code
Halant (H) T
extension techniques, which is equivalent to IS0 2022.
*Nukta (N) 7 IS 10401 (1982), B-bit code for information interchange - Sfruc-
ture and rules for implementation , which is equivalent to IS0
4873.
IS0 2375, Procedure for registration of escape sequences.
9
IS13194:1991
ANNEX - A
INDIAN SCRIPT ALPHAB~ET CORRESPONDENCE
Following mnemonics are used for Indian scripts : TLG: Telugu KND: Kannada MLM: Malayalam
DEV: Devanagari PNJ: Punjabi GJR: Gujarati TML: Tamil RMN: Roman
ORI: Oriya BNG: Bengali ASM: Assamese Roman script transliteration scheme is explained in Annex F
r GJR TML
:’*.‘.,
.....
I..‘.,
.....
..... :.‘.. 0
;. .I. .....o 0
It 4
WL 4?2
6 &I
6 A
6 e?_
Gl 25!I
2%
6T
4 6J
3 82
6
IS 13194:1991
gh
il
C
ch
j
Z
jh
fi
!
fh
d
;r
dh
ih
r!
t
th
d
dh
n
n
P
Ph
f
b
bh
n-l
Y
9
r
_r
_(-
‘1s 13194:1991
“T I Fr A a Q 0 lx 60
3 ! z5 ol B d 2 6lT
9
jP z 9
r V a ;= e 3 cl-l 6I_l
IT 4 m J3 21 3 u-0
? $ B 9, 3 &I
Q+
IT S -fT R a. 3 cl-u m
F h F J G. “3 nn
w
.... :“‘. ;. ‘.,J i:::11 y? ,..‘.
.....T ii ....T ‘.,.’ T
.._. :::::3 .....rl
T...
.___: i f :: f ::: p, :F..,
..... 7
.....
;.‘.. :::::q
: § I
:::? I i:::;t ::.:g t::g i:::q
:“‘.,
.__: ::::?I
:....:.
.. :.-; :-...
.....
:--:
..... ;‘.‘., :“.. <:::g...
..... .__:
d
5
U
3 0 .. . .
a_ ‘3
,.., :’ ‘.. :.-. ;‘“:
..... ;“..
<::Ly i:::g...
...t ..... &’
OI
iJ
h c\ ._,.:
err :a
;.‘.. :“‘..
..... ;.-.
..d.
4
. .._ :.‘..
..... c
._.. .’
2
2 I c 6
3.
...!
_)
;’ ‘..
.__.. 6c)i:::,
61:::::
e
\
,T’..
\
;‘ ._
.....
T
:.‘L >.
..,.) 6::::
-5
;“..
.....’
c i"':
.,..6 c::::
...1 e .....
Q 3 4
;’‘.,.’
‘., 7
06,i:::
*
:- :‘“.. ;.‘.. :“‘: a i:::,
‘.,
<:::tj 6
<:::zj
:.“., :’..,:
.. F vI
:
:.-... ;.“., .._: :‘“.,
;“..
:’ -.,
._,: ..... .. . .. ‘.,.. ....
..,..
. . .% \ \
I I I I I .
12
IS 13194:1991
,ANNEX-B
PC-ISCII CODE
Hex B C D E
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
A 10
B 11
C 12
D 13 3
E 14 c,
F 15 6
The PC-ISCII code is the version of ISCll code defined by Although the characters are at different locations in the PC-
Centre for Development of Advanced Computing (CDAC), ISCII code, their sequence remains identical to that in the ISCll
Pune, for compatibility with IBM-PC a-bit character set. IBM-PC code. This allows the PC-ISCII code to be functionally identical
does not follow the IS0 a-bit code recommendation. It uses to the ISCII code, enabling the same sorting sequence.
line-drawing character set located between BO hex and DF hex.
Since these line-drawing characters have to co-exist along with The positions occupied by the ATR and EXT codes were left
ASCII and Indian scripts, the PC-ISCII code is designed to undefined in the beginning, as some IBM-PC compatibles did
avoid clash with them. This has been possible through a not allow the corresponding characters to be typed in through
bifurcation of the ISCII character set into two halves. the keyboard. When this problem was overcome the PC-ISCII
code was already in wide use, and could not be changed. These
The Indian script numerals defined at the end of ISCII code positions could not be alloted to some new characters, as the
table are not included in the PC-ISCII code set. With PC-ISCII sorting order would have got affected. ATR and EXT codes (on
only the ASCII numerals should be used. These numerals which sorting is not defined) were therefore suitable to fill in
themselves can be rendered in shapes of numerals in a these two positions.
particular script through an appropriate Attribute (ATR) charac-
ter. The five empty character positions towards the end of the code,
are reserved in ISCII, but are needed in other script codes (like
Perso-Arabic code).
13
ls13194:1991
ANNEX-C
ENGLISH-ALPHABET ISCII CODE: EA-ISCII
.I+__ “,___._.rl___.__*._
3
_^____. ,-_. -.--.“_ ..X,_ __-__“_*
0 0 Lye,
1 1 ca
2 2 b
3 3 c
4 4 d
5 5 e
6 6 t
7 7 9
8 8 h
9 9 i
A 10 I
B 11 k
C 12 I
D 13 m
E 14 n
F 15
14
IS13194:1991
ANNEX-D
INSCRIPT KEYBOARD
?‘he lnrcript Jver!av , 33i~i 33s ~:shafacters required fur all the
indian stxpts, as c;e!med YJ !”a ESC!I character set. The Indian
script alphabet has a loij~:al structure. derived from Zhe
ohonetic properties. The ~nscr\pt overlay mirrors this iogica!
structure. The overlay has also*Deen optimized from phonetic!
frequency considerations. it is divided into two parts: the vowel
pad on the left hand side, and me consonant pad on the right
hand side.
15
- ait! i @ ‘w 7 $ 1% nA 78 aT’ 51( ) 7+ T@
“tl 2 3 4 5 6 7 8 g 6 r’=; Bs
When Nukta ” : ” is typed after a character, the character shown to its~left on the key, is obtained.
16
%
5
A
6
&
7
aq~*dw
8
(
9
1
0 :
*+
**~=
BS
I Cf?NTRfY
__. . . . .__ IA sls 6TID
&IF @lGp_lH IJ pIK IL F ~~ T-
.
!
1 @
~2 #_$ 4 c 5% P A
6 9a a* $( 1 _a~+Q BS
3 7 8 9 0 - =
6
C?aWgEER QT QY QUOOI UOUPB{ Q) &!I(
TAB
61 6” I ’ u Q Q Q Q caiS1.i
17
IS13194:1991
* =JJ 1 2 3 4 !jz6 7 6 9 0 - =3
!
@ # I$ z %w A 3 & 36 l 3 ( ) : + w Bs
1 2 3 4 5 6 7 6 9 0 7 =; I
16
z*x csm v 6 ‘sN @fvlM~bm> ?
SHIFT SHIFT
0 0 m m 0-l rzlm,, I w
! %
A
~& l (9 1 +
1 3 4 5 6 7 6 0 : = BS
Notes:-Rakar, =; a
- Nukta : can be typed after a ti al ?I 3 v B
toget gig V TJ 3 a j5
19
IS 13194:1991
ANNEX-E
ATTRIBUTE CODES
An ASCII character which follows the ATR character indicates Basic Attributes are:
a new font Attribute which is applicable for the subsequent Highlight Bold Outline S0ataOgw Italics Underline
characters till the end of the row, or till another attribute code is Expanded
encountered.
These can combine together to give different effects:
The ASCII character, following the ATR character, can indicate Highlight+Bold = ExtraBold
94 different attributes. Out of these the first 31 attributes are re-
EIILIP Qw_lunama PbD ov!IIBUlm@
served for display attributes, while the rest of 63 attributes
indicate selection of a font for a new script. mr&wIlm.m mlanflno
PHgP mvi&im l%m DPa&v
lzf&VMII mmlou
@mQmm+8~a~ow =
DQQD momm
Expanded characters are of Double width.
The DBL, Double size row attribute makes the whole row
B double-height and double-width. This can be used along with
!
TOP and LOW attributes to get quadruple size characters.
ICI I
H-t F
I
1
I
I
E-2 Font Attributes (40h to 7Eh)
At the beginning of a row the default display script is assumed
to be active. The font attributes cause selection of a new script
till the end of a row, or till another font attribute is encountered.
I I I
<-ATI? Codes-><----- FONT Codes -----> E-2.1 DEF (Default Font)
<------Normal----><-Reverse->
The Default font attribute causes re-selection of the default
display script.
20
IS 13194:‘1991
The RMN font attribute selects Roman script corresponding to E-2.4 Perso-Arabic Fonts (71 h to ?6h-)
the currently active script. The numerals after the RMN attribute
These scripts are written from right to left. In general codes from
will be shown as international numerals.
71 h to 7Eh are reserved for scripts written in the reverse
E-2.3 Indian Script Fonts (42h to 4Bh) direction. The Perso-Arabic family contains Arabic (AR@,
Persian (PM), Urdu (URD), Sindhi (SND). Kashmiri (KM)
This se1ects.a Brahmi based Indian script. The subsequent and Pushto (PST). Amongst these, Urdu, Sindhi and Kashmiri
numerals will be shown in the formscorresponding tothescript, belonging to the Indian subcontinent have considerabfe simi-
if they exist, otherwise they will be shown in their international larity.
form.
The ASCII numerals will be shown in the Perso-Arabic form,
after a Perso-Arabic font attribute.
21
ANNEX-F
ROMAN SCRIPT TRANSLITERATION
Nukta Consonants
The National Library at Calcutta standardized the diacritic
marks to be used for romanization of Indian scripts, in 1988
(“The National Library Newsletter“, June 1988).
As Northern scripts do not have short* and 3it, the long p and
Jit can also be rendered without diacritic mark as ‘e’ and ‘0’
respectively. VOWEL MODlFlERS
VOWELS
Notes :
a n = 6(n in Tamil
q r = = in Marathi
z ! used in Marathi
Kannada( d )
T z = Tamil( w ), Malayalam( ‘9 )
Non-Vargs
fi
22
IS13194:1991
ANNEX-G
EXTENDED CHARACTER SET FOR VEDIC
The ISCII codes for Devanagari catered to all the characters G-l .l Udetta ~3’FlFO
required for typing Hindi, Marathi and Sanskrit. However they
The vowel that is perceived as having a high tone is called
could not contain the additional characters required for repre-
Udatta, or acutely accented. It is normally not marked. “..” is
senting ancient Vedic text. Many of these Vedic characters
used in S ukla Yajurveda texts, at the end of a sentence.
combine with other Devanagari characters. The Vedic charac-
ters cannot be thus thought of as constituting an independent
script, but have to be catered to as an extension to the ISCII
G-l .2 Anudetta ( w > ( _>
character set.
The vowel that is perceived as having a low tone is called
ISCII code provides an Extension code (EXT) whichredefines
Anudatta, or gravely accented. In writing it is marked by a line
the following ISCII character as another character not present
underneath the vowel. It also denotes Udatta in S atapatha
in the ISCII code. Through this extension technique it is
Brahmana. In Kathaka text, Anudatta is shown as a vertical iine
possible to represent, apartfrom Vedic, miscellaneous charac-
below the character (, ).
ters required for other Indian scripts.
In the Vedas there are three lengths for a vowel. These are
short, long and extra-long (IF7, itd, v). The short and long G-l.6 Jlhv6mrlliya @fFIF@T or 3WPfN) ( x )
vowels are denoted by the normal vowel signs used in Deva-
nagari, while the extra-long vowel is indicated by putting a 3sign This is like a half-Visarga sound, and can come only before four
after a short or long vowel sign. Example: &ad 7I7. consonants. Before q and a it is called Jihvamuliya, while
before Wand q it is called Upadhmaniya.
23
-.
IS 13194:lQQl
. 24
IS13194:1991
Will go to the extreme right of the character, even after the The Extension key should be thought of as another kind of
Visarga. Example: qJ gr ftr:J SHIFT key, which has to be pressed along with a character key.
It is effective only when the lnscript overlay is active (CAPS-
G-3.210~ Svara * * LOCK is on). Each key typed along with the Extension key
emits a character pair, which join up on the display to show the
These will attach at the top of a character. However if there is
desired character. The first character is the EXT character,
already some other Matra, Anusvara or Chandrabindu sign
present on the top, then the top-Svara will-attach to the right of while the second is a Devanagari character. Thus two back-
it. Example: spaces would be required for deleting both these characters.
Example:
It a +t * $ f4 f#
EXT-KEY + H-key = EXT + T =r d
-cpi * * ‘6;’ * t% f% EXT-KEY + T-key = EXT + V E> ’
d 4-8 ?a
5?*31f” Vedic Keyboard Overlay
B jll
C
G-3.4 BottomSvara: H 3 F F F g v 4 5
c v
s
f
The Svara symbols which attach above, below or after a Non-Svara (R)
character. should be typed at the end of a composite character.
All the Vedic charactersexcluding the Svara.
Example:
*+ ,=m, “+Z+_=‘j;. rn+f+T+I.=lwJ-
Following ISthe extension to the ISCll code syntax, required for
Vedic syllab!es.
Vedic non-Svara characters can take only a Vedic Svara on it.
Example:
Vedic-Syllable ::= Vedic-Cons-Vowel-Syllable 1
exT~= U, $+‘=i, E +_= c
Vedic-Vowel-Syllable ( Full-Vedic-Syllable
26
ANNEX-H
ISCII IN TELEX/TELEPRINTERS
The Department of Telecommunication (DOT) has adopted the H.-l ISSCII-83 Syntax
ISSCII-83 (Indian Script Standard Code for fnformation Inter-
change) code, a DOE 1983 standard, for use in Roman/ The ISSCII-83 characters used in bilingual telex machine can
Devanagari telex/teleprinters. An 8-bit ISSCI character is be classified as:
transmitted as two 5bit characters. These machines initially
interact in Roman using 5bit Baudot code (CCIlTAlphabet No.
2). A protocol is defined, by which machines at both ends, can
enter and exit the ISSCII-83 mode.
27
IS13194:1991
-
1 2 3 4 5 6 8
Hex
-
ii- 32 48 64
-
80 96 128
0 3 NUL DLE SP 0 @ P P
1 1 sol-i DC1 I 1 A Q a q
II
2 2 STX DC2 2 B R b r
3 3 ETX DC3 # 3 C S C s
4 4 EOT DC4 RS 4 D T d t
5 5 ENC NAK % 5 E U e U
7 7 BEL ETB 7 G W 9 W
8 8 BS CAN ( 8 H X h X
9 9 HT EM 1 9 I Y i Y
*
A 10 LF SUB J z j Z
B 11 VT ESC + K 1 k 1
c 12 FF FS L \ I I
D 13 CR GS M I m }
E 14 so RS N h n
F 15 SI us I 0 - 0 DE
- -
-28
IS 13194:1991
DO v c9 EO T E9 :1 FO T E8,E8
01 T CA El 1 DA Fl
E2 f DB F2 LNk E8 l2
02 rT CB
D3 q cc E3 -t DC F3 R C4.CF
D4 P CD E4 ; DD F4 8. CF,DD,PE
D5 T CF E5 ; DE F5 -L CF,E8 *3
DO E6 ‘; DF F6 w B3,E8,D6
D6 3
07 FT Dl E7 ; DF,ES ~F7 X C2,E8,CF
D8 G3 02 E8 = E3 F8 -n BA,E8,BC
03 E9 A EO F9 st D5,E8,CF
D9 F
EA 3. El FA Ignored
DA q D4
FB Ignored
DB VI 05 EB i E2
FC Ignored
DC 1 D6 EC i E7 FD Ignored
DD fl D7 ED t E4 FE Ignored
DDE I!! 08 EE t E5 FF Ignored
DF 0 DO,E8,DC EF t E6
L
El 1 91 A5 E6 ; AA EB i AD
E2 f q A6 E7 T AA,EO EC j 82
E3 ? f A7 E8 = AE ED ‘t AF
1E4 3 3 A8 -E9 =, AB EE t Bo
;E5 ‘; s A9 EA 1 AC EF t Bl
29
IS 13194:1991
DO .7 D5 EO L1 E9 FO
Dl 3 D7 El L EA Fl * 0 30
D2 a D8 E2 F EB F2 9 1 31
D3 ;;r; D8 E3 = E8 F3 ? 2 32
D4 d DA E4 t ED F4 3 3 33
D5 m DB E5 ‘t EE F5 Y 4 34
D6 B DC E6 ‘r EF F6 4 5 35
D7 q DD E7 i EC F7 fi 6 36
D8 B DE E8 T F2 .1 F8 b 7 37
D9 INV Ignored E9 T EO 1
‘2 F9 c 8 38
DA 1 El EA I 2E FA 9 9 39
DB .f. E2 FB
EB
FC
DC ? E3 EC
FD
DD ; E4 ED
FE
EE
DE ; E5 FF
EF
DF - E6
_I
CONJUNCT TABLE Notes: It is necessary to collect a whole ISCll syllable berore translating
it to its corresponding ISSCll-83 syllable .
3Ja A0
ll A double Halant of ISCII gets converted to a single Halant of
s A5 ISSCII-83. A single l-falant before a non-consonant gets converted to
L F5 ‘3 a single Halant of ISSCII-83, followed by the non-consonant.
er F6 l2 If Nukta comes before Halant, ignore the Nukta. If the Nukta
P F7 comes before a Matra, then send the Nukta after the syllable.
?i F8 l3T ,of ISCII detected at the begining of an ISCII syllable, has to be put
I F9 after the last consonant of the syllable. But if the Matra is present, it has
to be put immediately after it.
30
..
IS 13194:1991
Formatting 5-bit coded bytes from &blt coded Protocol for change over from CCITT to ISSCII-83
characters :
BILINGUAL
MULTILINGUAL RESPONDING
1. 8-bit ISCll or ISSCII-83 code for a character INITIATER MACHINE
b7 b6 b5 b4 b3 b2 bl bo
Send “ZHHHH” in CCln
Indicator ..~~~~.______
Flashes
Splitting the code into two nibbles Send “DDDD” in CCITT
Idicator On ._____________<__~___..~~__________
3SCll-83 Mode Send “OK” Acknowledge fssc11f83 Mode
Second-byte - -z First-byte - - Z-
.
Protocol for change over from CCITT to ISSCII-83 Protocol for change over from CCllT to ISCII
BILINGUAL /
MULTILINGUAL WLTILINGUAL MULTILINGUAL
BILINGUAL RESPONDING NITIATING RESPONDING
INITIATER MACHINE AACHINE MACHINE
~__
FROM ISCll to CCITT
iROM ISSCII-83 to CCITT i
H-2, Bilingual to Bilingual /Multilingual An initiating machine indicates the default script to a respond-
ing machine through a 3 character Script-Mnemonic, as de-
Protocol fined in the Script-Mnemonic Table.
H-2.1After the call is established wit:1 verification of the called Although each line starts with the default script, it ispossible to
party identity by WRU exchange, the initiating machine sends select other scripts within a line through the Attribute character
“HHHH” mode-change sequence to the responding-machine (ATR) defined in the ISCII code. All the script attributes will,
and an indicator flashes. however, terminate at the end of a line, and the next line will start
with the default script.
H-2.2 On receiving “HHHH” sequence, the responding-ma-
chine sends “DDDD” identification sequence in CCITT code The ATR character also allows selection of different display
and its modes changes to ISSCII-83. Its indicator starts flash- attributes, like bold, italics and ur,derline. These attributes are
ing. always off at the beginning of a line, and then work on a toggle
basis.
H-2.3 If the initiating-machine receives identification sequence
“DDDD” correctly, it changes its mode to ISSCII-83 and sends
“OK” to the called-machine. Now the indicator on the initiating- H-4 Multilingual to Multilingual Protocol
machine lights up continuously.
H-4.lAfter the call is established with verification of the called
H-2.4 If “DDDD” is not received by the initiating-machine for 2
party identity by WRU exchange, the initiating machine sends
seconds, the indicator goes off, the machine reverts back to the
“ZHHHH” mode-change sequence to the responding machine
CCITT mode, and step 1 is repeated. This sequence is re-
and an indicator flashes.
peated twice in case of the automatic mode.
H-4.2On receiving “ZHHHH” sequence, the responding-mach-
H-2.5 On receiving “OK” in ISSCII-83, the indicator of the re-
ine sends “I I I I” identification sequence in CCITT code and its
sponding-machine becomes continuously on.
mod,> changes to ISCII. Its indicator starts flashing.
H-2.6 If “OK” is not received by the responding-machine for 2
H-4.3 If the initiating-machine receives identification sequence
seconds, the mode of the machine changes over to CCllT,
“I I I I” correctly, it changes its mode to ISCII and sends a default
indicator switches off and the sequence 1 is repeated twice in
script mnemonic, as specified in the Script-Mnemonic table.
case of the automatic mode.
Now the indicator on the initiating machine lights up continu-
H-2.7 To change from ISSCII-83 to CCITT, either of the ma- ously.
chines sends ” SS ” (Two Avagrah) sequence. The initiating-
H-4.4 If “DDDD” is received by the initiating-machine then the
machine switches off its indicator, and reverts to the CCllT
interaction proceeds as defined for two bilingual machines, If
mode.
“1 I I I” is not received for 2 seconds the indicator goes off, the
H-2.8 On receipt of the change-over sequence ” 55 “, the re- machine reverts back to the CCllTmode and the sequence 1
is repeated twice in case of of the automatic~mode.
ceiving-machine changes over to CClTT mode, switches off its
indicator and sends WRU code back.
H-4.5 On receiving a 3 character script-mnemonic in ISCII, the
indicator of the responding-machine becomes continuously on.
H-2.9 The receipt of answer-back in CCllTsetves as confirma-
As the characters are repeated thrice within the script-mne-
tion of change over to CCITT mode of the other machine.
monic, it is possible to detect the defaul? script if one of the
H-2.10 In case of answer-back failure in manual or auto-mode, characters is in error.
the call clears down.
H-4.6 If a valid script-mnemonic is not received by the re-
sponding-machine for 2 seconds, the mode of the machine
H-3 Multilingual Machines changes over to CCITT, indicator switches off and the se-
quence 1 is repeated twice in case of the automatic mode.
A multingual machine always provides all the 10 Brahmi-based
Indian scripts. These scripts can be typed in a common manner H-4.7 To change from ISCII to CCITT, either of the machines
through the lnscript keyboard overlay. In addition there can be sends n3%@l” sequence.Theinitiating-machineswitchesoff its
different overlays for some scripts. indicator, and reverts to the CCITT mode.
32
Bnrcril of Indian Stmwdr
BIS is a statutory institution established under the Bureau of Indian Standardr Act, 1986 to
promote harmonious development of the activities of standardization, marking and quality
certification of goods and attending to connected matters in the country.
Copyright
BIS has the copyright of all its publications. No part of these publications may be reproduced
In any form without the prior permission in writing of BIS. This does not preclude the free use,
in the course of implementing the standard, of necessary details, such as symbols and sizes, type
or grade designations. Enquiries relating to copyright be addressed to the Director
( Publications ), BIS.
Indian Standards are reviewed periodically and revised, when necessary and amendments, if
any, are issued from time to time. Users of Indian Standards should ascertain that they are in
possession of the latest amendments or edition. Comments on this Indian Standard may be
sent to BIS giving the following reference:
Headquarters:
Manak Bhavan, 9 Bahadur Shah Zafsr Marg, New Delhi 110002
Telephones : 331 01 31, 331 13 75 Telegrams : Manaksanstha
( Common to all Offices )
Regional Oflices : Telephone
Central : Manak Bhavan, 9 Bahadur Shah Zafar Marg 331 01 31
NEW DELHI 110002 1 331 13 75
&stern : l/14 C. I. T. Scheme VII M, V. I. P. Road, Maniktola 37 84 99, 37 85 61
CALCUTTA 700054 37 86 26, 37 86 62
SC0 445-446, Sector 35-C, CHANDIGARH 160036 53 38 43, 53 16 40
Northern :
53 23 84
C. I. T. Campus, IV Cross Road, MADRAS 600113 235 02 16, 235 04 42,
Southern :
235 15 19, 235 23 15
Western : Manakalaya, E9 MIDC, Marol, Andheri ( East ) 632 92 95, 632 78 58.
BOMBAY 400093 I 632 78 91, 632 78 92
Branches : AHMADABAD, BANGALORE, BHOPAL, BHUBANESHWAR, COIMBATORE
FARJDABAD, GHAZIABAD, GUWAHATI, HYDERABAD, JAIPUR, KANPUR
LUCKNOW, PATNA, THIRUVANANTHAPURAM.