Myanmar Unicode Myths vs. Truths: Ngwe Tun CEO, Solveware Solution YMCA, 2008-02-12

MYANMAR UNICODE
MYTHS VS. TRUTHS

Ngwe Tun
CEO, Solveware Solution YMCA, 2008-02-12
1
Etymology: Myanmar Unicode
 Myanmar
> The Myanmar script is used to write Burmese, the majority
language of Myanmar. Variations and extensions of the script are
used to write other languages of the region, such as Shan and Mon,
Karen as well as Pali and Sanskrit.
Ref; http://www.unicode.org/versions/Unicode5.0.0/ch11.pdf
> The Myanmar writing system derives from a Brahmi-related
script borrowed from South India in about the eighth century to
write the Mon language.
> The basic consonants, independent vowels, and dependent
vowel signs required for writing the Myanmar language are encoded
at the beginning of the Myanmar range (U1000~U109F).
2
Myanmar Unicode 5.0.0
3
Unicode 5.1, What News ?
4
 Unicode
> The Unicode Standard is the universal character encoding
standard for written characters and text. It defines a consistent way
of encoding multilingual text that enables the exchange of text data
internationally and creates the foundation for global software.
> It provides the capacity to encode all characters used for the
written languages of the world—more than 1 million characters can
be encoded.
> The Unicode character encoding treats alphabetic characters,
ideographic characters, and symbols equivalently, which means they
can be used in any mixture and with equal facility.
5
 Unicode
> The Unicode Standard specifies a numeric value (code point)
and a name for each of its characters.
> The Unicode Standard defines these and other semantic values,
and it includes application data such as case mapping tables (a, A)
and character property tables as part of the Unicode Character
Database (UCD). Character properties define a character’s identity
and behavior; they ensure consistency in the processing and
interchange of Unicode data.
> Unicode characters are represented in one of three encoding
forms: a 32-bit form (UTF-32), a 16-bit form (UTF-16), and an 8-bit
form (UTF-8). The 8-bit, byte-oriented form, UTF-8, has been
designed for ease of use with existing ASCII-based systems.
6
What’s Unicode Design Goal?
 The primary goal of the development effort for the Unicode

Standard was to remedy two serious problems common to most
multilingual computer programs.
 The first problem was the overloading of the font mechanism when
encoding characters. Fonts have often been indiscriminately
mapped to the same set of bytes. For example, the bytes 0x00 to
0xFF are often used for both characters and dingbats.
 The second major problem was the use of multiple, inconsistent
character codes because of conflicting national and industry
character standards. In Western European software environments,
for example, one often finds confusion between the Windows Latin 1
code page 1252 and ISO/IEC 8859-1.
7
What is Unicode
 Unicode provides a unique number for every

character,
no matter what the platform,
no matter what the program,
no matter what the language.
 Adopted by industry
 Required by modern standards
 Official implementation of ISO/IEC 10646
 Standard maintained by the Unicode Consortium:
http://www.unicode.org
 Current version: 5.0
 Version 5.1 due for release later this year. March 2008
8
Why is Unicode needed?
 Enables computer systems to support virtually all

the world’s written languages.
 De facto requirement for multilingual applications.
Using Unicode Using CP 1252

9
Character: an Abstraction
 ှ မဿာ ပုဳစဳကွဲ ၄ ခု ရဿိပ၂တယံ။ (different glyph)

မ၁၊ ည၀ိ ၊ နှာ၊ လျှာ
 ူ မဿာ ပုဳစဳကွဲ ္ခု ရဿိပ၂တယံ။ (different glyph)
ကူ၊ စက္ကူ
 ဳ ၊ ါ ကို ချွငံဵချကံ အ ြဖစံ သီဵ ြခာဵ တစံေနရာစီ
သတံမဿတံပ၂သညံ။ (different character)
 ဿ နဿငံဴ ္သ ကို ခွဲြခာဵနိုငံရနံ သီဵ ြခာဵ တစံေနရာစီ
သတံမဿတံပ၂သညံ။ (different character)
10
Characters ≠ Glyphs
 ဋ္ဌ၊ ဏ္ဍ နဿငံဴ ပ၂ဌံဆငံဴမျာဵ အတွကံ Code point

တစံခုမသတံမဿတံေပဵပ၂။
 ၊က၊ ြခ - ဗျညံဵတွဲ ပုဳစဳကွမ
ဲ ျာဵ အတွကံ Code
point တစံခု မသတံမဿတံေပဵပ၂။
 သေကတ
် - ကငံဵစီဵ အတွကံ Code point
တစံခုမသတံမဿတံေပဵပ၂။
11
Unicode Design Principals
1. Universal repertoire
2. Processing efficiency
3. Characters, not glyphs
4. Semantics (properties)
5. Plain text
6. Logical order in memory
7. Unification (within scripts across languages)
8. Dynamic composition
9. Equivalent Sequences (compatibility with current standards)
10. Convertibility
12
Life of a Character
Caps
Lock
A S D F GG H J K L :
;
"
'
Enter
Shift
Shift
Memory
GG
Keyboard
Driver
G Layout
Renderer G
0047
CPU Font
File
Font
Font
13
Memory
1000 103B 1031 Rendering
က ျ ေ a complex script
Display
Order ေကျ
Glyph
Selection
ေကျ &
Positioning
Positional
Font Shapes
and Ligatures
14
かきく…
Syllabic
ถไณ… 漢字…
Alphabetic Ideographic
Scripts
!,:;…
Shared
Α Β Γ…
α β γ…
…‫ا ب ت‬
√ ≠…
Alphabetic
Alphabetic Right-to-left
Bi-cameral
Symbols
15
Letter → Codes → Glyphs
ĝ ĝ ĝ
ĝ 011D
Coded Character ĝ ĝ
Letter
g $̂
0067 0302
ĝ ĝ
Glyphs
Coded Character Sequence
16
What is the Standards
+ Unicode Standard Annexes
UAX UAX UAX UAX UAX UAX

#9 #11 #14 #15 #24 #29 . . .
Data Data Data ...

+ Unicode Character Database
UMV UMV Minor Versions

* when and if they exist
4.1* 4.2* 17
Standards Annexes
 UAX #9 The Bidirectional Algorithm

 UAX #11 East Asian Width
 UAX #14 Line Breaking Properties
 UAX #15 Unicode Normalization Forms
 UAX #24 Script Names
 UAX #29 Text Boundaries
18
Conclusion…
 Unicode is a universal character set

 Unicode encodes characters, not glyphs
 Unicode includes character properties and
implementation specifications
 Unicode increasing adoption enables
multilingual applications
19
Myth #1
(based on Unicode Std)
 …ဆိုတာ ယူနီကုတံအဖွဲဲအစညံဵက သတံမဿတံထာဵတဲဴ ြမနံမာစာနဿငံဴ
ပတံသကံတဲဴ စဳသတံမဿတံချကံ ေတွကိုအေ ြခခဳ ြပီဵ ကွနံပျူတာမဿာ ြမနံမာစာကို
သုဳဵလိုေရေအာငံ လုပံထာဵတဲဴစနစံ ြဖစံပ၂တယံ။…
 အမဿနံ
ယူနီကုဒံစဳသတံမဿတံချကံမျာဵကို တစံေသွမတိမံဵလိုကံနာလုပံေဆာငံရာတွငံ
သတံမဿတံထာဵေသာ Code points သာမက အ ြခာဵ စဳသတံမဿတံချကံမျာဵကိုပ၂
လိုကံနာရနံ လိုအပံပ၂သညံ။
20
Myth #2
(Code Page Or Code Block)
 ဘာသာစကာဵတစံခုချငံဵစီအတွကံ Code page သတံမဿတံေပဵထာဵပ၂တယံ။
ြမနံမာဘာသာစကာဵအတွကံ Code page ကို 1000 ကေန 109F အထိ
သတံမဿတံေပဵထာဵပ၂တယံ။
 အမဿနံ
Wiki defined as “Code page is the traditional IBM term used for a specific
character encoding table: a mapping in which a sequence of bits, usually a single
octet representing integer values 0 through 255, is associated with a specific
character. IBM and Microsoft often allocate a code page number to a character
set even if that charset is better known by another name.”
It might be Code Block for each Script, not for each Language.
21
Myth #3
(just only drag and drop)
 ဘယံလို OS မျိုဵမဿာ မဆို font file ကို drag and drop လုပံရုဳနဲေ
အသုဳဵ ြပုလိုရ
ေ ပ၂တယံ။
 Bravo ! 
 အမဿနံ
ြမနံမာစာကို အသုဳဵ ြပုရနံ အတွကံ Rendering Engine ကို

သကံဆိုငံရာ OS တွငံ ထညံဴသွငံဵထာဵရနံ မရဿိမ ြဖစံလိုအပံသညံ။
22
Myth #4
(word break or syllable break)
 input method မဿာ စာလုဳဵေတွကို သူေဘာသာသူ မဿတံမိေစနိုငံေအာငံ
လိုအပံတဲဴ word break ကို တစံပ၂တညံဵထညံဴေပဵေစပ၂တယံ။ ြမနံမာစာမဿ
မဟုတံပ၂ဘူဵ။ ဘယံဘာသာစကာဵမဆို word break လိုအပံပ၂တယံ။
 အမဿနံ
 UTN #11 expressed that “From this we can say that a syllable break
may occur before a Myanmar digit, an independent vowel, one of the
various signs or a base consonant so long as the consonant:”
 is not devowelised with an asat and has no stacked consonant below it and
 is not a kinzi.
 သူ အလွနံြကိုဵစာဵသ ြဖငံ ကနံထ

 ရိုကံတာ ြဖစံသွာဵသညံ။ (it’s breaking opportunities in syllable)
23
Myth #5
(vowel sign in Myanmar)
 သရအတွကံ သီဵသနံေ အက္ခရာမရဿိပ၂ဘူဵ။ ဒ၂ေ၊ကာငံဴ ဩ၊ ဪ၊
ဥ၊ ဦ ကို သရယူနီကုဒံ တနံဖိုဵ ေပဵထာဵတာ အရမံဵကို
အဳဴအာဵသငံဴမိတယံ။
 အမဿနံ
 ဆရာ ေမာငံခငံမငံ (ဓနု ြဖူ)၏ ြမနံမာစကာဵ၊ ြမနံမာစာ ရုပံပုဳလှာ
စာမျကံနဿာ း္း တွငံ၊ “သရသေကတကိ
် ု သရသကံသကံ
နဿငံဴ ဗျညံဵတွဲေသာ သရဟူ၍နဿစံမျိုဵခွဲနိုငံသညံ။”
24
Myth #6
(fake, partial, pseudo Unicode)
 ယူနီကုဒံ စဳသတံမဿတံချကံမျာဵကို ြပငံဆငံထညံဴသွငံဴ ြခငံဵ။
 ယူနီကုဒံ Code point အသစံမျာဵထညံဴသွငံဵ ြခငံဵ။
 သီဵ ြခာဵ စဳပုဳစဳမျာဵ ထညံဴသွငံဵရ ြခငံဵ။
 အမဿနံ
ယူနီကုဒံကို စဳသတံမဿတံချကံမျာဵ ြပုလုပံရ ြခငံဵ၏ အဓိကရညံရွယံချကံမဿာ
း) အချကံအလကံမျာဵ ဖလဿယံရာတွငံ လွယံကူမဿနံကနံေစရနံ ြဖစံသညံ။
္) သုဳဵစွဲသူ ္ ဘကံလုဳဵတွငံ ပုဳစဳတူ စနစံမရဿိေသာံလညံဵ ဖတံရ၁နိုငံသညံဴ
စနစံမျိုဵ ြဖစံမဿသာ ယူနီကုဒံကို သုဳဵစွဲရေသာ အကျိုဵေကျဵဇူဵကို ရရဿိနိုငံပ၂မညံ။
25
Myth #7
(Windows enabled Myanmar 100%)
 ြမနံမာယူနီကုဒံကို MS Windows တွငံ း့့ ရာခိုငံန၁နံဵ အသုဳဵချနိုငံသညံ။
 Windows တွငံ ြမနံမာစကာဵလုဳဵမျာဵ ရဿာနိုငံသညံ။ အက္ခရာစဉံနိုငံသညံ။
 ကွနံပျူတာကို ြမနံမာလိုသုဳဵနိုငံေတာဴမညံ။
 အမဿနံ
ြမနံမာစာတွငံ ထပံတိုဵေသာ ယူနီကုဒံ စဳသတံမဿတံချကံမျာဵ ့့္၈
မတံလတွငံ အတညံ ြပုပ၂မညံ။ ယခုအချိနံထိ ေဆွဵေနွဵဆဲ ြဖစံပ၂သညံ။
Windows တွငံ ရိုကံေသာစာမျာဵ ေပ၃ရုဳသာရဿိပ၂သညံ။ အ ြခာဵေသာ ဘာသာ
စကာဵမျာဵတွငံ ရရဿိေသာ အဆငံဴ ြဖငံဴ နိ၁ငံဵယဿဉံပ၂က ၅့ ရာခိုငံန၁နံဵမျှသာ
ြဖစံပ၂သညံ။
26
Myth #8
(vowel should not place after consonants)
 သေဝထိုဵကို ဗျညံဵေနာကံပိုေထာဵတဲဴကိစ္စ၊ ...
သေဝထိုဵက ဗျညံဵေရဿဲေရာကံေနရငံ စာလုဳဵစီတာ၊ စဉံတာ၊ ြဖတံတာေတွမဿာ
ေခ၂ငံဵေတာံေတာံစာဵလိမံဴမယံ။
 အမဿနံ
ြမနံမာစာတွငံ စာလုဳဵအစဉံမျာဵ သတံမဿတံချကံသညံ အလွနံအေရဵြကီဵေသာ
အချကံ ြဖစံသညံ။ ဗျညံဵ၊ ဗျညံဵတွဲ၊ သရ၊ ဆိုေသာ အစဉံသညံ
စကာဵလုဳဵဖွဲဲစညံဵပုဳအရ သတံမဿတံချကံ ြဖစံပ၂သညံ။
ဥပမာ - လိေမမောံ၊ အိေဒနြေ
27
Myth #9
(* Unicode Font)
 ယူနီကုဒံ စာလုဳဵကို ြမနံမာပညာရဿငံ ၅ ဦဵ ့့္၅ ခုနစံမဿာ
တီထွငံခဲဴ။
 အမဿနံ
ယူနီကုဒံမဿ ချမဿတံထာဵေသာ စဳသတံမဿတံချကံမျာဵကို
လိုကံနာမ၁မရဿိဘဲ သာမာနံ စာလုဳဵပုဳစဳ ေနရာချမ၁မျာဵကို ေ ြပာငံဵလဲ
သတံမဿတံ ြခငံဵကို တီထွငံသညံဟု မဆိုအပံေပ။
28
Myth #10
(Government forced to use unicode fonts)
 သုဳဵလိုေမရတဲဴ Standards ကို နိုငံငဳေတာံက ဇွတံသုဳဵခိုငံဵတယံ။
 အမဿနံ
စဳသတံမဿတံချကံမျာဵကို ချမဿတံ၍ စမံဵသပံသုဳဵစွဲရနံအတွကံ
သကံဆိုငံရာမဿ မူဝ၂ဒ ချမဿတံ ြခငံဵသညံ နိုငံငဳတကာတွငံ
လုပံေဆာငံသညံဴ ပုဳစဳ ြဖစံသညံ။
29
More Myths there. Send Us.
 ngwetun@solvewaresolution.net
 www.parabaik.info
30

Myanmar Unicode Myths vs. Truths: Ngwe Tun CEO, Solveware Solution YMCA, 2008-02-12

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Myanmar Unicode Myths vs. Truths: Ngwe Tun CEO, Solveware Solution YMCA, 2008-02-12

Uploaded by

Copyright:

Available Formats

MYANMAR UNICODE

MYTHS VS. TRUTHS

 The primary goal of the development effort for the Unicode

 Unicode provides a unique number for every

 Enables computer systems to support virtually all

Using Unicode Using CP 1252

 ှ မဿာ ပုဳစဳကွဲ ၄ ခု ရဿိပ၂တယံ။ (different glyph)

 ဋ္ဌ၊ ဏ္ဍ နဿငံဴ ပ၂ဌံဆငံဴမျာဵ အတွကံ Code point

UAX UAX UAX UAX UAX UAX

Data Data Data ...

UMV UMV Minor Versions

 UAX #9 The Bidirectional Algorithm

 Unicode is a universal character set

ြမနံမာစာကို အသုဳဵ ြပုရနံ အတွကံ Rendering Engine ကို

 သူ အလွနံြကိုဵစာဵသ ြဖငံ ကနံထ

You might also like

Myanmar Unicode Myths vs. Truths: Ngwe Tun CEO, Solveware Solution YMCA, 2008-02-12

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Myanmar Unicode Myths vs. Truths: Ngwe Tun CEO, Solveware Solution YMCA, 2008-02-12

Uploaded by

Copyright:

Available Formats

MYANMAR UNICODE

MYTHS VS. TRUTHS

 The primary goal of the development effort for the Unicode

 Unicode provides a unique number for every

 Enables computer systems to support virtually all

Using Unicode Using CP 1252

 ှ မဿာ ပုဳစဳကွဲ ၄ ခု ရဿိပ၂တယံ။ (different glyph)

 ဋ္ဌ၊ ဏ္ဍ နဿငံဴ ပ၂ဌံဆငံဴမျာဵ အတွကံ Code point

UAX UAX UAX UAX UAX UAX

Data Data Data ...

UMV UMV Minor Versions

 UAX #9 The Bidirectional Algorithm

 Unicode is a universal character set

ြမနံမာစာကို အသုဳဵ ြပုရနံ ​အတွကံ Rendering Engine ကို

 သူ အလွနံြကိုဵစာဵသ ြဖငံ ကနံထ

You might also like

ြမနံမာစာကို အသုဳဵ ြပုရနံ အတွကံ Rendering Engine ကို