Professional Documents
Culture Documents
Arabic
Arabic
000
#=======================================================================
# FTP file name: ARABIC.TXT
#
# Contents:
Map (external version) from Mac OS Arabic
#
character set to Unicode 2.1
#
# Copyright:
(c) 1994-1999 by Apple Computer, Inc., all rights
#
reserved.
#
# Contact:
charsets@apple.com
#
# Changes:
#
#
b02 1999-Sep-22
Update contact e-mail address. Matches
#
internal utom<b1>, ufrm<b1>, and Text
#
Encoding Converter version 1.5.
#
n10 1998-Feb-05
Show required Unicode character
#
directionality in a different way. Matches
#
internal utom<n4>, ufrm<n21>, and Text
#
Encoding Converter version 1.3. Update
#
header comments; include information on
#
loose mapping of digits.
#
n07 1997-Jul-17
Update to match internal utom<n2>, ufrm<n17>:
#
Change standard mapping for 0xC0 from U+066D
#
to U+274A. Add direction overrides to
#
mappings for 0x25, 0x2C, 0x3B, 0x3F. Add
#
information on variants.
#
n03 1995-Apr-18
First version (after fixing some typos).
#
Matches internal ufrm<n11>.
#
# Standard header:
# ---------------#
# Apple, the Apple logo, and Macintosh are trademarks of Apple
# Computer, Inc., registered in the United States and other countries.
# Unicode is a trademark of Unicode Inc. For the sake of brevity,
# throughout this document, "Macintosh" can be used to refer to
# Macintosh computers and "Unicode" can be used to refer to the
# Unicode standard.
#
# Apple makes no warranty or representation, either express or
# implied, with respect to these tables, their quality, accuracy, or
# fitness for a particular purpose. In no event will Apple be liable
# for direct, indirect, special, incidental, or consequential damages
# resulting from any defect or inaccuracy in this document or the
# accompanying tables.
#
# These mapping tables and character lists are subject to change.
# The latest tables should be available from the following:
#
# <ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
# <ftp://dev.apple.com/devworld/Technical_Documentation/Misc._Standards/>
#
# For general information about Mac OS encodings and these mapping
# tables, see the file "README.TXT".
#
# Format:
# ------#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
0xAA -> <RL>+0x00D7 MULTIPLICATION SIGN, right-left
#
0xC0 -> <RL>+0x002A ASTERISK, right-left
#
# The Thuluth variant is used for the Arabic Postscript-only fonts:
# Thuluth and Thuluth bold. It differs from the standard variant in
# the following way:
#
0xAA -> <RL>+0x00D7 MULTIPLICATION SIGN, right-left
#
0xC0 -> 0x066D ARABIC FIVE POINTED STAR
#
# The AlBayan variant is used for the Arabic TrueType font Al Bayan.
# It differs from the standard variant in the following way:
#
0x81 -> no mapping (glyph just has authorship information, etc.)
#
0xA3 -> 0xFDFA ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM
#
0xA4 -> 0xFDF2 ARABIC LIGATURE ALLAH ISOLATED FORM
#
0xAA -> <RL>+0x00D7 MULTIPLICATION SIGN, right-left
#
0xDC -> <RL>+0x25CF BLACK CIRCLE, right-left
#
0xFC -> <RL>+0x25A0 BLACK SQUARE, right-left
#
# Unicode mapping issues and notes:
# --------------------------------#
# 1. Matching the direction of Mac OS Arabic characters
#
# When Mac OS Arabic encodes a character twice but with different
# direction attributes for the two code points - as in the case of
# plus sign mentioned above - we need a way to map both Mac OS Arabic
# code points to Unicode and back again without loss of information.
# With the plus sign, for example, mapping one of the Mac OS Arabic
# characters to a code in the Unicode corporate use zone is
# undesirable, since both of the plus sign characters are likely to
# be used in text that is interchanged.
#
# The problem is solved with the use of direction override characters
# and direction-dependent mappings. When mapping from Mac OS Arabic
# to Unicode, we use direction overrides as necessary to force the
# direction of the resulting Unicode characters.
#
# The required direction is indicated by a direction tag in the
# mappings. A tag of <LR> means the corresponding Unicode character
# must have a strong left-right context, and a tag of <RL> indicates
# a right-left context.
#
# For example, the mapping of 0x2B is given as <LR>+0x002B; the
# mapping of 0xAB is given as <RL>+0x002B. If we map an isolated
# instance of 0x2B to Unicode, it should be mapped as follows (LRO
# indicates LEFT-RIGHT OVERRIDE, PDF indicates POP DIRECTION
# FORMATTING):
#
#
0x2B -> 0x202D (LRO) + 0x002B (PLUS SIGN) + 0x202C (PDF)
#
# When mapping several characters in a row that require direction
# forcing, the overrides need only be used at the beginning and end.
# For example:
#
#
0x24 0x20 0x28 0x29 -> 0x202D 0x0024 0x0020 0x0028 0x0029 0x202C
#
# When mapping from Unicode to Mac OS Arabic, the Unicode
# bidirectional algorithm should be used to determine resolved
# direction of the Unicode characters. The mapping from Unicode to
# Mac OS Arabic can then be disambiguated by the use of the resolved
# direction:
#
#
Unicode 0x002B -> Mac OS Arabic 0x2B (if L) or 0xAB (if R)
#
# However, this also means the direction override characters should
# be discarded when mapping from Unicode to Mac OS Arabic (after
# they have been used to determine resolved direction), since the
# direction override information is carried by the code point itself.
#
# Even when direction overrides are not needed for roundtrip
# fidelity, they are sometimes used when mapping Mac OS Arabic
# characters to Unicode in order to achieve similar text layout with
# the resulting Unicode text. For example, the single Mac OS Arabic
# ellipsis character has direction class right-left,and there is no
# left-right version. However, the Unicode HORIZONTAL ELLIPSIS
# character has direction class neutral (which means it may end up
# with a resolved direction of left-right if surrounded by left-right
# characters). When mapping the Mac OS Arabic ellipsis to Unicode, it
# is surrounded with a direction override to help preserve proper
# text layout. The resolved direction is not needed or used when
# mapping the Unicode HORIZONTAL ELLIPSIS back to Mac OS Arabic.
#
# 2. Mapping the Mac OS Arabic digits
#
# The main table below contains mappings that should be used when
# strict round-trip fidelity is required. However, for numeric
# values, the mappings in that table will produce Unicode characters
# that may appear different than the Mac OS Arabic text displayed
# on a Mac OS system with Arabic support. This is because the Mac OS
# uses context-dependent display for the 0x30-0x39 digits.
#
# If roundtrip fidelity is not required, then the following
# alternate mappings should be used when a sequence of 0x30-0x39
# digits - possibly including 0x2C and 0x2E - occurs in an Arabic
# context (that is, when the first "strong" character on either side
# of the digit sequence is Arabic, or there is no strong character):
#
#
0x2C
0x066C # ARABIC THOUSANDS SEPARATOR
#
0x2E
0x066B # ARABIC DECIMAL SEPARATOR
#
0x30
0x0660 # ARABIC-INDIC DIGIT ZERO
#
0x31
0x0661 # ARABIC-INDIC DIGIT ONE
#
0x32
0x0662 # ARABIC-INDIC DIGIT TWO
#
0x33
0x0663 # ARABIC-INDIC DIGIT THREE
#
0x34
0x0664 # ARABIC-INDIC DIGIT FOUR
#
0x35
0x0665 # ARABIC-INDIC DIGIT FIVE
#
0x36
0x0666 # ARABIC-INDIC DIGIT SIX
#
0x37
0x0667 # ARABIC-INDIC DIGIT SEVEN
#
0x38
0x0668 # ARABIC-INDIC DIGIT EIGHT
#
0x39
0x0669 # ARABIC-INDIC DIGIT NINE
#
# Details of mapping changes in each version:
# ------------------------------------------#
# Changes from version n03 to version n07:
#
# - Change mapping for 0xC0 from U+066D to U+274A.
#
# - Add direction overrides (required directionality) to mappings
#
for 0x25, 0x2C, 0x3B, 0x3F.
#
##################
0x20
0x21
0x22
0x23
0x24
0x25
0x26
0x27
0x28
0x29
0x2A
0x2B
0x2C
0x2D
0x2E
0x2F
0x30
0x31
0x32
0x33
0x34
0x35
0x36
0x37
0x38
0x39
0x3A
0x3B
0x3C
0x3D
0x3E
0x3F
0x40
0x41
0x42
0x43
0x44
0x45
0x46
0x47
0x48
0x49
0x4A
0x4B
0x4C
0x4D
0x4E
0x4F
0x50
0x51
0x52
0x53
0x54
0x55
0x56
0x57
0x58
0x59
<LR>+0x0020
# SPACE, left-right
<LR>+0x0021
# EXCLAMATION MARK, left-right
<LR>+0x0022
# QUOTATION MARK, left-right
<LR>+0x0023
# NUMBER SIGN, left-right
<LR>+0x0024
# DOLLAR SIGN, left-right
<LR>+0x0025
# PERCENT SIGN, left-right
<LR>+0x0026
# AMPERSAND, left-right
<LR>+0x0027
# APOSTROPHE, left-right
<LR>+0x0028
# LEFT PARENTHESIS, left-right
<LR>+0x0029
# RIGHT PARENTHESIS, left-right
<LR>+0x002A
# ASTERISK, left-right
<LR>+0x002B
# PLUS SIGN, left-right
<LR>+0x002C
# COMMA, left-right
<LR>+0x002D
# HYPHEN-MINUS, left-right
<LR>+0x002E
# FULL STOP, left-right
<LR>+0x002F
# SOLIDUS, left-right
0x0030 # DIGIT ZERO
0x0031 # DIGIT ONE
0x0032 # DIGIT TWO
0x0033 # DIGIT THREE
0x0034 # DIGIT FOUR
0x0035 # DIGIT FIVE
0x0036 # DIGIT SIX
0x0037 # DIGIT SEVEN
0x0038 # DIGIT EIGHT
0x0039 # DIGIT NINE
<LR>+0x003A
# COLON, left-right
<LR>+0x003B
# SEMICOLON, left-right
<LR>+0x003C
# LESS-THAN SIGN, left-right
<LR>+0x003D
# EQUALS SIGN, left-right
<LR>+0x003E
# GREATER-THAN SIGN, left-right
<LR>+0x003F
# QUESTION MARK, left-right
0x0040 # COMMERCIAL AT
0x0041 # LATIN CAPITAL LETTER A
0x0042 # LATIN CAPITAL LETTER B
0x0043 # LATIN CAPITAL LETTER C
0x0044 # LATIN CAPITAL LETTER D
0x0045 # LATIN CAPITAL LETTER E
0x0046 # LATIN CAPITAL LETTER F
0x0047 # LATIN CAPITAL LETTER G
0x0048 # LATIN CAPITAL LETTER H
0x0049 # LATIN CAPITAL LETTER I
0x004A # LATIN CAPITAL LETTER J
0x004B # LATIN CAPITAL LETTER K
0x004C # LATIN CAPITAL LETTER L
0x004D # LATIN CAPITAL LETTER M
0x004E # LATIN CAPITAL LETTER N
0x004F # LATIN CAPITAL LETTER O
0x0050 # LATIN CAPITAL LETTER P
0x0051 # LATIN CAPITAL LETTER Q
0x0052 # LATIN CAPITAL LETTER R
0x0053 # LATIN CAPITAL LETTER S
0x0054 # LATIN CAPITAL LETTER T
0x0055 # LATIN CAPITAL LETTER U
0x0056 # LATIN CAPITAL LETTER V
0x0057 # LATIN CAPITAL LETTER W
0x0058 # LATIN CAPITAL LETTER X
0x0059 # LATIN CAPITAL LETTER Y
0x5A
0x5B
0x5C
0x5D
0x5E
0x5F
0x60
0x61
0x62
0x63
0x64
0x65
0x66
0x67
0x68
0x69
0x6A
0x6B
0x6C
0x6D
0x6E
0x6F
0x70
0x71
0x72
0x73
0x74
0x75
0x76
0x77
0x78
0x79
0x7A
0x7B
0x7C
0x7D
0x7E
#
0x80
0x81
0x82
0x83
0x84
0x85
0x86
0x87
0x88
0x89
0x8A
0x8B
0x8C
0x8D
0x8E
0x8F
0x90
0x91
0x92
0x93
0x94
0x95
0x005A # LATIN
<LR>+0x005B
<LR>+0x005C
<LR>+0x005D
<LR>+0x005E
<LR>+0x005F
0x0060 # GRAVE
0x0061 # LATIN
0x0062 # LATIN
0x0063 # LATIN
0x0064 # LATIN
0x0065 # LATIN
0x0066 # LATIN
0x0067 # LATIN
0x0068 # LATIN
0x0069 # LATIN
0x006A # LATIN
0x006B # LATIN
0x006C # LATIN
0x006D # LATIN
0x006E # LATIN
0x006F # LATIN
0x0070 # LATIN
0x0071 # LATIN
0x0072 # LATIN
0x0073 # LATIN
0x0074 # LATIN
0x0075 # LATIN
0x0076 # LATIN
0x0077 # LATIN
0x0078 # LATIN
0x0079 # LATIN
0x007A # LATIN
<LR>+0x007B
<LR>+0x007C
<LR>+0x007D
0x007E # TILDE
CAPITAL LETTER Z
# LEFT SQUARE BRACKET, left-right
# REVERSE SOLIDUS, left-right
# RIGHT SQUARE BRACKET, left-right
# CIRCUMFLEX ACCENT, left-right
# LOW LINE, left-right
ACCENT
SMALL LETTER A
SMALL LETTER B
SMALL LETTER C
SMALL LETTER D
SMALL LETTER E
SMALL LETTER F
SMALL LETTER G
SMALL LETTER H
SMALL LETTER I
SMALL LETTER J
SMALL LETTER K
SMALL LETTER L
SMALL LETTER M
SMALL LETTER N
SMALL LETTER O
SMALL LETTER P
SMALL LETTER Q
SMALL LETTER R
SMALL LETTER S
SMALL LETTER T
SMALL LETTER U
SMALL LETTER V
SMALL LETTER W
SMALL LETTER X
SMALL LETTER Y
SMALL LETTER Z
# LEFT CURLY BRACKET, left-right
# VERTICAL LINE, left-right
# RIGHT CURLY BRACKET, left-right
0x96
0x97
0x98
0x99
0x9A
0x9B
0x9C
0x9D
0x9E
0x9F
0xA0
0xA1
0xA2
0xA3
0xA4
0xA5
0xA6
0xA7
0xA8
0xA9
0xAA
0xAB
0xAC
0xAD
0xAE
0xAF
0xB0
0xB1
0xB2
0xB3
0xB4
0xB5
0xB6
0xB7
0xB8
0xB9
0xBA
0xBB
0xBC
0xBD
0xBE
0xBF
0xC0
0xC1
0xC2
0xC3
0xC4
0xC5
0xC6
0xC7
0xC8
0xC9
0xCA
0xCB
0xCC
0xCD
0xCE
0xCF
0xD0
0xD1
0xD2
0xD3
0xD4
0xD5
0xD6
0xD7
0xD8
0xD9
0xDA
0xDB
0xDC
0xDD
0xDE
0xDF
0xE0
0xE1
0xE2
0xE3
0xE4
0xE5
0xE6
0xE7
0xE8
0xE9
0xEA
0xEB
0xEC
0xED
0xEE
0xEF
0xF0
0xF1
0xF2
0xF3
0xF4
0xF5
0xF6
0xF7
0xF8
0xF9
0xFA
0xFB
0xFC
0xFD
0xFE
0xFF