You are on page 1of 112

Te X

E
T
E
X Companion
T
E
X meets OpenTvpe and Unicode
Edited bv Michel Goossens (CERN)
Vork in progress. Version Ianuary 11,2010
Please send your comments to michel.goossens@cern.ch
Michel Goossens (editoi) and the vaiious contiibutois (see next page).
Te copviight of the contiibutions extiacted fiom documentation of the vaiious packages (see below
foi details) iemains with theii iespective authois. Te cuiient maintainei of this document is Michel
Goossens.
Work history
Ianuary 2008 Initial veision (fiom LGC2 supplementaiv mateiial).
Spring 2008 Adapted mateiial fiom Ionathan Kews X
E
T
E
X manual and Will Robeitsons fontspec
manual.
Ianuary 2009 Adapted mateiial fiom Fianois Chaiettes arabxetex manual and Dian Yins
zhspacing manual.
Iuly 2009 Added mateiial contiibuted bv Vafa Khalighi desciibing his bidi package.
August 2009 Added mateiial about xecjk plus intioduced coiiections and claiications suggested
bv Leo Feiies and Kaiel Pka.
Ianuary 2010 Added lots of coiiections and a few suggestions foi claiications bv Tavloi Venable.
Contents
List of Figures vii
List of Tables ix
Preface xi
1 PostScript fonts and beyond 1
1.1 Font formats: a brief history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Adobe and its PostScript Type 1 . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 TrueType fonts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Two competing technologies. . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.4 The best of two worlds: OpenType . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 PostScript Type 1 and TrueType: two dierent approaches . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Unicode: the universal character encoding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 OpenType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.1 OpenType tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.2 OpenType features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4.3 OpenType support today . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.4 Interrogating OpenType fonts . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 X
E
T
E
X: T
E
X meets OpenType and Unicode 19
2.1 X
E
T
E
X: a historical introduction and some basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.1 A brief history. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.1.2 X
E
T
E
X: basic principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2 X
E
T
E
X: typesetting with glyphs, characters and fonts . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.1 Accessing font with fontcong . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.2 Specifying character codes. . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.3 Hyphenation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.4 Font management: the basics . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.5 Font mappings using TECkit . . . . . . . . . . . . . . . . . . . . . . . . . . 28
CONTENTS
2.2.6 Line breaks and justication . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.7 Unicode Character/glyph model . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2.8 Using OpenType via ICU Layout. . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2.9 X
E
T
E
Xs hyphenation support . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2.10 Running xetex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3 Supplementary commands introduced by X
E
T
E
X. . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.1 Specifying languages and scripts . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3.2 Specifying optional features . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3.3 Support for pseudo-features . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.3.4 Commands extracting information from OpenType fonts . . . . . . . . . . . . . 36
2.3.5 Maths fonts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.3.6 Encodings, linebreaking, etc. . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.3.7 Graphics and pdfT
E
X-related commands . . . . . . . . . . . . . . . . . . . . . 42
2.4 fontspec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.4.1 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.4.2 Latin Modern defaults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.4.3 Maths ddling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.4.4 A rst overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.4.5 Font selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.4.6 Default font families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5 X
E
T
E
X and other engines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3 Handling all those scripts 49
3.1 Writing systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.1.1 Basic terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.1.2 History of writing systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.1.3 Types of writing systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.1.4 Language Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.1.5 Freely available Unicode encoded fonts . . . . . . . . . . . . . . . . . . . . . 54
3.1.6 Directionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.1.7 Writing systems on computers . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2 Bidirectional typesetting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2.1 Using The bidi Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2.2 Basic Direction Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.2.3 Typesetting Short RTL and LTR texts. . . . . . . . . . . . . . . . . . . . . . . 57
3.2.4 Multicolumn Typesetting . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2.5 More peculiarities for RTL typesetting . . . . . . . . . . . . . . . . . . . . . . 58
3.2.6 Tabular material in RTL mode. . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.3 Languages using the Arabic alphabet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.3.1 ArabT
E
X: Arabic typography with T
E
X . . . . . . . . . . . . . . . . . . . . . . 62
3.3.2 ArabX
E
T
E
X: Arabic typography with X
E
T
E
X. . . . . . . . . . . . . . . . . . . . . 64
3.3.3 Arabic presentation forms . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.4 Typesetting Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.4.1 The xeCJK Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.4.2 The zhspacing package . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.5 Examples of the use of Unicode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.5.1 Unicode fonts and editors . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.5.2 Examples of Unicode texts . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
iv
ch-fiont.tex,v: 2.02 2010/01/10
Contents
4 Unicode mathematics 91
4.1 Unicode for handling math across platforms and applications . . . . . . . . . . . . . . . . . . . 91
4.2 X
E
T
E
X handling mathematics fonts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Index of Commands and Concepts 95
People 100
ch-fiont.tex,v: 2.02 2010/01/10
v
List of Figures
1.1 Using OpenTvpes advanced tvpogiaphic featuies in Adobe InDesign . . . . . . . . . . . . . . 13
1.2 Opentvpe Unicode suppoit in OpenOce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3 Miciosofs Fonts Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1 Complexities when dealing with vaiious languages . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Sciipts used in vaiious paits of the woild . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Asian sciipts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 List of featuies foi the sciipts and languages suppoited bv the Miciosof Aiial and
Adobe Minion fonts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.1 Wiiting svstems used in the woild todav . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2 Examples of six Aiabic calligiaphic stvles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
List of Tables
2.1 Mathematics svmbol tvpes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.1 Indic consonantvowel combinations in vaiious Indic abugidas . . . . . . . . . . . . . . . . . 54
3.2 AiabT
E
Xs input conventions foi Aiabic and Peisian . . . . . . . . . . . . . . . . . . . . . . . . 63
3.3 All arabxetex input conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Preface
Tis fiee booklet desciibes X
E
T
E
X and its X
E
L
A
T
E
X vaiiant. Afei an intioduction to the OpenTvpe and
Unicode technologies, it desciibes howX
E
T
E
Xextends the T
E
Xengine to optimallv use OpenTvpe fonts
diiectlv and allow vou to handle Unicode-encoded souices.
Vaiious L
A
T
E
X packages have been developed iecentlv to take advantage of X
E
T
E
Xs new function-
alities, and those aie desciibed next.
Tis compilation of tools has been wiitten in close collaboiation with the authois: Ionathan Kew
(X
E
T
E
X development), Will Robeitson fontspec and unicode-math), Fianois Chaiette (arabxetex), and
Dian Yin (zhspacing). Coiiections and feedback has also been ieceived fiom Adam Buchbindei, Leo
Feiies, Rik Kabel, and Kaiel Pka.
Comments aie welcome and can be addiessed to michel.goossens@cern.ch.
Michel Goossens
Ianuary 2010
C H A P T E R 1
PostScript fonts and beyond
1.1 Font formats: a brief history. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 PostScript Type 1 and TrueType: two dierent approaches . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Unicode: the universal character encoding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 OpenType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
In this chaptei we look at the most basic tvpe of giaphical object in documents: the chaiacteis that
foim the woids. Chaiactei shapes (glvphs) aie not a diiect pait of the T
E
X svstem; all T
E
X wants to
know about them is some metiic infoimation, such as theii width oi height. It is the task of the post-
piocessing stage (the backend of pdfT
E
X oi a device diivei, such as dvips which ieads the .dvi le as
output bv T
E
X) to pioduce the actual giaphical iepiesentation of the page. Foi this stage infoimation
about the actual shapes of the chaiacteis is needed and this infoimation is stoied in so-called fonts
(collections of chaiacteis) foi which manv dieient stoiage foimats exist. Tus in piinciple anv exist-
ing font can be used with T
E
X piovided that the metiic infoimation T
E
X needs is available oi can be
geneiated and that a pioceduie exists that undeistands the foimat in which the fonts aie stoied and
can inseit it into the output le.
Donald Knuth developed a companion piogiam to T
E
X, MetaFont, foi geneiating fonts to be used
with T
E
X (Chaptei 3 of Te LaTeX Graphics Companion looked biiev at MetaFonts diawing capabil-
ities). Foi quite some time onlv fonts designed with MetaFont weie available to T
E
X useis, with the
iesult that T
E
X oi L
A
T
E
X documents had an easilv identied look and feelmainlv a iesult of the use of
the Computei Modein fonts. Given that the T
E
X communitv is veiv small compaied to that of othei
tvpesetting svstems veiv few font designeis have pioduced fonts in MetaFont. Teiefoie, access foi
T
E
X engines to the liteiallv thousands of fonts available commeiciallv in othei foimats, in paiticulai
PostSciipt, TiueTvpe, and, moie iecentlv, OpenTvpe, has become a must.
Although at the beginning it was quite dicult to integiate PostSciipt fonts into L
A
T
E
X packages,
the ielease of L
A
T
E
X2

and its new font selection scheme (NFSS, see Chaptei 7 of [5]) made accessing
the laige set of PostSciipt fonts moie stiaightfoiwaid. Nowadavs, documents ioutinelv combine T
E
Xs
supeiioi tvpesetting qualitv with all the piofessionallv designed tvpefaces pioduced, mainlv in Post-
Sciipt, but also in TiueTvpe and OpenTvpe. Te cuiient chaptei will intioduce vou to solutions to
achieve this in a convenient wav.
Afei a histoiic oveiview of modein font technologies, including a biief desciiption of theii ie-
spective technical capabilities, we take a closei look at the basic issues conceined with tvpesetting and
how T
E
X and PostSciipt, woiking togethei, addiess this pioblem (how metiic infoimation is handled,
the dieient tvpes of T
E
X and PostSciipt fonts, how thev aie encoded, i.e., how one can access individ-
1 POSTSCRIPT FONTS AND BEYOND
ual chaiacteis of a font,etc.) We then explain how vou can use the basic PostSciipt fonts, as thev aie
dened in the PSNFSS svstem (a collection of small packages and accompanving les foi L
A
T
E
X), which
makes it easv to use a laige numbei of common PostSciipt fonts out of the box) and howto easilv down-
load and install a fewinstances of fieelv available fonts. We extend the discussion to wheie to download
and install the L
A
T
E
X suppoit les foi commeiciallv available fonts that vou might have bought. Since
manv L
A
T
E
X useis have de facto access to a lot of TiueTvpe fonts that come with theii opeiating svstem,
we devote the next section to the use of TiueTvpe fonts with pdatex, in paiticulai how one can use
a laige Unicode TiueTvpe font foi tvpesetting in manv dieient sciipts and languages. We aie then
ieadv to discuss a few iecent L
A
T
E
X packages which take advantage of the eniiched possibilities of the
OpenTvpe technologv. We end the chaptei with a discussion of Fontname, also know as the Beiiv
font naming scheme, which is impoitant to uniquelv identifv and handle all L
A
T
E
X suppoit les of the
laige numbei of fonts that aie available on cuiient opeiating svstem.
1.1 Font formats: a brief history
Te cuiient main font foimats aie PostSciipt Tvpe 1 (Tvpe 1), TiueTvpe (TT), and OpenTvpe (OT), an
integiated supeiset of the ist two. All thiee aie based on font outline technologies, aie multi-platfoim,
and have theii technical specications openlv available. Tese foimats can be iun on anv iecent com-
putei platfoim and theii chaiactei outlines (glvphs) aie desciibed mathematicallv as functions op-
eiating on points, lines and cuives. Te chaiactei iepiesentations aie iesolution independent and can
be scaled to anv size. Tese technologies implement hinting bv associating additional infoimation
with each chaiactei to help the iasteiization engine optimize theii iepiesentation on anv given output
device.
1.1.1 Adobe and its PostScript Type 1
When Adobe launched PostSciipt in 1984, it suppoited two dieient tvpes of fonts foimats: Tvpe 1,'
the moie sophisticated one with suppoit foi hinting and data compiession, and Tvpe 3, a moie geneial
(almost all PostSciipt giaphics opeiatois aie allowed) but less optimized vaiiant. At ist Adobe did not
publish the specication of its PostSciipt Tvpe 1 foimat (the Tvpe 3 spec was public), which helped
Adobe take a laige pait of the commeicial tvpogiaphv maiket but upset the othei font foundiies.
Apple, which also was founded in the eailv nineteen eighties, adopted PostSciipt as page desciip-
tion language foi its Apple LaseiWiitei piintei in 1985. Soon also othei high-end image setting ma-
chines adopted PostSciipt as theii native language. At about the same time the intioduction of af-
foidable desktop publishing sofwaie, such as Pagemaker, Freehand, set o a ievolution in page lavout
technologv, and PostSciipt backends appeaied foi most giaphics piogiams, thus adding to the poten-
tial maiket foi piofessional PostSciipt Tvpe 1 fonts. Because of its ieliabilitv, its wide selection of fonts
available, its clevei iasteiizing engine and supeiioi hinting mechanism, histoiicallv PostSciipt has been
the piefeiied font foimat of piofessional designeis, publisheis and piintshops.
Concuiientlv Adobe had developed aninteiactive veisionof PostSciipt, called Display PostScript,
that ian (somewhat slowlv) on peisonal computeis to allow displaving PostSciipt data on-scieen. Al-
though some computei manufactuieis agieed to take out (and pav) sofwaie licences, Apple and Mi-
ciosof weie quite unwilling to pav the iovalties iequested bv Adobe and, moieovei, to hand contiol to
Adobe ovei a vital pait of theii opeiating svstem.
In the ist pait of the 1990s Adobe also developed the PostSciipt Tvpe 1 multiple mastei (MM)
foimat as an extension of PostSciipt Tvpe 1. Essentiallv, it allows two (oi moie) design vaiiations to be
encoded on a given design axis (such as weight, width, optical size). Afeiwaids, anv in-between state
'See http://partners.adobe.com/public/developer/en/font/T1_SPEC.PDF.
2
xetex-opentvpe.tex,v: 2.01 2009/06/15
1.1 Font formats: a brief history
(instance) mav be geneiated bv the usei as iequiied.'
1.1.2 TrueType fonts
Te majoi svstem sofwaie vendois (Apple, Miciosof, IBM) had been thinking about scalable font
technologv suppoit at the level of theii iespective opeiating svstems since thev iealized that it would
guaiantee muchbettei scieendisplav, compaiedto pie-geneiatedbitmaps whichonlv look goodat theii
designsizes, andunacceptablv jaggedat all otheis. Foi instance inthe late 1980s Apple haddevelopedan
in-house scalable font technologv, Royal, latei ienamed to TiueTvpe. Te TiueTvpe specication was
public and alieadv in 1991 native TiueTvpe suppoit appeaied in Apples Mac Svstem 7 and Miciosofs
Windows 3.1.
TiueTvpe fonts use a dieient outline model fiom PostSciipt, and also the appioach to hinting
is dieient. Te font instances contain both scieen and piintei font data in a single component. Tis
makes the fonts easv to install. Although TiueTvpe fonts suppoit Unicode and can theoieticallv contain
ovei 65.000 chaiacteis, thev iaielv featuie moie that some 220 chaiacteis. Moieovei, TiueTvpe font
foimats aie platfoim-dependent.
1.1.3 Two competing technologies
Adobe ieacted to the advent of TiueTvpe bv publishing in 1990 the PostSciipt Tvpe 1 font foimat
specication [1]. A fewveais latei, it intioduced the Adobe Type Manager (ATM) sofwaie, which scales
PostSciipt Tvpe 1 fonts foi scieen displav, and suppoits imaging on non-PostSciipt piinteis.
Tus bv the end of the 1990s theie weie two widelv-used outline font specications, TiueTvpe,
built into the opeiating svstems used bv most desktop computeis, and PostSciipt Tvpe 1, the de facto
standaid foi the giaphic aits and the publishing industiv. Moieovei, as time went bv, the piactical
dieiences had begun to blui. On the one hand, suppoit foi TiueTvpe became standaid in PostSciipt
3, while on the othei hand, besides native TiueTvpe suppoit, PostSciipt Tvpe 1 iasteiizing technologv
was incoipoiated into Windows 2000, Windows XP, and Mac OS X.
1.1.4 The best of two worlds: OpenType
Te OpenTvpe` font foimat was jointlv developed bv Adobe and Miciosof to combine the best featuies
of the TiueTvpe and PostSciipt Tvpe 1 technologies. It was ist piesented in 1996 and its use and
suppoit has been steadilv incieasing since about 2000.
OpenTvpe fonts contain both the scieen and piintei font data in a single component. Te Open-
Tvpe foimat can contain eithei TiueTvpe oi PostSciipt font data. It suppoits expanded chaiactei sets
(up to 65.000) and special tvpogiaphic featuies. Tese mav include vaiious veisions of guies (tabulai,
old-stvle, lining), small caps, ligatuies, oidinals, and othei extias. While OpenTvpe allows tvpe design-
eis to build complex fonts, not manv fonts take advantage of these possibilities. Most OpenTvpe fonts
available todav aie simplv conveited PostSciipt fonts, limited to 220 chaiacteis in a set.
OpenTvpe fonts aie platfoim independent and can thus be used on all opeiating svstems.
'Te technologv nevei ieallv took o and since 2000 Adobe has abandoned developing multiple mastei fonts since most
applications cannot handle them and foi a laige majoiitv of useis it ofen makes moie economic sense to buv a fontset as mul-
tiple sepaiate fonts. Adobe now concentiates on ieleasing OpenTvpe fonts to ieplace theii multiple mastei equivalents (e.g., the
Minion and Mviiad tvpefaces).
See e.g., http://developer.apple.com/fonts/, and http://www.microsoft.com/typography.
`See Adobes Web pages http://store.adobe.com/type/opentype/main.html,
and http://blogs.adobe.com/typblography/TT%20PS%20OpenType.pdf,
oi Miciosofss Web page http://www.microsoft.com/typography/OTSPEC/default.htm.
xetex-opentvpe.tex,v: 2.01 2009/06/15
3
1 POSTSCRIPT FONTS AND BEYOND
1.2 PostScript Type 1 and TrueType: two dierent approaches
TiueTvpe and PostSciipt Tvpe 1 fonts use dieient mathematical iepiesentations to desciibe the cuives
dening the font outlines.' OpenTvpe, being a supeiset, can have eithei kind of outlines.
TiueTvpe desciibes its cuives bv quadiatic B-splines, while PostSciipt Tvpe 1 uses cubic Bziei
cuives. Tis means, in piactice, that the shapes of ieal-woild fonts tend to take moie points in Tiue-
Tvpe, even though the kind of mathematics used to desciibe the cuives is simplei. Anv quadiatic spline
canbe conveited to a cubic spline with essentiallv no loss. Acubic spline canbe conveited to a quadiatic
with aibitiaiv piecision, but theie will be a slight loss of accuiacv inmost cases. Tus it is easv to conveit
TiueTvpe outlines to PostSciipt Tvpe 1 outlines (the Tvpe 42 PostSciipt font foimat is a PostSciipt
wiappei aiound a TiueTvpe font foi use in PostSciipt inteipieteis), haidei to do the ieveise.
Te appioach to hinting is dieient in both technologies. PostSciipt Tvpe 1 takes a declarative
appioach and lets a smart PostSciipt inteipietei do the woik. It tells the iasteiizei what featuies ought
to be contiolled, and the iasteiizei inteipiets these using its own intelligence to decide how to do it.
Teiefoie, when the PostSciipt inteipietei is upgiaded, the iasteiization can be impioved.
On papei, the hinting potential of TiueTvpe` should be supeiioi to that of PostSciipt Tvpe 1 fonts,
since TiueTvpe hints can do all that PostSciipt Tvpe 1 can, and moie. Indeed TiueTvpe takes an al-
gorithmic oi piogiamming appioach and uses the veiv exible and complete instiuctions set of the
TiueTvpe language. Tus TiueTvpe puts all the hinting infoimation into the font to contiol exactlv
how it will appeai when iasteiized. TiueTvpe inteipieteis can be quite dumb and limit themselves to
simplv execute what thev have been instiucted to do. Tus, although a TiueTvpe font developei can
netune what happens when a font is iasteiized undei dieient conditions, it iequiies seiious eoit,
expeitise, and high-end tools to actuallv take advantage of this gieatei hinting potential. As a iesult,
high-qualitv TiueTvpe fonts, which exploit the tiue potentials of TiueTvpe hinting aie quite iaie. Moie-
ovei, when using complex hinting the intioduction of a new iasteiizei might iequiie majoi changes to
the TiueTvpe code in oidei to be able to optimallv displav existing fonts.
PostSciipt Tvpe 1 needs two sepaiate les foi its font data: one foi the chaiactei outlines (.pfb),
and the othei foi the metiics data (.afm on Linux, .pfm on Windows), containing chaiactei widths,
keining paiis, and a desciiption of how to constiuct composites. TiueTvpe fonts have all the data in
a single le. Neveitheless this single TiueTvpe font le is ofen twice laigei than the two PostSciipt
Tvpe 1 les combined due to the piesence in the TiueTvpe fonts of extensive hinting instiuctions.
Geneiallv speaking, PostSciipt Tvpe 1 fonts have some advantages simplv fiom being the longei-
established standaid, especiallv foi seiious giaphic aits woik. Seivice buieaus aie standaidized on, and
have laige investments in, PostSciipt Tvpe 1 fonts. Most of the fonts which have expeit sets of old
stvle guies, extia ligatuies, tiue small capitals and the like aie in that foimat.
1.2.1 Interoperability
In piinciple one can mix TiueTvpe and PostSciipt Tvpe 1 fonts with the caveat that the TiueTvpe and
PostSciipt Tvpe 1 instances of the fonts mav not have exactlv the same names on the given opeiating
svstem. Indeed, the fact that fonts exist with identical menu names oi PostSciipt Tvpe 1 font names
confuses the opeiating svstem oi the application piogiams, with ofen unpiedictable iesults.
Also, if using Windows, one mav ndthat metiicallv-similai PostSciipt Tvpe 1 fonts get substituted
foi the Windows TiueTvpe svstem fonts at output time: Times ^ew Roman becomes Times Roman, and
Arial becomes Helvetica. Although the basic spacing of the substituted fonts is identical, theii keining
paiis aie not. Tis can cause text to ieow (i.e., line endings in a paiagiaph mav diei) if one switches
between two almost identical fonts if voui tvpesetting piogiam (e.g., T
E
X) suppoits keining paiis.
'See http://www.truetype.demon.co.uk/articles/ttvst1.htm.
See Dadid Lemons Basic Type 1 hinting (http://www.pyrus.com/downloads/hinting.pdf).
`See the URL http://www.microsoft.com/typography/hinting/tutorial.htm, Vincent Connaies Basic hinting
philosophies and TrueType instructions.
4
xetex-opentvpe.tex,v: 2.01 2009/06/15
1.3 Unicode: the universal character encoding
Tus caie must be taken to ensuie that vou use the coiiect font all thiough the complete pioduction
chain.
1.3 Unicode: the universal character encoding
Unicode is an inteinational standaid' foi iepiesenting chaiacteis using a multi-bvte platfoim-
independent encoding foi coveiing all the woild languages (including some aiticial ones, such as
mathematical svmbols and the inteinational phonetic alphabet). Unicode deals with chaiacteis iathei
than glvphs. Tat is, it onlv deals with semantic iathei than tvpogiaphic distinctions (with a few ex-
ceptions foi compatibilitv with existing standaids). Teiefoie theie is no place foi glvph vaiiants, such
as unusual ligatuies, old stvle numbeis, oi small caps within Unicode itself; the Unicode standaid as-
sumes that such distinctions will be made elsewheie. Teiefoie, font foimats, which suppoits such
distinctions, such as OpenTvpe (see Section 1.4), need to be laveied on top of Unicode. Alan Woods
maintains a useful website (http://www.alanwood.net/unicode/) which desciibes numeious
iesouices foi Unicode and multilingual suppoit in HTML, fonts, web biowseis and othei applications.
Most cuiient opeiating svstems (Linux, Mac OS X and Windows XP) have diiect suppoit foi Uni-
code at the basic svstemlevel. Foi instance, apait fiomswitching between dieient language kevboaids,
these opeiating svstems oei means of diiectlv accessing anv Unicode chaiactei in anv font (e.g., on
Mac OS Xvia the Character Palette and on Miciosof Windows XP oi Vista via the Character Map utilitv
in System Tools in the Accessories submenu.)
1.4 OpenType
Te OpenTvpe font foimat was developed jointlv bv Miciosof and Adobe as an extension of the Tiue-
Tvpe font foimat. OpenTvpe addiesses the following goals:
suppoits PostSciipt Tvpe 1 outlines and hints;
suppoits TiueTvpe tables and hints;
suppoits advanced tvpogiaphic featuies bv wav of new tables foi glvph positioning and substitu-
tion;
suppoits multiple platfoims;
suppoits inteinational chaiactei sets bv using Unicode;
oeis bettei piotection foi font data;
featuies smallei le sizes to make font distiibution moie ecient.
Sometimes OpenTvpe fonts aie iefeiied to as TiueTvpe Open v.2.0 fonts. PostSciipt Tvpe 1 data
included in OpenTvpe fonts mav be diiectlv iasteiized oi conveited to the TiueTvpe outline foimat
foi iendeiing, depending on which iasteiizeis have been installed in the host opeiating svstem. Useis
do not need to know which outlines aie actuallv piesent. One can sav that OpenTvpe enteis TiueTvpe
and PostSciipt Tvpe 1 in a common wiappei. OpenTvpe tables include the cuiient TiueTvpe tables
plus some additional tables foi advanced tvpogiaphic featuies. Te iepiesentation of PostSciipt Tvpe 1
font sofwaie in an OpenTvpe font uses Adobes Compact Font Foimat (CFF) with Tvpe 2 chaistiings,
which is a moie compact iepiesentation of the same infoimation in PostSciipt Tvpe 1 (a gain of about
a factoi of two, on aveiage, when no glvphs and featuies aie added).
'Te cuiient veision is 5.0 [7] and it has been dened bv the membeis of the Unicode Consoitium, which includes majoi
computei coipoiations, sofwaie pioduceis, database vendois, ieseaich institutions, inteinational agencies, vaiious usei gioups,
and inteiested individuals, see http://www.unicode.org.
xetex-opentvpe.tex,v: 2.01 2009/06/15
5
1 POSTSCRIPT FONTS AND BEYOND
Te OpenTvpe foimat suppoits features equivalent to most of the advanced featuies of existing
TiueTvpe and PostSciipt foimats, such as Adobes CID technologv foi Asian fonts, and extended mul-
tilingual chaiactei sets. Howevei, multiple mastei fonts aie not pait of the OpenTvpe specication.
OpenTvpe fonts mav contain moie than 65,000 glvphs, which allows a single font le to contain manv
nonstandaid glvphs, such as old-stvle guies, tiue small capitals, fiactions, swashes, supeiiois, infeii-
ois, titling letteis, contextual and stvlistic alteinates, and a full iange of ligatuies. OpenTvpe fonts thus
oeis iich linguistic suppoit combined with advanced tvpogiaphic contiol. Featuie-iich Adobe Open-
Tvpe fonts aie ofen distinguished bv the woid Pio, being pait of the font name. OpenTvpe fonts can
be installed and used alongside PostSciipt Tvpe 1 and TiueTvpe fonts.
OpenTvpe, which is based on Unicode, signicantlv simplies font management and the pub-
lishing piocess bv ensuiing that all of the iequiied glvphs foi a document aie contained in one cioss-
platfoim font le thioughout the woikow.
Te text model of OpenTvpe is that applications stoie text using the undeilving Unicode chaiac-
teis, and applv foimatting to get at the specic desiied glvphs. In addition to the Unicode mapping of
default glvphs, the font has OpenTvpe lavout tables which tell it which glvphs to use when othei foims
aie desiied instead, such as small caps oi swashes. Tese tables also specifv which glvphs should tuin
into ligatuies, oi when a sciipt font needs dieient glvphs foi a lettei when it is at the beginning, middle
oi end of a woid, oi is a woid bv itself.
Having the tiansfoimations distinct fiomthe undeilving text enables table-diiven automatic glvph
substitution, which does not need to be one foi one; one glvph can be substituted foi seveial (such as
the ligatuie, which iemembeis that the undeilving text contains the chaiacteis f-f-i in seaiching),
oi multiple glvphs can be substituted foi a single one. Glvph substitution can be context sensitive, oi
it can be activated bv explicit usei demand. Tis featuie might not appeai essential foi Latin-based
languages, such as Spanish and English, but it becomes mandatoiv foi piopei tvpesetting of languages
that use complex sciipts, such as Aiabic oi the Indic languages, since having letteis take dieient
foims based on theii position in the woid is a basic pait of how Aiabic woiks.
OpenTvpe lavout featuies can be used to position oi substitute glvphs. Foi anv chaiactei, theie is
a default glvph and positioning behavioi. Te application of lavout featuies to one oi moie chaiacteis
mav change the positioning, oi substitute a dieient glvph.
Teie aie seveial advantages of using a laige OpenTvpe font ovei cuiientlv available expeit sets
and alteinates. Fiist, one onlv has to deal with one font le, iathei than being clutteied with a whole
set of supplemental fonts. Second,theie can be keining between glvphs that might otheiwise have been
in sepaiate fonts. Finallv, the usei can tuin on ligatuies, smallcaps, oi old-stvle guies, much like bold
oi italic stvling, without switching fonts.
Histoiicallv, some of the highest qualitv tvpefaces have includeddieient designs foi dieient piint
sizes. Rathei than using its multiple masteis technologv, most of Adobes OpenTvpe fonts now include
foui optical size vaiiations: caption, iegulai, subhead and displav. Called Opticals, these vaiiations
have been optimised foi use at specic point sizes. Although the exact intended sizes vaiv bv familv,
the geneial size ianges include: caption (68 point), iegulai (913 point), subhead (1424 point) and
displav (2572 point).
1.4.1 OpenType tables
OpenTvpe font les contain tables that contain eithei TiueTvpe oi PostSciipt outline font data and
the data in these tables aie used bv iendeiing piogiams to iendei the TiueTvpe oi PostSciipt glvphs.
Moieovei, some of the data is independent of the paiticulai outline foimat used.'
OpenTvpe fonts ist contain a numbei of required tables.
'Te stiuctuie of an OpenTvpe font le is desciibed at the URL http://www.microsoft.com/typography/otspec/
otff.htm; a shoit desciiption of the contents of the tables is at the URL http://www.microsoft.com/typography/
otspec/recom.htm.
6
xetex-opentvpe.tex,v: 2.01 2009/06/15
1.4 OpenType
cmap Chaiactei to glvph mapping
head Font headei
hhea Hoiizontal headei
hmtx Hoiizontal metiics
maxp Maximum piole
name Naming table
OS/2 OS/2 and Windows specic metiics
post PostSciipt infoimation
Foi OpenTvpe fonts based on TiueTvpe outlines, the following tables aie used:
cvt Contiol Value Table
fpgm Font piogiam
glyf Glvph data
loca Index to location
prep CVT Piogiam
Foi OpenTvpe fonts based on PostSciipt anothei set of tables containing data specic to PostSciipt
fonts aie used instead of the tables listed above:
CFF PostSciipt font piogiam (compact font foimat)
VORG Veitical Oiigin
OpenTvpe fonts mav contain bitmaps of glvphs, in addition to outlines. Hand-tuned bitmaps aie
especiallv useful in OpenTvpe fonts foi iepiesenting complex glvphs at veiv small sizes. If a bitmap foi
a paiticulai size is piovided in a font, it will be used bv the svsteminstead of the outline when iendeiing
the glvph. Foi OpenTvpe fonts containing bitmap glvphs thiee tables aie available:
EBDT Embedded bitmap data
EBLC Embedded bitmap location data
EBSC Embedded bitmap scaling data
Finallv, advanced tvpogiaphv, veitical tvpesetting and othei special functions aie suppoited with
the following tables:
BASE Baseline data
GDEF Glvph denition data
GPOS Glvph positioning data
GSUB Glvph substitution data
JSTF Iustication data
DSIG Digital signatuie
gasp Giid-tting/Scan-conveision
hdmx Hoiizontal device metiics
kern Keining
LTSH Lineai thieshold data
PCLT PCL 5 data
VDMX Veitical device metiics
vhea Veitical Metiics headei
vmtx Veitical Metiics
Fuitheimoie, OpenTvpe fonts use a set of sciipt, language and featuie tags to stiuctuie the infoi-
mation in theii tables.
Script tags identifv the sciipts iepiesented in an OpenTvpe font. Each sciipt coiiesponds to a con-
tiguous chaiactei code iange in Unicode. Sciipt tags aie foui-bvte chaiactei stiings composed of up to
foui letteis in the ASCII chaiacteis iange 0x20-0x7E, padding with blanks (0x20) if iequiied. A list
of sciipts and theii tags follows.'
dflt Default
arab Aiabic
armn Aimenian
beng Bengali
bopo Bopomofo
brai Biaille
byzm Bvzantine Music
cans Canadian Svllabics
cher Cheiokee
cyrl Cviillic
deva Devanagaii
ethi Ethiopic
geor Geoigian
grek Gieek
gujr Gujaiati
guru Guimukhi
jamo Hangul Iamo
hang Hangul
hani CIK Ideogiaphic
hebr Hebiew
kana Hiiagana
'See http://www.microsoft.com/typography/otspec/scripttags.htm foi an up-to-date list.
xetex-opentvpe.tex,v: 2.01 2009/06/15
7
1 POSTSCRIPT FONTS AND BEYOND
knda Kannada
kana Katakana
khmr Khmei
lao Lao
latn Latin
mlym Malavalam
mong Mongolian
mymr Mvanmai
ogam Ogham
orya Oiiva
runr Runic
sinh Sinhala
syrc Sviiac
taml Tamil
telu Telugu
thaa Taana
thai Tai
tibt Tibetan
yi Yi
When the table with the list of sciipts is seaiched foi a sciipt, and no entiv is found, and theie
exists an entiv foi the DFLT sciipt, then this entiv must be used. Fuitheimoie, the default sciipt can
onlv contain a single, default, language.
Language system tags identifv the language svstems suppoited in an OpenTvpe font. What is meant
bv a language svstem in this context is a set of tvpogiaphic conventions foi how text in a given sciipt
should be piesented. Such conventions mav be associated with paiticulai languages, with paiticulai
genies of usage, with dieient publications, and othei such factois. Foi example, paiticulai glvph vaii-
ants foi ceitain chaiacteis mav be iequiied foi paiticulai languages, oi foi phonetic tiansciiption oi
mathematical notation.
Note that two oi moie languages mav follow the same conventions oi that moie than one set of
tvpogiaphic conventions can applv to a given language. Teiefoie language svstem tags do not coiie-
spond in a one-to-one mannei with languages.'
Language svstem tags aie foui-bvte chaiactei stiings composed of up to foui chaiacteis in the
ASCII chaiacteis iange 0x20-0x7E, padding with blanks (0x20) if iequiied. A list of languages and
theii language svstem tags follows.
dflt Default
ABA Abaza
ABK Abkhazian
ADY Advghe
AFK Afiikaans
AFR Afai
AGW Agaw
ALT Altai
AMH Amhaiic
APPH Phonetic tiansciiption
(Ameiicanist conventions)
ARA Aiabic
ARI Aaii
ARK Aiakanese
ASM Assamese
ATH Athapaskan
AVR Avai
AWA Awadhi
AYM Avmaia
AZE Azeii
BAD Badaga
BAG Baghelkhandi
BAL Balkai
BAU Baule
BBR Beibei
BCH Bench
BCR Bible Ciee
BEL Belaiussian
BEM Bemba
BEN Bengali
BGR Bulgaiian
BHI Bhili
BHO Bhojpuii
BIK Bikol
BIL Bilen
BKF Blackfoot
BLI Balochi
BLN Balante
BLT Balti
BMB Bambaia
BML Bamileke
BRE Bieton
BRH Biahui
BRI Biaj Bhasha
BRM Buimese
BSH Bashkii
BTI Beti
CAT Catalan
CEB Cebuano
CHE Chechen
CHG Chaha Guiage
CHH Chattisgaihi
CHI Chichewa
CHK Chukchi
CHP Chipewvan
CHR Cheiokee
CHU Chuvash
CMR Comoiian
COP Coptic
CRE Ciee
CRR Caiiiei
CRT Ciimean Tatai
CSL Chuich Slavonic
CSY Czech
DAN Danish
DAR Daigwa
'See http://www.microsoft.com/typography/otspec/scripttags.htm foi an up-to-date list of language tags
and the coiiespondece to the ISO 639 codes, which identifv individual languages as well as foi ceitain collections of languages.
8
xetex-opentvpe.tex,v: 2.01 2009/06/15
1.4 OpenType
DCR Woods Ciee
DEU Geiman (Standaid)
DGR Dogii
DHV Dhivehi
DJR Djeima
DNG Dangme
DNK Dinka
DUN Dungan
DZN Dzongkha
EBI Ebiia
ECR Eastein Ciee
EDO Edo
EFI Ek
ELL Gieek
ENG English
ERZ Eizva
ESP Spanish
ETI Estonian
EUQ Basque
EVK Evenki
EVN Even
EWE Ewe
FAN Fiench Antillean
FAR Faisi
FIN Finnish
FJI Fijian
FLE Flemish
FNE Foiest Nenets
FON Fon
FOS Faioese
FRA Fiench (Standaid)
FRI Fiisian
FRL Fiiulian
FTA Futa
FUL Fulani
GAD Ga
GAE Gaelic
GAG Gagauz
GAL Galician
GAR Gaishuni
GAW Gaihwali
GEZ Geez
GIL Gilvak
GMZ Gumuz
GON Gondi
GRN Gieenlandic
GRO Gaio
GUA Guaiani
GUJ Gujaiati
HAI Haitian
HAL Halam
HAR Haiauti
HAU Hausa
HAW Hawaiin
HBN Hammei-Banna
HIL Hiligavnon
HIN Hindi
HMA High Maii
HND Hindko
HO Ho
HRI Haiaii
HRV Cioatian
HUN Hungaiian
HYE Aimenian
IBO Igbo
IJO Ijo
ILO Ilokano
IND Indonesian
ING Ingush
INU Inuktitut
IPPH Phonetic tiansciiption (IPA
conventions)
IRI Iiish
IRT Iiish Tiaditional
ISL Icelandic
ISM Inaii Sami
ITA Italian
IWR Hebiew
JAN Iapanese
JAV Iavanese
JII Yiddish
JUD Iudezmo
JUL Iula
KAB Kabaidian
KAC Kachchi
KAL Kalenjin
KAN Kannada
KAR Kaiachav
KAT Geoigian
KAZ Kazakh
KEB Kebena
KGE Khutsuii Geoigian
KHA Khakass
KHK Khantv-Kazim
KHM Khmei
KHS Khantv-Shuiishkai
KHV Khantv-Vakhi
KHW Khowai
KIK Kikuvu
KIR Kiighiz
KIS Kisii
KKN Kokni
KLM Kalmvk
KMB Kamba
KMN Kumaoni
KMO Komo
KMS Komso
KNR Kanuii
KOD Kodagu
KOK Konkani
KON Kikongo
KOP Komi-Peimvak
KOR Koiean
KOZ Komi-Zviian
KPL Kpelle
KRI Kiio
KRK Kaiakalpak
KRL Kaielian
KRM Kaiaim
KRN Kaien
KRT Kooiete
KSH Kashmiii
KSI Khasi
KSM Kildin Sami
KUI Kui
KUL Kulvi
KUM Kumvk
KUR Kuidish
KUU Kuiukh
KUY Kuv
KYK Koivak
LAD Ladin
LAH Lahuli
LAK Lak
LAM Lambani
LAO Lao
LAT Latin
LAZ Laz
LCR L-Ciee
LDK Ladakhi
LEZ Lezgi
LIN Lingala
LMA Low Maii
xetex-opentvpe.tex,v: 2.01 2009/06/15
9
1 POSTSCRIPT FONTS AND BEYOND
LMB Limbu
LMW Lomwe
LSB Lowei Soibian
LSM Lule Sami
LTH Lithuanian
LUB Luba
LUG Luganda
LUH Luhva
LUO Luo
LVI Latvian
MAJ Majang
MAK Makua
MAL Malavalam Tiaditional
MAN Mansi
MAR Maiathi
MAW Maiwaii
MBN Mbundu
MCH Manchu
MCR Moose Ciee
MDE Mende
MEN Meen
MIZ Mizo
MKD Macedonian
MLE Male
MLG Malagasv
MLN Malinke
MLR Malavalam Refoimed
MLY Malav
MND Mandinka
MNG Mongolian
MNI Manipuii
MNK Maninka
MNX Manx Gaelic
MOK Moksha
MOL Moldavian
MON Mon
MOR Moioccan
MRI Maoii
MTH Maithili
MTS Maltese
MUN Mundaii
NAG Naga-Assamese
NAN Nanai
NAS Naskapi
NCR N-Ciee
NDB Ndebele
NDG Ndonga
NEP Nepali
NEW Newaii
NHC Noiwav House Ciee
NIS Nisi
NIU Niuean
NKL Nkole
NLD Dutch
NOG Nogai
NOR Noiwegian
NSM Noithein Sami
NTA Noithein Tai
NTO Espeianto
NYN Nvnoisk
OCR Oji-Ciee
OJB Ojibwav
ORI Oiiva
ORO Oiomo
OSS Ossetian
PAA Palestinian Aiamaic
PAL Pali
PAN Punjabi
PAP Palpa
PAS Pashto
PGR Polvtonic Gieek
PIL Pilipino
PLG Palaung
PLK Polish
PRO Piovencal
PTG Poituguese
QIN Chin
RAJ Rajasthani
RBU Russian Buiiat
RCR R-Ciee
RIA Riang
RMS Rhaeto-Romanic
ROM Romanian
ROY Romanv
RSY Rusvn
RUA Ruanda
RUS Russian
SAD Sadii
SAN Sanskiit
SAT Santali
SAY Savisi
SEK Sekota
SEL Selkup
SGO Sango
SHN Shan
SIB Sibe
SID Sidamo
SIG Silte Guiage
SKS Skolt Sami
SKY Slovak
SLA Slavev
SLV Slovenian
SML Somali
SMO Samoan
SNA Sena
SND Sindhi
SNH Sinhalese
SNK Soninke
SOG Sodo Guiage
SOT Sotho
SQI Albanian
SRB Seibian
SRK Saiaiki
SRR Seiei
SSL South Slavev
SSM Southein Sami
SUR Suii
SVA Svan
SVE Swedish
SWA Swadava Aiamaic
SWK Swahili
SWZ Swazi
SXT Sutu
SYR Sviiac
TAB Tabasaian
TAJ Tajiki
TAM Tamil
TAT Tatai
TCR TH-Ciee
TEL Telugu
TGN Tongan
TGR Tigie
TGY Tigiinva
THA Tai
THT Tahitian
TIB Tibetan
TKM Tuikmen
TMN Temne
10
xetex-opentvpe.tex,v: 2.01 2009/06/15
1.4 OpenType
TNA Tswana
TNE Tundia Nenets
TNG Tonga
TOD Todo
TRK Tuikish
TSG Tsonga
TUA Tuiovo Aiamaic
TUL Tulu
TUV Tuvin
TWI Twi
UDM Udmuit
UKR Ukiainian
URD Uidu
USB Uppei Soibian
UYG Uvghui
UZB Uzbek
VEN Venda
VIT Vietnamese
WAG Wagdi
WA Wa
WCR West-Ciee
WEL Welsh
WLF Wolof
XHS Xhosa
YAK Yakut
YBA Yoiuba
YCR Y-Ciee
YIC Yi Classic
YIM Yi Modein
ZHP Chinese Phonetic
ZHS Chinese Simplied
ZHT Chinese Tiaditional
ZND Zande
ZUL Zulu
1.4.2 OpenType features
Features piovide infoimation about howto use the glvphs in an OpenTvpe oi TiueTvpe font to iendei a
sciipt oi language. Foi example, an Aiabic font might have a featuie foi substituting initial glvph foims,
and a Kanji font might have a featuie foi positioning glvphs veiticallv. All OpenTvpe Lavout featuies
dene data foi glvph substitution, glvph positioning, oi both.
Each OpenTvpe Lavout featuie has a featuie tag that identies its tvpogiaphic function and eects.
Bv examining a featuies tag, a text-piocessing client can deteimine what a featuie does and decide
whethei to implement it. All tags aie foui-bvte chaiactei stiings composed of a limited set of ASCII
chaiacteis (iange 0x20-0x7E).
A featuie denition does not necessaiilv piovide all the infoimation iequiied to piopeilv imple-
ment glvph substitution oi positioning actions. Ofen, a text-piocessing client mav need to supplv ad-
ditional data' In all cases, the text-piocessing client is iesponsible foi applving, combining, and aibi-
tiating among featuies and iendeiing the iesult.
Te list of featuies iegisteied bv Miciosof togethei with a shoit desciiption follows.
aalt Access All Alteinates
abvf Above-base Foims
abvm Above-base Maik Position-
ing
abvs Above-base Substitutions
afrc Alteinative Fiactions
akhn Akhands
blwf Below-base Foims
blwm Below-base Maik Position-
ing
blws Below-base Substitutions
calt Contextual Alteinates
case Case-Sensitive Foims
ccmp Glvph Composition and
Decomposition
clig Contextual Ligatuies
cpsp Capital Spacing
cswh Contextual Swash
curs Cuisive Positioning
c2sc Small Capitals Fiom Capi-
tals
c2pc Petite Capitals Fiom Capi-
tals
dist Distances
dlig Discietionaiv Ligatuies
dnom Denominatois
expt Expeit Foims
falt Final Glvph on Line Altei-
nates
fin2 Teiminal Foims #2
fin3 Teiminal Foims #3
fina Teiminal Foims
frac Fiactions
fwid Full Widths
half Half Foims
haln Halant Foims
halt Alteinate Half Widths
hist Histoiical Foims
hkna Hoiizontal Kana Alteinates
hlig Histoiical Ligatuies
hngl Hangul
hojo Hojo Kanji Foims (IIS X
0212-1990 Kanji Foims)
'As an example let us considei the init featuie whose function is to piovide initial glvph foims. Nothing in the featuies
lookup tables indicates when oi wheie to applv this featuie duiing text piocessing. Hence, to coiiectlv use this featuie in Aiabic
text wheie initial glvph foims appeai at the beginning of woids, text-piocessing clients must be able to identifv the ist glvph
position in each woid befoie making the glvph substitution.
Moie details about each featuie aie available at the Miciosof OpenTvpe site http://www.microsoft.com/
typography/otspec/featuretags.htm, oi Adobe developeis site http://partners.adobe.com/public/
developer/opentype/index_tag3.html
xetex-opentvpe.tex,v: 2.01 2009/06/15
11
1 POSTSCRIPT FONTS AND BEYOND
hwid Half Widths
init Initial Foims
isol Isolated Foims
ital Italics
jalt Iustication Alteinates
jp78 IIS78 Foims
jp83 IIS83 Foims
jp90 IIS90 Foims
jp04 IIS2004 Foims
kern Keining
lfbd Lef Bounds
liga Standaid Ligatuies
ljmo Leading Iamo Foims
lnum Lining Figuies
locl Localized Foims
mark Maik Positioning
med2 Medial Foims #2
medi Medial Foims
mgrk Mathematical Gieek
mkmk Maik to Maik Positioning
mset Maik Positioning via Sub-
stitution
nalt Alteinate Annotation
Foims
nlck NLC Kanji Foims
nukt Nukta Foims
numr Numeiatois
onum Oldstvle Figuies
opbd Optical Bounds
ordn Oidinals
ornm Oinaments
palt Piopoitional Alteinate
Widths
pcap Petite Capitals
pnum Piopoitional Figuies
pref Pie-Base Foims
pres Pie-base Substitutions
pstf Post-base Foims
psts Post-base Substitutions
pwid Piopoitional Widths
qwid Ouaitei Widths
rand Randomize
rlig Requiied Ligatuies
rphf Reph Foims
rtbd Right Bounds
rtla Right-to-lef alteinates
ruby Rubv Notation Foims
salt Stvlistic Alteinates
sinf Scientic Infeiiois
size Optical size
smcp Small Capitals
smpl Simplied Foims
ss01 Stvlistic Set 1
ss02 Stvlistic Set 2
ss03 Stvlistic Set 3
ss04 Stvlistic Set 4
ss05 Stvlistic Set 5
ss06 Stvlistic Set 6
ss07 Stvlistic Set 7
ss08 Stvlistic Set 8
ss09 Stvlistic Set 9
ss10 Stvlistic Set 10
ss11 Stvlistic Set 11
ss12 Stvlistic Set 12
ss13 Stvlistic Set 13
ss14 Stvlistic Set 14
ss15 Stvlistic Set 15
ss16 Stvlistic Set 16
ss17 Stvlistic Set 17
ss18 Stvlistic Set 18
ss19 Stvlistic Set 19
ss20 Stvlistic Set 20
subs Subsciipt
sups Supeisciipt
swsh Swash
titl Titling
tjmo Tiailing Iamo Foims
tnam Tiaditional Name Foims
tnum Tabulai Figuies
trad Tiaditional Foims
twid Tiid Widths
unic Unicase
valt Alteinate Veitical Metiics
vatu Vattu Vaiiants
vert Veitical Wiiting
vhal Alteinate Veitical Half
Metiics
vjmo Vowel Iamo Foims
vkna Veitical Kana Alteinates
vkrn Veitical Keining
vpal Piopoitional Alteinate
Veitical Metiics
vrt2 Veitical Alteinates and Ro-
tation
zero Slashed Zeio
1.4.3 OpenType support today
As an example of how publishing applications can exploit OpenTvpes lavout featuies we can look at
OpenTvpe suppoit in Adobes Illustrator, InDesign and Photoshop' piogiams. Tese include automatic
substitution bv alteinate glvphs in an OpenTvpe Pio font (ligatuies, small capitals, and piopoitional
old-stvle guies, veitical shif of punctuation in an all-caps setting). Moieovei, anv alteinate glvphs
in OpenTvpe fonts mav be selected manuallv via the Insert Character palette (see Figuie 1.1 on the
facing page). Tese OpenTvpe Pio fonts oei a full iange of accented chaiacteis to suppoit all cential
and eastein Euiopean languages, and manv of them also contain suppoit foi the Cviillic and Gieek
alphabets.
Featuie suppoit acioss Miciosofs Oce applications exists foi those featuies that aie necessaiv
foi language suppoit, such as contextual substitutions foi Aiabicand onlv in the languages which
iequiie them (e.g., Woid 2003 does contextual substitutions foi Aiabic, but not foi English).
'See http://www.adobe.com/products/XXX/main.htm, wheie XXX stands foi illustrator, indesign, and pho-
toshop, iespectivelv.
12
xetex-opentvpe.tex,v: 2.01 2009/06/15
1.4 OpenType
Figuie 1.1: Using OpenTvpes advanced tvpogiaphic featuies in Adobe InDesign. Lef: selection of au-
tomatic substitution of ligatuies and old-stvle guies on a menu. Right: select and inseit anv alteinate
glvph Insert Character palette.
Openoce on all suppoited platfoims has a somewhat similai appioach to Miciosofs Oce suite
in that it allows one to use the chaiacteis piesent in the font but does not ieallv piesent an inteiface to
the advanced tvpogiaphic featuies (see Figuie 1.2 on the next page).
Tat leaves us with the availabilitv of the fonts themselves. Aiound the veai 2000 theie weie onlv a
handful of OpenTvpe fonts, and almost all of them weie fiom Adobe. Nowadavs, theie aie thousands
available fiomovei two dozen font foundiies. Foi instance, the entiie Adobe Tvpe Libiaiv of ovei 2,200
fonts has been tianslated into the OpenTvpe foimat, URW has ieleased ovei 1,000 OpenTvpe fonts,
and othei laige foundiies, such as Linotvpe and Agfa Monotvpe, as well as most smallei foundiies, aie
also cieating OpenTvpe fonts. Most of Miciosofs svstem fonts, and Apples Iapanese svstem fonts, aie
OpenTvpe. Similailv, OpenTvpe is being embiaced bv majoi tvpe foundiies foi non-alphabetic sciipts,
such as Chinese and Iapanese.
Howevei, it is not enough foi a font to be in the OpenTvpe foimat to be suie that it has extended
language suppoit oi extia tvpogiaphic featuies. Teiefoie, befoie puichasing, vou should examine the
featuies piesent in a font.' To inspect a font that vou alieadv have on voui Miciosof Windows svstem,
vou can install the Font Properties Extension fiom Miciosof. Tis add-on allows vou to iight-click on
a font to displav a much expanded set of piopeities, which includes language suppoit and OpenTvpe
lavout featuies (see Figuie 1.3 on page 15).
1.4.4 Interrogating OpenType fonts
Eddie Kohleis otnfo piogiam piints infoimation about an OpenTvpe font.
> otfinfo --help
'Otfinfo' reports information about an OpenType font to standard output.
Options specify what information to print.
Usage: otfinfo [-sfzpg] [OTFFILES...]
Query options:
-s, --scripts Report font's supported scripts.
-f, --features Report font's GSUB/GPOS features.
-z, --optical-size Report font's optical size information.
-p, --postscript-name Report font's PostScript name.
-a, --family Report font's family name.
'In the case of Adobe, wheie cuiientlv not all fonts ieleased in OpenTvpe foimat have signicant added featuies oi extended
language suppoit, vou biowse all fonts in the Adobe Type Library fiom the URL http://store.adobe.com/type/main.
html, so that vou can inspect the font vou aie inteiested in. Othei font vendois oei similai possibilities.
Pait of his lcdf tools, see www.lcdf.org/type/.
xetex-opentvpe.tex,v: 2.01 2009/06/15
13
1 POSTSCRIPT FONTS AND BEYOND
Figuie 1.2: OpenTvpe Unicode suppoit in OpenOce. Te top panel shows text in vaiious alphabets
and the bottom panel the chaiacteis available in the Gieek pait of font lavout.
-v, --font-version Report font's version information.
-i, --info Report font's names and designer/vendor info.
-g, --glyphs Report font's glyph names.
-t, --tables Report font's OpenType tables.
Other options:
--script=SCRIPT[.LANG] Set script used for --features [latn].
-V, --verbose Print progress information to standard error.
-h, --help Print this message and exit.
-q, --quiet Do not generate any error messages.
--version Print version number and exit.
> otfinfo --info texmf-commercial/fonts/opentype/adobe/minionpro-regular.otf
Family: Minion Pro
Subfamily: Regular
Full name: Minion Pro
PostScript name: MinionPro-Regular
Version: Version 2.012;PS 002.000;Core 1.0.38;makeotf.lib1.6.6565
Unique ID: 2.012;ADBE;MinionPro-Regular
14
xetex-opentvpe.tex,v: 2.01 2009/06/15
1.4 OpenType
Figuie 1.3: Miciosofs Fonts Extension utilitv displavs OpenTvpe featuies foi MinionPio-Regulai and
the suppoited Chaiactei sets foi MviiadPio-Bold when vou iight-click on the font (Tis utilitv, ttfext,
adds seveial new piopeitv tabs to the standaids piopeities dialog box, such as infoimation ielating to
font oiigination and copviight, the tvpe sizes to which hinting and smoothing aie applied, and the code
pages suppoited bv extended chaiactei. It can be downloaded fiomhttp://www.microsoft.com/
typography/TrueTypeProperty21.mspx.)
Designer: Robert Slimbach
Vendor URL: http://www.adobe.com/type/
Trademark: Minion is either a ...
Copyright: 2000, 2002, 2004 ...
License URL: http://www.adobe.com/type/legal.html
> otfinfo --script texmf-commercial/fonts/opentype/adobe/minionpro-regular.otf
cyrl Cyrillic latn.DEU Latin/German (Standard)
grek Greek latn.MOL Latin/Moldavian
latn Latin latn.ROM Latin/Romanian
latn.AZE Latin/Azeri latn.SRB Latin/Serbian
latn.CRT Latin/Crimean Tatar latn.TRK Latin/Turkish
> otfinfo --tables texmf-commercial/fonts/opentype/adobe/minionpro-regular.otf
64 BASE 54 head
132417 CFF 36 hhea
5228 DSIG 6652 hmtx
40074 GPOS 6 maxp
13872 GSUB 1533 name
96 OS/2 32 post
4048 cmap
> otfinfo --features texmf-commercial/fonts/opentype/adobe/minionpro-regular.otf
xetex-opentvpe.tex,v: 2.01 2009/06/15
15
1 POSTSCRIPT FONTS AND BEYOND
aalt Access All Alternates c2sc Small Capitals From Capitals
case Case-Sensitive Forms cpsp Capital Spacing
dlig Discretionary Ligatures dnom Denominators
fina Terminal Forms frac Fractions
hist Historical Forms kern Kerning
liga Standard Ligatures lnum Lining Figures
numr Numerators onum Oldstyle Figures
ordn Ordinals ornm Ornaments
pnum Proportional Figures salt Stylistic Alternates
sinf Scientific Inferiors size Optical Size
smcp Small Capitals ss01 Stylistic Set 1
ss02 Stylistic Set 2 sups Superscript
tnum Tabular Figures zero Slashed Zero
Iust van Rossums ttx utilitv' can decompile the contents of an OpenTvpe font and output it in
XML foimat. Tis comes in handv if vou want to studv the contents of a given font (e.g., its tables) oi
(slightlv) modifv it.
> ttx --help
usage: ttx [options] inputfile1 [... inputfileN]
TTX 2.0b1 -- From OpenType To XML And Back
If an input file is a TrueType or OpenType font file, it will be
dumped to an TTX file (an XML-based text format).
If an input file is a TTX file, it will be compiled to a TrueType
or OpenType font file.
Output files are created so they are unique: an existing file is
never overwritten.
General options:
-h Help: print this message
-d <outputfolder> Specify a directory where the output files are
to be created.
-v Verbose: more messages will be written to stdout about what
is being done.
Dump options:
-l List table info: instead of dumping to a TTX file, list some
minimal info about each table.
-t <table> Specify a table to dump. Multiple -t options
are allowed. When no -t option is specified, all tables
will be dumped.
-x <table> Specify a table to exclude from the dump. Multiple
-x options are allowed. -t and -x are mutually exclusive.
-s Split tables: save the TTX data into separate TTX files per
table and write one small TTX file that contains references
to the individual table dumps. This file can be used as
input to ttx, as long as the table files are in the
same directory.
-i Do NOT disassemble TT instructions: when this option is given,
all TrueType programs (glyph programs, the font program and the
'Wiitten in Pvthon and pait of the FontTools toolset (sourceforge.net/projects/fonttools).
16
xetex-opentvpe.tex,v: 2.01 2009/06/15
1.4 OpenType
pre-program) will be written to the TTX file as hex data
instead of assembly. This saves some time and makes the TTX
file smaller.
Compile options:
-m Merge with TrueType-input-file: specify a TrueType or OpenType
font file to be merged with the TTX file. This option is only
valid when at most one TTX file is specified.
-b Don't recalc glyph bounding boxes: use the values in the TTX
file as-is.
Tus, to decompile a font myfont.otf just specifv:
> ttx myfont.otf
Tis will wiite a le myfon.ttx in the diiectoiv wheie the font le iesides. If vou aie onlv inteiested
in two tables (e.g., GSUB and GPOS), specifv them on the command line:
> ttx -t GSUB -t GPOS myfont.otf
To conveit an XML le myfont.ttx back into an OpenTvpe oi TiueTvpe le is similailv easv:
> ttx myfont.ttx
It vou want to intioduce modications (e.g., given in XML foimat in the le myfontmods.ttx) into
an OpenTvpe le, use the -m option, as follows:
> ttx -m myfont.otf myfontmods.ttx
A moie explicit example with the font MinionPio follows.
> ttx -l /texlive/2007/texmf-commercial/fonts/opentype/adobe/minionpro-regular.otf
Listing table info for
"/texlive/2007/texmf-commercial/fonts/opentype/adobe/minionpro-regular.otf":
tag checksum length offset tag checksum length offset
---- ---------- ------- ------- ---- ---------- ------- -------
BASE 0x086729a7 64 199052 CFF 0x101232c2 132417 6032
DSIG 0x446dbd94 5228 199116 GPOS 0xx71552700 40074 158976
GSUB 0xx3bf7bcba 13872 145104 OS/2 0x40e57e9f 96 320
cmap 0x0cedc8f1 4048 1952 head 0xx2167aded 54 220
hhea 0x09140bb5 36 276 hmtx 0xx37425493 6652 138452
maxp 0x067f5000 6 312 name 0x3cf7b183 1533 416
post 0x0x47ffce 32 6000
ttx -d. -t head /texlive/2007/texmf-commercial/fonts/opentype/adobe/minionpro-regular.otf
Dumping "/texlive/2007/texmf-commercial/fonts/opentype/adobe/minionpro-regular.otf"
to "./minionpro-regular.ttx"...
Dumping 'head' table...
> less ./minionpro-regular.ttx
<?xml version="1.0" encoding="ISO-8859-1"?>
<ttFont sfntVersion="OTTO" ttLibVersion="2.0b1">
<head>
<!-- Most of this table will be recalculated by the compiler -->
<tableVersion value="1.0"/>
<fontRevision value="2.0119934082"/>
xetex-opentvpe.tex,v: 2.01 2009/06/15
17
1 POSTSCRIPT FONTS AND BEYOND
<checkSumAdjustment value="-0x107d913c"/>
<magicNumber value="0x5f0f3cf5"/>
<flags value="00000000 00000011"/>
<unitsPerEm value="1000"/>
<created value="Tue Jun 29 11:41:10 2004"/>
<modified value="Tue Jun 29 11:41:10 2004"/>
<xMin value="-290"/>
<yMin value="-360"/>
<xMax value="1684"/>
<yMax value="989"/>
<macStyle value="00000000 00000000"/>
<lowestRecPPEM value="3"/>
<fontDirectionHint value="2"/>
<indexToLocFormat value="0"/>
<glyphDataFormat value="0"/>
</head>
</ttFont>
Foi ieasons of eciencv TiueTvpe and OpenTvpe font instances can be giouped into collection
(.ttc), so that dieient fonts can shaie common tables to desciibe glvphs. Some piogiams aie not
able to extiact the vaiious font components fiom such a collection. To help with this pioblem a small
utilitv, ttc2ttf, exists to extiact the font instances fiom a collection.
18
xetex-opentvpe.tex,v: 2.01 2009/06/15
C H A P T E R 2
X
E
T
E
X: T
E
X meets OpenType
and Unicode
2.1 X
E
T
E
X: a historical introduction and some basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 X
E
T
E
X: typesetting with glyphs, characters and fonts . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Supplementary commands introduced by X
E
T
E
X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4 fontspec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.5 X
E
T
E
X and other engines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
X
E
T
E
X is a tvpesetting svstem based on a meigei of e-T
E
X with Unicode and modein font technologies.
Ionathan Kew is the main developei behind X
E
T
E
X. X
E
T
E
Xs main aim is to deal with the complexities
(notice the coloied paits on the chaiacteis in Figuie 2.1) needed to tvpeset texts in the vaiious sciipts
used in the woild (Figuie 2.2 on the next page), in paiticulai in Asia (Figuie 2.3 on the following page).
Figuie 2.1: Complexities when dealing with vaiious languages
2 X
E
T
E
X: T
E
X MEETS OPENTYPE AND UNICODE
Figuie 2.2: Sciipts used in vaiious paits of the woild
Figuie 2.3: Asian sciipts
20
xetex-geneial.tex,v: 2.02 2009/06/15
2.1 X
E
T
E
X: a historical introduction and some basics
We stait the chaptei with an intioduction, a shoit histoiv and an oveiview of the basic opeiating
piinciples of X
E
T
E
X(Section 2.1). X
E
T
E
Xs chaiactei/glvph model, its tvpesetting algoiithm and the wav
it handles fonts is the subject of Section 2.2.
Section 2.3 piesents in detail the supplementaiv commands intioduced bv X
E
T
E
X, in paiticulai its
extension to TeXs \font command to take full advantage of the possibilities of the OpenTvpe fonts.
A L
A
T
E
X inteiface to X
E
T
E
Xs font handling is piesented in Section 2.4.
2.1 X
E
T
E
X: a historical introduction and some basics
X
E
T
E
X' was developed at SIL bv its authoi Ionathan Kew. One of X
E
T
E
Xs impoitant aims is to allow
the T
E
X engine to diiectlv use fonts available on the opeiating svstem. Technicallv this is implemented
bv augmenting T
E
Xs \font command so that it asks the host opeiating svstem to locate a given font
(using its ieal name, as known to the opeiating svstem, not some civptic lename, e.g., la Beiiv) in
whatevei font collection available. Tis means that all fonts known on a svstem and available to the
usei inteiface become usable foi tvpesetting in XeTeX and with the same names. Hence it is no longei
necessaiv to iun anv T
E
X-specic pioceduies (e.g., fontinst, oi applv one of the iecipes desciibed eai-
liei in this chaptei). When X
E
T
E
X is instiucted to use a font, it locates the actual font le itself (it can
handle all thiee vaiiants OpenTvpe, PostSciipt Tvpe 1, and TiueTvpe), and no longei needs a .tfm le.
XeTeXs paiagiaph building ioutine thus obtains metiic infoimation about the chaiactei glvphs diiectlv
fiom the font le. In addition, it has to take caie of the complexities of mapping chaiacteis to glvphs,
paiticulailv in cuisive and non-Latin sciipts. Teiefoie, XeTeX does not build its paiagiaphs fiom lists
of chaiacteis, but fiom woids, each of which consists of a whole iun of consecutive chaiacteis in a
given font. Linguistical and tvpogiaphical tiansfoimations and eects aie delegated to the appiopii-
ate lavout engine (X
E
T
E
X has inteifaces to ATSUI,` ICU, and SILs Giaphite). Te iesult is an aiiav
of glvphs and theii positions that iepiesent woids as laid out using the cuiient font. Fiom this list of
woids, which aie inteileaved with glue, penalties, etc., a paiagiaph is built. Of couise, when hvphen-
ation is iequiied, woids mav have to be taken apait and ieassembled afeiwaids using possible bieak
positions. Neveitheless the basic idea iemains: collect iuns of chaiacteis, hand them down as complete
units to a font iendeiing libiaiv, which is capable of handling the lavout at the level of the individual
glvphs.
X
E
T
E
X woiks with an extended veision of the existing dvipdfmx PDF diivei, wheie the help of
Iin-Hwan Cho has to be acknowledged. Akiia Kakutos W32tex (http://www.fsci.fuk.kindai.
ac.jp/kakuto/win32-ptex) has contiibuted a lot to make X
E
T
E
Xavailable on Miciosof Windows.
Ross Mooie has woiked on giaphics and coloi diiveis, while Mivata Shigeiu has impioved the handling
of veitical text and CIK suppoit in both X
E
T
E
X itself and the diivei, and piovides suppoit foi PSTricks
giaphics.
'Tis section is based on an inteiview with X
E
T
E
Xs authoi Ionathan Kew. Foi the full text of the inteiview see http://tug.
org/interviews/interview-files/jonathan-kew.html.
SIL (initiallv known as the Summer Institute of Linguistics, see http://www.sil.org foi moie infoimation) was cieated
in 1934. It now has about 5,000 collaboiatois coming fiom ovei 60 countiies. SILs main activitv is the linguistic investigation of
some 1,800 languages spoken bv moie than a billion people in moie than 70 countiies. In paiticulai, SIL publishes Ethnologue,
languages of the world (http://www.ethnologue.com/), a book which desciibes 6912 languages spoken on eaith.
`Apple Tvpe Seivices foi Unicode Imaging is the technologv behind all text diawing in Mac OS X, and is thus available on
that platfoim onlv. ATSUI allows ne contiol ovei lavout featuies, piovides advanced multilingual text-piocessing seivices, and
suppoits high-end tvpogiaphv. Foi details see http://developer.apple.com/documentation/Carbon/Conceptual/
ATSUI_Concepts/.
International Components for Unicode. ICU is a widelv poitable set of C/C++ and Iava libiaiies pioviding Unicode and global-
ization suppoit foi sofwaie applications. ICU ensuie that applications give the same iesults on all platfoims and between C/C++
and Iava sofwaie, see http://www.icu-project.org/.
Giaphite is a pioject to piovide iendeiing capabilities foi complex non-Roman wiiting svstems. Giaphite iuns on vaiious
computei platfoims and allows the cieation of smait fonts which suppoit displaving in wiiting svstems with vaiious complex
behaviois. Details aie available at http://scripts.sil.org/RenderingGraphite.
xetex-geneial.tex,v: 2.02 2009/06/15
21
2 X
E
T
E
X: T
E
X MEETS OPENTYPE AND UNICODE
L
A
T
E
X integiation foi X
E
T
E
X is foi a laige pait the woik of Will Robeitson.' Although X
E
T
E
X ac-
cepts Unicode input and suppoits OpenTvpe fonts, X
E
T
E
Xs inteifaces with OpenTvpe fonts is iathei
low-level. Foi instance, font featuies, such as using loweicase numbeis instead of uppeicase numbeis,
aie activated with haid to iemembei stiings such as +onum. Will Robeitsons fontspec package pio-
vides a moie ieadable and easv to use inteiface to such things with kevval-tvpe options, such as "Num-
bers=Lowercase" foi the above example. If vou want to use a new OpenTvpe (oi TiueTvpe) font,
vou no lonei need to mess aiound with extia les foi font metiics and font denitions. It is sucient
to declaie voui new font with the command \setmainfont in the pieamble to select that font as the
main document font.
Bv default, L
A
T
E
Xs NFSS mechanism onlv deals well with macioscopic font vaiiations, such as
weight, shape and size. fontspec extends L
A
T
E
Xs font handling bv pioviding suppoit foi font featuies,
which allow the usei at anv point in the document to vaiv a bioad iange of tvpogiaphical details bv
using dieient font instances.
2.1.1 A brief history
Apiil 2004: X
E
T
E
X 0.3 was ielased to the T
E
X communitv (on Mac OS X onlv) and oeied:
integiated Unicode suppoit
access to all fonts installed on the computei
AAT (Apple Advanced Tvpogiaphv) foi tvpogiaphic featuies
Ouicktime foi giaphics suppoit
Febiuaiv 2005 : X
E
T
E
X 0.9 was ieleased with as featuies:
Opentvpe suppoit
compatibilitv with moie impoitant L
A
T
E
X packages
Apiil 2006 (BachoT
E
X): X
E
T
E
Xfoi Linux was ieleased (ist public announcement of the availabilitv
of X
E
T
E
X)
Iune 2006: Akiia Kakuto announces the availabilitv of X
E
T
E
X on MS Windows
Febiuaiv 2007: T
E
XLive 2007 contains X
E
T
E
X 0.996 foi all suppoited binaiv platfoims
Septembei 2007: X
E
T
E
X 0.997 available with MikTeX 2.7 (beta)
Fall 2008: T
E
XLive 2008 contains X
E
T
E
X 0.999 foi all suppoited binaiv platfoims
Summei 2009: T
E
XLive 2009 contains X
E
T
E
X 0.999.5 foi all suppoited binaiv platfoims
2.1.2 X
E
T
E
X: basic principles
based on e-T
E
Xs tvpesetting engine
includes TeX--XeT (commands \beginL, \endL, \beginR and \endR activated with
\TeXXeTstate=1) foi bi-diiectional tvpesetting (Aiabic, Hebiew, etc.)
Unicode encoding (UTF-8 oi UTF-16) used bv default
most L
A
T
E
Xextensions (e.g., graphics, xcolor, geometry, crop, hyperref, pgf) nowautomaticallv detect
the piesence of the X
E
T
E
X engine and aie compatible with it
diiectlv uses OpenTvpe, TiueTvpe and PostSciipt fonts installed on the svstem without the need
to cieate T
E
X-specic les (.tfm, .vf, .fd, etc.)
'See http://tug.org/interviews/interview-files/will-robertson.html foi an inteiviewwith Will Robeit-
son.
22
xetex-geneial.tex,v: 2.02 2009/06/15
2.2 X
E
T
E
X: typesetting with glyphs, characters and fonts
piovides access to OpenTvpe featuies (ligatuies, swash, glvph alteinatives, dvnamic attachment of
accents, etc.)
thanks to Unicode piovides access to chaiacteis in extended alphabetic (Latin, Cviillic, Gieek,
Aiabic, Devanagaii, etc.) and complex sciipts.
allows the concuiient use of multiple sciipts in a single document thus making piocessing multi-
lingual texts much simplei
X
E
T
E
Xs diiect use of Unicode chaiacteis as input and of OpenTvpe Unicode-encoded fonts makes
pie-piocessois oi complex macios foi handling composite chaiacteis oi complex sciipts mostlv un-
necessaiv. As an example let us considei the wav T
E
X and X
E
T
E
X handle some input
T
E
X input X
E
T
E
X input typeset output notes
\'{a} \`{e} \^{o} tvpical accents
\c{c} \AA composed chaiacteis
d\v{z}abe {\dj}ak dabe ak dabe ak moie composed chaiacteis
--- \char"2014 specic ligatuie in T
E
X fonts
$\alpha$ \char"1D6FC mathematical svmbol (plane 1)
{\dn acchaa} T
E
X needs ad hoc piepiocessoi
2.2 X
E
T
E
X: typesetting with glyphs, characters and fonts
X
E
T
E
X delegates the iendeiing of Unicode chaiacteis to the freetype libiaiv' and uses the font congu-
iation libiaiv fontcong foi accessing font les (othei than T
E
X-specic fonts). Te fontcong libiaiv
lets vou conguie, customize and manage fonts foi all applications which need to access fonts piesent
on voui computing svstem.
2.2.1 Accessing font with fontcong
Te infoimation conceining fonts is stoied in XML foimat` and vou, as usei, should specifv wheie voui
OpenTvpe fonts live in the le $HOME/.fonts.conf, as in the following example of such a le.
<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">
<!-- /etc/fonts/fonts.conf file to configure system font access -->
<fontconfig>
<dir>/home/goossens/texlive/2007/texmf-update/fonts/opentype</dir>
<dir>/home/goossens/texlive/2007/texmf-commercial/fonts/opentype</dir>
<dir>/home/goossens/texlive/2007/texmf-dist/fonts/opentype</dir>
</fontconfig>
On Miciosof Windows, when iunning MikTeX, the le fonts.conf contains a line to include
the le localfonts.conf. Both these les live in the diiectoiv
c:\Documents and Settings\All Users\Application Data\MiKTeX\2.7\fontconfig\config
'See http://sourceforge.net/projects/freetype/.
See http://fontconfig.org/wiki/. You need at least fontcong veision 2.4 foi X
E
T
E
X to function coiiectlv.
`Tese les use a svntax dened bv a giammai specied as a DTD (/etc/fonts/fonts.dtd). Te svstem-wide congu-
iation le lives in /etc/fonts/fonts.conf.
xetex-geneial.tex,v: 2.02 2009/06/15
23
2 X
E
T
E
X: T
E
X MEETS OPENTYPE AND UNICODE
Te le localfonts.conf has the following content.
<?xml version="1.0"?>
<fontconfig>
<dir>C:\WINNT\Fonts</dir>
<dir>C:\Program Files\MiKTeX 2.7\fonts/type1</dir>
<dir>C:\Program Files\MiKTeX 2.7\fonts/opentype</dir>
<dir>c:\TeXlive2007\texmf-dist\fonts\opentype</dir>
<dir>c:\TeXlive2007\texmf-update\fonts\opentype</dir>
<dir>c:\TeXlive2007\texmf-commercial\fonts\opentype</dir>
</fontconfig>
Note that MiKTeX includes bv default Miciosof Windows (\WINNT\Fonts), as well as its own stan-
daid font diiectoiies. We added thiee othei ones fiom the T
E
XLive tiees (as in the example above).
Te fontcong libiaiv comes with thiee piogiams, two foi pioviding infoimation about the font
les declaied (i.e., ndable bv fontcong) on voui svstem (fc-match and fc-list), and one (fc-cache) foi
(ie)geneiating a font cache of all fonts (a fc-cache command should be issued each time a new font
is installed oi deleted).
> fc-list --help
usage: fc-list [-vV?] [--verbose] [--version] [--help] [pattern] element ...
List fonts matching [pattern]
-v, --verbose display status information while busy
-V, --version display font config version and exit
-?, --help display this help and exit
> fc-match --help
usage: fc-match [-svV?] [--sort] [--verbose] [--version] [--help] [pattern]
List fonts matching [pattern]
-s, --sort display sorted list of matches
-v, --verbose display entire font pattern
-V, --version display font config version and exit
-?, --help display this help and exit
> fc-cache --help
usage: fc-cache [-frsvV?] [--force|--really-force] [--system-only] [--verbose] [--version] [--help] [dirs]
Build font information caches in [dirs]
(all directories in font configuration by default).
-f, --force scan directories with apparently valid caches
-r, --really-force erase all existing caches, then rescan
-s, --system-only scan system-wide directories only
-v, --verbose display status information while busy
-V, --version display font config version and exit
-?, --help display this help and exit
> fc-list 'Minion Pro'
Minion Pro,Minion Pro Subh:style=Italic Subhead,Italic
Minion Pro:style=Bold Italic
Minion Pro,Minion Pro SmBd Cond Capt:style=Semibold Cond Caption,Regular
Minion Pro,Minion Pro Cond Disp:style=Bold Cond Display,Bold
Minion Pro,Minion Pro Disp:style=Display,Regular
Minion Pro,Minion Pro SmBd Subh:style=Semibold Italic Subhead,Italic
Minion Pro,Minion Pro SmBd Cond Capt:style=Semibold Cond Italic Caption,Italic
Minion Pro,Minion Pro Capt:style=Bold Caption,Bold
Minion Pro,Minion Pro Cond Subh:style=Bold Cond Italic Subhead,Bold Italic
Minion Pro,Minion Pro SmBd:style=Semibold,Regular
Minion Pro,Minion Pro Cond Disp:style=Bold Cond Italic Display,Bold Italic
Minion Pro:style=Regular
...
Many more lines
...
Minion Pro:style=Bold
Minion Pro,Minion Pro Cond:style=Bold Cond,Bold
Minion Pro,Minion Pro Cond:style=Bold Cond Italic,Bold Italic
Minion Pro,Minion Pro SmBd:style=Semibold Italic,Italic
Minion Pro,Minion Pro Disp:style=Italic Display,Italic
24
xetex-geneial.tex,v: 2.02 2009/06/15
2.2 X
E
T
E
X: typesetting with glyphs, characters and fonts
2.2.2 Specifying character codes
Te ist step towaids Unicode suppoit in T
E
X is to expand the chaiactei set bevond the oiiginal 256-
chaiactei limit. At the lowest level, this means changing inteinal data stiuctuies thioughout, wheievei
chaiacteis weie stoied as 8-bit values. As Unicode scalai values mav be up to U+10FFFF, an obvious
modication would be to make chaiacteis 32 bits wide, and tieat Unicode chaiacteis as the basic
units of text.
Howevei, in X
E
T
E
Xa piagmatic decision was made to woik inteinallv with UTF-16 as the encoding
foim of Unicode, making chaiacteis in the engine 16 bits wide, and handling supplementaiv-plane
chaiacteis using UTF-16 suiiogate paiis. Tis choice was made foi a numbei of ieasons:
X
E
T
E
Xuses opeiating svstemapplications piogiaminteifaces that expect UTF-16 encodedstieams,
so woiking with this encoding foim avoids the need foi conveision at this inteiface.
Manv of standaidT
E
Xs inteinal tables aie implementedas 256-element aiiavs indexedbv chaiactei
code. InX
E
T
E
Xthese aiiavs have beenenlaigedto 65,536 elements eachto allowthemto be indexed
bv UTF-16 code values.'
Tese pei-chaiactei aiiavs aie used to implement chaiactei categoiies, used in paising input text
into tokens, as well as case conveisions and space factoi (a piopeitv used to modifv woid spacing
foi punctuation in Roman tvpogiaphv). In piactice, it seems unlikelv that theie will be a gieat need
to customize these chaiactei piopeities foi individual supplementaiv-plane chaiacteis. Tev aie
unlikelv to be wanted as escape chaiacteis oi othei special categoiies of T
E
X input; need not have
the lettei piopeitv that allows them to be pait of T
E
X contiol sequences; and piobablv do not
need to be included in automatic hvphenation patteins.
In view of these factois, X
E
T
E
X woiks with UTF-16 code units, and Unicode chaiacteis bevond
U+FFFF cannot be given individuallv-customized T
E
X piopeities. Tev can still be included in docu-
ments, howevei, and will iendei coiiectlv (given appiopiiate fonts) as the UTF-16 suiiogate paiis will
be piopeilv passed to the font svstem.
X
E
T
E
X uses Unicodes 16-bit UTF-16 encoding
chaiacteis encoded in 16 bits
uses Unicodes UTF-16 encoding
exception: a few ancient dieientlv-encoded fonts
extension of T
E
X piimitives
\char, \chardef accept numbeis up to 65536
foui-digit notation using the svntax ^^^^abcd
\char"5609^^^^6167 =
Unicode chaiacteis in the uppei (> 0) planes
use of suiiogates (standaid UTF-16)
all iight foi tvpesetting
'In piinciple, using full 32-bit wide aiiavs would be possible but thev would make foi extiemelv laige aiiavs and have a veiv
laige memoiv footpiint. Some kind of spaise aiiav implementation would be necessaiv, but this iequiies signicant additional
development andtesting, andmight impact peifoimance of kev innei-looppaits of the T
E
Xsvstem. Teiefoie the moie piagmatic
16-bit appioach has been adopted.
xetex-geneial.tex,v: 2.02 2009/06/15
25
2 X
E
T
E
X: T
E
X MEETS OPENTYPE AND UNICODE
does not allow text manipulation in the input stieam on the level of the individual chaiactei
incieased size foi inteinal code tables foi \catcode, \lccode, \uccode, \sfcode
X
E
T
E
X plain initialises its tables with Unicode code points
\lowercase{DIN} din
\uppercase{Esi eyama kl mae nuvwo a v la}
ESI EYAMA KL MAE NUVWO A V LA
\catcode`\=\active \def{...}
X
E
T
E
Xs default input encoding is Unicode (UTF-8 oi UTF-16). X
E
T
E
X automaticallv detects the
encoding used in the input le. If a non-Unicode encoding is used, it has to be specied with a
\XeTeXinputencoding command (see page 41). Such histoiical encodings aie handled with the
ICU conveision ioutines.
2.2.3 Hyphenation
At the moment X
E
T
E
X ieuses T
E
Xs hvphenation patteins bv adding an extia Unicode lavei pio-
vided bv language-specic inteimediate les in the xu-hyphen diiectoiv'. An example of such a le
(xu-frhyph.tex which handles the Fiench patteins) follows.
%%%%%%% xu-frhyph.tex (Wrapper for XeTeX to read frhyph.tex)
\begingroup
\expandafter\ifx\csname XeTeXrevision\endcsname\relax
\else
% frhyph.tex uses ^^xx for T1 characters
% redefine them to access the required Unicode characters
% (only \oe{} actually matters here!)
\input xu-t1.tex
\fi
\input frhyph.tex
\endgroup
It is seen that xu-frhyph.tex ist loads the geneiic le xu-t1.tex, which makes the letteis in the
T1-encoded hvphenation pattein les active to map them onto theii Unicode equivalents. Pait of the
contents of that les follows.
%%%%%%%%%%%%%%%%%%%%%%%% xu-t1.tex %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% make T1 letters \active and map them to Unicode character codes
% (for use when loading hyphenation patterns that use ^^xx notation
% to represent characters in T1 font encoding, or literal 8-bit
% bytes if read using \XeTeXinputencoding "bytes")
\catcode`\"=12 % ensure " isn't active or otherwise "weird"
\catcode`\^=7 % ensure ^ is the proper catcode for hex notation
%
\catcode"B0=\active \def^^b0{^^^^0159} % rcaron
...
\catcode"DF=\active \def^^df{SS} % SS
\catcode"F7=\active \def^^f7{^^^^0153} % oe
\catcode"D7=\active \def^^d7{^^^^0152} % OE
'With T
E
XLive this diiectoiv is at texmf-dist/tex/generic/xu-hyphen.
26
xetex-geneial.tex,v: 2.02 2009/06/15
2.2 X
E
T
E
X: typesetting with glyphs, characters and fonts
% we don't handle the non-letter codes in the control range
% but we'd better handle dotless-i (for Turkish)
\catcode"19=\active \def^^19{^^^^0131} % dotlessi
Foi languages that do not use the Latin alphabet othei similai iedenitions aie made in the in-
teimediate les. On top of that fullv UTF-8encoded les exist foi ancient, monotonic and polvtonic
modein Gieek and foi Coptic.
To hvphenate woids coiiectlv hvphenation patteins have also been extended to 16 bits. As de-
sciibed pieviouslv, an inteiface between 8-bit pattein les and X
E
T
E
Xs 16-bit vaiiants exists. Foi puie
Unicode pattein les aie simple Unicode data, without need of commands oi active chaiacteis, as the
following examples show.
% hyphenate before and after independent vowel
11
11
11
% hyphenate following an independent vowel but never before
21|
21|
2.2.4 Font management: the basics
X
E
T
E
X can use all modein font foimats (PostSciipt Tvpe 1, TiueTvpe, OpenTvpe) and gives access to
all fonts on voui computei. Moieovei, X
E
T
E
X lets vou still use T
E
X-specic font les, such as tfm. Te
lattei aie useful foi math fonts oi foi non-Unicode encoded input les.
X
E
T
E
Xextends T
E
Xs \font command (as explained latei). In paiticulai, vou can specifv the actual
name of a font, iathei than its somewhat aiticial 8-chaiactei equivalent in the Fontname scheme.'
Examples aie
\font\rm="Adobe Caslon Pro" at 14pt \rm Bonjour GUT2007 !
Bonjour GUT2007 !
\font\it="Trebuchet MS" at 14pt \it Bonjour GUT2007 !
Bonjour GUT2007 !
\font\ch="Viva Std" at 14pt \ch Bonjour GUT2007 !
Bonjour GUT2007 !
A PDF post-piocessoi (bv default xdvipdfmx on Linux) can use the thiee font foimats mentioned
above. xdvipdfmx has access to all fonts usable bv xetex, i.e., those in font diiectoiies declaied to font-
cong oi in T
E
Xs texmf font tiees of (this is in analogv to dvips). On the othei hand, xdvipdfmx has
no suppoit foi bitmap fonts and limited xdvipdfmx geneiates PDF bv default. It onlv includes the chai-
acteis of a font that aie actuallv iefeienced into the PDF le. xdvipdfmx can geneiate an inteimediate
extended DVI foimat (.xdv). Tis inteimediate foimat can be useful when xetex encounteis an eiioi
and does not geneiate a PDF le. In that case vou can use the following two-step piocess to investigate
the pioblem (note the use of the veibositv switch -vv).
> xelatex -no-pdf mydocument
> xdvipdfmx -vv -E mydocument.xdv
'Fontname is maintained bv Kail Beiiv. Its documentation is available as an electionic document on CTAN at: info/
fontname.
xetex-geneial.tex,v: 2.02 2009/06/15
27
2 X
E
T
E
X: T
E
X MEETS OPENTYPE AND UNICODE
2.2.5 Font mappings using TECkit
TECkit (cuiientlv veision 2.2, see http://scripts.sil.org/TECkit) is a low-level toolkit in-
tended to be used bv othei applications that need to peifoim encoding conveisions (e.g., when im-
poiting legacv data into a Unicode-based application). Te piimaiv component of the TECkit package
is theiefoie a libiaiv that peifoims conveisions; this is the TECkit engine. Te engine ielies on map-
ping tables in a specic binaiv foimat (foi which documentation is available); theie is a compilei that
cieates such tables fiom a human-ieadable mapping desciiption (a simple text le).
Widelv-used T
E
Xkevboaiding conventions such as \'{e} oi \pounds aie implemented
via T
E
X macios (and theiefoie easilv adapted foi Unicode-compliant fonts, bv modifving the macio
denitions). In addition, theie aie a few established conventions that aie implemented as ligatuie iules
associated with standaid T
E
X fonts; these include --- (em-dash), ?` (Spanish inveited :),
and a few moie. In piinciple, smait font technologies such as AAT and OpenTvpe could implement
these same ligatuies, pioviding the same behavioi as tiaditional T
E
X fonts. But as these conventions
aie peculiai to the T
E
X woild, it is not iealistic to expect them to be piovided in mainstieam, geneial-
puipose fonts.
Although it would usuallv be possible to simulate these ligatuies via macio piogiamming, it is
dicult to ensuie that iepiogiamming widelv-used text chaiacteis such as the hvphen, question maik,
and quotation maiks will not inteifeie with othei levels of maikup in the souice document. Instead,
X
E
T
E
X piovides a mechanism known as font mappings, wheiebv a mapping of Unicode chaiacteis is
associated with a paiticulai font, and applied to all stiings of text being measuied oi iendeied in that
font. Tis is implemented using the TECkit mapping engine.
While TECkit was piimaiilv designed to conveit between legacv bvte encodings and Unicode, it can
also be used to peifoim tiansfoimations on a Unicode text stieam, using the same mapping language
and text conveision libiaiv. Te following shows the le tex-text.map (in fact its binaiv equivalent
tex-text.tec, which usuallv lives in the texmf tiee in subdiiectoiv texmf/fonts/misc/xetex/
fontmapping/), which piovides suppoit foi noimal T
E
X conventions.
; TECkit mapping for TeX input conventions <-> Unicode characters
; used with XeTeX to emulate Knuthian ligatures
; Copyright 2006 SIL International.
; You may freely use, modify and/or distribute this file.
LHSName "TeX-text"
RHSName "UNICODE"
pass(Unicode)
U+002D U+002D <> U+2013 ; -- -> en dash
U+002D U+002D U+002D <> U+2014 ; --- -> em dash
U+0027 <> U+2019 ; ' -> right single quote
U+0027 U+0027 <> U+201D ; '' -> right double quote
U+0022 > U+201D ; " -> right double quote
U+0060 <> U+2018 ; ` -> left single quote
U+0060 U+0060 <> U+201C ; `` -> left double quote
U+0021 U+0060 <> U+00A1 ; !` -> inverted exclam
U+003F U+0060 <> U+00BF ; ?` -> inverted question
When associated with a standaid Unicode-compliant font in X
E
T
E
X, this has the eect of imple-
28
xetex-geneial.tex,v: 2.02 2009/06/15
2.2 X
E
T
E
X: typesetting with glyphs, characters and fonts
menting the legacv T
E
X conventions foi dashes and quotes, as shown in the next example, without
iequiiing anv T
E
X-specic featuies in the smait fonts themselves.
Exa.
2-2-1 !'Typing ''quotes''---and dashes---the T
E
X way!
!Typing quotesand dashesthe T
E
X way!
\font\TestA="Times New Roman" at 9pt
\TestA !`Typing "quotes"(1--2)---and
``dashes''---the \TeX\ way!\par
\bigskip
\font\TestB="Times New Roman:
mapping=tex-text" at 9pt
\TestB !`Typing "quotes"(1--2)---and
``dashes''---the \TeX\ way!\par
While this mechanism, associating a mapping dened in teims of Unicode chaiactei sequences,
was ist devised in oidei to suppoit legacv T
E
X input conventions, it can also be applied in othei wavs.
Te following example shows how to tvpeset a single fiagment of input text in two sciipts bv giving
dieient font specications, one of which includes a tiansliteiation mapping (in this case the mapping
le cyr-lat-iso9.tex must be ndable bv X
E
T
E
X).
Exa.
2-2-2 Unicode
,
,
,
.
Unicode
to unikal'nyj kod dl lbogo simvola,
nezavisimo ot platformy,
nezavisimo ot programmy,
nezavisimo ot zyka.
\def\SampleText{Unicode \\

,\\
,\\
,\\
.\par}
\font\gen="Gentium" at 9pt
\centering
\gen\SampleText
\bigskip
\font\gentrans="Gentium:mapping=cyr-lat-iso9"
at 9pt \gentrans
\SampleText
2.2.6 Line breaks and justication
Some languages do not use spaces between woids in the input le, so the line bieaks must be geneiated
when tvpesetting the text.
T
E
X noimallv bieaks line at a point wheie theie is glue associated to an intei-woid space
Chinese, Iapanese, Tai, etc. do not leave spaces between woids
,.
. Unicode ,
encoding.
Te linebieaking model implementedinthe ICUlibiaiv is usedwith: \XeTeXlinebreaklocale "th"
, .
. Unicode
, encoding .
xetex-geneial.tex,v: 2.02 2009/06/15
29
2 X
E
T
E
X: T
E
X MEETS OPENTYPE AND UNICODE
Line justicationof a text without spaces, including line bieaking is a non-tiivial task. One solution
is iagged tvpesetting (i.e., no text alignment to the iight (oi lef) maigin.

Unicode

Alteinativelv one can use the command \XeTeXlinebreakskip, which lets vou intioduce glue at
potential beak points.

Unicode

2.2.7 Unicode Character/glyph model


An impoitant aspect of iendeiing Unicode text is the chaiactei/glvph model; it is assumed that the
ieadei is familiai with this concept. Tiaditionallv, T
E
X does not have a well-developed chaiactei/glvph
model. Input text is a sequence of 8-bit codes, inteipieted as chaiactei tokens oi othei (e.g., contiol
sequence) tokens accoiding to the scanning iules and chaiactei categoiies. Tese same 8-bit codes aie
used as access codes foi glvphs in fonts. It is possible to iemap codes bv T
E
X macio piogiamming, and
the font metiics (.tfm) les used bv T
E
X can include simple ligatuie iules (e.g.,fi ), but the model
is faiilv iudimentaiv, and not adequate foi sciipt behaviois such as Aiabic cuisive shaping oi Indic
ieoideiing. To suppoit the full iange of complex sciipts in Unicode, a moie complete chaiactei/glvph
model is needed.
Rathei than designing a text iendeiing svstem based on the Unicode chaiactei/glvph model fiom
sciatch, it seemed desiiable to leveiage existing implementations, allowing T
E
X to take advantage of
the smait fonts and multilingual text iendeiing facilities found in modein opeiating svstems and
libiaiies. Cuiientlv, X
E
T
E
X suppoits two such iendeiing svstems: ATSUI on Mac OS X, and ICU on othei
svstems.
2.2.8 Using OpenType via ICU Layout
While the initial implementation of X
E
T
E
X was based on Apples ATSUI iendeiing svstem, the incieas-
ing availabilitv of fonts with OpenTvpe lavout featuies led to a desiie to also suppoit this font tech-
nologv. Teiefoie, the svstem was extended bv incoipoiating the OpenTvpe lavout engine fiom ICU4.'
Befoie laving out glvphs, it is necessaiv to deal with bidiiectional lavout issues; most chunks X
E
T
E
X
needs to measuie will be unidiiectional, but this is not alwavs the case. With mixed-diiection text, each
diiection iun is measuied sepaiatelv. Te ICU LavoutEngine class is used to peifoim the actual lavout
piocess, and ietiieve the list of glvphs and positions. Te iesulting aiiav of positioned glvphs is stoied
within the woid node in X
E
T
E
Xs paiagiaph list.
Inteinallv, ICU-based OpenTvpe iendeiing is handled in a veiv dieient wav fiom ATSUI ien-
deiing. With ATSUI, the output of the tvpesetting piocess includes the oiiginal Unicode stiings and
the appiopiiate font desciiptois; the PDF-geneiating back-end then ieuses ATSUI lavout functions to
actuallv iendei the text into the PDF destination. In the case of OpenTvpe, howevei, the tvpesetting
piocess ietiieves the aiiav of positioned glvphs that iesult fiom the lavout opeiation, and iecoids this;
the back-end then meielv has to diaw the glvphs as specied, not iepeat anv of the text lavout woik.
'In addition to the actual lavout engine, X
E
T
E
X uses ICUs implementation of Unicodes BiDI (bi-diiectional) algoiithm.
30
xetex-geneial.tex,v: 2.02 2009/06/15
2.2 X
E
T
E
X: typesetting with glyphs, characters and fonts
When the T
E
X souice calls foi a paiticulai font, X
E
T
E
X looks foi specic lavout tables within the
font (e.g., GSUB foi OpenTvpe) to deteimine which lavout engine to use, and instantiates eithei an
ATSUI stvle oi an ICU LavoutEngine as appiopiiate (foi a font that suppoits both lavout technologies,
X
E
T
E
X cuiientlv chooses the OpenTvpe engine bv default, but useis can explicitlv specifv which one to
use). Te dieience in the implementation of the two technologies is, howevei, entiielv hidden fiom
the main T
E
Xpiogiam, which simplv deals with woid nodes, foiming theminto paiagiaphs and pages
once thev have been measuied bv the appiopiiate smait-font engine.
X
E
T
E
X optimallv exploits the Unicode chaiacteiistics piesent in OpenTvpe fonts. Teiefoie, X
E
T
E
X
dieis iathei diasticallv fiom T
E
Xs tiaditional model, chaiacteiized bv:
T
E
Xs fundamental tvpesetting unit is a code point of a given chaiactei in a paiticulai font, wheie
T
E
X assumes that the dimensions of such a chaiactei aie known and invaiiable
ligatuies aie handled bv a chaiactei substitution mechanism
a paiagiaph is constiucted fiom a sequence of character nodes, which aie placed with gieat pieci-
sion, inteispeised with nodes of glue.
Tis is not optimal foi Unicode, wheie a chaiactei might not coiiespond to a single known glvph.
Indeed, manv sciipts iequiie contextual selectionof glvphs (e.g., Aiabic, Devanagaii), so that chaiacteis
must be measuied in context iathei than in isolation.
X
E
T
E
Xs appioach is the following:
the tvpesetting piocess collects iuns of chaiacteis (woids) whose widths aie obtained via the API
to the svstem libiaiies (e.g., ICU) to deteimine the widths,
a X
E
T
E
X paiagiaph is a sequence of word nodes sepaiated bv glue.
Tus X
E
T
E
Xs tvpesetting engine places woids iathei than glvphs, the lattei being diawn bv the font
iendeiing engine. Te following scheme illustiates this distinction between the T
E
Xand X
E
T
E
Xengines.
T
E
X : nodes in a paiagiaph
glue:wordspace
glue:wordspace
glue:wordspace
char:T
char:h
char:e
char:q
char:u
char:i
char:c
char:k
char:f
char:o
char:x
X
E
T
E
X : nodes in a paiagiaph
glue:wordspace
glue:wordspace
glue:wordspace
word:fox
word:quick
word:The
Depending on the tables piesent in a given font, X
E
T
E
X will use ATSUI (the equivalent of ICU on
Mac OS X) oi ICU and localizes the iequested font with the application fontcong. Tus, the tvpesetting
piocess is completelv independent of the undeilving font technologv (onlv the low-level lavout engine,
xetex-geneial.tex,v: 2.02 2009/06/15
31
2 X
E
T
E
X: T
E
X MEETS OPENTYPE AND UNICODE
which needs to deteimine the dimensions of the chaiacteis, has to know. Teiefoie a given souice le
can iefei at the same time to OpenTvpe, AAT, and even T
E
X fonts).
Bv default X
E
T
E
X uses the xdvipdfmx output engine, which uses the freetype libiaiv (www.
freetype.org/) foi iendeiing the images of the glvphs with gieat piecision.
2.2.9 X
E
T
E
Xs hyphenation support
Implementing woid nodes as black boxes within the main T
E
X piogiam made it easv to foim paia-
giaphs of such woids, without extensive changes to the iest of T
E
X. A complication aiose, howevei, in
that T
E
Xhas anautomatic hvphenationalgoiithmthat comes into eect if it is unable to nd satisfactoiv
line-bieak positions foi a paiagiaph. Te hvphenation ioutine applies to lists of chaiactei nodes iepie-
senting iuns of text within a paiagiaph to be line-bioken. But at this level, the piogiam sees Unicode
woid nodes as indivisible, iigid chunks.
Explicit discietionaiv hvphens mav be included in T
E
Xinput, and these continue to woik in X
E
T
E
X,
as thev become discietionaiv bieak nodes in the list of items making up the paiagiaph. Te woid
fiagments on eithei side, then, would become sepaiate nodes in the list, and a line-bieak can occui
between them.
In oidei to ieinstate hvphenation suppoit, theiefoie, it was necessaiv to extend the hvphenation
ioutine so as to be able to extiact the text fiom a woid node, use T
E
Xs pattein-based algoiithm to
nd possible hvphenation positions within the woid, and then ieplace the oiiginal woid node with a
sequence of nodes iepiesenting the (possiblv) hvphenated fiagments, with discietionaiv hvphen nodes
in between.
A nal ienement pioved necessaiv heie: once the line-bieaks have been chosen, and the lines
of text aie being packaged foi justication to the desiied width, anv unused hvphenation points aie
iemoved and the adjacent woid (fiagment) nodes ie-meiged. Tis is iequiied in oidei to allow iendei-
ing behavioi such as chaiactei ieoideiing and ligatuies, implemented at the smait-font level, to occui
acioss hvphenation points. With an eailv ielease of X
E
T
E
X, a usei iepoited that OpenTvpe ligatuies in
ceitain woids such as dieient would inteimittentlv fail (appeaiing as diffeient, without the ligatuie).
is was occuiiing when automatic hvphenation came into eect and a discietionaiv bieak was inseited,
bieaking the woid node into sub-woids that weie being iendeied sepaiatelv.
a paiagiaph is built fiom a list of word boxes
these boxes aie tieated as indivisible units in the token lists
T
E
X can iemain unawaie of low-level details
when an acceptable linebieak cannot be found the algoiithm tiies to hvphenate woids
extiact the chaiacteis fiom the word nodes
nd bieak points using T
E
Xs hvphenation algoiithm
iepackage woids as woid fiagments and discietionaiv hvphenation nodes
modifv the node list to allow hvphenation of woids
Two glue different foxes glue
Two glue dif fer ent foxes glue hyphen? hyphen?
pioblem : the unused hvphenation points bieak iendeiing
32
xetex-geneial.tex,v: 2.02 2009/06/15
2.3 Supplementary commands introduced by X
E
T
E
X
Two glue dif fer
ent foxes glue
-
Two differ-
ent foxes
one has to ie-meige woid nodes afei choosing bieaks
Two glue differ-
ent foxes glue
Two dier-
ent foxes
2.2.10 Running xetex
As explained in Section 2.1.2 X
E
T
E
X is a development of e-T
E
X and it builds on Kail Beiivs kpathsea
libiaiv foi path seaiching as implemented in the Web2C veision of T
E
X.' Te xetex command thus
oeis essentiallv the same options (tvpe xetex --help to get a full list) as the tex command (e.g., the
veision distiibuted with T
E
XLive). Te moie impoitant additional ones aie:
-etex enable the e-T
E
X extensions
-no-pdf geneiate XDV (extended DVI) output iathei than PDF (see also page 27)
-output-driver=CMD use CMD as the XDV-to-PDF diivei instead of xdvipdfmx, the default diivei
used bv xetex
2.3 Supplementary commands introduced by X
E
T
E
X
X
E
T
E
X oeis a few additional featuies, most of which aie available with the help of new commands oi
via the highei level L
A
T
E
X inteiface of Will Robeitsons fontspec package.
X
E
T
E
X extends T
E
Xs basic command with additional options to addiess the iich set of featuies
available in OpenTvpe (and AAT) fonts, as follows.
\font\myname="[fontname]{font-options}:{font-features}"{T
E
X font-features}
Te onlv mandatoiv pait of this constiuct is fontname, the actual name of the font (as encoded in the
.ttf oi .otf les, e.g., TeX Gyre Schola.
Te xdvipdfmx diivei can also use fonts that aie not installed in the opeiating svstem. Such fonts
should have theii name specied in squaie biackets. Te full path can be specied in the font declaia-
tion, as follows,
\font\myname="[/mydir/myfontfile]"
Alteinativelv, the cuiient diiectoiv and the texmf tiees can be seaiched foi locating the given lename,
e.g., the following will select a Latin Modein font in the useis T
E
X hieiaichv.
\font\myname="[lmroman10-regular]"
'Te Web2C implementation of the T
E
Xfamilv of piogiams is a tianslation of the oiiginal WEB souices of these piogiaminto
the C piogiamming language to allow easv compilation on all piesent-dav computei svstems. A detailed desciiption is available
fiom its Web page (http://www.tug.org/web2c/) wheie vou can nd also the kpathsea manual. Cuiientlv Web2C is pait of
T
E
XLive.
xetex-geneial.tex,v: 2.02 2009/06/15
33
2 X
E
T
E
X: T
E
X MEETS OPENTYPE AND UNICODE
Te aigument font options can onlv be used when the font is selected thiough the opeiating
svstem (i.e., without squaie biackets), and mav be anv concatenation of the following:
/B Use the bold veision of the selected font.
/I Use the italic veision of the selected font.
/BI Use the bold italic veision of the selected font.
/IB Same as /BI.
/S=x Use the veision of the selected font coiiesponding to the optical size x pt.
/AAT Explicitlv use the ATSUI iendeiei (Mac OS X onlv).
/ICU Explicitlv use the ICU OpenTvpe iendeiei (onlv useful on Mac OS X).
Te aigument font-features is a comma oi semi-colon sepaiated list activating oi deactivating
vaiious AAT oi OpenTvpe font featuies, which will vaiv bv font. Te X
E
T
E
X distiibution contains the
documentation le opentype-info.tex which lists all suppoited featuies available foi the vaiious
sciipts and languages in the specied OpenTvpe font.'
OpenTvpe font featuies aie chosen bv specifving theii standaid tags names, sepaiated bv a comma
oi a semicolon, and piepended with a + to tuin them on, oi - to tuin them o.
Bold IIalIc MInIon Pro
Smnrr cnvs noru r1nrrc Mrro Puo
\font\wbi="Minion Pro/BI" at 12pt
\wbi Bold italic Minion Pro\par
\font\wbisc="Minion Pro/BI:+smcp" at 12pt
\wbisc Small caps bold italic Minion Pro
Exa.
2-3-1
X
E
T
E
X oeis a seiies of featuies that aie available foi anv font, namelv
mapping=<font map> Species the mapping foi the given font. Foi example, mapping=tex-
text enables classical T
E
X mappings such as the sequence --- being tuined into the piopei
tvpogiaphical glvph , etc.
color=RRGGBB[TT] Species the coloi foi the given font as thiee paiis of hexadecimal RGB values.
An optional aigument lets vou specifv a tianspaiencv value.
letterspace=x A space of x/S is added between woids (S is the font size).
Depending on the sciipt and language chosen a ceitain numbei of OpenTvpe featuies, when avail-
able, will be activated bv default.
Sciipt and language aie chosen as follows:
script=<script tag> selects the font sciipt,
language=<lang tag> selects the font language.
Sciipt (alphabet) tags aie foui-lettei codes,` while language tags aie thiee-lettei codes.
2.3.1 Specifying languages and scripts
Ceitain chaiacteis have a dieient piesentation depending on the language in which thev aie used.
Below we show how identical input texts aie iendeied with identical fonts ist in the default language
'A similai le, aat-info.tex, exists foi displaving the chaiacteiistics of an AAT font.
See http://www.microsoft.com/typography/otspec/featuretags.htmfoi a list of available iegisteied featuies.
`See http://www.microsoft.com/typography/otspec/scripttags.htm.
See http://www.microsoft.com/typography/otspec/languagetags.htm.
34
xetex-geneial.tex,v: 2.02 2009/06/15
2.3 Supplementary commands introduced by X
E
T
E
X
(lef) and then in Vietnamese, iespectivelv, Tuikish (iight).
\font\Doulos="Doulos SIL" \font\DoulosViet="Doulos SIL:language=VIT"
Unicode cung cp mt con s duy nht
cho mi k t
Unicode cung cp mt con s duy nht
cho mi k t
\font\Minion="Minion Pro" \font\MinionTrk="Minion Pro:language=TRK"
gelen imalai taiafndan gelen fiimalai taiafndan
Moieovei, ceitain languages need a language-specic iendeiing pioceduie to diawthe foimof the
letteis, as the following examples of Aiabic and Devanagaii show.
\font\x="Code2000:script=arab" \x
\font\x="Code2000:script=deva" \x
2.3.2 Specifying optional features
Te font declaiation can iefei to one oi moie optional featuies.
\font\x="Minion Pro" \x Hello TUG2008! 0123456789
Hello TUG2008! 0123456789
\font\x="Minion Pro:+smcp"
Hiiio TUGioo8: o1i:a-o8u
\font\x="Minion Pro Italic:+onum"
Hello TUG:oo8! or:+,o/8u
\font\x="Minion Pro Italic:+swsh,+zero"
Hello TUG2008! 012345789
Ceitain fonts come in a seveial optical sizes, so that the image of the chaiactei is optimized to the
tvpeset size used.
Minion Pio tvpeset at 7pt, at 10pt, at 18pt, and at 24pt
seven ten eighteen twentv foui
One can foice a given optical size as shown with the following texts which aie all tvpeset at 16pt,
but which use the optical size specied with the /S= speciei.
Minion Pro/S=7 Minion Pio Caption
Minion Pro/S=10 Minion Pio Text
Minion Pro/S=18 Minion Pio Subhead
Minion Pro/S=24 Minion Pio Displav
xetex-geneial.tex,v: 2.02 2009/06/15
35
2 X
E
T
E
X: T
E
X MEETS OPENTYPE AND UNICODE
2.3.3 Support for pseudo-features
Sometimes it can be useful to fake some featuies bv emulating them when thev aie not nativelv avail-
able in a given font. Examples aie slanting (in the absence of a genuine Italic vaiiant) oi extending the
width of a font (when widei oi condensed vaiiants do not exist). Tese eects can be achieved with the
slant and extend pseudo-featuies, as the following example shows.
Charis SIL normal
Charis SIL slan
Charis SIL n
Charis SIL onns slan
ChafIs SII condensed, antI-sIanted
\font\x="Charis SIL" at 12 pt
\x Charis SIL normal\\[1mm]
\font\x="Charis SIL:slant=0.2" at 12 pt
\x Charis SIL slanted\\[1mm]
\font\x="Charis SIL:extend=1.5" at 12 pt
\x Charis SIL extended\\[1mm]
\font\x="Charis SIL:slant=0.2;extend=0.8" at 12 pt
\x Charis SIL condensed, slanted\\[1mm]
\font\x="Charis SIL:slant=-0.2;extend=0.8" at 12 pt
\x Charis SIL condensed, anti-slanted
Exa.
2-3-2
2.3.4 Commands extracting information from OpenType fonts
X
E
T
E
X piovides new commands to extiact infoimation fiom font les.
\XeTeXuseglyphmetrics
A countei which species whethei the height and depth of chaiacteis must be taken into account in
the tvpesetting piocess (>0, the default), oi whethei a single height and depth foi all chaiacteis is used
(<1).
m M g G
m M g G
\font\minion="Minion Pro" at 12pt\minion
\XeTeXuseglyphmetrics=0 \fbox{m}\fbox{M}\fbox{g}\fbox{G}
\par\medskip
\XeTeXuseglyphmetrics=1 \fbox{m}\fbox{M}\fbox{g}\fbox{G}
Exa.
2-3-3
\XeTeXglyph{Glyph slot}
Inseits the glvph in slot of the cuiient font (font specic, i.e., this command will give dieient output
foi dieient fonts).
\XeTeXglyphindex"glyphname"
Tis command, that must be followed bv a space oi \relax, ietuins the glyph slot coiiesponding
to the (possiblv font specic) glyphname in the cuiientlv selected font.
36
xetex-geneial.tex,v: 2.02 2009/06/15
2.3 Supplementary commands introduced by X
E
T
E
X
\XeTeXcharglyph{charcode}
Tis command ietuins the default glvph numbei of chaiactei charcode in the cuiient font (the value
of zeio is ietuined if the chaiactei is absent fiom the font).
Exa.
2-3-4 e glyph slot in Minion Pro
for the copyright symbol is:
170 (using the font-specic
glyph name) or 170 (using the
unicode character slot).
is glyph may be typeset with
the font-specic glyph slot
printed above , or directly by
storing the slot number in a
counter, as follows: . e
Unicode code can also be used
directly to address the
character slot, as follows:
(T
E
X syntax) or (L
A
T
E
X
syntax).
\font\minion="Minion Pro"\minion
\raggedright
The glyph slot in Minion Pro for the copyright symbol is:
\the\XeTeXglyphindex"copyright" \space (using the font-specific glyph
name) or \the\XeTeXcharglyph"00A9 \space (using the unicode character
slot).
\newcounter{Cslot}
\setcounter{Cslot}{\the\XeTeXglyphindex"copyright"}
\medskip
This glyph may be typeset with the font-specific glyph slot printed
above \XeTeXglyph170, or directly by storing the slot number in a
counter, as follows: \XeTeXglyph\value{Cslot}. The Unicode code can
also be used directly to address the character slot, as follows:
\char"00A9 \space (\TeX{} syntax) or \symbol{"00A9} (\LaTeX{} syntax).
\XeTeXfonttype{font}
Retuins the numbei coiiesponding to the iendeiei which is used foi font:
0 foi T
E
X (standaid T
E
X-based .tfm font);
1 foi ATSUI (usuallv an AAT font);
2 foi ICU (an OpenTvpe font);
3 foi Giaphite.
xetex-geneial.tex,v: 2.02 2009/06/15
37
2 X
E
T
E
X: T
E
X MEETS OPENTYPE AND UNICODE
"[cmtt10]" is rendered by ICU.
"LMRoman10 Regular" is rendered by ICU.
"[lmsans10-bold]" is rendered by ICU.
"Charis SIL" is rendered by ICU.
"Charis SIL/AAT" is rendered by ICU.
\usepackage{ifthen}
\newcounter{Cfont}
\newcommand\whattype[1]{%
\texttt{\fontname#1} is rendered by
\setcounter{Cfont}{\XeTeXfonttype#1}
\ifthenelse{\value{Cfont}=0}{\TeX}{%
\ifthenelse{\value{Cfont}=1}{ATSUI}{%
\ifthenelse{\value{Cfont}=2}{ICU}{%
\ifthenelse{\value{Cfont}=3}{Graphite}%
{\typeout{Renderer number not known}}}}}%
.\par}
\font\fa="[cmtt10]"
\font\fb="LMRoman10 Regular"
\font\fc="[lmsans10-bold]"
\font\fd="Charis SIL"
\font\fe="Charis SIL/AAT"
\whattype\fa\whattype\fb
\whattype\fc\whattype\fd\whattype\fe
Exa.
2-3-5
\XeTeXOTcountscripts{Font}
Retuins the numbei of sciipts piesent in a font.
e number of scripts in Minion Pro is 4.
The number of scripts in Charis SIL is 2.
The number of scripts in Arial Unicode MS is 8.
The number of scripts in Code2000 is 21.
\newcommand{\NumScripts}[1]{%
\font\testfont="#1"\testfont
The number of scripts in #1 is
\the\XeTeXOTcountscripts\testfont.}
\NumScripts{Minion Pro}\par
\NumScripts{Charis SIL}\par
\NumScripts{Arial Unicode MS}\par
\NumScripts{Code2000}
Exa.
2-3-6
\XeTeXOTscripttag{Font}{n}
Expands to a countei coiiesponding to sciipt tag n in the font.
\XeTeXOTcountlanguages{Font}{ScriptTag}
Expands to countei coiiesponding to the numbei of languages suppoited bv the given sciipt in the
font.
\XeTeXOTlanguagetag{Font}{ScriptTag}{n}
Expands to a countei coiiesponding to language tag n in the given sciipt of the font.
\XeTeXOTcountfeatures{Font}{ScriptTag}{LanguageTag}
Expands to a countei coiiesponding to the numbei of featuies foi the given sciipt and language tags
of the font.
38
xetex-geneial.tex,v: 2.02 2009/06/15
2.3 Supplementary commands introduced by X
E
T
E
X
Type Class) Meaning Example Type Class) Meaning Example
\mathord (0) Oidinaiv / \mathopen (4) Opening (
\mathop (1) Laige opeiatoi \int \mathclose (5) Closing )
\mathbin (2) Binaiv opeiation + \mathpunct (6) Punctuation ,
\mathrel (3) Relation = \mathalpha (7) Alphabet chaiactei A
Table 2.1: Mathematics svmbol tvpes
\XeTeXOTfeaturetag{Font}{ScriptTag}{LanguageTag}{n}
Expands to a countei coiiesponding to featuie tag n foi the given sciipt and language tags in the font.
A le OpenType-info.tex that is available with the X
E
T
E
X distiibution uses all the commands
to list the featuies foi all languages and sciipts suppoited bv a given OpenTvpe font.
2.3.5 Maths fonts
To handle maths paiameteis moie easilv X
E
T
E
X adds a seiies of new piimitives to standaid T
E
X. In the
desciiption of these supplementaiv commands that follows, Fam is a numbei (0255) iepiesenting the
font to use in maths and MathType is an integei in the iange 07 (Table 2.1) dening the natuie (class
in T
E
X language, see [4, p. 154]) of the math svmbol, i.e., whethei it is a binaiv opeiatoi, a ielation, etc.
(L
A
)T
E
X needs this infoimation to leave the coiiect amount of space aiound the svmbol when it is used
in a foimula (see [5, Section 8.9] foi moie details).
\XeTeXmathcode{char slot}[=]{MathType}{Fam}{GlyphSlot}
Denes a maths glvph accessible via aninput chaiactei. Note that the input takes three aiguments unlike
T
E
Xs \mathcode.
\XeTeXmathcodenum{CharSlot}[=]{MathType/Fam/GlyphSlot}
Puie extension of \mathcode that uses a bit-packed single numbei aigument. Can also be used to
extiact the bit-packed mathcode numbei of the CharSlot if no assignment is given.
\XeTeXmathchardef{cmd}[=]{MathType}{Fam}{GlyphSlot}
Denes a maths glvph accessible via a contiol sequence.
\XeTeXdelcode{CharSlot}[=]{Fam}{GlyphSlot}
Denes a delimitei glvph accessible via an input chaiactei.
\XeTeXdelcodenum{CharSlot}[=]{Fam/GlyphSlot}
Puie extension of \delcode that uses a bit-packed single numbei aigument. Can also be used to
extiact the bit-packed mathcode numbei of the CharSlot if no assignment is given.
\XeTeXdelimiter{MathType}{Fam}{GlyphSlot}
Tvpesets the delimitei in the GlyphSlot in the familv specied of eithei MathType 4 (opening) oi 5
(closing).
xetex-geneial.tex,v: 2.02 2009/06/15
39
2 X
E
T
E
X: T
E
X MEETS OPENTYPE AND UNICODE
O
p
e
n
T
y
p
e
L
a
y
o
u
t
f
e
a
t
u
r
e
s
f
o
u
n
d
i
n
A
r
i
a
l
U
n
i
c
o
d
e
M
S
:
s
c
r
i
p
t
=
'
a
r
a
b
'
l
a
n
g
u
a
g
e
=
'
F
A
R
'
f
e
a
t
u
r
e
s
=
'
i
s
o
l
'
'
i
n
i
t
'
'
m
e
d
i
'
'
f
i
n
a
'
'
l
i
g
a
'
'
i
s
o
l
'
'
f
i
n
a
'
'
l
o
c
l
'
l
a
n
g
u
a
g
e
=
'
U
R
D
'
f
e
a
t
u
r
e
s
=
'
i
s
o
l
'
'
i
n
i
t
'
'
m
e
d
i
'
'
f
i
n
a
'
'
l
i
g
a
'
'
i
s
o
l
'
'
i
n
i
t
'
'
m
e
d
i
'
'
f
i
n
a
'
'
l
o
c
l
'
l
a
n
g
u
a
g
e
=
<
d
e
f
a
u
l
t
>
f
e
a
t
u
r
e
s
=
'
i
s
o
l
'
'
i
n
i
t
'
'
m
e
d
i
'
'
f
i
n
a
'
'
l
i
g
a
'
'
m
a
r
k
'
s
c
r
i
p
t
=
'
d
e
v
a
'
l
a
n
g
u
a
g
e
=
<
d
e
f
a
u
l
t
>
f
e
a
t
u
r
e
s
=
'
n
u
k
t
'
'
a
k
h
n
'
'
r
p
h
f
'
'
b
l
w
f
'
'
h
a
l
f
'
'
v
a
t
u
'
'
p
r
e
s
'
'
a
b
v
s
'
'
b
l
w
s
'
'
p
s
t
s
'
'
h
a
l
n
'
'
a
b
v
m
'
'
b
l
w
m
'
'
d
i
s
t
'
s
c
r
i
p
t
=
'
g
u
j
r
'
l
a
n
g
u
a
g
e
=
<
d
e
f
a
u
l
t
>
f
e
a
t
u
r
e
s
=
'
n
u
k
t
'
'
a
k
h
n
'
'
r
p
h
f
'
'
b
l
w
f
'
'
h
a
l
f
'
'
v
a
t
u
'
'
p
r
e
s
'
'
a
b
v
s
'
'
b
l
w
s
'
'
p
s
t
s
'
'
h
a
l
n
'
'
a
b
v
m
'
'
b
l
w
m
'
'
d
i
s
t
'
s
c
r
i
p
t
=
'
g
u
r
u
'
l
a
n
g
u
a
g
e
=
<
d
e
f
a
u
l
t
>
f
e
a
t
u
r
e
s
=
'
n
u
k
t
'
'
b
l
w
f
'
'
h
a
l
f
'
'
p
s
t
f
'
'
b
l
w
s
'
'
a
b
v
s
'
'
a
b
v
m
'
'
b
l
w
m
'
s
c
r
i
p
t
=
'
h
a
n
i
'
l
a
n
g
u
a
g
e
=
'
J
A
N
'
f
e
a
t
u
r
e
s
=
'
v
e
r
t
'
l
a
n
g
u
a
g
e
=
'
K
O
R
'
f
e
a
t
u
r
e
s
=
'
l
o
c
l
'
'
v
e
r
t
'
l
a
n
g
u
a
g
e
=
'
Z
H
S
'
f
e
a
t
u
r
e
s
=
'
l
o
c
l
'
'
v
e
r
t
'
l
a
n
g
u
a
g
e
=
'
Z
H
T
'
f
e
a
t
u
r
e
s
=
'
l
o
c
l
'
'
v
e
r
t
'
l
a
n
g
u
a
g
e
=
<
d
e
f
a
u
l
t
>
f
e
a
t
u
r
e
s
=
'
s
a
l
t
'
'
t
r
a
d
'
'
s
m
p
l
'
'
v
e
r
t
'
s
c
r
i
p
t
=
'
k
a
n
a
'
l
a
n
g
u
a
g
e
=
'
J
A
N
'
f
e
a
t
u
r
e
s
=
'
v
e
r
t
'
l
a
n
g
u
a
g
e
=
<
d
e
f
a
u
l
t
>
f
e
a
t
u
r
e
s
=
'
v
e
r
t
'
s
c
r
i
p
t
=
'
k
n
d
a
'
l
a
n
g
u
a
g
e
=
<
d
e
f
a
u
l
t
>
f
e
a
t
u
r
e
s
=
'
a
k
h
n
'
'
r
p
h
f
'
'
b
l
w
f
'
'
h
a
l
f
'
'
b
l
w
s
'
'
a
b
v
s
'
'
p
s
t
s
'
'
h
a
l
n
'
'
d
i
s
t
'
'
d
i
s
t
'
s
c
r
i
p
t
=
'
t
a
m
l
'
l
a
n
g
u
a
g
e
=
<
d
e
f
a
u
l
t
>
f
e
a
t
u
r
e
s
=
'
a
k
h
n
'
'
h
a
l
f
'
'
a
b
v
s
'
'
p
s
t
s
'
'
h
a
l
n
'
O
p
e
n
T
y
p
e
L
a
y
o
u
t
f
e
a
t
u
r
e
s
f
o
u
n
d
i
n
M
i
n
i
o
n
P
r
o
:
s
c
r
i
p
t
=
'
c
y
r
l
'
l
a
n
g
u
a
g
e
=
<
d
e
f
a
u
l
t
>
f
e
a
t
u
r
e
s
=
'
a
a
l
t
'
'
c
2
s
c
'
'
c
a
s
e
'
'
d
l
i
g
'
'
d
n
o
m
'
'
f
i
n
a
'
'
f
r
a
c
'
'
h
i
s
t
'
'
l
i
g
a
'
'
l
n
u
m
'
'
n
u
m
r
'
'
o
n
u
m
'
'
o
r
d
n
'
'
o
r
n
m
'
'
p
n
u
m
'
'
s
a
l
t
'
'
s
i
n
f
'
'
s
m
c
p
'
'
s
s
0
1
'
'
s
s
0
2
'
'
s
u
p
s
'
'
t
n
u
m
'
'
z
e
r
o
'
'
c
p
s
p
'
'
k
e
r
n
'
'
s
i
z
e
'
s
c
r
i
p
t
=
'
g
r
e
k
'
l
a
n
g
u
a
g
e
=
<
d
e
f
a
u
l
t
>
f
e
a
t
u
r
e
s
=
'
a
a
l
t
'
'
c
2
s
c
'
'
c
a
s
e
'
'
d
l
i
g
'
'
d
n
o
m
'
'
f
i
n
a
'
'
f
r
a
c
'
'
h
i
s
t
'
'
l
i
g
a
'
'
l
n
u
m
'
'
n
u
m
r
'
'
o
n
u
m
'
'
o
r
d
n
'
'
o
r
n
m
'
'
p
n
u
m
'
'
s
a
l
t
'
'
s
i
n
f
'
'
s
m
c
p
'
'
s
s
0
1
'
'
s
s
0
2
'
'
s
u
p
s
'
'
t
n
u
m
'
'
z
e
r
o
'
'
c
p
s
p
'
'
k
e
r
n
'
'
s
i
z
e
'
s
c
r
i
p
t
=
'
l
a
t
n
'
l
a
n
g
u
a
g
e
=
'
A
Z
E
'
f
e
a
t
u
r
e
s
=
'
a
a
l
t
'
'
c
2
s
c
'
'
c
a
s
e
'
'
d
l
i
g
'
'
d
n
o
m
'
'
f
i
n
a
'
'
f
r
a
c
'
'
h
i
s
t
'
'
l
i
g
a
'
'
l
n
u
m
'
'
n
u
m
r
'
'
o
n
u
m
'
'
o
r
d
n
'
'
o
r
n
m
'
'
p
n
u
m
'
'
s
a
l
t
'
'
s
i
n
f
'
'
s
m
c
p
'
'
s
s
0
1
'
'
s
s
0
2
'
'
s
u
p
s
'
'
t
n
u
m
'
'
z
e
r
o
'
'
c
p
s
p
'
'
k
e
r
n
'
'
s
i
z
e
'
l
a
n
g
u
a
g
e
=
'
C
R
T
'
f
e
a
t
u
r
e
s
=
'
a
a
l
t
'
'
c
2
s
c
'
'
c
a
s
e
'
'
d
l
i
g
'
'
d
n
o
m
'
'
f
i
n
a
'
'
f
r
a
c
'
'
h
i
s
t
'
'
l
i
g
a
'
'
l
n
u
m
'
'
n
u
m
r
'
'
o
n
u
m
'
'
o
r
d
n
'
'
o
r
n
m
'
'
p
n
u
m
'
'
s
a
l
t
'
'
s
i
n
f
'
'
s
m
c
p
'
'
s
s
0
1
'
'
s
s
0
2
'
'
s
u
p
s
'
'
t
n
u
m
'
'
z
e
r
o
'
'
c
p
s
p
'
'
k
e
r
n
'
'
s
i
z
e
'
l
a
n
g
u
a
g
e
=
'
D
E
U
'
f
e
a
t
u
r
e
s
=
'
a
a
l
t
'
'
c
2
s
c
'
'
c
a
s
e
'
'
d
l
i
g
'
'
d
n
o
m
'
'
f
i
n
a
'
'
f
r
a
c
'
'
h
i
s
t
'
'
l
n
u
m
'
'
n
u
m
r
'
'
o
n
u
m
'
'
o
r
d
n
'
'
o
r
n
m
'
'
p
n
u
m
'
'
s
a
l
t
'
'
s
i
n
f
'
'
s
m
c
p
'
'
s
s
0
1
'
'
s
s
0
2
'
'
s
u
p
s
'
'
t
n
u
m
'
'
z
e
r
o
'
'
c
p
s
p
'
'
k
e
r
n
'
'
s
i
z
e
'
l
a
n
g
u
a
g
e
=
'
M
O
L
'
f
e
a
t
u
r
e
s
=
'
a
a
l
t
'
'
c
2
s
c
'
'
c
a
s
e
'
'
d
l
i
g
'
'
d
n
o
m
'
'
f
i
n
a
'
'
f
r
a
c
'
'
h
i
s
t
'
'
l
i
g
a
'
'
l
n
u
m
'
'
l
o
c
l
'
'
n
u
m
r
'
'
o
n
u
m
'
'
o
r
d
n
'
'
o
r
n
m
'
'
p
n
u
m
'
'
s
a
l
t
'
'
s
i
n
f
'
'
s
m
c
p
'
'
s
s
0
1
'
'
s
s
0
2
'
'
s
u
p
s
'
'
t
n
u
m
'
'
z
e
r
o
'
'
c
p
s
p
'
'
k
e
r
n
'
'
s
i
z
e
'
l
a
n
g
u
a
g
e
=
'
R
O
M
'
f
e
a
t
u
r
e
s
=
'
a
a
l
t
'
'
c
2
s
c
'
'
c
a
s
e
'
'
d
l
i
g
'
'
d
n
o
m
'
'
f
i
n
a
'
'
f
r
a
c
'
'
h
i
s
t
'
'
l
i
g
a
'
'
l
n
u
m
'
'
l
o
c
l
'
'
n
u
m
r
'
'
o
n
u
m
'
'
o
r
d
n
'
'
o
r
n
m
'
'
p
n
u
m
'
'
s
a
l
t
'
'
s
i
n
f
'
'
s
m
c
p
'
'
s
s
0
1
'
'
s
s
0
2
'
'
s
u
p
s
'
'
t
n
u
m
'
'
z
e
r
o
'
'
c
p
s
p
'
'
k
e
r
n
'
'
s
i
z
e
'
l
a
n
g
u
a
g
e
=
'
S
R
B
'
f
e
a
t
u
r
e
s
=
'
a
a
l
t
'
'
c
2
s
c
'
'
c
a
s
e
'
'
d
l
i
g
'
'
d
n
o
m
'
'
f
i
n
a
'
'
f
r
a
c
'
'
h
i
s
t
'
'
l
i
g
a
'
'
l
n
u
m
'
'
n
u
m
r
'
'
o
n
u
m
'
'
o
r
d
n
'
'
o
r
n
m
'
'
p
n
u
m
'
'
s
a
l
t
'
'
s
i
n
f
'
'
s
m
c
p
'
'
s
s
0
1
'
'
s
s
0
2
'
'
s
u
p
s
'
'
t
n
u
m
'
'
z
e
r
o
'
'
c
p
s
p
'
'
k
e
r
n
'
'
s
i
z
e
'
l
a
n
g
u
a
g
e
=
'
T
R
K
'
f
e
a
t
u
r
e
s
=
'
a
a
l
t
'
'
c
2
s
c
'
'
c
a
s
e
'
'
d
l
i
g
'
'
d
n
o
m
'
'
f
i
n
a
'
'
f
r
a
c
'
'
h
i
s
t
'
'
l
i
g
a
'
'
l
n
u
m
'
'
n
u
m
r
'
'
o
n
u
m
'
'
o
r
d
n
'
'
o
r
n
m
'
'
p
n
u
m
'
'
s
a
l
t
'
'
s
i
n
f
'
'
s
m
c
p
'
'
s
s
0
1
'
'
s
s
0
2
'
'
s
u
p
s
'
'
t
n
u
m
'
'
z
e
r
o
'
'
c
p
s
p
'
'
k
e
r
n
'
'
s
i
z
e
'
l
a
n
g
u
a
g
e
=
<
d
e
f
a
u
l
t
>
f
e
a
t
u
r
e
s
=
'
a
a
l
t
'
'
c
2
s
c
'
'
c
a
s
e
'
'
d
l
i
g
'
'
d
n
o
m
'
'
f
i
n
a
'
'
f
r
a
c
'
'
h
i
s
t
'
'
l
i
g
a
'
'
l
n
u
m
'
'
n
u
m
r
'
'
o
n
u
m
'
'
o
r
d
n
'
'
o
r
n
m
'
'
p
n
u
m
'
'
s
a
l
t
'
'
s
i
n
f
'
'
s
m
c
p
'
'
s
s
0
1
'
'
s
s
0
2
'
'
s
u
p
s
'
'
t
n
u
m
'
'
z
e
r
o
'
'
c
p
s
p
'
'
k
e
r
n
'
'
s
i
z
e
'
F
i
g
u
i
e
2
.
4
:
L
i
s
t
o
f
f
e
a
t
u
i
e
s
f
o
i
t
h
e
s
c
i
i
p
t
s
a
n
d
l
a
n
g
u
a
g
e
s
s
u
p
p
o
i
t
e
d
b
v
t
h
e
M
i
c
i
o
s
o
f
A
i
i
a
l
a
n
d
A
d
o
b
e
M
i
n
i
o
n
f
o
n
t
s
40
xetex-geneial.tex,v: 2.02 2009/06/15
2.3 Supplementary commands introduced by X
E
T
E
X
\XeTeXradical{Fam}{GlyphSlot}
Tvpesets the iadical in the glyph slot in the familv specied.
2.3.5.1 Character classes
Te idea behind chaiactei classes is to dene a boundaiv wheie tokens can be added to the input stieam
without explicit maikup. It is piimaiilv intended foi automatic alphabet/language font switching.
\XeTeXinterchartokenstate
Countei. If positive, enables the chaiactei classes functionalitv.
\XeTeXcharclass{CharSlot}[=]{ClassNumber}
Assigns a class coiiesponding to ClassNumber (iange 0255) to a CharSlot. Most chaiacteis aie
class 0 bv default. Class 1 is foi CIK ideogiaphs, classes 2 and 3 aie CIK punctuation. Special case class
256 is ignoied; useful foi diaciitics.
\XeTeXinterchartoks{ClassNum1}{ClassNum2}[=]{token list}
Denes tokens to be inseited at the inteiface between ClassNum1 and ClassNum2 (in that oidei).
Exa.
2-3-7
a[A]a
\XeTeXinterchartokenstate = 1
\XeTeXcharclass `\a 7
\XeTeXcharclass `\A 8
\XeTeXinterchartoks 7 8 = {[\itshape}
\XeTeXinterchartoks 8 7 = {\upshape]}
\Large aAa
2.3.6 Encodings, linebreaking, etc.
\XeTeXversion \XeTeXrevision
Expand to a numbei coiiesponding to the X
E
T
E
X veision, and to a stiing coiiesponding to the X
E
T
E
X
ievision numbei, iespectivelv.
Exa.
2-3-8 The X
E
T
E
X version is: 0.997
\usepackage{xltxtra}
The \XeTeX\ version is: \the\XeTeXversion\XeTeXrevision
\XeTeXinputencoding{CharsetName}
Denes the input encoding of the following text.
\XeTeXdefaultencoding{CharsetName}
Denes the input encoding of subsequent les to be iead.
\XeTeXdashbreakstate{Integer}
Specifv whethei line bieaks afei en- and em-dashes aie allowed. O, 0, bv default.
xetex-geneial.tex,v: 2.02 2009/06/15
41
2 X
E
T
E
X: T
E
X MEETS OPENTYPE AND UNICODE
\XeTeXlinebreaklocale{LocaleID}
Denes how to bieak lines foi multilingual text. Foi instance, to bieak Chinese text, wheie the chaiac-
teis aie not sepaiated bv spaces, one can use the following (see also Example 2-4-7):
\XeTeXlinebreaklocale "zh"
\XeTeXlinebreakskip{Glue}
Intei-chaiactei linebieak stietch.
\XeTeXlinebreakpenalty{Integer}
Intei-chaiactei linebieak penaltv.
\XeTeXupwardsmode{Integer}
If gieatei than zeio, successive lines of text (and iules, boxes, etc.) will be stacked upwaids instead of
downwaids.
2.3.7 Graphics and pdfT
E
X-related commands
Tis desciiption is incomplete.
\XeTeXpicfile{Filename}{Options}
Inseit an image.
\XeTeXpdffile{Filename}{Options}
Inseit (pages of) a PDF. A simple example of how to include a one-page PDF le follows.
\XeTeXpdffile "myfile.pdf"
\pdfpageheight{Dimension}
Te height of the PDF page.
\pdfpagewidth{Dimension}
Te width of the PDF page.
\pdfsavepos
Saves the cuiient location of the page in the tvpesetting stieam.
\pdflastxpos
Retiieves the hoiizontal position saved bv the above.
\pdflastypos
Retiieves the veitical position saved bv the above.
42
xetex-geneial.tex,v: 2.02 2009/06/15
2.4 fontspec
2.4 fontspec
As explained pieviouslv, Ionathan Kews X
E
T
E
X lets vou easilv use all OpenTvpe (and TiueTvpe) fonts
available on voui computei svstem with T
E
X without having to cieate a whole seiies of .tfm, .vf, etc.
les. Neveitheless X
E
T
E
Xs \font command still has a somewhat cumbeisome svntax. Teiefoie, to
allow the use of commands moie in line with L
A
T
E
Xs NFSS svntax Will Robeitson has developed his
fontspec package. It oeis a simple wav to select font families in L
A
T
E
Xfoi aibitiaiv fonts. In paiticulai it
lets vou fullv contiol the selection of advanced font featuies that aie available in OpenTvpe oi TiueTvpe
fonts.
2.4.1 Usage
Foi basic use, no package options aie iequiied:
\usepackage{fontspec}% font selecting commands
\usepackage{xunicode}% unicode character macros
\usepackage{xltxtra} % a few fixes and extras
Ross Mooies xunicode package is highlv iecommended, as it piovides access L
A
T
E
Xs vaiious methods
foi accessing extia chaiacteis and accents (foi example, \%, \$, \textbullet, \"u, and so on), plus
manv moie unicode chaiacteis.
Will Robeitsons xltxtra package, which loads the fontspecxunicode packages, adds a couple of gen-
eial impiovements to L
A
T
E
X undei X
E
T
E
X. It also piovides the \XeTeX macio to tvpeset the \XeTeX
logo bv loading the metalogo package.
It is impoitant to note that the babel package is not ieallv suppoited. Manv languages, such as
Vietnamese, Gieek, and Hebiew, might not woik coiiectlv. You might have moie chance with Cviillic
and Latin-based languages, howeveifontspec ensuies at least that fonts should load coiiectlv, but
hvphenation and othei matteis aie not guaianteed.
fontspec has a list of options:
cm-default Te Latin Modein fonts aie not loaded;
no-math Te maths fonts aie not changed;
no-config the conguiation le fontspec.cfg is not loaded;
quiet fontspecs wainings will onlv be wiitten in the log le and not on the console.
2.4.2 Latin Modern defaults
fontspec denes a new L
A
T
E
X font encoding to allow the Latin Modein fonts (which aie Unicode-
encoded) to be used bv default. Indeed, it does not ieallv make sense to have the legacv Computei
Modein fonts in the Unicode-enabled X
E
T
E
X. Note that fontspec also iequiies the euenc package to be
installed.
Te package option ([cm-default]) instiucts fontinst to ignoie the Latin Modein fonts and use
T
E
Xs standaid Computei Modein fonts instead. Tis might be useful on a svstem wheie the Latin
Modein fonts aie not installed.
2.4.3 Maths ddling
Bv default, fontspec adjusts L
A
T
E
Xs default maths setup in oidei to maintain the coiiect Computei Mod-
ein svmbols when the ioman font changes. Howevei, it will attempt to avoid doing this if anothei maths
font package is loaded (such as mathpazo oi Wills upcoming unicode-math package).
If vou nd that it is not coiiectlv changing the maths font vou should specifv the [no-math]
package option to suppiess its maths font component.
xetex-geneial.tex,v: 2.02 2009/06/15
43
2 X
E
T
E
X: T
E
X MEETS OPENTYPE AND UNICODE
You can customise anv pait of the fontspec inteiface, e.g., selecting featuies oi sciipts, bv cieating
a le fontspec.cfg, which is automaticallv loaded bv X
E
T
E
X if it is found. Te package option [no-
config] suppiesses loading this le.
Since the fontspec package is quite veibose with its waining messages, an expeiienced usei, who
knows what she is doing, can specifv the [quiet] package option, which diiects all waining messages
to the tiansciipt (.log) le only.
2.4.4 A rst overview
fontspec is a quite complex package since it has to handle a lot of font featuies. A basic pieamble set-up
is shown below, to simplv select some default document fonts. See the le fontspec-example.tex
foi a moie detailed example.
\usepackage{fontspec}
\defaultfontfeatures{Scale=MatchLowercase}
\setmainfont[Mapping=tex-text]{Minion Pro}
\setsansfont[Mapping=tex-text]{Myriad Pro}
\setmonofont{Courier Std}
2.4.5 Font selection
\fontspec[FontFeatures]{Fontname}
Tis is the basic command of the fontspec package. It lets vou select Fontname fioma L
A
T
E
Xfamilv. Te
optional aigument FontFeatures is a comma-sepaiated list of featuies (see Section 1.4.2 on page 11).
As oui ist example, look howeasv it is to select the Minion Pro tvpeface with the fontspec package:
Mv ist fontspec example.
My rst fontspec example.
Mv iivs1 io1svic ix:mvii.
Mv rivs1 ro1svrc rxzmvir.
Mv rst fontspec example.
My rs fonspec example.
Mv v:ns1 voN1svvc vxnmvtv.
Mv rtnx1 ro1xvrc rxnMvtr.
\usepackage{fontspec,xltxtra}
\providecommand\MyText
{My first fontspec example.\\}
\fontspec{Minion Pro} \MyText
{\itshape \MyText}
{\scshape \MyText}
{\scshape\itshape \MyText}
\bfseries \MyText
{\itshape \MyText}
{\scshape \MyText}
{\itshape\scshape \MyText}
Exa.
2-4-1
Te fontspec package takes caie automatically of the necessaiv font denitions foi those shapes as
shown above. Fuitheimoie, it is not necessaiv to install the font foi X
E
T
E
X specicallv: eveiv font that
is installed in the opeiating svstem mav be accessed.
44
xetex-geneial.tex,v: 2.02 2009/06/15
2.4 fontspec
2.4.6 Default font families
Te \setmainfont, \setsansfont, and \setmonofont commands aie used to select the default
font families foi the entiie document. Tev take the same aiguments as \fontspec, foi instance:
Exa.
2-4-2 Famous quick and jumping brown foxes.
Famous quick and jumping brown foxes.
Famous quick and jumping brown foxes.
\usepackage{fontspec,xltxtra}
\providecommand\MyText
{Famous quick and jumping brown foxes.}
\setmainfont{Adobe Garamond Pro}
\setsansfont[Scale=0.86]{Cronos Pro}
\setmonofont[Scale=0.8]{News Gothic Std}
\rmfamily\MyText\par
\sffamily\MyText\par
\ttfamily\MyText
Heie, the scales of the fonts have been chosen to equalise theii loweicase lettei heights. Te Scale
font featuie also allows foi automatic scaling, as will be explained latei.
A moie complex example which shows the italic and bold vaiiants follows.
Exa.
2-4-3 Famous quick and jumping brown foxes.
Famous quick and jumping brown foxes.
Famous quick and jumping brown foxes.
Famous quick and jumping brown foxes.
Famous quick and jumping brown foxes.
Famous quick and jumping brown foxes.
Famous quick and jumping brown foxes.
Famous quick and jumping brown foxes.
Famous quick and jumping brown foxes.
Famous quick and jumping brown foxes.
Famous quick and jumping brown foxes.
Famous quick and jumping brown foxes.
\usepackage{fontspec,xltxtra}
\providecommand\MyText
{Famous quick and jumping brown foxes.}
\setmainfont{Adobe Garamond Pro}
\setsansfont[Scale=0.86]{Cronos Pro}
\setmonofont[Scale=0.8]{News Gothic Std}
\rmfamily\MyText\par
{\itshape\MyText}\par
{\bfseries\MyText}\par
{\itshape\bfseries\MyText}\par
\sffamily\MyText\par
{\itshape\MyText}\par
{\bfseries\MyText}\par
{\itshape\bfseries\MyText}\par
\ttfamily\MyText\par
{\itshape\MyText}\par
{\bfseries\MyText}\par
{\itshape\bfseries\MyText}\par
Since fontspec has to paise and piocess its aiguments at each call it can be moie ecient to cie-
ate a font instance foi a given set of featuies using the \newfontfamily command, which cieates
commands that can be used like \rmfamily, \sffamily, etc.
Exa.
2-4-4 The perfect match is hard to find.
L O G O F O N T
\usepackage{fontspec}
\setmainfont{Georgia}
\newfontfamily\lc[Scale=MatchLowercase]{Verdana}
The perfect match {\lc is hard to find.}\\
\newfontfamily\uc[Scale=MatchUppercase]{Arial}
L O G O \uc F O N T
Foi cases wheie onlv one specic font face is needed, without accompanving italic oi bold vaiiants,
the \newfontface command is available. In paiticulai, this command can be useful when a font is of
a fancv natuie, e.g., it contains sciipt oi swash featuies that aie onlv available in an italic vaiiant, and
xetex-geneial.tex,v: 2.02 2009/06/15
45
2 X
E
T
E
X: T
E
X MEETS OPENTYPE AND UNICODE
not in upiight, etc.
Characters [*349@!?] of a Brushy Nature.
\usepackage{fontspec}
\newfontface\Brush{Brush Script Std Medium}
\Brush Characters [*349@!?] of a Brushy Nature.
Exa.
2-4-5
Automatic selection of bold, italic, and bold italic foi ceitain fonts might not be adequate, in pai-
ticulai if the given font does not exist in bold oi italic vaiiants. Neveitheless, in such cases the usei
might want to choose matching shapes fiom a completelv dieient font. In othei instances a font can
have a iange of bold and italic fonts to choose between. Te BoldFont and ItalicFont featuies aie
piovided foi these situations. If onlv one of these is used, the bold italic font is iequested as the default
fiom the new font.
Helvetica Neue Ultra Light
Helvetica Neue Ultra Light (italic)
Helvetica Neue Roman (bold)
Helvetica Neue Roman (bold italic)
\usepackage{fontspec}
\fontspec[BoldFont={Helvetica Neue 55 Roman}]
{Helvetica Neue 25 Ultra Light}
Helvetica Neue Ultra Light \\
{\itshape Helvetica Neue Ultra Light (italic)} \\
{\bfseries Helvetica Neue Roman (bold)} \\
{\bfseries\itshape Helvetica Neue Roman (bold italic)}\\
Exa.
2-4-6
In this example we want to use the font Helvetica ^eue 25 Ultra Light (its full name has to be spec-
ied to the ICU piocessoi), which has no bold vaiiant, hence we tell fontspec to use Helvetica ^eue 55
Roman when constiucting the bold vaiiants. We can also specifv an explicit bold italic vaiiant with the
BoldItalicFont featuie.
Fontspec: Chinese, Mandarin
(Simplied):
,

X
E
T
E
X: Chinese, Mandarin
(Traditional):

And now the same vertically

\usepackage{fontspec,xltxtra,graphicx}
\XeTeXlinebreaklocale "zh" % allow linebreaks
\XeTeXlinebreakskip = 0pt plus 1pt minus 0.1pt
\setmainfont[Mapping=tex-text]{Minion Pro}
\providecommand{\ZHS}{%
,}
\providecommand{\ZHT}{%
}
%%%% Use font MingLiU with 'vert' feature
\parbox{45mm}{\raggedright
Fontspec: Chinese, Mandarin (Simplified):\\
\fontspec{MingLiU} \ZHS \\
\rmfamily
\XeTeX: Chinese, Mandarin (Traditional):\\
%%%% Define font in plain xetex
\font\body="MingLiU" \body \ZHT }\\[3mm]
%%%% Rotate glyphs
\rmfamily And now the same vertically\\
\fontspec[Vertical=RotatedGlyphs]{MingLiU}
\quad\rotatebox{-90}{\parbox{45mm}{\ZHS}}
\font\body="MingLiU:vertical" \body
\quad\rotatebox{-90}{\parbox{45mm}{\ZHT}}
Exa.
2-4-7
46
xetex-geneial.tex,v: 2.02 2009/06/15
2.5 X
E
T
E
X and other engines
2.5 X
E
T
E
X and other engines
Te two kev featuies X
E
T
E
X oeis aie (a) native suppoit foi Unicode, including complex non-Latin
sciipts, and (b) easv use of modein font foimats (TiueTvpe and OpenTvpe).
Eailiei, Unicode suppoit was oeied bv Omega (and then Aleph); moie iecentlv, this has been
incoipoiated into LuaT
E
X, which also has suppoit foi diiect use of OpenTvpe fonts. Neveitheless, ac-
coiding to Ionathan Kew' theie aie majoi dieiences in the appioach taken bv the dieient piojects,
in paiticulai,
X
E
T
E
X values LuaT
E
X (and piedecessois)
ease of setup and use ultimate exibilitv
uses available libiaiies contiol eveiv aspect of the implementation
wheievei feasible do the iight thing automati-
callv
piovide authois oi macio wiiteis with low-level
tools
'Piesentation at BachoT
E
X2008 (http://www.gust.org.pl/bachotex/2008/presentations/
XeTeX-BachoTeX2008-pres.pdf).
xetex-geneial.tex,v: 2.02 2009/06/15
47
C H A P T E R 3
Handling all those scripts
3.1 Writing systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2 Bidirectional typesetting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3 Languages using the Arabic alphabet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.4 Typesetting Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.5 Examples of the use of Unicode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
As shown in Figuies 2.2 and 2.3 on page 20, the woild has manv sciipts. In this chaptei we ist piesent
a biief oveiview of the woilds wiiting svstems. Pioblems ielated to bidiiectional tvpesetting and theii
solution aie desciibed in Section 3.2. Application packages foi Aiabic and Chinese tvpesetting aie the
subject of Sections 3.3.2 and 3.4, iespectivelv. Finallv, in Section 3.5 we give hints about wheie to nd
infoimation on Unicode fonts and fieelv available texts in UTF-8.
3.1 Writing systems
It is accepted that eveiv human communitv possesses language, vet the development and adoption of
wiiting svstems occuiied onlv quite iecentlv in the histoiv of mankind. Moieovei, wiiting svstems,
once thev aie intioduced, geneiallv change iathei moie slowlv than the spoken vaiiant thev iepiesent,
and thev thus ofen pieseive featuies and expiessions which aie no longei cuiient in the spoken lan-
guage. Neveitheless, the gieat benet of wiiting svstems is that thev maintain a peisistent iecoid of
infoimation expiessed in a language, which can be ietiieved independentlv of the initial act of foimu-
lation.
Wiiting svstems iequiie:
a set of dened base elements oi svmbols, individuallv teimed chaiacteis oi giaphemes, and col-
lectivelv called a sciipt;
a set of iules and conventions undeistood and shaied bv a communitv, which aibitiaiilv assign
meaning to the base elements, theii oideiing, and ielations to one anothei;
a language (geneiallv a spoken language) whose constiuctions aie iepiesented and able to be ie-
called bv the inteipietation of these elements and iules;
some phvsical means of distinctlv iepiesenting the svmbols bv application to a peimanent oi semi-
peimanent medium, so that thev mav be inteipieted (usuallv visuallv, but tactile svstems have also
3 HANDLING ALL THOSE SCRIPTS
been devised).
3.1.1 Basic terminology
Te studv of wiiting svstems has developed along paitiallv independent lines in the examination of
individual sciipts, and as such the teiminologv emploved dieis somewhat fiom eld to eld [6].
Te geneiic teim text mav be used to iefei to an individual pioduct of a wiiting svstem. Te act
of composing a text mav be iefeiied to as writing, and the act of inteipieting the text as reading. In
the studv of wiiting svstems, orthography iefeis to the method and iules of obseived wiiting stiuctuie
(liteial meaning, coiiect wiiting), and in paiticulai foi alphabetic svstems, includes the concept of
spelling.
Graphemes aie the atomic units of a given wiiting svstem, i.e., the minimallv signicant elements
which taken togethei compiise the set of building blocks out of which texts of a given wiiting svstem
mav be constiucted, along with iules of coiiespondence and use. Foi example, foi standaid contem-
poiaiv English giaphemes include the uppeicase and loweicase foims of the twentv-six letteis of the
Latin alphabet (coiiesponding to vaiious phonemes the atoms of the spoken language), maiks of punc-
tuation (mostlv non-phonemic), and a few othei svmbols such as those foi numeials (logogiams foi
numbeis).
A given giapheme mav be iepiesented in a wide vaiietv of wavs, each vaiiation being visuallv dis-
tinct in some iegaid, but all aie inteipieted as iepiesenting the same giapheme. Tese individual
vaiiations aie known as allogiaphs of a giapheme, e.g., the loweicase lettei a has dieient allogiaphs
depending on the medium used, the wiiting instiument, the stvlistic choice of the wiitei, and an indi-
viduals handwiiting.
Te teims glyph, sign, and character aie sometimes used to iefei to a giapheme. Te glvphs of most
wiiting svstems aie made up of lines (oi stiokes) and aie theiefoie called lineai, but theie aie glvphs in
non-lineai wiiting svstems made up of othei tvpes of maiks, such as Cuneifoim and Biaille.
Wiiting svstems aie conceptual svstems, as aie the languages to which thev iefei. Wiiting svstems
mav be iegaided as complete accoiding to the extent to which thev aie able to iepiesent all that mav be
expiessed in the spoken language.
3.1.2 History of writing systems
http://en.wikipedia.org/wiki/History_of_writing
Wiiting svstems weie pieceded bv pioto-wiiting, svstems of ideographic (iepiesenting an idea) oi
eailv mnemonic (seiving as a memoiv aid) svmbols, e.g., the Iiahu Sciipt (ca 6600 BCE, toitoise shells,
China), the Vinca sciipt (ca. 4500 BCE, Titiia tablets, Romania), and the eailv Indus Haiappan sciipt
(ca. 3500 BC, N-W India).
Te invention of the ist wiiting svstems is ioughlv contempoiaiv with the beginning of the Eailv
Bionze Age in the late Neolithic (aiound 3000 BCE), e.g., the Sumeiian aichaic cuneifoim sciipt and
the Egvptian hieioglvphs, geneiallv consideied the eailiest wiiting svstems, both emeige out of theii
ancestial pioto-liteiate svmbol svstems as the ist coheient texts fiom about 2600 BCE. Similailv, the
Chinese sciipt is consideied to have developed independentlv of the Middle Eastein sciipts mentioned
pieviouslv, aiound 1600 BCE.
It is geneiallv accepted that the ist tiue alphabetic wiiting appeaied in the Middle Bionze Age
(20001500 BCE), as a iepiesentation of language developed bv Semitic woikeis in Cential Egvpt.'
Ovei the next ve centuiies it spiead noith, and all subsequent alphabets aiound the woild have eithei
descended fiom it, manv via the Phoenician alphabet, oi weie diiectlv inspiied bv its design.
Te ist puielv alphabetic sciipt is thought to have been developed aiound 2000 BCE foi Semitic
woikeis in cential Egvpt.
'Histoiv of the alphabet, see http://en.wikipedia.org/wiki/History_of_the_alphabet.
50
xetex-languages.tex,v: 2.02 2009/06/15
3.1 Writing systems
souice: http://en.wikipedia.org/wiki/Image:WritingSystemsoftheWorld4.png
Figuie 3.1: Wiiting svstems used in the woild todav
3.1.3 Types of writing systems
Figuie 3.1 shows the wiiting svstems and theii tvpes as thev aie used in the woild todav.
Te oldest-known foims of wiiting weie mainlv of the logographic tvpe, i.e., thev used a single
giapheme foi iepiesenting a moipheme, the atomic unit of meaning in a language.' Such foims com-
bined pictographic (a svmbol iepiesenting a concept, object, activitv, place, event, etc. bv a drawing) and
ideographic (a svmbol iepiesenting an idea) elements.
Most wiiting svstems can be bioadlv divided into thiee categoiies, namelv logographic, syllabic, and
alphabetic, although a given wiiting svstem can contain two, oi all thiee, in which case one ofen talks
of a complex system.
Vaiious tvpes of wiiting svstems exist.
a logographic svmbol iepiesents a morpheme (e.g., Chinese chaiacteis);
a syllabic tvpe svmbol iepiesents a syllable (e.g., Iapanese kana);
an alphabetic tvpe svmbol iepiesents a phoneme: consonant oi vowel (e.g., Latin alphabet);
an abugida tvpe svmbol iepiesents a phoneme: consonant+vowel (e.g., Indian Devanagaii);
an abjad tvpe svmbol iepiesents a phoneme: consonant (e.g., Aiabic alphabet);
a featural tvpe svmbol iepiesents a phonetic feature (e.g., Koiean hangul).
3.1.3.1 Logographic writing systems
Alogogram(see http://en.wikipedia.org/wiki/Logogram) is a single wiitten chaiactei which
iepiesents a complete giammatical woid oi morpheme. Tus, manv logogiams aie iequiied to wiite all
the woids of language. Te vast aiiav of logogiams and the memoiization of what thev mean aie the
majoi disadvantage of the logogiaphic svstems ovei alphabetic svstems. On the othei hand, since the
meaning is inheient to the svmbol, the same logogiaphic svstem can theoieticallv be used to iepiesent
dieient languages. In piactice, this is onlv tiue foi closelv ielated languages, like the vaiious dialects
'See http://en.wikipedia.org/wiki/Morpheme.
xetex-languages.tex,v: 2.02 2009/06/15
51
3 HANDLING ALL THOSE SCRIPTS
of the Chinese language. Speakeis of dialects of the vaiious piovinces of China will undeistand the
chaiacteis of given Chinese text but pionounce them in quite dieient, and sometimes mutuallv un-
intelligle, wavs. Fuitheimoie, Iapanese uses Chinese logogiams extensivelv in its wiiting svstems, with
most of the svmbols caiiving the same oi similai meanings. Howevei, the semantics, and especiallv
the giammai, aie dieient enough that a long Chinese text is not ieadilv undeistandable to a Iapanese
ieadei without anv knowledge of basic Chinese giammai, though shoit and concise phiases such as
those on signs and newspapei headlines aie much easiei to compiehend.
While most languages do not use whollv logogiaphic wiiting svstems manv languages use some
logogiams. Agood example of modeinwesteinlogogiams aie the Hindu-Aiabic numeials eveivone
who uses those svmbols undeistands what 1 means, whethei the svmbol is pionounced as one, un,
eins, vi, odin, ichi, oi ehad. Othei westein logogiams include the ampeisand & (used foi and),
the (with its manv semantic uses), the (as peicent), and manv cuiiencv svmbols ($, , , , ,
etc.).
Logogiams aie sometimes called ideograms, svmbols which giaphicallv iepiesent abstiact ideas,
but this use is somewhat inappiopiiate foi Chinese chaiacteis since thev ofen consist of seman-
ticphonetic compounds, i.e., thev include an element that iepiesents the meaning and anothei that
iepiesents the pionunciation.
Todav the onlv suiviving impoitant modein logogiaphic wiiting svstemis the Chinese one, whose
chaiacteis aie oi weie used, with vaiving degiees of modication, in Chinese, Iapanese, Koiean, Viet-
namese, and othei east Asian languages. Ancient Egvptian hieioglvphics and the Mavan wiiting svstem
aie also svstems with ceitain logogiaphic featuies, although thev have maiked phonetic featuies as well,
and thev aie no longei in cuiient use.
3.1.3.2 Syllabic writing systems
A svllabaiv (see also http://en.wikipedia.org/wiki/Syllabary) is a set of wiitten svmbols
that iepiesent (oi appioximate) svllables, which make up woids. A svmbol in a svllabaiv tvpicallv iep-
iesents a consonant sound followed bv a vowel sound, oi just a vowel alone. In a tiue svllabaiv theie is
no svstematic giaphic similaiitv between phoneticallv ielated chaiacteis.'
Svllabaiies aie best suited to languages with ielativelv simple svllable stiuctuie, such as Iapanese.
wheie the numbei of possible svllables is no moie than about fv to sixtv. In contiast, English would
need manv thousands to iepiesent all its possible svllable stiuctuies. Te Iapanese language uses Chi-
nese Kanji, as well as two svllabaiies togethei called kana, namelv hiragana and katakana (developed
aiound 700 CE). Tev aie mainlv used to wiite some native woids and giammatical elements, as
well as foieign woids (see Iapanese writing system http://en.wikipedia.org/wiki/Japanese_
writing_system)
Languages that use svllabic wiiting include Mvcenaean Gieek (Lineai B), the Native Ameiican
language Cheiokee, the Afiican language Vai, the English-based cieole language Ndvuka (the Afaka
sciipt), Yi language in China, the N Shu svllabaiv foi Yao people, China, and the ancient Filipino
sciipt Alibata. Te Chinese, Cuneifoim, and Mava sciipts aie laigelv svllabic in natuie, although based
on logogiams. Tev aie theiefoie sometimes iefeiied to as logosvllabic.
3.1.3.3 Alphabetic writing systems
An alphabet (see http://en.wikipedia.org/wiki/Alphabet) is a small set of letters basic
wiitten svmbols each of which ioughlv iepiesents a phoneme of a spoken language (as it is cuiientlv
pionounced oi as it was pionounced in the past).
'Some svllabaiies exibit a giaphic similaiitv foi the vowels. Foi instance in hiiagana, the chaiacteis foi ke, ka, and ko
showno giaphical similaiitv to indicate theii common k phonetic element. Tis is in contiast to abugida, wheie each giapheme
tvpicallv iepiesents a svllable but wheie chaiacteis iepiesenting ielatedsounds aie similai giaphicallv, i.e., a commonconsonantal
base is annotated in a moie oi less consistent mannei to iepiesent the vowel in the svllable
52
xetex-languages.tex,v: 2.02 2009/06/15
3.1 Writing systems
In a peifectlv phonemic alphabet, the phonemes and letteis would coiiespond peifectlv in two
diiections: a wiitei could piedict the spelling of a woid given its pionunciation, and a speakei could
piedict the pionunciation of a woid given its spelling. Examples of languages with such an alphabet
aie Seibocioatian oi Finnish, and these have much lowei baiiieis to liteiacv than languages such as
English, which has a veiv complex and iiiegulai spelling svstem, which has haidlv evolved since manv
centuiies, wheieas the spoken language has consideiablv. Moieovei, since wiiting svstems have been
boiiowed foi languages thev weie not designed foi, the degiee to which letteis of an alphabet coiie-
spond to phonemes of a language vaiies gieatlv fiom one language to anothei and even within a single
language. Although possible, using a tiulv phonetic alphabet (e.g., the Inteinational Phonetic Alpha-
bet (IPA), see http://en.wikipedia.org/wiki/International_Phonetic_Alphabet) foi
a natuial spoken language would be veiv cumbeisome, as it would have to have a huge vaiietv of pho-
netic vaiiation.
3.1.3.4 Abjads
Te ist tvpe of alphabet that was developed was the abjad, analphabetic wiiting svstemwhich uses one
svmbol pei consonant, vowels usuallv not being maiked (see http://en.wikipedia.org/wiki/
Abjad).
Almost all abjad sciipts aie used foi Semitic languages and the ielated Beibei languages which
have a moiphemic stiuctuie which makes the denotation of vowels iedundant in most cases.
Some abjads (e.g., Aiabic and Hebiew) have maikings foi vowels as well (in this case thev aie called
impuie abjads), although thev most onlv use them in special contexts, such as foi teaching. On the
othei hand, when an abjad sciipt was adapted to a non-Semitic language the deiived abjad has been
extended with vowel svmbols to become full alphabets, the most famous case being the deiivation of
the Gieek alphabet fiom the Phoenician abjad.
3.1.3.5 Abugida
An abugida (see http://en.wikipedia.org/wiki/Abugida) is an alphabetic wiiting svstem in
which each lettei (basic chaiactei) iepiesents a consonant accompanied bv a specic vowel; othei vow-
els aie indicated bv modication of the consonant sign, eithei bv means of diaciitics oi thiough a
change in the foim of the consonant. In some abugidas, the absence of a vowel is indicated oveitlv.
About half the wiiting svstems in the woild, including the vaiious sciipts used foi most Indo-Aivan
languages, aie abugidas.
Foi instance, in an abugida theie is no sign foi k, but instead one foi ka, the a being inher-
ent vowel. Te phoneme ke is wiitten bv modifving the ka sign in a wav that is consistent with
how one would modifv la to get le. In manv abugidas the modication is the addition of a vowel
sign, but othei possibilities aie imaginable (and used), such as iotation of the basic sign, addition of
diaciitical maiks, and so on (an example can be seen foi thiee Indic sciipts in Figuie 3.1. Moie in-
foimation on Indic languages can be found at the Web page http://www.unicode.org/notes/
tn10/indic-overview.pdf).
3.1.3.6 Featural writing systems
A featuial sciipt iepiesents nei detail than an alphabet. Heie svmbols do not iepiesent whole
phonemes, but iathei the elements (featuies) that make up the phonemes, such as voicing oi its place
of articulation. Te onlv piominent example is Koiean Hangul, wheie the featuial svmbols aie com-
bined into alphabetic letteis, and these letteis aie in tuin joined into svllabic blocks, so that the svstem
combines thiee levels of phonological iepiesentation.
xetex-languages.tex,v: 2.02 2009/06/15
53
3 HANDLING ALL THOSE SCRIPTS
Table 3.1: Indic consonantvowel combinations in vaiious Indic abugidas
position syllable pronunciation derived from script
above /ke/
below /ku/
/k(a)/ Devanagari
le /ki/
right /k/
around /kau/ /ka/ Tamil
within /ki/ /ka/ Kannada
Exa.
3-1-1
3.1.4 Language Resources
http://www.geonames.de/
Tis website piovides a tieasuie of data in manv languages and sciipts. It piovides tables with the
countiies of the woild in theii own languages and sciipts, with ocial names, capitals, ags, coats of
aims, administiative divisions, national anthems, and tianslations of the countiies and capitals. Also
available aie tianslations of the names of the davs, months, planets, geogiaphical names, such as iiveis,
mountains, etc., chemical elements, ieligions, numbeis, and an extended glossaiv with seveial hundied
woids tianslated into languages classied pei familv.
http://www.lexilogos.com/
Infoimation (in Fiench) on manv languages, with examples of phiases.
3.1.5 Freely available Unicode encoded fonts
Te site Wazu japans Galleiv of Unicode Fonts (http://www.wazu.jp/) was cieated bv David
McCieedv and Mimi Weiss. Cuiientlv the site is maintained bv Wazu Iapan. Te site displavs sam-
ples of available Unicode fonts oideied bv wiiting svstem (ioughlv speaking Unicode ianges). Luc De-
vioves web site (http://cg.scs.carleton.ca/~luc/fonts.html) also has a long list of fiee
and shaiewaie fonts classied bv language.
3.1.6 Directionality
Dieient sciipts aie wiitten in dieient diiections. Te eailv alphabet could be wiitten in anv diiec-
tion: eithei hoiizontal (lef-to-iight oi iight-to-lef) oi veitical (up oi down). It could also be wiitten
boustrophedon: staiting hoiizontallv in one diiection, then tuining at the end of the line and ieveising
diiection. Egvptian hieioglvph is one such sciipt, wheie the beginning of a line wiitten hoiizontallv
was to be indicated bv the diiection in which animal and human ideogiams aie looking.
Te Gieek alphabet and its successois settled on a lef-to-iight pattein, fiom the top to the bot-
tom of the page. Othei sciipts, such as Aiabic and Hebiew, came to be wiitten iight-to-lef. Sciipts
that incoipoiate Chinese chaiacteis have tiaditionallv been wiitten veiticallv (top-to-bottom), fiom
the iight to the lef of the page, but nowadavs aie fiequentlv wiitten lef-to-iight, top-to-bottom, due
to Westein inuences, a giowing need to accommodate teims in the Roman alphabet, and technical
limitations in populai electionic document foimats. Te Mongolian alphabet is unique in being the
onlv sciipt wiitten top-to-bottom, lef-to-iight; this diiection oiiginated fiom an ancestial Semitic di-
54
xetex-languages.tex,v: 2.02 2009/06/15
3.2 Bidirectional typesetting
iection bv iotating the page 90 countei-clockwise to confoim to the appeaiance of Chinese wiiting.
Sciipts with lines wiitten awav fiom the wiitei, fiom bottom to top, also exist, such as seveial used in
the Philippines and Indonesia.
3.1.7 Writing systems on computers
Dieient ISO/IEC standaids aie dened to deal with each individual wiiting svstems to implement
them in computeis (oi in electionic foim). Todav most of those standaids aie ie-dened in a bettei
collective standaid, the ISO 10646, also known as Unicode. In Unicode, each chaiactei, in eveiv lan-
guages wiiting svstem, is in piinciple given a unique identication numbei, known as its code point.
Te computeis sofwaie uses the code point to look up the appiopiiate chaiactei in the font le, so the
chaiacteis can be displaved on the page oi scieen.
3.2 Bidirectional typesetting
Vafa Khalighis (vafa@users.berlios.de) bidi package piovides a convenient inteiface foi tvpe-
setting bidiiectional texts with X
E
L
A
T
E
X'.
Tis section is intended foi people who use bidi diiectlv, people who use othei packages that de-
pend on bidi, and developeis of the packages that depend on bidi.
bidi modies lots of L
A
T
E
X classes and packages so that vou can use them foi voui bidiiectional
tvpesetting. bidi cuiientlv suppoits the standaid L
A
T
E
X keinel, the amsart, amsbook, article, bidibeamer
(modied veision of the beamer class, bidimemoir (modied veision of the memoir class), bidimoderncv
(modied veision of the moderncv class), bidipresentation, book, bookest, extbook, rapport3, refrep,
report, scrartcl, scrbook, scrreprt classes, and the amsthm, array, booktabs, beamerthemebidiJLTree
(modied veision of the beamerthemeJLTree package), bidi2in1, bidibeamerbaseauxtemplates (mod-
ied veision of beamerbaseauxtemplates package), bidibeamerbasetemplates (modied veision of
the beamerbasetemplates package), cvthemebidicasual (modied veision of cvthemecasual package),
cvthemebidiclassic (modied veision of the cvthemeclassic package), dcolumn, draftwatermark, fancyhdr,
graphicx, hhline, listings, longtable, minitoc, multirow, pdfpages, pstricks, ragged2e, stabular, supertabular,
tabls, tabularx, tabulary, threeparttable, tikz, tocloft, tocstyle and wrapg packages. Anvthing else is not
suppoited vet but this does not mean thev will not woik with bidi, please feel fiee to expeiiment using
othei packages and classes with bidi but please note that vou aie on voui own. In futuie veisions of the
bidi package, moie classes and packages will be suppoited.
3.2.1 Using The bidi Package
You can use the package bv simplv putting \usepackage{bidi} in the pieamble of voui document.
When using bidi the following should be noted.
1. Te bidi package automaticallv loads the amsmath package so that vou do not need to load it voui-
self.
2. Te bidi package should be the last package that vou load in the pieamble of voui document. Tis
is because bidi modies lots of commands dened in othei L
A
T
E
X packages so that thev can be
used foi bidiiectional tvpesetting. If vou do not load the bidi package as voui last package, the bidi
denitions would be oveiwiitten and consequentlv vou would not get the iesult vou expect.
'In fact, bidi can be used with anv e-T
E
X-based engine, notablv PDFL
A
T
E
X.
xetex-languages.tex,v: 2.02 2009/06/15
55
3 HANDLING ALL THOSE SCRIPTS
3. Teie is an exception to the above statement, vou should alwavs load package xunicode afei' bidi.
If vou foiget to follow this iule vou will get an eiioi message which looks like this:
! Package bidi Error: Oops! you have loaded package xunicode before
bidi package. Please load package xunicode after bidi package, and
then try to run xelatex on your document again.
See the bidi package documentation for explanation.
Type H <return> for immediate help.
...
l.4 \begin{document}
?
3.2.1.1 Package options
Teie aie two options RTLdocument and rldocument which aie essentiallv equivalent. Tev aie in-
tended mainlv foi RTL tvpesetting with some LTR tvpesetting and automaticallv activate \setRTL,
\RTLdblcol and \autofootnoterule which aie explained latei.
3.2.2 Basic Direction Switching
bidi piovides some commands, enviionments foi diiection switching:
3.2.2.1 Commands for direction switching
\setRTL \setRL \unsetLTR
\setLTR \setLR \unsetRTL \unsetRL
Te commands in the ist iow allows vou to have RTL tvpesetting and the commands in the second
iow allows vou to have LTR tvpesetting.
tesepyt si hcihw hpargarap LTR a si sihT
.tfel ot thgir morf
And this is an LTRparagraph which is type-
set from left to right. Note the blank line that
we put before changing the direction of type-
setting.
\usepackage{bidi}
\setRTL
This is a RTL paragraph which is
typeset from right to left.
\setLTR
And this is an LTR paragraph which
is typeset from left to right. Note the
blank line that we put before changing
the direction of typesetting.
Exa.
3-2-1
'Tis is because amsmath should be loaded befoie the xunicode package and bidi alieadv loads amsmath. Hopefullv this will
change in futuie veisions of the bidi package.
56
xetex-languages.tex,v: 2.02 2009/06/15
3.2 Bidirectional typesetting
3.2.2.2 Environments for direction switching
\begin{RTL} \end{RTL}
\begin{LTR} \end{LTR}
Te ist enviionment allows vou to have RTL tvpesetting and the second enviionment allows vou to
have LTR tvpesetting.
Exa.
3-2-2 tesepyt si hcihw hpargarap LTR na si sihT
.tfel ot thgir morf
This is an LTRparagraph inside an RTL para-
graph.
ecno edom LTR ni gnittesepyt era ew ereH
.erom
\usepackage{bidi}
\begin{RTL}
This is an RTL paragraph which is
typeset from right to left.
\begin{LTR}
This is an LTR paragraph inside
an RTL paragraph.
\end{LTR}
Here we are typesetting in
RTL mode once more.
\end{RTL}
3.2.3 Typesetting Short RTL and LTR texts
\RLE{} \RL{}
\LRE{} \LR{}
Te commands in the ist iow allow vou to tvpeset a shoit piece of text fiom iight to lef and the
commands in the second iow allow vou to tvpeset a shoit piece of text fiom lef to iight.
\usepackage{bidi}
\setRTL
This is an RTL paragraph and \LRE{these words} appeared LTR.
\setLTR
This is an LTR paragraph and \RL{these words sentence} appeared RTL.
Exa.
3-2-3 .RTL deraeppa these words dna hpargarap LTR na si sihT
This is an LTR paragraph and ecnetnes sdrow eseht appeared RTL.
3.2.4 Multicolumn Typesetting
3.2.4.1 Two column typesetting
\RTLdblcol \LTRdblcol
\RTLdblcol allows vou to have RTL two column tvpesetting and \LTRdblcol allows vou to have
LTR two column tvpesetting as the options of the class le.
3.2.4.2 Multicolumn typesetting
Foi RTL multicolumn tvpesetting, vou can use fmultico package which has the same svntax as multicol
package.
\usepackage{bidi,fmultico}
\setRTL
\begin{multicols}{3}
xetex-languages.tex,v: 2.02 2009/06/15
57
3 HANDLING ALL THOSE SCRIPTS
EETS was founded in 1864 by Frederick James Furnivall, with the help
of Richard Morris, Walter Skeat, and others, to bring the mass of
unprinted Early English literature within the reach of students. It
was also intended to provide accurate texts from which the New (later
Oxford) English Dictionary could quote; the ongoing work on the
revision of that Dictionary is still heavily dependent on the
Society's editions, as are the Middle English Dictionary and the
Toronto Dictionary of Old English.
\end{multicols}
-vaeh llits si yranoitciD taht
-icoS eht no tnedneped yli
-diM eht era sa ,snoitide syte
eht dna yranoitciD hsilgnE eld
-nE dlO fo yranoitciD otnoroT
.hsilg
saw tI .stneduts fo hcaer eht
-ucca edivorp ot dednetni osla
weN eht hcihw morf stxet etar
-oitciD hsilgnE )drofxO retal(
-ogno eht ;etouq dluoc yran
fo noisiver eht no krow gni
4681 ni dednuof saw STEE
,llavinruF semaJ kcirederF yb
-roM drahciR fo pleh eht htiw
,srehto dna ,taekS retlaW ,sir
detnirpnu fo ssam eht gnirb ot
nihtiw erutaretil hsilgnE ylraE
Exa.
3-2-4
You also can use vwcol package foi RTL multicolumn tvpesetting.
\usepackage{bidi,vwcol}
\setRTL
\begin{vwcol}[widths={0.3,0.2,0.5},rule=2pt]
EETS was founded in 1864 by Frederick James Furnivall, with the help
of Richard Morris, Walter Skeat, and others, to bring the mass of
unprinted Early English literature within the reach of students. It
was also intended to provide accurate texts from which the New (later
Oxford) English Dictionary could quote; the ongoing work on the
revision of that Dictionary is still heavily dependent on the
Society's editions, as are the Middle English Dictionary and the
Toronto Dictionary of Old English.
\end{vwcol}
hsilgnE )drofxO retal( weN eht hcihw morf
no krow gniogno eht ;etouq dluoc yranoitciD
-vaeh llits si yranoitciD taht fo noisiver eht
sa ,snoitide syteicoS eht no tnedneped yli
eht dna yranoitciD hsilgnE elddiM eht era
.hsilgnE dlO fo yranoitciD otnoroT
-til hsilgnE ylraE
eht nihtiw erutare
.stneduts fo hcaer
-ni osla saw tI
edivorp ot dednet
stxet etarucca
ni dednuof saw STEE
semaJ kcirederF yb 4681
pleh eht htiw ,llavinruF
retlaW ,sirroM drahciR fo
gnirb ot ,srehto dna ,taekS
detnirpnu fo ssam eht
Exa.
3-2-5
3.2.5 More peculiarities for RTL typesetting
3.2.5.1 Handling color
Due to X
E
T
E
Xs limitations in handling colois, vou cannot use the color and xcolor packages foi genei-
ating RTL coloi texts. Instead vou should use the xecolour package.
58
xetex-languages.tex,v: 2.02 2009/06/15
3.2 Bidirectional typesetting
3.2.5.2 RTL cases
\rcases is dened in bidi foi tvpesetting RTL cases.
Exa.
3-2-6
nem
nemow
}
sgnieB snamuH
\usepackage{bidi}
\setRTL
\[\rcases{\text{men}\cr\text{women}}
\text{Humans Beings}
\]
3.2.5.3 Footnotes
\footnote{} \LTRfootnote{} \RTLfootnote{}
\setfootnoteRL \setfootnoteLR \unsetfootnoteRL
\footnote in RTL mode pioduces an RTL footnote while in LTR mode it pioduces an LTR foot-
note.
\LTRfootnote will alwavs pioduce an LTR footnote, independent on the cuiient mode.
\RTLfootnote will alwavs pioduce an RTL footnote, independent on the cuiient mode.
Specifving a \setfootnoteRL command anvwheie will make \footnote pioduce an RTL foot-
note.
Specifving eithei a \setfootnoteLR oi an \unsetfootnoteRL command anvwheie will make
\footnote pioduce an LTR footnote.
Te behavioi of footnote iules can also be contiolled.
\autofootnoterule \rightfootnoterule
\leftfootnoterule \textwidthfootnoterule
\rightfootnoterule will put footnote iule on the iight-hand side.
\leftfootnoterule will put footnote iule on the lef-hand side.
\textwidthfootnoterule will diaw the footnote iule with a width equal to \textwidth.
\autofootnoterule will diaw the footnote iule iight oi lef aligned based on the diiection of
the ist footnote following the iule (i.e., put in the cuiient page).
xetex-languages.tex,v: 2.02 2009/06/15
59
3 HANDLING ALL THOSE SCRIPTS
3.2.6 Tabular material in RTL mode
You can tvpeset anv tabulai mateiial in RTL mode, as seen below.
C11C12 C13C14 C15C16
C21 C22 C23 C24 C25 C26
C31 C32 C33 C34 C35 C36
C41C44 C45C46
61C51C 41C31C 21C11C
62C 52C 42C 32C 22C 12C
63C 53C 43C 33C 23C 13C
64C54C 44C14C
\usepackage{bidi}
\providecommand\Mytable{%
\begin{tabular}{|l|c|r|r|c|l|}\hline
\multicolumn{2}{|l|}{C11--C12}
& \multicolumn{2}{c|}{C13--C14}
& \multicolumn{2}{r|}{C15--C16}\\\hline
C21 & C22 & C23 & C24 & C25 & C26\\
\cline{2-2}\cline{4-4}\cline{6-6}
C31 & C32 & C33 & C34 & C35 & C36\\
\cline{1-1}\cline{3-3}\cline{5-5}
\multicolumn{4}{|l|}{C41--C44} &
\multicolumn{2}{|r|}{C45--C46}\\
\hline\hline
\end{tabular}}
\Mytable\\[1ex]
\setRTL
\Mytable
Exa.
3-2-7
Bv compaiing the top (tvpeset in LTR mode) and the bottom (tvpeset in RTL mode) tables it
is seen seen that in RTL mode the columns aie indeed tvpeset fiom iight to lef, e.g., the lefmost
column becoming the iightmost, etc. Tis behavioi includes the numbeiing of the columns, as used
in the \cline command, wheie in RTL mode, e.g., \cline{2-2} iefeis to the second iightmost
column. Note that the alignment indicatois (l and r) in the \begin{tabular} and \multicolumn
aiguments plav theii usual iole of aligning the mateiial lef and iight adjusted, iespectivelv. A moie
complex example is the following.
\usepackage{bidi}
\newcommand{\rb}[1]{\raisebox{1.5ex}[0mm]{#1}}
\setRTL
\begin{tabular}{|r||c|r|c|r|c|r|}
\hline
& \multicolumn{2}{c|}{6.15--7.15 pm} & \multicolumn{2}{c|}{7.20--8.20 pm}
& \multicolumn{2}{c|}{8.30--9.30 pm} \\ \cline{2-7}
&& Teacher && Teacher && Teacher \\ \cline{3-3}\cline{5-5}\cline{7-7}
\rb{Day} & \rb{Subj.} & Room & \rb{Subj.} & Room & \rb{Subj.} & Room\\
\hline\hline
&& Dr.~Smith && Ms.~Clark && Mr.~Mills\\
\cline{3-3}\cline{5-5}\cline{7-7}
\rb{Mon.} & \rb{UNIX} & Comp. Ctr & \rb{Fortran} & Hall A
& \rb{Math.} & Hall A \\ \hline
&& Miss Baker && Ms.~Clark && Mr.~Mill\\
\cline{3-3}\cline{5-5}\cline{7-7}
\rb{Tues.} & \rb{\LaTeX} & Conf.~Room & \rb{Fortran} & Conf~Room
& \rb{Math.} & Hall A \\ \hline
&& Dr.~Smith && Dr.~Jones && Dr.~Jones \\
\cline{3-3}\cline{5-5}\cline{7-7}
\rb{Wed.} & \rb{UNIX} & Comp. Ctr & \rb{C} & Hall A
& \rb{ComSci.} & Hall A \\ \hline
&& Miss Baker && Ms. Clark & \multicolumn{2}{c|}{} \\
\cline{3-3}\cline{5-5}
60
xetex-languages.tex,v: 2.02 2009/06/15
3.3 Languages using the Arabic alphabet
\rb{Fri.} & \rb{\LaTeX} & Conf.~Room & \rb{C++} & Conf.~Room
& \multicolumn{2}{c|}{\rb{canceled}}\\ \hline
\end{tabular}
Exa.
3-2-8
mp 03.903.8 mp 02.802.7 mp 51.751.6
rehcaeT rehcaeT rehcaeT
mooR
.jbuS
mooR
.jbuS
mooR
.jbuS yaD
slliM .rM kralC .sM htimS .rD
A llaH
.htaM
A llaH
nartroF
rtC .pmoC
XINU .noM
lliM .rM kralC .sM rekaB ssiM
A llaH
.htaM
mooR fnoC
nartroF
mooR .fnoC
L
A
T
E
X .seuT
senoJ .rD senoJ .rD htimS .rD
A llaH
.icSmoC
A llaH
C
rtC .pmoC
XINU .deW
kralC .sM rekaB ssiM
delecnac
mooR .fnoC
++C
mooR .fnoC
L
A
T
E
X .irF
You can get an idea of the manv additional featuies that aie available in the bidi bv looking at the
examples accompanving the bidi package.
3.3 Languages using the Arabic alphabet
Te Aiabic alphabet (see http://en.wikipedia.org/wiki/Arabic_alphabet) is afei the
Latin alphabet, the second-most widelv used alphabet aiound the woild. Te alphabet was ist used
to wiite texts in Aiabic, in paiticulai the Ouian, the holv book of Islam. With the spiead of Islam, it
came to be used to wiite manv othei languages, such as Peisian, Uidu, Pashto, Baloch, Malav, Balti,
Biahui, Panjabi (in Pakistan), Kashmiii, Sindhi (in Pakistan), Uvghui (in China), Kazakh (in China),
Kvigvz (in China), Azeibaijani (in Iian) and Kuidish in Iiaq and Iian. In oidei to accommodate the
needs of these (ofen non-semitic) languages, new letteis and othei svmbols weie added to the oiiginal
alphabet.
Aiabic is wiitten fiom iight to lef, and is wiitten in a cuisive stvle of sciipt. Teie aie 28 basic
letteis in the Aiabic alphabet. In analogv with the iich set of tvpefaces in the Roman alphabet, Aiabic
sciipts [3] come in a numbei of dieient Aiabic calligiaphv stvles (see Figuie 3.2 foi a few examples).
In the Aiabic alphabet theie aie no distinct uppei and lowei case lettei foims. Both piinted and
wiitten Aiabic aie cuisive, with most of the letteis diiectlv connected to the lettei that immediatelv
follows. Teie aie some non-connecting letteis that do not connect with the following lettei, even in
the middle of a woid. Each individual lettei can have up to foui distinct foims, depending on the
position of the lettei within in a woid oi gioup of letteis, as follows:
Initial: beginning of a woid; oi in the middle of a woid, following a non-connecting lettei.
Medial: between two connecting letteis (non-connecting letteis lack a medial foim).
Final: at the end of a woid following a connecting lettei.
Isolated: at the end of a woid following a non-connecting lettei; oi used independentlv.
Some letteis appeai almost the same in all foui foims, while otheis displav moie vaiietv. In ad-
dition, some lettei combinations aie wiitten as ligatuies (special shapes), including lam-alif. In manv
cases, dots will be placed above oi below the cential pait of a lettei to distinguish it fiom othei similai
letteis.
Te Aiabic alphabet is an impuie abjad since shoit vowels aie not wiitten, but long ones aie.
Teiefoie the ieadei must knowthe language in oidei to iestoie the vowels. Howevei, in editions of the
xetex-languages.tex,v: 2.02 2009/06/15
61
3 HANDLING ALL THOSE SCRIPTS
Dieient stvles of the phiase In the name of God (top to bot-
tom):
Ruqah oi Riqa is chaiacteiized bv clipped letteis composed
of shoit stiaight lines and simple cuives, as well as its
stiaight and even lines of text. It is cleai and legible and
is the easiest sciipt foi dailv handwiiting. It is used in the
titles of books and magazines, and in commeicial advei-
tisements.
^askh, ^askhi oi ^esih is the most commonlv used stvle
foi piinting Aiabic, and usuallv the ist to be taught to
childien.
^astalq oi ^astaleeq is one of the main genies of Islamic
calligiaphv. It has shoit veiticals with no seiifs, and long
hoiizontal stiokes. In is onlv used foi titles and heading
in wiiting Aiabic, but a somewhat less elaboiate veision
seives as the piefeiied stvle foi wiiting Peisian, Pashto and
Uidu (and foimeilv foi Ottoman Tuikish)
Tuluth is chaiacteiized bv cuived and oblique lines, with
one-thiid of each lettei sloping. It is a laige and elegant,
cuisive sciipt, used in medieval times on mosque decoia-
tions, and to wiite the heading of suiahs, Ouianic chap-
teis.
Muhaqqaq oi Muhakkak, a now iaielv used calligiaphic
sciipt in Aiabic deiived fiom Tuluth bv widening the
hoiizontal sections of the letteis in the Tuluth sciipt.
Kuq oi Kuc is the oldest calligiaphic foim of the vaiious
Aiabic sciipts. It was alieadv in use at the time of the emei-
gence of Islam so that the ist copies of the Ouian weie
wiitten in this sciipt. Kuc (the example shows Square Ku-
q) is chaiacteiized bv stiaight lines and angles, ofen with
elongated veiticals and hoiizontals. souice: www.islamicarchitecture.org/art/images/calligraphy/
Figuie 3.2: Examples of six Aiabic calligiaphic stvles
Ouian oi in didactic woiks vocalization maiks aie used, including a sign foi vowel omission (sukn)
and one foi gemination/doubling/lengthening of consonants (adda).
3.3.1 ArabT
E
X: Arabic typography with T
E
X
Since 1992, when Klaus Lagallv publiclv ieleased Veision 2 of his arabtex package,' T
E
X useis have
been able to tvpeset Aiabic (and Hebiew) texts in a usei-fiiendlv wav, and foi manv veais AiabT
E
X
has become a standaid tvpesetting tool foi manv Aiabists. Howevei, Lagallvs masteiful, but extiemelv
complex, dicult to undeistand, and monolithic set of T
E
Xmacios makes it at piesent a somewhat out-
of-date piece of sofwaie. AiabT
E
X peifoims all tvpesetting tasks, fiom paising the input encoding,
doing the contextual analvsis, assembling the vaiious foims of a chaiactei, and placing them on the
page fiom iight to lef, bv T
E
X macios. Moieovei, AiabT
E
X can onlv be used with its speciallv designed
fonts.
Todav, with the advent of Unicode-encoded OpenTvpe fonts, manv of the foimatting issues aie
encoded in the OpenTvpe fonts and taken caie of bv the opeiating svstem. Teiefoie, a Unicode-
based solution taking full advantage of the manv nice Aiabic OpenTvpe fonts, is highlv desiiable. Te
AiabX
E
T
E
X svstem, desciibed in Section 3.3.2, is one wav of solving the pioblem, while Youssef Iabiis
arabi package [2] (available on CTAN in the diiectoiv /language/arabic/arabi/) piovides an-
'Te URL ftp://ftp.informatik.uni-stuttgart.de/pub/arabtex/arabtex.htm gives infoimation about the
most iecent veision of the sofwaie (3.11 ,dated 2 Iulv 2006, at the time of wiiting).
62
xetex-languages.tex,v: 2.02 2009/06/15
3.3 Languages using the Arabic alphabet
Table 3.2: AiabT
E
Xs input conventions foi Aiabic and Peisian
Exa.
3-3-1
a a alif b .

b b a p .
.
p p a
t
.
. t t a _t
.
. t

a ^g -

g gm
.h - h
.
h
.
a _h - h

a d . d d al
_d

. d

al r r r a z

z z ay
s s sn ^s
.
n .s s
.
s
.
ad
.d

d
.
d
.
ad .t . t
.
t
.
a .z

. z
.
z
.
a
ayn .g

g gayn f

. f f a
q
.
q q af v
.
. v v a k . k k af
g . g g af l l l am m . m mm
n

_ n n un h . h h a w . w w aw
y
.
y y a _A a alif T
.
. h t a
maqs
.
ura marbut
.
a
othei.
Table 3.2 shows AiabT
E
Xs input convensions foi the Aiabic and Peisian languages.
A small example of the use of AiabT
E
X is the following Aiabic anecdote about Iuha and the 10
donkevs (We will use the text of Example 3-3-2 also in the examples of AiabX
E
T
E
X). Te text is shown
fullv vocalized (\fullvocalize) and is tiansliteiated inline (\transtrue). Te title is centeied and
tvpeset in bold (\setnashbf). Te shoit Aiabic text of the title is maiked up inside the chaiacteis
sequence \< and >, while the longei Aiabic text of the bodv of the stoiv is enclosed inside an arabtext
enviionment. Compaie the tvpeset output with the input text using the input conventions of Table 3.2.
Note the dieient foims of the letteis, which aie all composed bv AiabT
E
Xs macios.
\usepackage{arabtex,atrans,nashbf}
\setarab\transtrue\fullvocalize
\setnashbf \centerline {\<^gu.hA wa-.hamIruhu al-`a^saraTu>}
\transtrue\setnash
\begin{arabtext}
i^starY ^gu.hA `a^saraTa .hamIriN.
fari.ha bihA wa-sAqahA 'amAmahu,
_tumma rakiba wA.hidaN minhA.
wa-fI al-.t.tarIqi `adda .hamIrahu wa-huwa rAkibuN,
fa-wa^gadahA tis`aTaN.
_tumma nazala wa-`addahA fa-ra'AhA `a^saraTuN fa-qAla:
xetex-languages.tex,v: 2.02 2009/06/15
63
3 HANDLING ALL THOSE SCRIPTS
'am^sI wa-'aksibu .himAraN,
'af.dalu min 'an 'arkaba wa-'a_hsara .himAraN.
\end{arabtext}
.
.

.
.
.
.
.
.
.
.
|
.

.
.
.
.
.
-
.
. .
.
.
.
-

guh
.
a wa-h
.
amruhu l-aaratu
itar a guh
.
a aarata h
.
amrin. farih
.
a bih a wa-s aqah a am amahu, t

umma rakiba w ah
.
idan minh a. wa-
f t
.
-t
.
arqi adda h
.
amrahu wa-huwa

.

.

.

.

.
.
.

-

.
.

.
.

.
.

.
.
.

.

. .

.
.

..



.-

.

.


.
.

..

.
.

.
.

.

. .

..

.
.
.

-

.
.

.
.

. .

.

.

.
.
.
.
.

r akibun, fa-wa gadah a tisatan. t

umma nazala wa-addah a fa-ra ah a aaratun fa-q ala:

.
.

.
.
.

.
.

. .

.
.

.

.

.
.

.

.

.
.


.
.
.
.
.

.

.

.

-


.
.

am wa-aksibu h
.
im aran, afd
.
alu min an arkaba wa-ah

sara h
.
im aran.


.

.
-

.
.

.

.

.
.

.


.

.
-

.
.

.
.

.
.

Exa.
3-3-2
3.3.2 ArabX
E
T
E
X: Arabic typography with X
E
T
E
X
Fianois Chaiettes arabxetex package is a X
E
L
A
T
E
Xadaptionof Klaus Lagallvs arabtex (see Section3.3.1).
Te main advantage of the package is that it allows vou to use all OpenTvpe encoded Aiabic fonts that
vouhave available onvoui svstem. Inpaiticulai, the package iequiies that voudeclaie the default Aiabic
font, \arabicfont, with fontspecs \newfontfamily command.
Te arabxetex package consists of a set of TECkit mappings (see Section 2.2.5) foi conveiting in-
teinallv fiom arabtexs ASCII input convention to Unicode, and a L
A
T
E
X stvle le (arabxetex.sty) that
piovides a convenient usei inteiface foi tvpesetting in those languages. With iespect to arabtexs con-
ventions, arabxetex intioduces seveial additions, and a few minoi modications (see the next section).
arabxetex ielies on the package bidi (see Section 3.2).
The arabtex input encoding
Apait fiom ease and legibilitv, the arabtex input conventions oei seveial advantages foi tvpesetting
in the Aiabic sciipt. As the examples in this section will show, indeed, it is stiaightfoiwaid to mix Uni-
code and arabtex encodings on input, and to switch between iomanized tiansliteiation and the Aiabic
sciipt on output. Tis comes in handv when one wants to input L
A
T
E
X constiucts inside Aiabic souices
oi handle complex multi-lavei documents, such as ciitical editions, wheie footnotes and annotations
abound, and wheie dealing with a plain ASCII encoding is a genuine advantage, all the moie so since
AiabT
E
Xs input conventions allow vou a full contiol of the tvpogiaphical details.
Support for languages using the Arabic script
Languages suppoited at piesent aie the same as in arabtex, namelv: Aiabic, Maghiibi Aiabic, Faisi
(Peisian), Uidu, Sindhi, Kashmiii, Ottoman Tuikish, Kuidish, Iawi (Malav) and Uighui. arabxetex adds
suppoit foi seveial additional Unicode chaiacteis, so that some moie languages aie piobablv suppoited
de-facto as well (such as Westein Punjabi).
Foi Aiabic RL (fiom-iight-to-lef) texts the arabxetex package denes the arab enviionment
and the equivalent \textarab command foi shoit Aiabic text inseitions inside lef-to-iight input
texts.' Foi othei languages wiitten in the Aiabic alphabet similai enviionments and commands, aie
available, as follows.
\begin{farsi}[opt]\end{farsi} \farsi[opt]{}
'Similailv, foi lef-to-iight Latin inseitions inside Aiabic text the \textroman command can be used.
64
xetex-languages.tex,v: 2.02 2009/06/15
3.3 Languages using the Arabic alphabet
\begin{kashmiri}[opt]\end{kashmiri} \kashmiri[opt]{}
\begin{kurdish}[opt]\end{kurdish} \kurdish[opt]{}
\begin{malay}[opt]\end{malay} \malay[opt]{}
\begin{ottoman}[opt]\end{ottoman} \ottoman[opt]{}
\begin{pashto}[opt]\end{pashto} \pashto[opt]{}
\begin{sindhi}[opt]\end{sindhi} \sindhi[opt]{}
\begin{urdu}[opt]\end{urdu} \urdu[opt]{}
\begin{uighur}[opt]\end{uighur} \uighur[opt]{}
Foi some entiies in this list alteinatives names exits, namelv persian foi farsi, turk foi ottoman,
and jawi foi malay.
Te optional aigument opt in all of these commands oi enviionments can take one oi moie of
the following values. Te equivalent command in AiabT
E
X is given between squaie biackets when it
exists.
novoc non-vocalized mode: no diaciitics aie added (the default global option) [\novocalize].
fullvoc fully vocalized: mode eveiv shoit vowel wiitten will geneiate the coiiesponding diaciitical
maik [\fullvocalize].
voc vocalized mode: as fullvoc, but skun and wala will not be geneiated [\vocalize].
trans tiansliteiation mode [\transtrue].
utf input in plain UTF-8 encoding. When not in tiansliteiation mode, this option is in piinci-
ple not stiictlv needed since one can mix AiabT
E
Xs ASCII input conventions and UTF-8
input.
Transliteration
At piesent AiabX
E
T
E
X oeis arabtex tiansliteiation mappings foi Aiabic, Peisian, Uidu, Sindhi and
Pashto. It is foiseen to implement alteinative tiansliteiation conventions foi each language, as with
arabtex, e.g., ZDMG, Encvclopedia Iianica, etc. (a list of such schemes is at the URL http://
transliteration.eki.ee/pdf/Arabic.pdf)
As with arabtex (see Example 3-3-2), the tiansliteiation is bv default tvpeset in italics. Tis can
be customized ewith the \SetTranslitStyle command. In the tiansliteiation one can capitalize
piopei names bv piexing the woid with the command \UC, e.g.,
Exa.
3-3-3 al-shaykh al-lim Nar al-Dn al-s
\usepackage{arabxetex}
\newfontfamily\arabicfont[Script=Arabic]{Scheherazade}
\newfontfamily\gentium{Gentium}
\SetTranslitStyle{\gentium\itshape}
\begin{arab}[trans]
al-^say_h al-`Alim \UC na.sIr \UC al-dIn \UC al-.tUsI
\end{arab}
Since the tiansliteiation is coded in Unicode we must ensuie that all needed Latin extension chai-
acteis aie available in the font. Teiefoie we used the font gentium in this example. Note also that in
the tiansliteiation, the aiticle al- is automaticallv skipped.
Emphasis
In Aiabic emphasis is ofen indicated with a line ovei the text to be highlighted. In AiabX
E
T
E
X this
is achieved with the \aemph command. Te following example shows how this woiks, ist without
xetex-languages.tex,v: 2.02 2009/06/15
65
3 HANDLING ALL THOSE SCRIPTS
vocalization and then with vocalization.
:

\usepackage{arabxetex}
\newfontfamily\arabicfont[Script=Arabic,Scale=2.0]
{Scheherazade}
\begin{arab}[novoc]mi_tAl: \aemph{45} darajaT\end{arab}
\begin{arab}[voc] mi_tAl: \aemph{45} darajaT\end{arab}
Exa.
3-3-4
ArabT
E
Xs four representation variants
Te following somewhat longei example uses the same text as Example 3-3-2, but shows the foui pie-
sentation vaiiants intioduced pieviouslv one afei the othei. We use the Traditional Arabic font as de-
fault Aiabic font (\arabicfont command) and Gentium as font foi the non-Aiabic text (with the
\setmainfont command, which sets the main font foi the document).
\usepackage[no-math]{fontspec}
\setmainfont{Gentium} \usepackage{arabxetex} \newfontfamily\arabicfont
[Script=Arabic,Scale=1.2]{Traditional Arabic}
% Story of Juha and the 10 donkeys
\begin{arab}% No short vowels shown
\begin{center}\bfseries\large ^gu.hA wa-.hamIruhu al-`a^saraTu\end{center}
i^starY ^gu.hA `a^saraTa .hamIriN.
fari.ha bihA wa-sAqahA 'amAmahu,_tumma rakiba wA.hidaN minhA.
wa-fI al-.t.tarIqi `adda .hamIrahu wa-huwa rAkibuN, fa-wa^gadahA tis`aTaN.
_tumma nazala wa-`addahA fa-ra'AhA `a^saraTuN fa-qAla: \\
'am^sI wa-'aksibu .himAraN, 'af.dalu min 'an 'arkaba wa-'a_hsara .himAraN.
\end{arab}
\begin{arab}[fullvoc]% All short vowels shown
\begin{center}\bfseries\large ^gu.hA wa-.hamIruhu al-`a^saraTu\end{center}
i^starY ^gu.hA `a^saraTa .hamIriN.
fari.ha bihA wa-sAqahA 'amAmahu,_tumma rakiba wA.hidaN minhA.
wa-fI al-.t.tarIqi `adda .hamIrahu wa-huwa rAkibuN, fa-wa^gadahA tis`aTaN.
_tumma nazala wa-`addahA fa-ra'AhA `a^saraTuN fa-qAla: \\
'am^sI wa-'aksibu .himAraN, 'af.dalu min 'an 'arkaba wa-'a_hsara .himAraN.
\end{arab}
\begin{arab}[voc] % All short vowels shown except for sukun and wasla
\begin{center}\bfseries\large ^gu.hA wa-.hamIruhu al-`a^saraTu\end{center}
i^starY ^gu.hA `a^saraTa .hamIriN.
fari.ha bihA wa-sAqahA 'amAmahu,_tumma rakiba wA.hidaN minhA.
wa-fI al-.t.tarIqi `adda .hamIrahu wa-huwa rAkibuN, fa-wa^gadahA tis`aTaN.
_tumma nazala wa-`addahA fa-ra'AhA `a^saraTuN fa-qAla: \\
'am^sI wa-'aksibu .himAraN, 'af.dalu min 'an 'arkaba wa-'a_hsara .himAraN.
\end{arab}
\begin{arab}[trans] % transliteration
\begin{center}\bfseries\large ^gu.hA wa-.hamIruhu al-`a^saraTu\end{center}
i^starY ^gu.hA `a^saraTa .hamIriN.
fari.ha bihA wa-sAqahA 'amAmahu,_tumma rakiba wA.hidaN minhA.
wa-fI al-.t.tarIqi `adda .hamIrahu wa-huwa rAkibuN, fa-wa^gadahA tis`aTaN.
_tumma nazala wa-`addahA fa-ra'AhA `a^saraTuN fa-qAla: \\
'am^sI wa-'aksibu .himAraN, 'af.dalu min 'an 'arkaba wa-'a_hsara .himAraN.
\end{arab}
66
xetex-languages.tex,v: 2.02 2009/06/15
3.3 Languages using the Arabic alphabet
Exa.
3-3-5

. . .
:
.

. . .
:
.

. . .
:
.
ju wa-amruhu al-asharatu
ishtar ju asharata amrin. faria bih wa-sqah ammahu,thumma rakiba widan minh. wa-f
al-arqi adda amrahu wa-huwa rkibun, fa-wajadah tisatan. thumma nazala wa-addah fa-rah
asharatun fa-qla:
amsh wa-aksibu imran, afalu min an arkaba wa-akhsara imran.
Te arabxetex package loads the fontspec package, so that it is easv to select dieient fonts with Aia-
bic chaiacteis. Te following example tvpeset an ofen-used gieeting in vaiious fonts. In the comment
line (staiting with %), vou can see the oidei in which the Aiabic chaiacteis aie input, i.e., the same
as in the Latin tiansciiption with the \textroman command. Te actual denition of the \Salam
command shows how the low-level displav ioutines inveit the Aiabic letteis automaticallv within each
woid (without T
E
X having anv contiol). Indeed, the input sequence of the chaiacteis is shown in the
commented line, wheie the chaiactei U+202D (LRO, foi lef-to-iight oveiiide) has been piepended
befoie each woid to foice the chaiacteis to be displaved lef-to-iight. Ten, the same gieeting is dis-
plaved in ve dieient Aiabic fonts. Note the use of the \SCAR command which denes the sciipt as
xetex-languages.tex,v: 2.02 2009/06/15
67
3 HANDLING ALL THOSE SCRIPTS
Aiabic and scales the chaiacteis so that theii foim is moie visible.
The most common Arabic lan-
guage greeting used in both Muslim
and Christian cultures means
Peace be upon you.
As-SalAmu `Alaykum
..... .




\usepackage[no-math]{fontspec}
\usepackage{arabxetex}
\setmainfont{Arial Unicode MS}
\providecommand\SCAR{Script=Arabic,Scale=2.}
\newfontfamily\arSch[\SCAR]{Scheherazade}
\newfontfamily\arTyp[\SCAR]{Arabic Typesetting}
\newfontfamily\arTra[\SCAR]{Traditional Arabic}
\newfontfamily\arTah[\SCAR]{Tahoma}
\newfontfamily\arAri[\SCAR]{Arial Unicode MS}
\let\arabicfont\arSch
%\providecommand\Salam{ }
\providecommand\Salam{ {
The most common Arabic language greeting used
in both Muslim and Christian cultures means
\underline{Peace be upon you}.
\begin{arab}[utf]
\textroman{As-SalAmu `Alaykum}\\
{\arSch\Salam}\newline{\arTyp\Salam}\newline
{\arTra\Salam}\newline{\arTah\Salam}\newline
{\arAri\Salam}
\end{arab}
Exa.
3-3-6
Te following example shows how easv it is to include L
A
T
E
X commands inside Aiabic text. Foi
the Aiabic souice (at the iight) each woid has been pieceded bv the LRO (U+202D, as explained foi
Example 3-3-6) chaiactei to show the oidei (lef-to-iight) in which the Aiabic chaiacteis aie input.
Note how the flushleft enviionment tvpesets the Aiabic text eectivelv flushright.

.
.




.
.
\color[rgb]{0,0,1}
\begin{arab}[utf]
\begin{center}
\\[3mm]
\end{center}
\begin{flushleft}
\fbox{}
. \\[2mm]
\fbox{}
.

|
\\[2mm]
\fbox{}
. .
\end{flushleft}
\end{arab}
Contextual analysis of hamza
Oui next example is fiom the AiabX
E
T
E
X manual. As with arabtex, a contextual analvsis of the input
encoding is peifoimed (at the font-mapping level) to automaticallv deteimine the caiiiei of the hamza,
68
xetex-languages.tex,v: 2.02 2009/06/15
3.3 Languages using the Arabic alphabet
as illustiated next.
\usepackage{arabxetex}
\newfontfamily\arabicfont[Script=Arabic,Scale=1.0]{Scheherazade}
\begin{arab}[voc]
'amruN, 'ibiluN, 'u_htuN, '"u_ht"uN, '"Uql"Id"Is, ra'suN, 'ar'asu,
sa'ala, qara'a, bu'suN, 'ab'usuN, ra'ufa, ru'asA'u, bi'ruN, 'as'ilaTuN,
ka'iba, qA'imuN, ri'AsaTuN, su'ila, samA'uN, barI'uN, sU'uN, bad'uN,
^say'uN, ^say'iN, ^say'aN, sA'ala, mas'alaTuN, saw'aTuN, _ha.tI'aTuN,
jA'a, ridA'uN, ridA'aN, jI'a, radI'iN, sU'uN, .daw'uN, qay'iN, .zim'aN
, yatasA'alUna, 'a`dA'akum, 'a`dA'ikum, 'a`dA'ukum maqrU'aT, mU'ibAt,
taw'am, yas'alu, 'a.sdiq^A$\;$'uh_u, ya^g^I'u, s^U'ila
\end{arab}
Exa.
3-3-7

.

.

.

.

.

.

.

.

.

. . .

..

,.

.

.

, .

.

... . .

,.

.. .. . .

.. ..

.. .

.

. .

...

.

. .

.

.

.

..

. .

. ..

.

. ... .

.

.

.. ..

. .

Typesetting the Qurn


As the Holv Qurn ( ) plavs an impoitant iole in Islamic cultuie, its high-qualitv tvpesetting
is an impoitant and iathei complex task, and tvpeset examples of that book bv piofessional tvpeset-
teis aie ofen ieal woiks of ait. Nowadavs seveial OpenTvpe fonts covei the full Unicode chaiactei
iange foi the Aiabic sciipt, and it is possible to achieve quite acceptable iesults. Te following example
fiom the AiabT
E
X manual, which uses the fonts Scheherazade shows some tvpogiaphic featuies which
chaiacteiize tvpical piinted editions fiom Saudi Aiabia.
Note in paiticulai the denition of the hamza placed diiectlv ovei the baseline instead as ovei the
alif, something that is usuallv not encoded in a Unicode font, but it is easilv emulated bv a T
E
X macio
(\hamzaB).
\usepackage{arabxetex}
\newfontfamily\arabicfont[Script=Arabic,Scale=1.0]{Scheherazade}
\newcommand{\hamzaB}{\char"200D\char"0640\raisebox{-.95ex}{\char"0654}\char"200D}
\begin{arab}[fullvoc]
mina 'l-qur'Ani 'l-karImi, sUraTu 'l-ssajdaTi 15--16:\\
'innamA yu'minu bi-\hamzaB a|"Ay___atinA 'lla_dIna 'i_dA _dukkirUA bihA
_harrUA sujjadaN wa-sabba.hUA bi-.hamdi rabbihim wa-hum lA yastakbirUna
SAJDA [[15]] tatajAfY_a junUbuhum `ani 'l-ma.dAji`i yad`Una rabbahum
_hawfaN wa-.tama`aN wa-mimmA razaqn_ahum yunfiqUna [[16]]\\
sUraTu 'l-baqaraTi 71--72:\\
qAla 'innahu, yaqUlu 'innahA baqaraTuN llA _dalUluN tu_tIru 'l-'ar.da wa-lA
tasq.I 'l-.har_ta musallamaTuN llA ^siyaTa fIhA|^JIM qAluW" 'l-\hamzaB a___ana
ji'ta bi-'l-.haqqi|^JIM fa_daba.hUhA wa-mA kAdduW" yaf`alUna [[71]] wa-'i_d
qataltum nafsaN fa-udda$\,$_ara|'|_i"tum fIhA|^SLY wa-al-ll_ahu mu_hrijuN mmA
kun"tum taktumUna [[72]]
\end{arab}
Exa.
3-3-8

.

.

.

. . , .


, .


..

.

.

.

.

. . .


. . .

.

.

. .

.

. .

, .

. .

.


. ,
'

.

.

.

.

.


.
'
, .

,

,'.

. .

..

.
. . .

. .

,.

, .

.

.

. .. .

. .

. .

. ..

. .

' . . .


. .

.

'
.

. . .

..

.

,.

.

. .

..

.

.

. .


,.

. . .

, .

.

, . .

.
. .

. .

.

. ... .


.

,.

xetex-languages.tex,v: 2.02 2009/06/15


69
3 HANDLING ALL THOSE SCRIPTS
Te following example is a table fiom a giammai book showing piex and sux constiucts foi
Aiabic veibs. It is seen how easv it is to mix the Latin and Aiabic alphabets and use a laige set of L
A
T
E
X
commands. We onlv show the beginning of the souice le. As default Aiabic font we select Traditional
Arabic. Note how we intioduce the Aiabic enviionment arab in the pieamble foi the thiid, fh, and
seventh columns (the [utf] option is implicit, since not needed).
% from http://en.wikipedia.org/wiki/Arabic_grammar
\documentclass[a4paper]{article}
\usepackage[no-math]{fontspec}
\usepackage{array}
\usepackage{arabxetex}
\setmainfont{Minion Pro}
\newfontfamily\arabicfont[Script=Arabic,Scale=1.2]{Traditional Arabic}
\begin{document}
\begin{tabular}{@{}l*3{l>{\begin{arab}}r<{\end{arab}}}@{}}
\multicolumn{7}{c}{Prefixes and suffixes of the Arabic verb}\\
& \multicolumn{2}{c}{Perfective}
& \multicolumn{2}{c}{Imperfective}
& \multicolumn{2}{c}{Subjunctive and Jussive}\\
\multicolumn{7}{c}{\textbf{Singular}} \\
3rd (m.)
& STEM\textbf{-a} &
& \textbf{ya-}STEM &
& \multicolumn{2}{c}{\emph{no written change}}\\
3rd (f.)
& STEM\textbf{-at} &
& \textbf{ta-}STEM &
& \multicolumn{2}{c}{\emph{no written change}}\\
Prexes and suxes of the Arabic verb
Perfective Imperfective Subjunctive and Jussive
Singular
3rd (m.) STEM-a ya-STEM no written change
3rd (f.) STEM-at ta-STEM no written change
2nd (m.) STEM-ta ta-STEM no written change
2nd (f.) STEM-ti ta-STEM-na ta-STEM-
1st STEM-tu a-STEM no written change
Dual
3rd (m.) STEM- ya-STEM-ni ya-STEM-
3rd (f.) STEM-at ta-STEM-ni ta-STEM-
2nd (m. & f.) STEM-tum ta-STEM-ni ta-STEM-
Plural
3rd (m.) STEM- ya-STEM-na ya-STEM-
3rd (f.) STEM-na ya-STEM-na no written change
2nd (m.) STEM-tum ta-STEM-na ta-STEM-
2nd (f.) STEM-tunna ta-STEM-na no written change
1st STEM-n na-STEM no written change
Exa.
3-3-9
Anothei giammatical table showing deiivations fiom sound veibs is oui next example, wheie we
70
xetex-languages.tex,v: 2.02 2009/06/15
3.3 Languages using the Arabic alphabet
use Arabic Typesetting font.
Exa.
3-3-10 Sound verbs (3rd sg. masc.)
Active voice Passive voice
Past Present Past Present
I

II


III


IV


VI


VII






not available
VIII

IX

not available
X

\usepackage[no-math]{fontspec}
\usepackage{array}
\usepackage{arabxetex}
\setmainfont{Minion Pro}
\newfontfamily\arabicfont[Script=Arabic,Scale=1.2]
{Arabic Typesetting}
\begin{tabular}{@{}c*4{>{\begin{arab}[voc]}r<{\end{arab}}}@{}}
\multicolumn{5}{c}{\textbf{Sound verbs} (3rd sg. masc.)}
\\
& \multicolumn{2}{c}{\textbf{Active voice}}
& \multicolumn{2}{c}{\textbf{Passive voice}}
\\
& \multicolumn{1}{c}{\emph{Past}}
& \multicolumn{1}{c}{\emph{Present}}
& \multicolumn{1}{c}{\emph{Past}}
& \multicolumn{1}{c}{\emph{Present}}
\\
\textbf{I} &fa`ala &yaf`alu &fu`ila &yuf`alu \\
\textbf{II} &fa``ala &yufa``ilu &fu``ila &yufa``alu \\
\textbf{III} &fA`ala &yufA`ilu &fU`ila &yufA`alu \\
\textbf{IV} &'af`ala &yuf`ilu &'uf`ila &yuf`alu \\
\textbf{V} &tafa``ala &yatafa``alu &tufu``ila &yutafa``alu\\
\textbf{VI} &tafA`ala &yatafA`alu &tufU`ila &yutafA`alu \\
\textbf{VII} &infa`ala &yanfa`ilu
& \multicolumn{2}{c}{\emph{not available}}\\
\textbf{VIII}&ifta`ala &yafta`ilu &ufti`ila &yufta`alu \\
\textbf{IX} &if`alla &yaf`allu
& \multicolumn{2}{c}{\emph{not available}}\\
\textbf{X} &istaf`ala &yastaf`ilu &ustuf`ila &yustaf`alu
\end{tabular}
We can even get moie fancv and specifv all Aiabic chaiacteis on input bv theii Unicode code
position (this is ofen used on the Web with the chaiactei iefeience svntax &xxxx;, wheie xxxx is the
code position). Te following table of countiies in the Aiab woild is taken fiom the Web site indicated
below (onlv the ist pait of the souice is shown). Te Arial Unicode MS font is used foi most of the
Aiabic, except foi the iight-hand column in the table, foi which Old Antic Bold has been selected. Note
the oidei of tvpesetting of the columns in this table (fiom iight to lef). In fact, in English this table
would have the following stiuctuie:
countiv capital people
Noith Afiica Tunesia Tunis Tunesians
Algeiia Algeis Algeiians

Foi the Aiabic veision shown below, these columns have to be miiioied bv hand fiom lef to iight
bv specifving the people columns entiies ist, then the capital column entiies, etc.
% from http://www.arabiyya.123.fr/spip/spip.php?article13
\documentclass[a4paper]{article}
\usepackage[no-math]{fontspec}
\usepackage{array}
\usepackage{arabxetex}
\setmainfont{Minion Pro}
\newfontfamily\arabicfont[Script=Arabic,Scale=1.0]{Arial Unicode MS}
\newfontfamily\Antic[Script=Arabic,Scale=1.2]{Old Antic Bold}
xetex-languages.tex,v: 2.02 2009/06/15
71
3 HANDLING ALL THOSE SCRIPTS
\begin{document}
\begin{arab}
\renewcommand{\arraystretch}{1.1}
\setlength{\extrarowheight}{1mm}
\begin{tabular}{@{}>{\Antic}l@{\quad}rrr@{}}
& \char1575\char1604\char1588\char1614\char1593\char1618\char1576
& \char1575\char1604\char1593\char1575\char1589\char1616\char1605\char1577
& \char1575\char1604\char1576\char1614\char1604\char1614\char1583 \\\hline
\char1576\char1604\char1583\char1575\char1606 \char1580\char1575\char1605\char1593\char1577
\char1575\char1604\char1583\char1608\char1604 \char1575\char1604\char1593\char1585\char1576\char1610\char1577
&\char1578\char1608\char1606\char1616\char1587\char1610\char1617
& \char1578\char1600\char1615\char1608\char1606\char1616\char1587
& \char1578\char1600\char1615\char1608\char1606\char1616\char1587 \\
& \char1580\char1614\char1586\char1575\char1574\char1616\char1585\char1610\char1617
& \char1575\char1604\char1580\char1614\char1586\char1575\char1574\char1616\char1585
& \char1575\char1604\char1580\char1614\char1586\char1575\char1574\char1616\char1585 \\









Exa.
3-3-11
72
xetex-languages.tex,v: 2.02 2009/06/15
3.3 Languages using the Arabic alphabet
3.3.2.1 ArabX
E
T
E
X: typesetting Persian
Te following is an example fiom the AiabT
E
X manual.
\usepackage{arabxetex}
\newfontfamily\arabicfont[Script=Arabic,Scale=1.0]{Scheherazade}
\newfontfamily\farsifont[Script=Arabic,Scale=1.1]{Farsi Simple Bold}
\begin{farsi}[voc]
_hwAb, xwI^s, _hwod, ^ceH, naH, yal_aH, _hAneH, _hAneHhA, _hAneH-hA,
ketAb-e, U, rAh-e, t_U, nAmeH-i, man, bInI-e, An, mard, pA-i, In,
zan, bAzU-i, In, zan, dAr-_i, man, _hU-_i, t_U, nAmeH-_i, sormeH-_i,
gofteH-_i, ketAb-I, rAh-I, nAmeH-I, dAnA-I, pArU-I, dAnA-I-keH,
pArU-I-keH, rafteH-am, rafteH-Im, AnjA-st, U-st, t_U-st, ketAb-I-st,
be-man, be-t_U, be-An, be-In, be-insAn, beU, be-U, .sA.heb"|_hAneH,
pas"|andAz, naw"|AmUz
\end{farsi}
Exa.
3-3-12

3.3.2.2 ArabX
E
T
E
X: Various ways of typesetting Urdu
Like Peisian (Faisi), Uidu is an Indo-Euiopean language wiitten in the Aiabic alphabet (see http:
//en.wikipedia.org/wiki/Urdu). Howevei, Uidu letteis (and theii fonts) have foims that aie
quite dieient fiom theii common Aiabic equivalents as the next shoit example shows.' We ist use
an undieientiated Aiabic font (Code2000).
\usepackage{arabxetex}
\newfontfamily\arabicfont[Script=Arabic,Scale=1.0]{Scheherazade}
\newfontfamily\urdufont[Script=Arabic,Scale=1.1]{Code2000}
\begin{urdu}[novoc]
,ham `i^sq kE mArO.n kA itnA ,hI fasAna,h ,hae\\
rOnE kO na,hI.n kO'I ,ha.nsnE kO zamAna,h ,hae\par
ya,h kiskA ta.sawwur ,hae ya,h kiskA fasAna,h ,hae\\
jO a^sk ,hae A.nkhO.n mE.n tasbI.h kA dAnA ,hae
\end{urdu}
Exa.
3-3-13



Ten we show the same example with two othei fonts which have been designed to show Uidu
vaiiant of tvpesetting the letteis. Te example also shows that it is enough to change the denition of
the \urdufont command to contain the OpenTvpe name of the font one actuallv want to use.
'Te text is boiiowed fiom http://tabish.freeshell.org/u-trans/urducode.html, a shoit page on AiabT
E
X
coding foi Uidu.
xetex-languages.tex,v: 2.02 2009/06/15
73
3 HANDLING ALL THOSE SCRIPTS
. ..'.s_-': ':'.. :x..
. ..'..) .~
.:.
_ .) _
.
.- .) .
. ..'.s':.: .
.
. .

.. ':.: .
.
.
. '.':
.
.

..: _
.
..\. . ! .

>
. =.
l
.
:
o l.: ll. .J-
s
~~
. =.l. +.~
.
..
o+.
.,
+.
. =.
l
.
:
l.=

. .

+-- l.=

.
. l. l
-

:

..
:
.
.
+. . : +

-
^afees Pakistani ^askh ^afees Riqa
Te web page iefeienced in the footnote 1 iefeis to the Uidu font Urdu ^astaliq Unicode, which
comes with a few examples, one of which is a ghazal.' We use it to show the dieience in tvpesetting of
the Uidu text with a global font foi Aiabic chaiacteis (Arial Unicode MS), seen at the lef, and Urdu
^astaliq Unicode, specicallv developed foi tvpesetting Uidu texts, seen at the iight.

Exa.
3-3-14
'Te ghazal is a poetic foim consisting of couplets which shaie a ihvme and a iefiain. Each line must shaie the same metei.
Ghazals aie tiaditionallv expiessions of love, sepaiation and loneliness, a poetic expiession of both the pain of loss oi sepaiation
and the beautv of love in spite of that pain. Te foimis ancient, oiiginating in 10th centuiv Peisian veise. It is consideied bv manv
to be one of the piincipal poetic foims the Peisiancivilizationoeiedto the easteinIslamic woild, see http://en.wikipedia.
org/wiki/Ghazal. Nowadavs the ghazal is most piominentlv a foim of Uidu poetiv, see http://www.urdupoetry.com.
74
xetex-languages.tex,v: 2.02 2009/06/15
3.3 Languages using the Arabic alphabet
3.3.3 Arabic presentation forms
Te piefeiied Unicode block foi the Aiabic sciipts is Aiabic (U+0600U+06FF), which is com-
plemented bv the Aiabic Supplement block (U+0750U+077F), which adds letteis mainlv used in
Noithein and Westein Afiican languages.
Languages wiitten in the Aiabic sciipt have ofen a long tiadition of cuisive handwiiting on
manusciipts. In paiticulai, Aiabic itself is closelv linked to the spiead of the Koian and, moie geneiallv,
Islamic cultuie. Teiefoie lettei sequences, oi even woids have piesentations that aie dieient fiom
the lineai combination of the composing letteis. Moieovei, these foims ofen depend on the language.
Teiefoie Unicode contains an Aiabic Piesentation Foims-A block (U+FB50U+FDFD). Tis is subdi-
vided into seveial paits: glvphs foi contextual foims of letteis foi Peisian, Uidu, Sindhi, etc. (U+FB50
U+FBB1), glvphs foi contextual foims of letteis foi Cential Asian languages (U+FBD3U+FBE9), lig-
atuies (two elements, U+FBEAU+FD3D), punctuation (U+FD3EU+FD3F), ligatuies (thiee elements,
U+FD50U+FDC7), Nonchaiacteis (U+FDD0U+FDEF), woid ligatuies (U+FDF0U+FDFB), cuiiencv
sign (U+FDFC), and a svmbol (U+FDFD).
Teie is also an Aiabic Piesentation Foims-B block (U+FE70U+FEFF), which contains mainlv
contextual shape vaiiations that aie impoitant semanticallv foi Aiabic mathematics: glvphs foi spacing
foims of Aiabic points (U+FE70U+FE7F), and basic glvphs foi Aiabic language contextual foims
(U+FE80U+FEFC).
One example is U+FDF2 (Aiabic ligatuie Allah isolated foim), whose suppoit in vaiious fonts is
shown heie. Te issue of tvpesetting the name of God in Aiabic, which is quite complex, is explained
in detail in the AiabX
E
T
E
X manual.
Fonts from or licensed to Microsof.
Times New Roman Aiial Couiiei New Miciosof Sans Seiif Aiial Unicode
MS Aiabic Tianspaient Simplied Aiabic Simplied Aiabic Fixed WinSof
Seiif Pio Medium Tiaditional Aiabic Aiabic Tvpesetting Old Antic Bold Faisi
Simple Bold
Urdu. Nastaleeq Like PakTvpe Naqsh , which contains also piesentation foims foi the following
Aiabic ligatuies:
U+FDFA (SALLALLAHOU ALAYNE WASALLAM)
U+FDFB (JALLAJALALOUHOU)
Adobe (http://www.adobe.com): Adobe Aiabic
SIL (www.sil.org): Scheheiazade Lateef
Arabeyes (www.arabeyes.org): KacstBook KacstFaisi
Overview of all input conventions
Table 3.3 shows all complete list of all input chaiactei combinations used bv arabxetex. Te input se-
quences aie oideied alphabeticallv following the most signicant lettei of the ASCII input code. Te
chaiacteis aie accompanied bv theii (hexadecimal) Unicode numbei. Te following coloi conventions
aie used: red means that the glvph is the default foi the given input code, and that it is available in all
languages except those wheie dieient glvphs aie shown (in black). Tat default glvph is also displaved
in light gray undei each language in which it is featuied. Glvphs in blue aie aichaic foims (e.g., old
Uidu). An asteiisk afei the Unicode numbei means that the chaiactei was not available with arabtex.
Green glvphs aie special: eithei thev aie used to iepiesent defective wiiting oi thev piovide chaiacteis
foi othei languages. Tose shown in the column foi Aiabic aie available bv default.
Table 3.3: All arabxetex input conventions
xetex-languages.tex,v: 2.02 2009/06/15
75
3 HANDLING ALL THOSE SCRIPTS
code arab farsi urdu pashto sindhi ottoman kurdish kashmiri malay uighur
a

064E
/
A

0627
/
.a

0654
.A

0672
_a

0670
_A

:a
b

0628

B

0640
.b

066E
:b

067B
bh

0680
c

0681

062C

0686
,c

0685

0686

0686
^c


0686

^ch

0687
:c

0682*
.^c

06BF*
d

062F

.d

0636

,d

0688

0689

068A
.,d

068B*
a
^d

06EE*

068E*
_d

0630

:d

068F
::d

0690*
dh

068C
,dh

068D
e

0659

/

06D2+0658

06D0
E


06D2

06D0
/

06D2
ee

ae



Ee

06CD
_e

`e

'E

06D3
f

0641

.f

06A1
g

06AF


0762

G

06AB
.g

063A

:g

06B3
.:g

06B4*
,g

06AC
b
^g

062C

06A0

063A

06A0

063A
76
xetex-languages.tex,v: 2.02 2009/06/15
3.3 Languages using the Arabic alphabet
code arab farsi urdu pashto sindhi ottoman kurdish kashmiri malay uighur
gh
h

0647


06BE

H

0647

06C3

0647
.h

062D

,h

06C1

_h

062E

i

0650

066E
I

064A

06CC



/


.I

06CC*
_i

0656
j

062C

0698

0698

:j

0684
jh

06A9
k

0643


06AA

.k

06A9

0642
_k

063A
kh
l

0644

.l

06B6*
^l

06B5
m

0645

.mIN

06FE
'|IN

06FD
n

0646

aN

064B
uN

064C
iN

064D
.n

06BA
..n

06B2*
,n

06BC

06BB
^n

0683

06BD

06AD
:n

06B1
o

0657
/

06C6

O

/

ao

.o

06C4
.O

_o
_O

:o
/
06C6
:O

06FC
p

067E

06A8

ph

06A6
q

0642

.q

066F
xetex-languages.tex,v: 2.02 2009/06/15
77
3 HANDLING ALL THOSE SCRIPTS
code arab farsi urdu pashto sindhi ottoman kurdish kashmiri malay uighur
r

0631

.r

0694*

0695
,r

0691

0693

0699

0694*
^r

06EF*

0692*
:r

0697*
c
s

0633

.s

0635

,s

069A

0634
^s

0634

_s

062B
:s

069B
t

062A

T

062A

.t

0637

,t

0679

067C

067D
_t

062B

th

067F
,th

067A
u

064F
/ /
06C7
U

/
0648+0657

.u

0655
.U

0673
_u

0657
:u
/
06C8
:U

06C7
d
v

06A4
e
06CF
w

0648


06CB
W

^w

06C9*
:w

06CA*
x

062E

y

064A

06CC

Y

0649
.y

z

0632

.z

0638

,z

0696

0636
^z

0698

_z

0630
f
:z

0636
'

0621


`

0639

a
For Western Punjabi (Lahnda).
b
Alternative form of in Malay.
c
For Dargwa (language of Dagestan).
d
For Kirgiz (and Uighur).
e
To transliterate dialects and foreign words.
f
Alternative to _d.
Maghribi Arabic is identical to Arabic except for the three letters f, q and v which yield the glyphs
(U+06A2), (U+06A7), and (U+06A5), respectively.
78
xetex-languages.tex,v: 2.02 2009/06/15
3.4 Typesetting Chinese
3.4 Typesetting Chinese
Ideogiaphics CIK (Chinese, Iapanese, Koiean) sciipts can be handled bv X
E
T
E
X bv diiectlv using the
coiiesponding Unicode chaiacteis in the input stieam. Te folowing example shows a few Kanji chai-
acteis and theii pionunciation. Note the use of the Color aigument on the \font command (see
Section 2.3 foi details of X
E
T
E
Xs extensions to T
E
Xs standaid \font command).
\font\han="STSong:color=660000" at 12pt
\font\rom="Gentium:color=006600" at 8pt
\newcommand\hc[2]{\begin{tabular}{l}
\han #1\\[-1mm]\rom #2\end{tabular}}
\begin{tabular}{l}
\hc{}{ka-ku}\\
\hc{}{motto-mo}\\
\hc{}{sai-go}\\
\hc{}{hatara-ku}\\
\hc{}{umi}
\end{tabular}

ka-ku

motto-mo

sai-go

hatara-ku

umi
Bv default, X
E
T
E
X does not handle some impoitant aspects of Chinese tvpesetting, such as auto-
matic font switching between Chinese and Westein chaiacteis, skip adjustments foi fullwidth punctu-
ations, oi automatic skip inseitions between Chinese and Westein chaiacteis oi math foimulas.
3.4.1 The xeCJK Package
Wenchang Sun developed the xecyk package to help X
E
L
A
T
E
X useis tvpeset texts based on CIK sciipts
moie easilv. Te xeCJK package oeis the following main featuies.
1. initializes dieient default fonts foi CIK and othei sciipts;
2. spaces aie automaticallv ignoied between CIK chaiacteis;
3. suppoits seveial CIK punctuation piocessing modes;
4. can adjust the space between CIK and othei chaiacteis automaticallv.
Noie that xeCJK needs veision 0.9995.0 of X
E
T
E
X oi a latei veision.
3.4.1.1 Usage
\usepackage[Options]{xeCJK}
Te options aie the following.
BoldFont Cieate svnthetic bold fonts foi CIK chaiacteis. Will be oveiiidden bv specifving Bold-
Font in the denition of a CIK familv.
SlantFont Cieate slanted fonts foi CIK chaiacteis. Will be oveiiidden bv specifving ItalicFont
in the denition of a CIK familv.
CJKnumber Load the CJKnumb package.
CJKaddspaces Add spaces between CIK and othei chaiacteis if theie is none.
CJKnormalspaces Ignoie onlv spaces between CIK chaiacteis and leave spaces between CIK and
othei chaiacteis untouched.
xetex-languages.tex,v: 2.02 2009/06/15
79
3 HANDLING ALL THOSE SCRIPTS
CJKchecksingle Avoid that a single Chinese chaiactei monopolizes a line.
\setCJKmainfont[<font features>]{font name}
\setCJKsansfont[<font features>]{font name}
\setCJKmonofont[<font features>]{font name}
Tese thiee comamnds, which aie analogues of \setmainfont, \setsansfont, and
\setmonofont, iespectivelv, set dieient default fonts foi CIK chaiacteis onlv, without aect-
ing othei sciipts.
When in the denition of a CIK tvpeface the ItalicFont= {...} option specifed an explicit
fontname, then the SlantFont option will have no eect foi this tvpeface. Similailv specifving an
explicit bold font with BoldFont= {...} in the font featuie pait suppiesses the eect of the global
BoldFont option.
\setCJKfamilyfont{familyname}[<font features>]{font name}
Tis command denes a font foi a CIK familv which can be activated foi tvpesetting bv the command
\CJKfamily{familyname}.
Foi a full desciiption on the paiameteis <font features> and font name, we iefei to the
package fontspec.
Te next example shows the eect of some of these commands. Foi the default English tvpeface
TeX Gyre Termes is chosen, the default Chinese tvpeface is Bitstream CyberCIK, while (Song tvpeface),
while AR PL SungtiL GB is established as the CIK familv song.
This is default font abCD.
This is the bold font abCD.
This is the italic font abCD.
And the bold italic font abCD.
Finally this is Song typeface.
\usepackage{xeCJK}
\setmainfont{TeX Gyre Termes}
\setCJKmainfont{Bitstream CyberCJK}
\setCJKfamilyfont{song}{AR PL SungtiL GB}
This is default font abCD. \\
{\bfseries This is the bold font abCD.} \\
{\itshape This is the italic font abCD.} \\
{\bfseries\itshape And the bold italic font abCD.} \\
{\CJKfamily{song} Finally this is Song typeface.}
Exa.
3-4-1
xeCJK oeis impioved Chinese and English spacing piocessing, and mav avoid the single Chi-
nese chaiactei monopolizing a section of last line. Te following example shows the eect of the
CJKchecksingle option.
\usepackage[boldfont,slantfont,CJKaddspaces,CJKchecksingle]{xeCJK}
\setCJKmainfont{Bitstream CyberCJK}
\providecommand\mytext{xeCJK }
\section*{First with the option ``checksingle''}
\mytext\par\mytext\par\mytext
\section*{And now without the option ``checksingle''}
\makeatletter
\let\xeCJK@checksingle\xeCJK@notchecksingle
\makeatother
\mytext\par\mytext\par\mytext
80
xetex-languages.tex,v: 2.02 2009/06/15
3.4 Typesetting Chinese
Exa.
3-4-2
First with the option checksingle
xeCJK

xeCJK

xeCJK

And now without the option checksingle


xeCJK

xeCJK

xeCJK

3.4.1.2 Advanced settings


\punctstyle{PunctStyle}
Sets the CIKpunctuationstvle. xeCJK piedenes the following PunctStylestvles foi tvpesetting punc-
tuation.
quanjiao or fullwidth
tvpeset all punctuation in full-width, oi two adjoint punctuation, the ist is tvpeset in half-width;
banjiao or halfwidth
tvpeset all punctuation in half-width;
kaiming or mixedwidth
tvpeset all punctuation in half-width except the peiiod, question, and exclamation maiks;
hangmobanjiao or marginkerning
tvpeset punctuation at the end of lines in half-width.
CCT Use the CCT Chinese T
E
X svstem foimat (http://freshmeat.net/projects/
ceeceetee/).
plain leave the punctuation untouched as-is.
\xeCJKallowbreakbetweenpuncts \xeCJKnobreakbetweenpuncts
Bv default, xeCJK piohibits line bieaks between punctuation. Te command
\xeCJKallowbreakbetweenpuncts allows line bieaks, while \xeCJKnobreakbetweenpuncts
disallows them.
xetex-languages.tex,v: 2.02 2009/06/15
81
3 HANDLING ALL THOSE SCRIPTS
\xeCJKsetslantfactor{slant factor}
\xeCJKsetemboldenfactor{embolden factor}
Sets the slant (a value between 0.999 and 0.999) and embolden factois, iespectivelv. Default settings
aie
\xeCJKsetslantfactor{0.17}
\xeCJKsetemboldenfactor{4}
Note that both macios eect onlv CIK families that aie dened subsequentlv in the L
A
T
E
X souice le.
\CJKnormalspaces \CJKaddspaces
Bv default, xeCJK leaves spaces between CIK and othei chaiacteis untouched wheieas it ignoies spaces
between CIK chaiacteis. One can use \CJKaddspaces to add a space between CIK and othei chai-
acteis if a blank space is not piesent and use \CJKnormalspaces to change back to the default.
\CJKsetecglue{value}
Allows vou to contiol the spacing between Chinese and English. Te default is \CJKsetecglue
\usepackage[boldfont,slantfont,CJKaddspaces]{xeCJK}
\setCJKmainfont{Bitstream CyberCJK}
\providecommand\mytext{%
English {\itshape Chinese} \LaTeX\
\emph{Italic} \textbf{} a $b$ $c$ $d$
\newline
English{\itshape Chinese}\LaTeX\
\emph{Italic}\textbf{}a$b$ $c$ $d$\newline
This is an example.
}
\CJKaddspaces
\CJKsetecglue{\hskip 0.15em plus 0.05em minus 0.05em}
\mytext
\CJKaddspaces
\CJKsetecglue{ }
\mytext
\CJKnormalspaces
\mytext
EnglishChinese L
A
T
E
X Italic ab c d
EnglishChinese L
A
T
E
X Italic ab c d
This is an example.
English Chinese L
A
T
E
X Italic a b c d
English Chinese L
A
T
E
X Italic a b c d
This is an example.
English Chinese L
A
T
E
X Italic a b c d
EnglishChineseL
A
T
E
X Italicab c d
This is an example.
Exa.
3-4-3
82
xetex-languages.tex,v: 2.02 2009/06/15
3.4 Typesetting Chinese

THE TEXT BELOW WAS TRANSLATED BY BABELFISH FROM THE CHINESE COMPUSCRIPT
ONCE I UNDERSTAND ITS MEANING THE TEXT WILL BE REWRITTEN
One can see that
{<texts>} {<texts>} as well as English {<texts>} the middle blank space can ietain
(cannot adjust), but it does not have the blank space, (see above then can accoiding to need to
inciease suiface example).
in the Chinese and the line the mathematical expiession gap contiol is thiough denes
\everymath and \everydisplay iealization, sometimes is possible invalid, Te solution is the
manual Canadian blank space.
\xeCJKsetcharclass{first}{last}{class}
undei default state, xeCJK 0x2000 Between the 0xFFFF chaiactei iegaids as the CIK wiiting, namelv
the CIK coiielation tvpeface establishment () to is onlv eective in this scope chaiactei. Mav use the
above gieat oidei change chaiactei categoiv. Foi example, the following oideis to establish 0x0080
Between the 0x2FFF chaiactei is the non-CIK wiiting, but 0x20000 Between 0x30000 is the CIK
wiiting:
\xeCJKsetcharclass {"80} {"2FFF} {0}
\xeCJKsetcharclass {"20000} {"30000} {1}
attention: Last the paiametei onlv can be 0 oi 1.Do not change the chaiactei categoiv easilv.
\xeCJKcaption[<encoding>]{caption}
is similai with \CJKcaption, mav choose the paiametei to use to choose the code, default is UTF-8.
\xeCJKsetkern{punctuation 1}{punctuation 2}{kern}
if is unsatised to the default disposition, mav use this oidei to establish between two punctuations the
distances.Foi example,
\xeCJKsetkern{:}{"}{0.3em}

3.4.1.3 Compatibility
CJKfntef
Loads the CJKfntef (fiom the CJK package) afei xeCJK to get vaiious eects on CIK chaiacteis.
Tis package piovides the commands \CJKunderline to diaw aline undei CIK chaiacteis, and
\CJKunderdot diaw a dot below such chaiacteis. Te eect of these two commands can be com-
bined, as the following example shows.
\usepackage[boldfont,slantfont]{xeCJK}
\usepackage{CJKfntef}
\setCJKmainfont{Bitstream CyberCJK}
\setCJKmonofont{Bitstream CyberCJK}
Chinese$x=y$
Chinese $x=y$
xetex-languages.tex,v: 2.02 2009/06/15
83
3 HANDLING ALL THOSE SCRIPTS
\CJKunderline{Chinese$x=y$\CJKunderdot{}}
\CJKunderline{ Chinese $x=y$ \CJKunderdot{}}
\CJKunderline*{\CJKunderdot{}}
\CJKunderdot{\CJKunderline{}}
Chinese x = y
Chinese x = y
Chinese x = y

Chinese x = y

Exa.
3-4-4
CJKnumber
To use the package CJKnumb, one can specifv the option CJKnumber while loading xeCJK.
12345 12345 .
67890 67890 .
\usepackage[CJKnumber]{xeCJK}
\setmainfont{TeX Gyre Termes}
\setCJKmainfont{Bitstream CyberCJK}
12345 $12345$ \CJKnumber{12345}.
67890 $67890$ \CJKnumber{67890}.
Exa.
3-4-5
CJK
To be compatible with the CIK-ielated packages CJKnumb, CJKfntef and CJKulem, xeCJK ieimplements
some macios dened in the package CJK. Teiefoie packagesxeCJK and CJK aie incompatible and xeCJK
will pievent the usei fiom loading CJK subsequentlv.
3.4.2 The zhspacing package
A moie detailed and expeit handling of Chinese tvpogiaphic peculiaiities is possible with Dian Yins
zhspacing package (available fiom http://code.google.com/p/zhspacing/), which takes ad-
vantage of the X
E
T
E
X command \XeTeXinterchartoks.
English
E = mc
2

English
E = mc
2

English
E = mc
2

\usepackage[no-math]{fontspec}
\setmainfont[BoldFont=SimHei]{SimSun}
\usepackage{zhspacing}
\raggedright\noindent
English
$E = mc^2$
\par\noindent
English
$E = mc^2$
\par\zhspacing\noindent
English
$E = mc^2$
Exa.
3-4-6
84
xetex-languages.tex,v: 2.02 2009/06/15
3.4 Typesetting Chinese
zhspacing can be used in both plain X
E
L
A
T
E
X oi X
E
T
E
X. In the lattei case the souice would look like
\input zhspacing.sty
\zhspacing
input text
\bye
Tis example shows that spaces afei Chinese chaiacteis aie alwavs ignoied. Moieovei, a noticable
skip is inseited between Chinese chaiacteis and English chaiacteis as well as math foimulas. In fact, all
of the following inputs can pioduce mixed language output with skip automaticallv inseited between
Chinese and English chaiacteis.
Exa.
3-4-7
Eng, Eng,
Eng , Eng
Eng , Eng ,
Eng , Eng
\usepackage{zhspacing}\zhspacing
\begin{flushleft}
\emptyskipscheme
Eng, Eng,\\
Eng , Eng \\
\simsunskipscheme
Eng, Eng,\\
Eng , Eng
\end{flushleft}
Look close at the inputs on the ist line and vou will see that thev geneiate exactlv the same output,
as do the inputs on the second line. Tis means that spaces following Chinese chaiacteis aie ignoied
if no spacing scheme is activated (\emptyskipscheme). Howevei, afei activation of the spacing
scheme (\simsunskipscheme) dened inthe zhspacing package a skip is intioduced foi such a space.
Note that the skip between Eng and on the last two lines is somewhat widei than the skip between
and Eng. Tat is because the space is pioduced bv the space token afei the lettei g, not the skip
automaticallv inseited bv zhspacings skip mechanism.
3.4.2.1 Punctuation skip adjustment
Piopei Chinese tvpesetting iequiies consecutive fullwidthpunctuations be compiessed, anda linebieak
befoie oi afei a fullwidth punctuation will cut o the blank spaces of this punctuation, making it align
to the maigin. zhspacing solvedthese pioblems, as well as piopei piohibitions(). Heies anexample.
Exa.
3-4-8

3.4.2.2 Advanced usage


Fonts
zhspacing uses an extensible wav of selecting fonts. Te iules can be summaiized as follows,
Westein chaiacteis, i.e., those that aie not CIKV ideogiams noi CIKV punctuation use the default
font.
Chinese chaiacteis use sepeiate fonts. Font changes in the document does not aect the font used
to displav Chinese, unless vou aie using the NFSS scheme to change font seiies oi shape.
When tvpesetting basic Chinese ideogiams the command \zhfont is executed.
xetex-languages.tex,v: 2.02 2009/06/15
85
3 HANDLING ALL THOSE SCRIPTS
When tvpesetting Chinese punctuations the command \zhpunctfont is executed.
When tvpesetting CIK Ext-A chaiacteis the command \zhcjkextafont is executed.
When tvpesetting CIK Ext-B chaiacteis the command \zhcjkextbfont is executed.
When switching fiom non-Chinese to Chinese chaiacteis the command \zhs@savefont is ex-
ecuted, wheieas when switching back the command \zhs@restorefont is executed.
zhspacings default denitions in X
E
L
A
T
E
X foi these commands aie:
\newfontfamily\zhfont[BoldFont=SimHei]{SimSun}
\newfontfamily\zhpunctfont{SimSun}
\def\zhcjkextafont{\message{CJK Ext-A}}
\def\zhcjkextbfont{\message{CJK Ext-B}}
\def\zhs@savefont{\zhs@savef@nt{old}}
\def\zhs@restorefont{\zhs@restoref@nt{old}}
Te inteinal macios \zhs@savef@nt and \zhs@restoref@nt save and iestoie the NFSS-ielated
infoimation foi the cuiient font.
Te extension CIK Ext-A/B fonts aie not dened bv default since not eveiv usei has necessaiilv
installed the fonts needed. Te package authoi iecommends to use Sun-ExtA and Sun-ExtB foi these
fonts. You can dene the ext-font macios manuallv in a similai wav to the denition of \zhfont.
Skips
Te zhspacing package uses a exible skip mechanism which is based on a seiies of commands
iathei than on skip iegisteis. Tis allows the skips to vaiv accoiding to the cuiient font size. Te
list of available skip commands follows. Tev aie all dened accoiding to the following model
\def\skipxxx{\hskip xxxxx}.
\skipzh Skip between adjacent Chinese chaiacteis.
\skipenzh Skip between a Chinese chaiactei and a Westein chaiactei oi a math foimula.
\skipzhopen Skip befoie fullwidth opening punctuations, such as , , , etc.
\skipzhinteropen Skip befoie a fullwidth opening punctuation when pieceded bv anothei full-
width punctuation.
\skipzhlinestartopen Skip befoie a fullwidth opening punctuation when it occuis at the stait
of a line.
\skipzhclose Skip afei fullwidth closing punctuations, such as , , , etc.
\skipzhinterclose Skipafei a fullwidthclosing punctuationwhenfollowedbv anothei fullwidth
punctuation.
\skipzhlineendclose Skip afei a fullwidth closing punctuation when it occuis at the end of a
line.
\skipzhjudou Skip afei fullwidth judou() punctuations, such as , , , etc.
\skipzhinterjudou Skip afei a fullwidth judou punctuation when followed bv anothei fullwidth
punctuation.
\skipzhlineendjudou Skip afei a fullwidth judou punctuation when it occuis at the end of a line.
\skipnegzhlinestartopen Negative skip to \skipzhlinestartopen.
\skipnegzhlineendclose Negative skip to \skipzhlineendclose.
86
xetex-languages.tex,v: 2.02 2009/06/15
3.4 Typesetting Chinese
\skipnegzhlineendjudou Negative skip to \skipzhlineendjudou.
Te zhspacing package comes with thiee pie-dened skip schemes, namelv
\simsunskipscheme, \emptyskipscheme and \haltskipscheme. Te ist scheme should
be suitable foi font SimSun and othei populai Chinese fonts used in China, which does not suppoit
OpenTvpe featuies of halt, and needs negative spaces be inseited befoie opening punctuations and
afei closing oi judou punctuations. Te second scheme simplv addes zeio length. And the last one
should be t foi OpenTvpe Chinese fonts suppoiting the halt featuie such as Adobe Song Std, wheie
positive spaces should be inseited befoie oi afei ceitain punctuations. You can dene voui own skip
schemes foi customization, of couise.
Vertical Chinese
Veitical Chinese can be achieved bv adding the iaw featuie vertical foi the specied Chinese font.
An example is the oowing, which also shows what T
E
X thinks the boundingbox of the chaiacteis is.
Exa.
3-4-9

\usepackage[dvipdfm]{graphicx}
\usepackage{zhspacing}\zhspacing
\newfontfamily\zhfont[RawFeature={vertical:}]{SimSun}
\newfontfamily\zhpunctfont[RawFeature={vertical:
+vert:+vhal}]{Adobe Song Std}
\haltskipscheme
\setlength\fboxsep{0mm}
\fbox{\rotatebox{-90}{}}%
\qquad
\setlength\fboxsep{2mm}
\fbox{\rotatebox{-90}{}}
Note that in this example, in oidei to have piopei veitical punctuations, we set \zhpunctfont
to use the Adobe Song Std font, which suppoits the vert featuie, and change the skip scheme to
\haltskipscheme to match the vhal featuie specied. Some Chinese fonts have bugs foi tvpeset-
ting veitical Chinese containing punctuations. Moieovei, ofen the baseline of veitical Chinese is not
coiiect, so that mixing Chinese and English in veitical mode can geneiate uglv iesults, and thus should
be avoided.
Some moie veitical tvpesetting is shown in the following example, which also explains how easv
it is to make X
E
T
E
X piint HTML chaiactei iefeiences, a possibilitv that comes in handv if vou want to
tvpeset some text fiom a Web page, wheie non-Latin chaiacteis aie souiced using this kind of iepie-
sentation of Unicode chaiacteis, which is extiemelv poitable (onlv ASCII chaiacteis aie in the HTML
xetex-languages.tex,v: 2.02 2009/06/15
87
3 HANDLING ALL THOSE SCRIPTS
souice), and is thus quite ofen used.
This is English.

"

\usepackage[dvipdfm]{graphicx}
\usepackage{fontspec}
\fontspec[Mapping=tex-text,Script=CJK]{Kozuka Mincho Pro-VI}
% macro hacking to read chars represented as character references
\catcode`\&=\active % make & active
\catcode`\#=12 % make # "other"
\def&#{\char} % replace sequence &# by \char
\catcode`\;=\active % make ; active
\def;{\relax} % and make it a no-operation
\fboxsep0pt
\fbox{This is English.
&#12371;&#12428;&#12399;&#26085;&#26412;&#35486;&#12391;&#12377;&#12290;}
\fontspec[Mapping=tex-text,Vertical=RotatedGlyphs,Script=CJK]{Kozuka
Mincho Pro-VI}
\rotatebox{-90}{\fbox{This is English.
&#12371;&#12428;&#12399;&#26085;&#26412;&#35486;&#12391;&#12377;&#12290;}}
Exa.
3-4-10
3.4.2.3 Compatibility
Teoieticallv, zhspacing should be compatible with all macio packages, except those who change the
denitionof \hskipand \penalty, inwhichcase special tieatment should be applied. I havent found
anv conict when using common packages such as hyperref and fancyhdr. Howevei, ulem iedeneds
\hskip and \penalty, and causes unexpected output. Use zhulem piovided along with zhspacing
instead.
Using zhspacing with the ctex package needs some piecautions, see the manual foi moie details
(http://www.ctex.org).
3.4.2.4 Character classes and class inheritance
Te actual situation conceining Chinese tvpesetting is so complicated that its dicult to guie out
exactlv how manv classes aie needed and what we should do when changing fiom this class to that. In
fact, in a moie natuial wav, we can considei fiomthe top down ist theie aie fullwidth and halfwidth
chaiacteis as well as boundaiies and constiuct a hieiaichical foiest wheie each node iepiesents a
chaiactei class. In this wav common behaviois can be peifoimed between dieient families of classes,
andspecic actioncanbe takenfoi a paiticulai class paii. Tat is the idea of class inheritance, the concept
behind zhspacing.
3.5 Examples of the use of Unicode
3.5.1 Unicode fonts and editors
emacs and vi whenadequate fonts aie installed onthe svstem(and made knownto the applications)
yudit, a fieewaie editoi (http://yudit.org) foi Linux and Miciosof Windows
88
xetex-languages.tex,v: 2.02 2009/06/15
3.5 Examples of the use of Unicode
Resouices foi Unicode fonts
Bitstieam Cyberbit
1
a moie iecent veision of the above TITUS Cyberbit Basic (developed at the Univeisitv of Fiank-
fuit, Geimanv, see the URL http://titus.uni-frankfurt.de/)
the shaiewaie fonts Code2000 foi Unicode plane 0, Code2001 foi plane 1, and Code2002 foi
plane 2 (see http://www.code2000.net/)
Arial Unicode MS, which comes with the Miciosofs Vindows XP and Vista svstems
Web page VAZU IAPA^s Gallery of Unicode Fonts (http://www.wazu.jp/index.html)
Web page of Luc Deviove (http://www.cccg.ca/~luc/fonts.html)
Web page of Alan Wood (http://www.alanwood.net/unicode/fonts.html)
Web page Unicode tools and fonts (http://www.unifont.org/)
3.5.2 Examples of Unicode texts
Te Oce of the High Commissionei foi Human Rights in Geneva publishes the Universal decla-
ration of human rights (http://www.ohchr.org/french). Te site of the Unicode Consoitium
makes the Universal declaration of human rights available in 324 languages to show the powei of
Unicode (http://www.unicode.org/udhr).
Te site www.sacred-texts.com contains hundieds of sacied texts, manv in UTF-8. Teie is
Homei in ancient Gieek (cla/homer/greek), a multi-language bible in English, Fiench, He-
biew, and Latin (bib/poly), the Coian in Aiabic and English (isl/uq), Confucius in Chinese
and English (cfu/cfu.htm), the Rig Veda in Sanskiit (hin/rvsan), etc.
Te Titus pioject of Indo-Geimanic studies (titus.uni-frankfurt.de) and the Peiseus
Pioject (http://www.perseus.tufts.edu/cache/perscoll_Greco-Roman.html) con-
tain manv classical texts.
1
See ftp://ftp.netscape.com/pub/communicator/extras/fonts/windows/cyberbit.zip
xetex-languages.tex,v: 2.02 2009/06/15
89
C H A P T E R 4
Unicode mathematics
4.1 Unicode for handling math across platforms and applications . . . . . . . . . . . . . . . . . . . . . 91
4.2 X
E
T
E
X handling mathematics fonts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.1 Unicode for handling math across platforms and applications
It is impoitant to iepiesent math coiiectlv on the Web and in the vaiious tvpesetting applications.
T
E
X exists foi books and MathML (piesentation and context) foi XML-enabled applications.
Muiiav Saigent (Miciosof), membei of the W3C MathML Woiking Gioup, and his collaboiatois
have developed an extention foi OpenTvpe fonts to enable them to handle math (theii appioach
is based on T
E
Xs math tvpesetting algoiithm as desciibed in Appendix G of the T
E
XBook [4]).
An additional OpenTvpe MATH table contains the paiameteis needed to tvpeset math. Tis eoit
iesulted in the Cambria Math math font.
Baibaia Beeton, Asmus Fievtag, and Muiiav Saigent wiote a papei Unicode support for mathe-
matics (www.unicode.org/reports/tr25/tr25-7.html) which desciibes the default math
piopeities foi Unicode chaiacteis.
Muiiav Saigent desciibes in the Unicode iepoit Unicode ^early Plain-Text Encoding of Mathematics
(unicode.org/notes/tn28/) how with a few additions to Unicode mathematical expiessions
can usuallv be iepiesented with a ieadable Unicode neailv plain-text (lineai) foimat.
Oce 2007 now has a built-in math-engine (see Maiiavs piesentation Math Editing and
display in Oce 2007 (research.microsoft.com/workshops/fs2006/presentations/
17_Sargent_071706.ppt) and his blog (http://blogs.msdn.com/murrays). Tis ad hoc
piocessoi is based on T
E
XBooks Appendix G algoiithm and uses the Cambria Math math font
and uses the sofwaie component MathFont.dll to communicate between the vaiious applications
piogiams.
Te Miciosof Woid2007 woik and the denition of the OpenTvpe MATH table aie unpublished.
Paul Topping, in the inteiest of the scientic communitv at laige wiote a position papei Design Sci-
ence Proposal to Microsof to Help STEM Scientic/Technical/ Engineering/Mathematical) Publishers
Vork with Oce 2007 Documents wheie he asks Miciosof to shaie the infoimation in its speci-
cations.
1
1
See http://www.dessci.com/en/reference/white_papers/STMOffice2007Proposal.htm
4 UNICODE MATHEMATICS
4.2 X
E
T
E
X handling mathematics fonts
X
E
T
E
X uses the algoiithm in Appendix G of the T
E
XBook to tvpeset mathematics;
foi a standaid T
E
X math font X
E
T
E
X use the metiic infoimation foi each chaiactei in the
coiiesponding .tfm le, then xdvipdfmx iefeis to the .pfb le via the viitual font les (when
necessaiv) and the le dvipdfm.map,
foi an OpenTvpe math font, such as Cambria Math, X
E
T
E
X ieads the metiic paiameteis in the
MATH table and tiansfoims them into the values needed bv the Appendix G algoiithm.
Cuiientlv X
E
T
E
X does not use a specic piocessoi to handle OpenTvpe math fonts, but peihaps
such suppoit will latei be included in the svstem middlewaie (ICU).
Othei Unicode math fonts:
the font developed bv STIX (Scientic and Technical Information Exchange font, see http://
www.stixfonts.org/). Tis is pioject wheie seveial scientic publisheis have co-nanced
a Unicode-based mathun font that contains ovei 8000 dieient glvphs
Apostoulos Sviopoulos (asyropolous@yahoo.com) is developing anothei font (Asana-
Math) with the help of fontforge that includes the OpenTvpe MATH tables.
Will Robeitson is woiking on a L
A
T
E
Xpackage unicode-math to piovide a simple inteiface to Open-
Tvpe math fonts with L
A
T
E
X.
92
xetex-mathematics.tex,v: 2.02 2009/06/15
Bibliography
[1] Adobe Svstems. Adobe Tvpe 1 Font Foimat. Addison-Weslev, Reading, MA, 1990.
Tis so-called White Book contains the specication of Adobes Tvpe 1 font foimat, including infoimation about hints, the en-
civption mechanism, encodings, and the ex pioceduie. Available electionicallv fiom
http://partners.adobe.com/public/developer/en/font/T1_SPEC.PDF
[2] Youssef Iabii. Te Aiabi svstem. T
E
X wiites in Aiabic and Faisi. TUGboat, 27(4):147153, 2006.
Tis aiticle desciibes the Arabi package, which intioduces suppoit in the LPackbabel svstem foi languages using the Aiabic sciipt, in
paiticulai Aiabic and Faisi. Te package comes with a set of good-qualitv fiee fonts, but mav also use commeicial fonts. It suppoits
manv 8-bit input encodings, e.g., CP-1256, ISO-8859-6 and Unicode UTF-8, and can tvpeset classical Aiabic poetiv.
http://www.tug.org/TUGboat/Articles/tb27-2/tb87jabri.pdf
[3] Gabiiel Mandel Khan. Aiabic Sciipt. Abbeville Piess Publisheis, New Yoik, 2001.
Tis book piovides a detailed look at the Aiabic sciipt and its calligiaphv, an essential pait of the Aiabic cultuie. Since Aiabic is the
language of the Koian, with the spiead of Islam to laige paits of the woild, the Aiabic sciipt is now one of the woilds majoi foims
of wiiting. With the help of ovei 300 two-coloi and black-and-white pictuies the authoi desciibes each lettei, its histoiv, meaning,
vaiiants, and calligiaphic adaptations, as well as its philosophical, theological, and cultuial signicance.
Te book staits with a shoit intioduction sketching the development of the Aiabic alphabet and the vaiious sciipts in which it has
been wiitten. Ten, the ist majoi pait of the book, Te Letteis of thelphabet, is devoted to the tieatment of individual letteis and
theii shapes which can vaiv depending on the letteis position within a woid. Ovei thiitv dieient stvles, oi sciipts, aie illustiated foi
each lettei. Te letteis pionunciation, its chaiacteiistic in ieciting the Koian, plus possible othei cultuial associations aie dened.
Te second majoi pait of the book, Stvles, vaiiants, and calligiaphic adaptations, piovides an laige set of histoiic examples of Aiabic
wiiting. Finallv, theie is a glossaiv and an index.
[4] Donald E. Knuth. Te T
E
Xbook, volume A of Computers and Typesetting. Addison-Weslev, Read-
ing, MA, 1986.
Tis book is the denitive useis guide and complete iefeience manual foi T
E
X.
[5] Fiank Mittelbach, Michel Goossens, Iohannes Biaams, David Cailisle, and Chiis Rowlev. Te
L
A
T
E
X Companion, Second Edition. Addison-Weslev, Reading, MA, 2004.
Tis book desciibes ovei 200 L
A
T
E
X packages and piesents a whole seiies of tips and tiicks foi using L
A
T
E
X in both tiaditional and
modein tvpesetting, in paiticulai how to customize lavout featuies to voui own needsfiom phiases and paiagiaphs to headings,
lists, and pages. It piovides expeit advice on using LaTeXs basic foimatting tools to cieate all tvpes of publication, fiommemos to en-
cvclopedias. It coveis in depth impoitant extension packages foi tabulai and technical tvpesetting, oats and captions, multi-column
lavouts, including iefeience guides and discussion of the undeilving tvpogiaphic concepts. It details techniques foi geneiating and
tvpesetting indexes, glossaiies, and bibliogiaphies, with theii associated citations.
[6] Petei D. Daniels and William Biight. Te Woilds Wiiting Svstems. Oxfoid Univeisitv Piess, New
Yoik, 1996.
Adetailed desciiption of the majoi histoiical and modein wiiting svstems of the woild. Te moie than eightv aiticles contiibuted bv
expeit scholais in the eld aie oiganized in twelve units, each dealing with a paiticulai gioup of wiiting svstems dened histoiicallv,
4
geogiaphicallv, oi conceptuallv. Each unit begins with an intioductoiv aiticle pioviding the social and cultuial context in which
the gioup of wiiting svstems was cieated and developed. Aiticles on individual sciipts detail the histoiical oiigin of the wiiting
svstem in question, its stiuctuie (with tables showing the foims of the wiitten svmbols), and its ielationship to the phonologv of
the coiiesponding spoken language. Each wiiting svstem is illustiated bv a passage of text, accompanied bv a iomanized veision, a
phonetic tiansciiption, and a modein English tianslation. Each aiticle concludes with a bibliogiaphv.
Units aie aiianged accoiding to the chionological development of wiiting svstems and theii histoiical ielationship within geogiaph-
ical aieas. Fiist, theie is a discussion of the eailiest sciipts of the ancient Neai East. Subsequent units focus on the sciipts of East
Asia, the wiiting svstems of Euiope, Asia, and Afiica that have descended fiom ancient West Semitic (Phoenician), and the sciipts
of South and Southeast Asia. Othei units deal with the iecent and ongoing piocess of decipheiment of ancient wiiting svstems; the
adaptation of tiaditional sciipts to new languages; new sciipts invented in modein times; and giaphic svstems foi numeiical, music,
and movement notation.
[7] Te Unicode Consoitium. Te Unicode Standaid, Veision 5.0. Addison-Weslev, Reading, MA,
2007.
Te iefeience guide of the Unicode Standaid, a univeisal chaiactei-encoding scheme that denes a consistent wav of encoding
multilingual text. Unicode is the default encoding of HTML and XML. Te book explains the piinciples of opeiation and contains
images of the glvphs foi all chaiacteis piesentlv dened in Unicode.
Available foi iestiicted use fiom: http://www.unicode.org/versions/Unicode5.0.0/
94
xetex-end.tex,v: 2.01 2009/06/15
Index of Commands
and Concepts
Te index has been split into two paits. We stait with a geneial index that coveis all entiies. We
end with an index of authois.
To make the indexes easiei to use, the entiies aie distinguished bv theii tvpe, and this is ofen
indicated bv one of the following tvpe woids at the beginning of the main entiv oi a sub-entiv:
boolean, countei, document class, env., le, le extension, font, kev, kev value, option,
package, piogiam, iigid length, oi svntax.
Te absence of an explicit tvpe woid means that the tvpe is eithei a L
A
T
E
X command oi simplv a
concept.
Use bv, oi in connection with, a paiticulai package is indicated bv adding the package name (in
paientheses) to an entiv oi sub-entiv. Teie is one viitual package name, tlgc, that indicates com-
mands intioduced onlv foi illustiative puiposes in this book.
A blue italic page numbei indicates that the command oi concept is demonstiated in an example
on that page.
When theie aie seveial page numbeis listed, bold face indicates a page containing impoitant in-
foimation about an entiv, such as a denition oi basic usage.
When looking foi the position of an entiv in the index, vou need to iealize that, when thev come
at the stait of a command oi le extension, both the chaiacteis \ and . aie ignoied. All svmbols come
befoie all letteis and eveivthing that staits with the @ chaiactei will appeai immediatelv befoie A.
96 (SymbolsF) Index of Commands and Concepts
Symbols
.fonts.conf le, 23
.log le, 44
\<, 3
$HOME/.fonts.conf le, 23
A
\active, 2
\aemph, 65,
Aleph piogiam, 47
amsart document class, 55
amsbook document class, 55
amsmath package, 55, 56
amsthm package, 55
\arab,
arab env., 64, 5, , 68, 9, 70, 71
Arabi package, 93
arabi package, 62
\arabicfont, 64, 5, , 8, 9, 71, 73
arabtex package, 6264, 65, 68, 75
arabtext env., 3
arabxetex package, ii, xi, 6478
arabxetex.sty package, 64
array package, 55, 71
article document class, 55
ATSUI piogiam, 21, 30, 31, 34, 37
\autofootnoterule, 56, 59
B
babel package, 43
beamer document class, 55
beamerbaseauxtemplates package, 55
beamerbasetemplates package, 55
beamerthemebidiJLTree package, 55
beamerthemeJLTree package, 55
\beginL, 22
\beginR, 22
\bfseries, 80
bidi package, ii, 5561, 64
bidi2in1 package, 55
bidibeamer document class, 55
bidibeamerbaseauxtemplates package, 55
bidibeamerbasetemplates package, 55
bidimemoir document class, 55
bidimoderncv document class, 55
bidipresentation package, 55
book document class, 55
bookest document class, 55
booktabs package, 55
\bye, 85
C
\catcode, 2
\char, 25
\chardef, 25
CJK package, 83, 84
\CJKaddspaces, 82, 83
CJKchecksingle option, 80
\CJKfamily, 80
CJKfntef package, 83, 84
\CJKnormalspaces, 82, 83
CJKnumb package, 79, 84
\CJKnumber, 84
\CJKsetecglue, 82, 83
CJKulem package, 84
\CJKunderdot, 83
\CJKunderline, 83
\cline, 0
color package, 58
crop package, 22
ctex package, 88
cvthemebidicasual package, 55
cvthemebidiclassic package, 55
cvthemecasual package, 55
cvthemeclassic package, 55
cyr-lat-iso9 le, 29
cyr-lat-iso9.tex le, 29
D
dcolumn package, 55
\defaultfontfeatures, 44
draftwatermark package, 55
dvipdfm.map le, 92
dvipdfmx piogiam, 21
dvips piogiam, 1, 27
E
emacs piogiam, 88
\emptyskipscheme, 85, 87
\endL, 22
\endR, 22
euenc package, 43
\everydisplay, 83
\everymath, 83
extbook document class, 55
F
fancyhdr package, 55, 88
\farsi, 64
farsi env., 64, 73
\farsifont, 73
\fbox, 68, 88
fc-cache piogiam, 24
fc-list piogiam, 24
fc-match piogiam, 24
.fd le extension, 22
flushleft env., 68
fmultico package, 57
\font, 21, 27, 29, 343, 38, 79
fontcong piogiam, 23, 24, 27, 31
fontforge piogiam, 92
fontinst package, 43
fontinst piogiam, 21
fonts.conf le, 23, 24
\fontspec, 44, 45, 46
fontspec package, ii, xi, 22, 33, 4346, 64, 6668, 71, 80, 84, 88
fontspec.cfg le, 43, 44
FontTools piogiam, 16
\footnote, 59
freetype piogiam, 23, 32
frhyph.tex le, 2
96
xetex-end.tex,v: 2.01 2009/06/15
Index of Commands and Concepts (FP) 97
\fullvocalize, 3, 65
G
geometry package, 22
graphics package, 22
graphicx package, 55
H
\haltskipscheme, 87
\hamzaB, 69
hhline package, 55
\hline, 0
\hskip, 88
hyperref package, 22, 88
I
ICU piogiam, 21, 26, 2931, 34, 37, 46, 92
\ifthenelse, 38
\input, 85
\itshape, 80
K
\kashmiri, 65
kashmiri env., 65
kpathsea piogiam, 33
\kurdish, 65
kurdish env., 65
L
\lccode, 26
\leftfootnoterule, 59
listings package, 55
localfonts.conf le, 23, 24
longtable package, 55
\LR, 57
\LRE, 57
LTR env., 57
\LTRdblcol, 57
\LTRfootnote, 59
M
\malay, 65
malay env., 65
\mathalpha, 39
\mathbin, 39
\mathclose, 39
MathFont.dll piogiam, 91
\mathop, 39
\mathopen, 39
\mathord, 39
mathpazo package, 43
\mathpunct, 39
\mathrel, 39
memoir document class, 55
metalogo package, 43
minitoc package, 55
moderncv document class, 55
multicol package, 57
multicols env., 57
\multicolumn, 0, 71
multirow package, 55
myfont.ttx le, 17
myfontmods.ttx le, 17
N
\newfontface, 45, 4
\newfontfamily, 45, 64, 5, , 8, 9, 71, 73, 8, 87
\novocalize, 65
O
Oce piogiam, 12, 13
Omega piogiam, 47
OpenOce piogiam, 14
Openoce piogiam, 13
OpenType-info.tex le, 39
otnfo piogiam, 13
\ottoman, 65
ottoman env., 65
P
packages
amsmath, 55, 56
amsthm, 55
Arabi, 93
arabi, 62
arabtex, 6264, 65, 68, 75
arabxetex, ii, xi, 6478
arabxetex.sty, 64
array, 55, 71
babel, 43
beamerbaseauxtemplates, 55
beamerbasetemplates, 55
beamerthemebidiJLTree, 55
beamerthemeJLTree, 55
bidi, ii, 5561, 64
bidi2in1, 55
bidibeamerbaseauxtemplates, 55
bidibeamerbasetemplates, 55
bidipresentation, 55
booktabs, 55
CJK, 83, 84
CJKfntef, 83, 84
CJKnumb, 79, 84
CJKulem, 84
color, 58
crop, 22
ctex, 88
cvthemebidicasual, 55
cvthemebidiclassic, 55
cvthemecasual, 55
cvthemeclassic, 55
dcolumn, 55
draftwatermark, 55
euenc, 43
fancyhdr, 55, 88
fmultico, 57
fontinst, 43
fontspec, ii, xi, 22, 33, 4346, 64, 6668, 71, 80, 84, 88
geometry, 22
graphics, 22
graphicx, 55
hhline, 55
hyperref, 22, 88
xetex-end.tex,v: 2.01 2009/06/15
97
98 (PT) Index of Commands and Concepts
packages cont.)
listings, 55
longtable, 55
mathpazo, 43
metalogo, 43
minitoc, 55
multicol, 57
multirow, 55
pdfpages, 55
pgf, 22
pstricks, 55
ragged2e, 55
stabular, 55
supertabular, 55
tabls, 55
tabularx, 55
tabulary, 55
threeparttable, 55
tikz, 55
tlgc, 95
tocloft, 55
tocstyle, 55
ulem, 88
unicode-math, xi, 43, 92
vwcol, 58
wrapg, 55
xcolor, 22, 58
xeCJK, 7984
xecjk, ii
xecolour, 58
xecyk, 79
xltxtra, 43
xunicode, 43, 56
zhspacing, ii, xi, 8488
zhulem, 88
\pashto, 65
pashto env., 65
\pdflastxpos, 42
\pdflastypos, 42
pdatex piogiam, 2
\pdfpageheight, 42
pdfpages package, 55
\pdfpagewidth, 42
\pdfsavepos, 42
\penalty, 88
.pfb le extension, 92
pgf package, 22
\pounds, 28
pstricks package, 55
\punctstyle, 81
R
ragged2e package, 55
\raisebox, 0
rapport3 document class, 55
\rcases, 59
refrep document class, 55
report document class, 55
\rightfootnoterule, 59
\RL, 57
rldocument option, 56
\RLE, 57
\rmfamily, 45
\rotatebox, 87, 88
RTL env., 57
\RTLdblcol, 56, 57
RTLdocument option, 56
\RTLfootnote, 59
S
\Salam, 67
scrartcl document class, 55
scrbook document class, 55
scrreprt document class, 55
\setarab, 3
\setCJKfamilyfont, 80
\setCJKmainfont, 80, 8284
\setCJKmonofont, 80
\setCJKsansfont, 80
\setfootnoteLR, 59
\setfootnoteRL, 59
\setLR, 56
\setLTR, 5, 57
\setmainfont, 22, 44, 45, , 8, 71, 80, 84
\setmonofont, 44, 45, 80
\setnash, 3
\setnashbf, 3
\setRL, 56
\setRTL, 50
\setsansfont, 44, 45, 80
\SetTranslitStyle, 5
\sfcode, 26
\sffamily, 45
\simsunskipscheme, 85, 87
\sindhi, 65
sindhi env., 65
\skipenzh, 86
\skipnegzhlineendclose, 86
\skipnegzhlineendjudou, 87
\skipnegzhlinestartopen, 86
\skipzh, 86
\skipzhclose, 86
\skipzhinterclose, 86
\skipzhinterjudou, 86
\skipzhinteropen, 86
\skipzhjudou, 86
\skipzhlineendclose, 86
\skipzhlineendjudou, 86, 87
\skipzhlinestartopen, 86
\skipzhopen, 86
stabular package, 55
supertabular package, 55
T
tabls package, 55
tabular env., 0, 71
tabularx package, 55
tabulary package, 55
TECkit piogiam, 28, 64
tex piogiam, 33
tex-text-tec le, 28
tex-text.map le, 28
tex-text.tec le, 28
texmf le, 27
98
xetex-end.tex,v: 2.01 2009/06/15
Index of Commands and Concepts (TZ) 99
\text, 59
\textarab, 64
\textroman, 64, 67
\textwidth, 59
\textwidthfootnoterule, 59
\TeXXeTstate=1, 22
.tfm le extension, 22, 37, 43, 92
tfm le, 27
threeparttable package, 55
tikz package, 55
tlgc package, 95
tocloft package, 55
tocstyle package, 55
\transtrue, 3, 65
.ttc le extension, 18
ttc2ttf piogiam, 18
ttx piogiam, 16, 17
U
\UC, 5
\uccode, 26
\uighur, 65
uighur env., 65
ulem package, 88
unicode-math package, xi, 43, 92
\unsetfootnoteRL, 59
\unsetLTR, 56
\unsetRL, 56
\unsetRTL, 56
\urdu, 65
urdu env., 65, 73
\urdufont, 73
\usepackage, 55
V
.vf le extension, 22, 43
vi piogiam, 88
\vocalize, 65
vwcol env., 58
vwcol package, 58
W
W32tex piogiam, 21
Web2C piogiam, 33
wrapg package, 55
X
xcolor package, 22, 58
.xdv le extension, 27
xdvipdfmx piogiam, 27, 32, 33, 92
xeCJK package, 7984
xecjk package, ii
\xeCJKallowbreakbetweenpuncts, 81
\xeCJKcaption, 83
\xeCJKnobreakbetweenpuncts, 81
\xeCJKsetcharclass, 83
\xeCJKsetemboldenfactor, 82
\xeCJKsetkern, 83
\xeCJKsetslantfactor, 82
xecolour package, 58
xecyk package, 79
xelatex piogiam, 1995
\XeTeX, 41, 43
XeTeX piogiam, 19, 21
xetex piogiam, 1995
\XeTeXcharclass, 41
\XeTeXcharglyph, 37
\XeTeXdashbreakstate, 41
\XeTeXdefaultencoding, 41
\XeTeXdelcode, 39
\XeTeXdelcodenum, 39
\XeTeXdelimiter, 39
\XeTeXfonttype, 37, 38
\XeTeXglyph, 36, 37
\XeTeXglyphindex, 36, 37
\XeTeXinputencoding, 26, 41
\XeTeXinterchartokenstate, 41
\XeTeXinterchartoks, 41, 84
\XeTeXlinebreaklocale, 42
\XeTeXlinebreakpenalty, 42
\XeTeXlinebreakskip, 30, 42
\XeTeXmathchardef, 39
\XeTeXmathcode, 39
\XeTeXmathcodenum, 39
\XeTeXOTcountfeatures, 38
\XeTeXOTcountlanguages, 38
\XeTeXOTcountscripts, 38
\XeTeXOTfeaturetag, 39
\XeTeXOTlanguagetag, 38
\XeTeXOTscripttag, 38
\XeTeXpdffile, 42
\XeTeXpicfile, 42
\XeTeXradical, 41
\XeTeXrevision, 2, 41
\XeTeXupwardsmode, 42
\XeTeXuseglyphmetrics, 36
\XeTeXuseglyphmetricsfont, 3
\XeTeXversion, 41
xltxtra package, 43
xu-frhyph.tex le, 2
xu-hyphen le, 26
xu-t1.tex le, 2
xunicode package, 43, 56
Y
yudit piogiam, 88
Z
\zhcjkextafont, 8
\zhcjkextbfont, 8
\zhfont, 85, 8, 87
\zhpunctfont, 8, 87
\zhs@restoref@nt, 86
\zhs@restorefont, 8
\zhs@savef@nt, 86
\zhs@savefont, 8
\zhspacing, 84, 85, 87
zhspacing package, ii, xi, 8488
zhulem package, 88
xetex-end.tex,v: 2.01 2009/06/15
99
People
Beeton, Baibaia, 91
Beiiv, Kail, 27, 33
Buchbindei, Adam, xi
Chaiette, Fianois, ii, xi, 64
Cho, Iin-Hwan, 21
Deviove, Luc, 54
Feiies, Leo, ii, xi
Fievtag, Asmus, 91
Goossens, Michel, 93
Iabii, Youssef, 62
Kabel, Rik, xi
Kakuto, Akiia, 21
Kew, Ionathan, ii, xi, 19, 21, 43, 47
Khalighi, Vafa, ii, 55
Knuth, Donald, 1, 93
Lagallv, Klaus, 62, 64
McCieedv, David, 54
Mittelbach, Fiank, 93
Mooie, Ross, 21, 43
Pka, Kaiel, ii, xi
Robeitson, Will, ii, 22, 43, 92
Saigent, Muiiav, 91
Shigeiu, Mivata, 21
Sun, Wenchang, 79
Topping, Paul, 91
Weiss, Mimi, 54
Wood, Alan, 5
Yin, Dian, ii, xi, 84

You might also like