Professional Documents
Culture Documents
Alena L. Aissing
A wide diversity exists in the current practice of transliterating Cyrillic scripts
for use in bibliographic records in online catalogs. Without knowing which
transliteration table was used, it is difficult to retrieve the desired record
successfully or efficiently. Retrieving an item (e.g., titles or an author's name)
from a library's online catalog (OPAC) where it is given only in transliterated
form can be a confusing task, even for users who know the Russian language
or at least the Cyrillic alphabet. This study explores the problems besetting three
groups of Russian-language students faced with romanized Cyrillic bib-
liographic records. It also tries to investigate students' ability in searching the
Russian records romanized according to the Library of Congress (LC) translit-
eration table. Analysis of the test results show the students' success-and-error
rate before and after instruction. The findings of this study establish that
transliteration is one of the factors limiting access _by Russian language stu-
dents to the Slavic collections.
Alena Aissing is German and Slavic Studies Selector and Slavic Studies Cataloger at the University ofFlorida
George A. Smathers Libraries, Gainesville, Florida 32611.
The autlwr wishes to express her appreciation to the Department of Germanic and Slavic l.Jmguages and
Literatures at the University of Florida, the Department of Modern lAnguages at Florida State University, the
Department of Slavic lAnguages and Literatures at the University of Illinois at Urbana-Champaign for their
cooperation; Olga Campara and Graig N. Packard from the Center ofApplied Linguistics in Washington, D.C., for
valuable information in the field of Slavic linguistics; Joan Aliprand from the Research Libraries Group for
inspiration, editing and updates on computer software developments; Frank DiTrolio, Dolores Jenkins, and John
Van Hook from the Collection Management Department at University of Florida Libraries for editing this
manuscript; Bill Covey from the Systems Department at the University ofFlorida Libraries for editing the statistical
analysis and my husband, Gerrard Aissing, for encouragement and input on this research design. This research
was funded in part by a grant, from the University of Florida, Division of Sponsored Research.
207
208 College & Research Libraries May1995
TABLE 1
EXAMPLE OFTRANSLITERATION OF THE WORDS 'll:IL.4.;7JJ.Hfi AND
.Y.llElilUill4 AND .f().lll.KiilJHl.l IN VARlO US TRANSUTERATI ON SYSTEMS
plete transliteration. 12 This can present IME24 have contracts to develop systems
several problems for the users of the for Russian libraries; Cyrillic script capa-
catalog who have to deduce how these bility is a fundamental requirement.
diacritics or letter combinations trans- Few American libraries have taken ad-
late back to the original Cyrillic charac- vantage of these developments because
ters.13 The differences among the various of the pervasive belief that romanization
schemes are considerable, especially for is adequate for those languages written
those Cyrillic letters for which no Ro- in Cyrillic script. (The results of this
man equivalent exists: e, JK, x, Q,q, lll, I.Q, study undermine this belief.) Because of
~q and 11. 14 In addition, Russian has no h this conviction, most libraries do not
and represents this sound mostly as r, own systems that can utilize other
therefore, transliterating Hamlet from scripts, nor do they have enough fund-
Cyrillic back to Roman script results in ing and concern for the multilingual
Gamlet. 15 A Russian name beginning needs of their community. 25 Users cannot
with 11 might be transliterated into ia, ja, search and display a bibliographic record
or ya with major retrieval problems un- in the script of the original document. 26
less the conversion system is known. A Most local OPACs are limited to Roman
further problem is that certain phonemes character sets and do not provide the
characteristic of Slavic languages cannot proper typographical facilities necessary
be written unambiguously as a single for the display of non-Roman languages.
Roman letter (assuming English pro- Therefore, romanization of non-Roman
nunciation).16 scripts is necessary if the automated.
The Library of Congress offers a sepa- catalog is to be a comprehensive repre-
rate transliteration table for every Slavic sentation of the library's holdingsP
language written in Cyrillic scriptsY
This can lead to more inconsistencies.
For example, Q is used when transcrib- Most local OPACs are limited to
ing Ukrainian e and old Russian-B. That Roman character sets and do not
is, the same combination of Roman let- provide the proper typographical
ters is used for two completely different facilities necessary for the display of
Cyrillic letters! The user has to know or non-Roman languages. · .'
recognize the original language in or-
der to find the corresponding translit-
eration when searching the library's Yet there are some hopeful signs that
online catalog. this situation is changing. Several
Readers of non-Roman documents authors have discussed the problems for
usually want to see the original script, users caused by romanization, and the
because it is more familiar to them than attitude of librarians toward minority
the romanized version. 18 Cyrillic script users is changing. 28 Allen and Plumer
has been implemented on a number of cite many examples of practices and
systems. The first American library to methods that tend to alienate a library's
automate Cyrillic script was the New international clientele: awareness is the
York Public Library, where Cyrillic script first step toward correction. 29,3°
records were included in its book-form Recent developments in computer
"Dictionary Catalog," phototypeset from software and standards will eventually
machine-readable copy. 19 Cyrillic script ca- do away with this limitation. 31 -34 In the
pability was added to the Research Librar- past, research and formulation of stand-
ies Information Network (RLIN) in ards for the automation of non-Roman
1986. 2°Cyrillic is one of the scripts imple- scripts was slow and fragmentary. In ad-
mented on ALEPH, the Israeli library dition, the library community devel-
system. 21 .22 The British Library's online oped its own standards in isolation from
catalog includes not only the characters the standards-making of the computer
of modern Cyrillic script but also Old ind ustry.35 These standards were incorpo-
Church Slavonic. 23 VTLS, Geac, and rated into USMARC and UNIMARC. 36,3 7
210 College & Research Libraries May 1995
Today, there is a new universal multi- I tried to ascertain whether they still had
script character set, International Stand- problems and how much they had im-
ard ISO/IEC 10646.38 The Unicode™ proved. After the data were statistically
character set, which is code-for-code analyzed, the findings showed that
identical with ISO /IEC 10646, is being without the knowledge of the LC trans-
implemented in products from leading literation practice of Russian letter .R by
computer companies. 39 The advent of a, for example, 80 percent of the stu-
this new standard facilitates the devel- dents chose ya, whereas only 7 percent
opment of global software capable of were correct. When students had to deal
processing any script. No longer will li- with Russian Ill (phonetically very simi-
braries have to develop systems on their lar to English sh), 91 percent were suc-
own; many of the features needed for cessful even without knowing how to
multiscript processing will be included transliterate. It is likely that any graph-
in the standard package or can be added eme transliterated by a letter combina-
easily. This will be a boom to the users of tion for which the English pronunciation
various non-Roman scripts. Service to does not resemble the Russian sound
the user and provision for the most di- becomes a barrier to access. This pretest
rect access to dissimilar documents (i.e., concentrated on transliteration of only 7
documents not in the predominant script letters-those for which transliteration
used by the library) need to be seen as the could be problematic: .R,IO, Q, ~. H:, x,
prime responsibilities of libraries. and )1(. None of the students was able to
transliterate all 7 tested letters correctly
RESEARCH DESIGN
(the average correct score of the tested
AND METHODOLOGY
individuals was only 1.1 correct out of
Pretest 7). The instruction and approximately
A sample of 50 Russian language stu- one week practice resulted in an overall
dents was randomly chosen from the improvement in the scores for the indi-
Department of Germanic and Slavic vidual letters (the average score in-
Languages and Literatures at the Uni- creased to 4.3 out of 7). Results of a
versity of Florida during the spring se- paired t-test indicated that the improve-
mester of 1991. None of the students was ment of the test scores is highly signifi-
familiar with the LC transliteration ta- cant (t =0.001). Finally, only 14.6 percent
ble. The data were collected using three of the students were able to transliterate
tests consisting of a list of titles and all 7 letters. 40
proper names in Russian. The students
were then asked to transliterate the Rus- Actual Research
sian items on the list. The objective of Because the sample size was rather
these tests was to determine: Test A- small and the Department of Germanic
How correct are the students' searches and Slavic Languages and Literatures at
without the knowledge of the translit- the University of Florida offers only un-
eration table and what are the problems dergraduate classes in Russian, infer-
involved? Test B-How correct are the ences from the data analyses were
students' searches after receiving in- limited and showed a high degree of
struction and practicing the translitera- uncertainty. To get a more reliable pic-
tion in the library? Another test (C) ture of the problem, the study was ex-
consisted of retrieving and locating at tended to include more students from
least three items-one title search, one Florida State University and the Uni-
author search, and one journal search. versity of Illinois, including graduate
Most of the students visited the library, students. One hundred forty-five under-
located the items, and brought back the graduate and graduate Russian lan-
call numbers and the location codes of guage students from these three
the materials requested. Test B was then universities were the sample size for the
given to the students after they practiced actual research. The randomly selected
what they learned from my instruction. students were tested by the use of three
Cyrillic Transliteration 211
I
TABLE2
NUMBERS OF ERRORS MADE IN TESTS A AND B
TestA TestB
Interval Count• % Count• %
0 0 0 20 15
1-5 4 3 59 43
6-10 19 13 20 15
11-15 28 19 13 9
16-20 25 17 7 5
21-25 26 18 6 4
26-30 12 8 5 4
31-35 15 10 2 1
36-40 6 4 2 1
>40 10 7 3 2
Total 145 137
•The difference in total counts was due to the fact that some students did not participate in the second test.
transliteration table, none of the stu- stituted the largest student unit, forming
dents would be able to conduct a 100 43 percent of the total sample. Only one
percent successful search. The lowest person had all 82 letters wrong. The
number of mistakes made was one, and average number of mistakes made was
only one person achieved this rate. The 8.9.
average number of mistakes made was Special attention was given to the let-
21.6. The largest group of students, 28, ters that either cannot be rebuilt revers-
made between 11 and 15 mistakes. ibly and to those that must be trans-
literated by letter combinations. These let-
ters were n,IO, ii, Q, w, x, )1(, q, and I.Q. The
For a reader familiar with the language results for these letters are compiled in
and the original script of the work, figure 1. In the comparison of the two
the transliteration could be a serious tests significant improvement is seen. It
obstacle resulting in partial or even also becomes clear that the students are
total loss of information. dealing with two groups of letters: those
that represent sounds similar to the ones
encountered in the English language
Test B. Test B consisted of 82letters (in and those that represent sounds that are
16 words) which included words similar alien to English speakers. The improve-
to the ones used in the first test (for ment in the second group is more im-
example, in examining the soft vowel n, pressive than that experienced in the
test A included the word nepaan and test ·first group. The first group consists of
B had the word coapeMeHHan). Table 2 the letters Q, q, and w, whereas the sec-
showed the frequency distribution of ond group is formed by'IO, n, ii, x, )1(, and
mistakes by the number of students who I.Q. In the first group the average im-
made them. The table shows that the provement after instruction is almost
library instruction and practice resulted negligible; in the second group it is al-
in a clear overall improvement. Twenty ways more than 27 percent. It also seems
students transliterated all 82 characters that the vowels are more difficult to
without any mistakes, followed by 59 transliterate than the consonants. A
students who made fewer than 6 mis- good example of the problems in trans-
takes in their search structure. This con- literating Cyrillic letters is given by the
Cyrillic Transliteration 213
Count
200~--------------~~-----------------.
['-
,....
0
100 II TestA
D TestB
Letter
FIGURE 1
Comparison of Test A and Test B for the Letters Jl, 10, u;, III, X, .IK, q, I..Q, and .ti
letter 10. In modern American English lie version and reestablish the text in its
there are several ways one could write original characters to determine if a
this sound, including ewe, yu, you, or given record matches the one sought for.
even u. In fact, the version used by LC The reconstruction or back-translit-
"fil" is counterintuitive, because there eration can be performed only between
seems no relation between the translit- two alphabetic scripts and depends on
eration and the everyday "sound" of the the rules governing the relationships be-
Cyrillic letter. tween the letters of these two scripts. 43
Test C. After instruction, test C was Applying the rules of transliteration in
given to the students as well. This test reverse can cause some difficulties
included a list of transliterated Russian since this process involves at least three
titles and authors' names. If a student different stages: (1) the user must know
were to search a book written in a Cyril- how the word appears in the original
lic script in the online catalog, this would (i.e., Cyrillic) alphabet; (2) the user
be the way in which it would display. needs to know how to use the translitera-
The user then would have to match the tion rules; (3) the user should be skilled
transliterated information with its Cyril- in recognizing that this transliterated
214 College & Research Libraries May 1995
information does indeed match its origi- The largest group, consisting of 20 stu-
nal equivalent. Obviously, users will dis- dents, made 3 mistakes while only 5 stu-
cover this only if they apply the dents made as many as 16 to 17 mistakes.
back-transliteration. An additional step
would be needed in dealing with proper
names, especially those used in Western Cyrillization of foreign names is
languages (e.g., Baker) as well as adjec- frequently done by phonological
tives derived from proper names (e.g., transcription, not by transliteration
Copernican theory). For example, when since the latter would result in an
a user deals with a transliterated text unintelligible and unpronounceable
from Russian alphabet where the name result to a Russian reader.
"Brown" is included, the user needs to
know how this name is spelled in its
original (in this case English script). In Another factor examined in test C was
addition, the user needs to be aware that the total number of mistakes that stu-
the Russian version will be BpayH. Cyril- dents made individually for each Rus-
lization of foreign names is frequently sian letter. This analysis is shown in table
done by phonological transcription, not 3, where as predicted, the letters 10 and
by transliteration since the latter would n, (represented combinations of two Ro-
result in an unintelligible and unpro- man letters when romanized) caused a
nounceable result to a Russian reader. lot of trouble. The most misunderstood
Back-transliteration of "BpayH" could letter was 11 (140 mistakes), followed by
also refer to a German author "Braun," hi (94) and i1 (80). The underlined parts
but for the English language, the user of the following words show where the
needs to know the correct spelling of the students made most mistakes on test C:
name. This example clearly demon- A1-14peii. .[xoHTOB, Coq>J::Ul, Aape.u Map11.1:1
strates that back-transliteration and/ or Me411'111; c6opHHK q>aHTaCTH'IeCKHX
retranscription is sometimes impossible npHKA!Q'IeHWi, 3aKoH npyno.4hl. The
without tracing the identity of the origi- word "npHAJO'Iei-mii" was the most dif-
nal name and its spelling. ficult for students to transliterate. The
Figure 2 shows the frequency distribu- transliterated word "prikliuchenir" is
tion of mistakes made by students. on the genitive plural of "npHKAID'IeHHe"
test C. The data were collapsed into 16 (in English adventure). Since the R~ssian
groups showing that after instruction, 3 language changes its noun endings in
students did not make any mistake in particular cases, the user needs to take
reversing the transliteration process. this into consideration when transliter-
TABLE3
.ERROR COUNT FROM TEST C BY LETTER
Letter Count Letter Count Letter Count
a 5 G 0 B 2
r 1 A 3 e 15
:;) 1 )I{ 4 3 10
140 80 2
H
J\ 3 "
M 1
K
H 5
0 2 n 3 p 5
c 1 T 10 y 25
<I> 5 X 35 ~ 31
'I 11 Ill 11 11\ 22
bl 94 10 90 JI 96
Cyrillic Transliteration 215
Mistakes
missing 11
16-27 5
14 2
13 2
11 2
10 2
9 7
8 8
7 9
6 13
5 13
4 17
3 20
2 19
1 11
0 3
0 10 20 30
Number of Students
FIGURE2
Frequency Distribution of Mistakes Made by Students on Test C
ating from Russian and/or back-translit- demonstrates that the users attempted
erating into Russian. When students to base their transliteration on both or-
dealt with transliteration of the word thographic and phonetic rules at the
"npHKAIO'Iemtif," in most cases, they same time. In practice, it could mean
omitted one of the last letters or substi- that all bibliographical transliteration
tuted the English letter y for them. systems contain some elements of
(Phonetically Russian if is considered phonological transcription that are
identical with the English y as in yes.) based on the historical habit of pro-
Even though transliteration assumes nouncing certain letters in a certain way.
following the rule of "write what you These habits are probably acquired in
see" (i.e., performing exclusively ortho- childhood, and it is just as difficult to
graphic transliteration where the user change them as any other phonetic at-
should concentrate only on the letters tributes of articulation such as stress,
not on their sounds), the example above pitch, and intonation. The study shows
216 College & Research Libraries May1995
that the students tend to transliterate ported that they were familiar with the
according to the spelling and pronun- online catalog in the library.
ciation convention governing their own Another question of the survey dealt
(i.e., English) language. with the students' experience of problems
in the retrieval of Russian bibliographic
Questionnaire records. Table 5 shows that 37 students (26
Undergraduates comprised the larg- percent) answered that they "sometimes"
est group of students (79 percent), and, had problems, followed by those who did
of these, 40 percent were seniors. The not know (28 = 19 percent) and those who
remaining 21 percent were primarily answered no (27 = 19 percent).
graduate students. Only 1 percent of the When answering the question "What
group were Ph.D. students. Thus the kind of problems did you have in retriev-
data gathered in this survey represent ing a Russian bibliographic record?" 48 (33
primarily undergraduate students' pat- percent) students indicated transliteration
terns rather that those of the total Rus- as a major problem (table 6).
sian-language student population (see One question also dealt with stu-
table 4). dents' familiarity of the transliteration
One of the first questions the students system. Table 7 indicates that 73 re-
were asked in the survey was "Do you spondents (50 percent) indicated that
use the library services?" In response to they can search Russian materials with
this question, 9 percent of the students the help of the transliteration table.
said that they did not use the services Those who felt that they could search
offered by the library while 85 percent witJ:lOut a table numbered 40 (28 per-
answered yes. Nine st.udents (6 percent) cent) and 2 students (1 percent) said
did not answer the question at all. that they could not search at all.
Additional analysis of the relationship To find out students' opinion about
between the familiarity with the online possible use and display of the Cyrillic
catalog and the students' years in school alphabet in their online search, the fol-
is shown in table 4. Based on the fre- lowing question was asked: "Do you
quency distributions, it was expected that think that it would be easier for you if
students who spent more years at school you had the option of using the original
would be more familiar with the library Cyrillic alphabet in your search?" The
online catalog and the concept of biblio- majority of the Russian-language stu-
graphic access. Table 4 shows this to be dents (105 = 72 percent) answered yes.
primarily the case. Of those students who
were freshmen, only 35 percent said that IMPLICATIONS
they were familiar with the online catalog, The intent of this study is to examine
while 65 percent of graduate students the public reaction to online retrieval of
and 100 percent of doctoral students re- material involving Cyrillic script trans-
TABLE4
FAMILIARITY WITH ONLINE CATALOG, BY YEARS IN SCHOOL
Years in School
Freshman Sophomore J Junior Senior Graduate Doctoral
Familiarity with online catalog No. % No. % INo. % No. % No. % No. %
Yes 7 35 7 47 73 25 54 13 65 2 100
No 5 25 3 20 12! 12 9 19 3 15 0 0
Little 8 40 5 33 I s 15 12 26 4 20 0 0
Number of students 20 14 15 10 1 33 23 46 32 20 14 2 1
X2 = 11.12; df = 8; Cramer's V = 0.194
Cyrillic Transliteration 217
TABLES
FREQUENCY DISTRIBUTION FOR PROBLEMS WITH ONLINE CATALOG
IN GENERAL AND WITH RETRIEVAL OF RUSSIAN TITLES
Online Catalog Retrieval of Russian Titles
R~sponse Frequency % Frequency %
Unknown 28 19 14 10
Always 12 8 2 1
Sometimes 37 26 41 28
Seldom 10 7 28 19
No 27 19 41 28
Never tried 31 19 13
themselves with the system and teach character set has been developed. Per-
them how to interpret particular charac- haps academic libraries will eventually
ters in the Cyrillic script that could be acquire systems based on this new stand-
troublesome. ard. Implementation of this sixteen-bit
Transitions in computer standards character encoding that can represent the
that support multiple-character sets in principal written languages collected qy
the libraries are predictably slow. Never- American academic libraries, would mean
theless, multiscript-character set, the a revolutionary change in serving foreign
Unicode standard/ISO 10646, that su- language students. It is up to the librari-
persedes the traditional ASCII (American ans, developers, and programmers to
Standard Code Information Interchange) make the change.
19. S. Michael Malinconico, Walter R. Grutchfield, and Erik J. Steiner, "Vernacular Scripts in the
NYPL Automated Bibliographic Control System," Journal of Library Automation 10 (Sept.
1977): 205-25.
20. The Research Libraries Group News 10 (May 1986): 3-4.
21. "Panel on Cyrillic Systems: ALEPH, RLIN VTLS," International Federation of the Interna-
tional Library Associations and Institutions Satellite Meeting on Automated Systems for
Access to Multilingual and Multiscript Library Materials (presented at second meeting held
in Madrid: 1993) Proceedings (Munich: Saur, 1994).
22. Susan S. Lazinger, "ALEPH: Israel's Research Library Network: Background, Evolution, and
Implications for Networking in a Small Country," Information Technology and Libraries 10
(Dec. 1991): 275-91.
23. Roger Butcher, "Multi-lingual OPAC Developments in the British Library," Program 27 (Apr.
1993): 165-71.
24. Information from respective vendors. The features of the VTLS system are described in
"Panel on Cyrillic Systems" (see note 21 above).
25. Sylva Simsova, "Coping with Foreign and Nonstandard Character in Libraries," in John
Eyre, ed., Small Computers in Libraries (London: Meckler, 1988), 13-15.
26. James Edward Agenbroad, Nonromanization: Prospects for Improving Automated Cataloging
of Items in Other Writings (Washington D.C.: Library of Congress, 1992).
27. C. Summer Spalding, "Romanization Reexamined," Library Resources and Technical Services
21 (Jan. 1977): 3-21.
28. Joan M. Aliprand, "Nonroman Scripts in the Bibliographic Environment," Information
Technology and Libraries 11 (June 1992): 105-19.
29. Mary Beth Allen, "International Students in Academic Libraries: A User Survey," College &
Research Libraries 54 (July 1993): 323-33.
30. Plummer Alston Jones, Jr., "Cultural Oasis or Ethnic Ghetto," North Carolina Libraries 50
(Summer 1992): 100-105.
31. Mark Leggott, "Unique, Universal & Uniform Character Encoding," Canadian Library
Journal 48 (Oct. 1991): 345-46.
32. Charles Petzold, "Move Over, ASCII! Unicode Is Here," PC Magazine 12 (Oct. 1993): 374.
33. Charles Petzold, "Viewing a Unicode TrueType Font under Windows NT," PC Magazine 12
(Nov. 1993): 375.
34. Charles Petz~;>ld, "Typing Unicode Characters from the Keyboard," PC Magazine 12 (Dec.
1993): 426.
35. John Clews, Language Automation Worldwide (Harrogate: United Kingdom SESAME Com-
puter Projects, 1988).
36. USMARC Specifications for Record Structure, Character Sets, Tapes, prepared by Network -
Development and MARC Standards Office (Washington, D.C.: Cataloging Distribution
Service, Library of Congress, 1990).
37. Brian P. Holt, Sally H. McCallum and A.B. Long, UNIMARC Manual (London: International
Federation of Library Associations and Institutions Universal Bibliographic Control and
International MARC Programme, British Library Bibliographic Services, 1987).
38. Joint Technical Committee ISO /IEC JTC1, Information Technology-Universal Multiple-Oc-
tet Coded Character Set, Part 1: "Architecture and Basic Multilingual Plane, ISO/IEC 10646:
1993" (Geneva: International Organization for Standardization, 1993).
39. The Unicode Consortium, The Unicode Standard: Worldwide Character Encoding, Version
1.0 (Reading, Mass.: Addison-Wesley, 1991-92); supplemented by The Unicode Standard,
Version 1.1 (prepublication ed.) (Mountain View, Calif.: Unicode Consortium, 1993). (Uni-
code is a trademark of Unicode, Inc.)
40. Alena L. Aissing, The Effectiveness of Cyrillic Transliteration (poster session given at the
American Library Association Annual Conference, Atlanta, July 1991).
41. Isomorphic representation-data are expressed exactly as found in the original documents,
in this case, in the Russian characters.
42. SAS Institute JMP User's Guide, Version 2.0 (Cary, N.C.: September 1989).
43. Alena L. Aissing, "Computer Oriented Bibliographic Control for Cyrillic Documents with
or without Script Conversion," Information Technology and Libraries 11 (Dec. 1992): 340-44.
TIME CONSIDER
OUTSIDE .0 N G HELP?