Professional Documents
Culture Documents
C D
E F
Figure 2. Six types of spelling variant plots: (A) showing clean transition, (B) barrel-shaped, (C) with consistent
demarcation, (D) with multiple crossovers, (E) funnel-shaped, (F) showing almost-equal usage preference rates for
both competing spelling variant groups.
Normalized frequency graphs for competing word sets spelling variant groups were extracted from the gathered
were then created to show transition periods in grammar and historical corpus. Regularization values for each spelling
orthography from 1900s to present. variant groups were then computed for all the years with
available data. All the regularization values of each spelling
2.4. Determining the overall conventionalization rate of variant groups were afterwards averaged per year in order to
usage preferences of spelling variants arrive at the overall conventionalization graph of the
Philippine national language’s spelling system over the
The regularization value of a group of competing word course of the 20th century.
forms is defined as the ratio of the number of times the most
commonly used word form for a particular publication year 3. RESULTS AND ANALYSIS
was seen in the text corpora-on-hand, over the total sum of
the number of occurrences of all the competing word forms Appendix I shows the 29 transformation rules that were
for that group. This metric can be seen as indication of how hand-crafted from manual inspections of the spelling variant
a language standardises in its use as seen from the groups extracted from the corpus using Levenshtein edit
publications of professional writers. For this study, all the distance as similarity measure. There are six prevailing
Figure 3. Overall Conventionalization Graph for the Tagalog/Pilipino/Filipino
types of graphs observed from the normalized usage spelling (capoua-> kapuwa, cong->kong, etc.) The 2000s
preference plots of all the 29 spelling variant groups. Figure graph shows the sudden variability in preferences
2 shows some representative plots of these categories. particularly in ways of adopting loan words. The plots
Many of the transformation rules (i.e. <qui> vs. <ki>) aggregated by 5 years and 10 years show that the spelling
are remnants of a Spanish-based system of orthography that system conventionalizes beginning from the 1920s until the
started to be replaced at the beginning of the century. The mid-1980s. However, it starts to drop from the mid-1980s
spelling variant groups that involve the use of the old henceforth.
Spanish system of orthography tend to show clean transition
3. CONCLUSIONS AND FUTURE WORK
lines, whose transition regions can be used to demarcate the
stage from an old system of orthography to a modern system
We have just described our method for objectively tracking
of orthography. Figure 2-A clearly shows that the transition
the development of the Philippine national language’s
region lies in the first decade of the 20 th century, centering
system of orthography by investigating usage preference
in the year 1905. The case of <v> vs. <b> in Figure 2-B
plots of competing spelling variant categories extracted
shows a funnel-shaped graph for their corresponding usage
from historical corpus composed of works published from
preference rates, which can mean that recent developments
the 1900s to the present times. The normalized usage
have caused a steady resurgence in the use of the alternate
preference plots yielded six types of spelling variant cases:
forms. While some of the competing forms in the spelling
(1) those that exhibit transitional plots typical of Spanish-
variant groups show consistent domination of one form over
influenced spelling conventions that are being supplanted by
the other, as in Figure 2-C, there are some spelling variants
their modern forms at the first decade of the 20th century,
that show multiple crossovers in usage preference rates,
(2) those that consistently show one spelling convention
such as in Figure 2-D indicating that the rules for these
being largely preferred over the other, (3) cases showing
particular spelling variant case groups are still not yet fully
multiple-crossovers indicating that rules regarding these
accepted within the writing community. The last two graphs
spelling variants still have not been settled, (4) cases that
(Figure 2-E and Figure 2-F) reflects that fact the Filipinos
show barrel-shaped graphs and (5) funnel-shaped graphs
are generally confused over the use of <i> vs. <e> and <o>
showing resurgence in the use of alternate spelling forms,
vs. <u> in the Filipino language’s written form. In fact, it
and (6) cases showing almost equal usage preferences
was also shown that this confusion is not just confined to the
indicating both alternative spelling forms are widely
Tagalog/Pilipino/Filipino language, but to the other major
accepted in writing community.
Philippine languages Ilokano and Cebuano-Visayan as well
The results of this study are particularly interesting to
[6].
planners of the Filipino language, since it enables them to
Figure 3 shows the resulting overall conventionalization
see, in very objective terms, how the national language
graphs of spelling variant groups, aggregated per year, by 5
develops, and what interventions have significantly affected
years, by 10 years, and by 25 years. The per-year plots show
its progress. Thus, a natural follow-up to this study would be
that there was a dip in the 1920s in the conventionalization
to correlate language-related socio-political and legislative
plots and recent years also see a gradual decrease of
historical developments to the usage preference and
regularity in language usage (from 2000 onwards). The
conventionalization plots culled from the gathered historical
1920s can be seen as a transition period where the old ways
corpus.
of spelling were gradually being supplanted by new ways of
4. ACKNOWLEDGMENTS [3] S. Orasmaa, R. Käärik, J. Vilo and T. Hennoste, "Information
Retrieval of Word Form Variants," in Seventh conference on
The authors would like to thank the Department of Science International Language Resources and Evaluation (LREC'10),
and Technology - Science Education Institute (DOST-SEI) Valletta, Malta, 2010.
for funding this research project as part of the Engineering [4] D. Archer, A. Ernst-Gerlach, S. Kempken, T. Pilz and P.
Research and Development for Technology (ERDT) Rayson, "The identification of spelling variants in English and
scholarship given to the first author. German historical texts: manual or automatic?," in Abstracts of
Digital Humanities, Paris: Sorbonne, 2006.
5. REFERENCES [5] Levenshtein, "Binary codes capable of correcting deletions,
insertions, and reversals," Soviet Physics Doklady, vol. 10, no.
[1] A. Ernst-Gerlach and N. Fuhr, "Retrieval in text collections 8, pp. 707-710, 1966.
with historic spelling using linguistic and spelling variants," in [6] J. Ilao and T. G. R. Santos, "Comparative analysis of actual
In JCDL '07: Proceedings of the 7th ACM/IEEE-CS joint language usage andselected grammar and orthographical rules
conference on Digital libraries, 2007. for Filipino,Cebuano-Visayan and Ilokano:a Corpus-based
[2] R. Giusti, A. J. Candido, M. Muniz, L. Cucatto and S. Aluisio, Approach," in 2nd Philippine Conference Workshop on Mother
"Automatic detection of spelling variation in historical corpus: Tongue-Based Multilingual Education, Iloilo, 2012.
an application to build Brazilian Portuguese spelling variants
dictionary," in Corpus Linguistics Conference, University of
Birmingham, U.K., 2007.
Appendix I. Hand-crafted transformation rules based on spelling variant groups extracted from running lexicon.
Highlighted rows correspond to transformation rules influenced by the Spanish system of orthography which
prevailed until the end of the 19th century.
Number Transformation Rule Example
1 <c>vs.<k> balcon / balkon
2 <c>vs.<s> princesa / prinsesa
3 <ch>vs.<ts> derecho / deretso
4 DASH vs. NODASH bahay-kubo / bahaykubo
5 <ñ>vs.<n> hañgarin / hangarin
6 <ñ>vs.<ny> españa / espanya
7 <f>vs.<p> filosopo / pilosopo
8 <ia>vs.<iya> biblia / bibliya
9 IPINAGREDUP_vs_IPINAGNODUP pinag-aagawan / pinapag-agawan
10 IPINAREDUP_vs_IPINANODUP ipinamamalas / ipinapamalas
11 <i>vs.<e> babai / babae
12 <iye>vs.<ie> impiyerno / impierno
13 <iyo>vs.<io> kolehiyo / kolehio
14 <j>vs.<h> jardin / hardin
15 <ks>vs.<x> taksi / taxi
16 <ll>vs.<ly> martillo / martilyo
maibibili / mabibili
17 MNAI vs MNA
naisasagot / nasasagot
MNAKAPAGREDUP vs makapag-iisa / makakapag-isa nakapag-
18
MNAKAPAGNODUP uutos / nakakapag-utos
makapag-iisa / makakapag-isa
19 MNAKAREDUP vs MNAKANODUP
nakapagsasabi / nakakapagsabi
20 MUTEH vs WITHH ospital / hospital
21 <ng> vs <n> kangi-kangina / kani-kanina
22 <o> vs <u> anu-ano / ano-ano
23 PrefixI vs WithoutPrefixI iginawa / ginawa
24 <qui> vs <ki> aquin / akin
25 RepeatingVowels vs non-repeatingVowels aakalain / akalain
26 <ui> vs <wi> dalauin / dalawin
27 <v> vs <b> españa / espanya
28 <w> vs <u> asawa / asaua
29 <z> vs <s> luzon / luson