You are on page 1of 16

Lingloss

Welcome to The Lingloss Project Page! In 1967, I designed what was meant to be an international auxiliary language called Lingloss. Like many other such projects, it was never really ready enough to inflict upon the public. In 2012 Lingloss still remains a work in progress; however, I believe I have recently made some progress on one aspect of the overall problem. The reasons for this belief are more fully detailed at http://www.richardsandesforsyth.net/docs/bunnies.pdf . So I am using this webpage share some software which, when more fully developed, may help designers of the coming international auxiliary language. (Yes, there will have to be one eventually: the human race can always be relied upon to do the right thing, as Churchill said of the Americans, once they have exhausted the alternatives.) The software is concerned with the problem of establishing a suitable core vocabulary. This is an obstacle that prior efforts have never convincingly overcome. What you will find when you download and unzip [glossoft.zip] is a pair of programs written in Python3 (along with various ancillary files) which address the following aspects of the vocabulary-building problem: 1.� How to choose a core collection of lexical items, i.e. what Hogben (1963) calls a "list of essential semantic units" (LESU), which is concise enough to be learnt in a matter of weeks and at the same time extensive enough to support the great majority of essential communicative functions; 2.� How to choose a suitable international word for each of the items in the LESU. Towards a Core Vocabulary The program corevox1.py takes in several lists of essential semantic units (formatted one item per line) and produces a consensus list consisting of all the items that occur in at least minfreq of the input lists, where minfreq is an integer from 1 (in which case the output is all the items that occur in any of the input lists) to N, the number of input lists (in which case the output is only those items common to all the input lists). Where do the input lists come from? Well, to test the program, four files containing previous attempts to come up with a LESU are provided (baslist, hoglist, longlist and maclist). These are, respectively: the Basic English wordlist (Ogden,1937); the LESU of "Essential World English" (Hogben, 1963); the defining vocabulary of the Longman English Dictionary (Longman, 2003); the defining vocabulary of the MacMillan English dictionary for advanced learners (MacMillan, 2002). [subfolder: lexicons] Ogden and Hogben were trying to establish minimal subsets of words needed for the majority of communicative purposes in simplified versions of English. Compilers of the Longman and MacMillan dictionaries were trying to establish basic word lists in terms of which all the other entries in their

Portuguese. so. so it is retained.py. it works in 2 stages. Romanian. the most typical representative of those input strings. I should perhaps apologize for anglocentric bias here. 'equus'. They didn't all settle on the same words. 'cavall'] which are the French. although most of the entries in these lists are relatively common. Latin and Catalan words for 'horse'. they are not mere frequency lists. the string in the group which is most similar to all the others of that group is chosen. young you yes yellow year would work word wool woods giovane voi sì giallo anno sarebbe lavorare parola lana bosco . Spanish. although in mitigation it should be noted that there is nothing in this software that limits it to the English language. but I would hope that others could apply the same methods to other languages: the comparisons would be instructive. no deletions or letter-exchanges make it more typical. Firstly. in a certain sense. the heart of this program is a function that takes in a number of strings (usually words. though they could be short phrases) and produces a string which is. The program works by reading in several (utf8) files in the format exemplified below. I am most at home with English examples. certain manipulations.dictionaries could be defined. are tried to see if they increase the similarity score of that string in relation to the rest and. For example. Secondly. if so. It finds the 'verbal average' of a number of different words. Towards an International Vocabulary The second program. Note that. Italian. as far as the field of interlinguistics is concerned. given the following inputs ['cheval'. They result from attempts to cover the most commonly used concepts without redundancy. 'caballo'. 'cavallo'. Therefore some high-frequency terms will be excluded if they are redundant. is more innovative. Thus all four lists represent principled attempts to create concise but effective vocabularies. such as dropping a character or swapping 2 adjacent characters. As currently implemented. the program computes that 'cal' is the most central or typical item. avwords3. 'cavalo'. 'cal'. As far as I know. but any term that appears in more than one of these sets is likely have a strong claim for inclusion in anyone's core vocabulary. to be a little more specific. using a string-similarity scoring function. the modified string is accepted. nobody has ever defined what a verbal average might be. In this case.

py to the several lexicons needed as input by avwords3. require human scrutiny anyway. if it ever gets into circulation. where each sourcelanguage item is associated with the 'verbal average' of the terms in the various target languages -intended as a first approximation to an English-Lingloss dictionary. I suspect. However. Example output produced from the seven small example inputs in the lexicons folder follows below. Incidentally. Lingloss.py. that decision is by no means set in stone.wood legno woman donna with con wire filo wing ala wine vino window finestra This is an extract from a simple English-Italian lexicon: each line consists of a source-language term followed by a target-language equivalent. the part that hasn't been automated is going from the LESU produced as output by corevox1. Mon Dec 24 16:28:24 2012 window fenestra wine vin wing ala wire fil with con woman mulier wood lea woods bos wool lana word parala work trabaar would voudrais year ano yellow gallo yes si you voi young jove On the basis of the example data provided here.py is a lexicon in the same format as the inputs. with tab character separating them. but to do it properly would.py. These sample bilingual lexicons can be found in the lexicons folder after you have unzipped the software. so that task is left as "an exercise for the reader". The output of avwords3. The main point of computerizing parts of the process is to permit exploration of alternative design decisions. would look very much like a Romance language. a kind of simplified. Each of these input lexicons uses the same source language (English in the examples provided) with a different target language (various Romance languages in the examples provided). modernized Latin. There are lots of public-domain bilingual lexicons. so it would be possible to write software that took a LESU and an existing lexicon (English-to-target-language in the present case) and produced suitable input for avwords3. .

This is meant to provide serious users with information to enable them to decide which of the proposed term equivalents need further attention. Much work remains to be done. comparison of alternative string-similarity scoring functions would be a good idea. 1 implies yes. parameter type name casefold 0 . Notepad++ or other text editors.python. but I believe they should run without alteration under Linux as well. The main point is to stimulate such work.. lexicons libs op p3 sample LESUs and small-scale bilingual lexicons common routines and variables for the programs in p3 default directory to receive output Python3 programs directory to hold parameter files parapath Each program requires certain input parameters. avword3. which are put into a text file that can be edited by Notepad. Then you will have to unzip the file glossoft. These programs are prototypes. This can be found at www. A table of parameters used by the programs follows. . 0 implies no. Parameters not given a value in the parameter file receive a default value. intended to illustrate a particular methodology. which I believe is novel.org I have tested these programs under Windows7. This will have subfolders as follows. Each line of a parameter file starts with a parameter name then one or more spaces then the value for that parameter.py also produces a listing file in which the quality of the 'verbal averages' is shown. 1 default 1 description whether to fold uppercase to lower case on input. as would a test of whether each target word should be rendered into a common phonetic representation or just taken as spelled.The English word 'would' isn't expressed by a single word in these languages. Running the programs To execute the programs you will have to obtain Python (version 3 not 2) if you don't already have it. and so on.zip preferably at your top-level directory. In fact. thus illustrates the need for human pre-processing or post-processing. Example parameter files for using the example data provided will be found on the parapath folder once the zipped file has been unpacked. For example. Unknown parameters are ignored.

a simple initial parameter file for corevox1. Hogben. Dictionary of Contemporary English.py. or not (0) The content of coretest. is copied below.txt vocfile c:\glossoft\op\corelist.txt. London: Kegan.txt outgloss c:\glossoft\op\glossout. voclists c:\glossoft\parapath\lesu.. Paul. Ogden. Interglossa.K. 1 per line whether to include the sourcelanguage term along with the target-language equivalents in avwords (1). (1963).dat / lexicons. L. 1 minimum number of input LESU files in which a term must appear in to be kept for output avwords_glos output file for consensus lexicon corevox_vocs output file for consensus LESU lesu. Macmillan (2002). Trubner & Co. Ltd. C. a starter parameter file for avwords3. London: Michael Joseph Ltd. Harmondsworth: Penguin Books.txt. (1943). is copied below.py. MacMillan English Dictionary for Advanced Learners. Harlow: Pearson Educational Ltd.jobname minfreq alphanumeric string integer same name as program 2 name to link output files outgloss vocfile voclists withkey Windows or Linux filespec Windows or Linux filespec Windows or Linux filespec 0 .txt withkey 0 Pretty simple. The ABC of Basic English. Essential World English.txt 0 input text file containing list of input file-specs. Longman (2003). L. voclists c:\glossoft\parapath\glossies. Oxford: MacMillan Education. eh? References Hogben. (1937).txt minfreq 2 The content of wordavs. . Trench.

Appendix Constructed Auxiliary Languages : Year Language 1661 Universal Character 1668 Real Character Characteristica 1699 Universalis 1765 Nouvelle Langue 1866 Solresol 1868 Universalglot 1880 Volapuk 1886 Pasilingua 1887 Bopal 1887 Esperanto 1888 Lingua 1888 Spelin 1890 Mundolingue 1892 Latinesce 1893 Balta 1893 Dil 1893 Orba 1896 Veltparl 1899 Langue Bleu 1902 Idiom Neutral 1903 Latino sine Flexione 1906 Ro 1907 Ido 1913 Esperantido 1922 Occidental 1928 Novial 1943 Interglossa 1944 Mondial 1951 Interlingua 1957 Frater 1961 Loglan 1967 Lingloss 1983 Uropi 1996 Unish 1998 Lingua Franca Nova 2002 Mondlango 2011 Angos Surname Dalgarno Wilkins Leibniz de Villeneuve Sudre Pirro Schleyer Steiner de Max Zamenhof Henderson Bauer Lott Henderson Dormoy Fieweger Guardiola von Arnim Bollack Rosenberger Peano Foster de Beaufront de Saussure de Wahl Jespersen Hogben Heimer Gode Thai Brown Forsyth Landais Jung Boeree Yafu Wood Forename(s) George Bishop Gottfried Faiguet Francois Jean Martin Paul Saint Lazarus George Georg Julius George Emile Julius Jose Wilhelm Leon Waldemar Giuseppe Edward Louis Rene Edgar Otto Lancelot Helge Alexander Pham Xuan James Richard Joel Young Hee George He Benjamin In Praise of Fluffy Bunnies .

one populated by vegetarians. apart from the obvious implication that derivatives were actual ly used to magnify risk rather than reducing it (doubtless by carnivores ignorant of Esperanto) . Wha t struck me about this throwaway remark. Richard Forsyth. Esperanto speakers and fluffy bunny wabbits. was its presumption that right thinking readers would t ake it for granted that Espera nto symbolize s well meaning futility -thus highlighting the author's status as a tough minded realist.Copyright © 2012. derivatives would be used for one thing only: reducing levels of risk. This is just one i llustration that disdain for Esp ." (Lanchester. an entertaining account of how highly paid hotshot traders in a number of prestigious financial institution s brought the world to the brink of economic collapse. I was s truck by the following sentence : "In an ideal world. 2010: 37). Background Reading John Lanchester's Whoops! .

eranto in particular . ( Not to mentio n the fact that a widespread ad option of Espe ranto . would have a seriously negative impa ct on their opportunities for gainful employment. ) Thus Esperanto has become a symbol of lost causes. And if you dare to raise the subject of constructed international languages with a professional translator or interpreter be prepared not just for disdain but outright hostility. Perhaps they have somet hing to learn from vegetarians and Esperanto speakers. as in many other countries. It turned out that they were in the grip of a collective delusion whose effects have impover ished us all. or one of its competitors. and can't see why the rest of us shouldn't become fluent in half a dozen natural languages in our spare time. Of course pr ofessional interpreters are among the most linguis tically gifted people on the planet. to be dismissed out of hand by practical folk. pervades int ellectual circles in Britain today . and auxiliary languages in general . In the world of supposedly practical folk today. . during an intercontinental recession. Yet those risk junkies busily tradi ng complex derivatives who brought us to the brink of ruin also thought of themselves as supremely practical hard headed folk.

" If you really believe that English is an adequa te lingua franca for Europe. let alone the world. "English is the international language these day s . cohort after cohort of schoolchildren emerge from secondary education unable to under stand any language other than their own. listening to colleagues conversing in Eu ro globish heavily laden with mispron . German or Spanish. often after years of instruction in French. At first glance. Luxembour g .the European Union spends vast sums of money each year on translating thousands of tonnes of docume nts into 23 di fferent official languages. Th e demand for simultaneous interpreters in Brussels. I spent 2 years as the only native English speaker in an EU project. Stra sbourg and at the UN consistently outstrip s supply. with Englis h as its official working la nguage. this would seem to represent a triumph for the language of Shakespeare and Churchill: our native tongue has conquered the world! Sitting in a meeting." retort the anglophone triumphalists. Meanwhile in the UK. "Never mind. try working in a multi national research project. and have been scarred by the ex perience.

But most of us have neither the talent nor the dedication to reach such a height in our mother tongue. but it has c ome a long way since then.ounced En glish jargon . and I personally am very fond of it. It seems more like a devious kind of linguistic ju jitsu. The anglicized pidgin that passes for English as an international language isn't the language I love. the offspring of a shotgun marriage between Anglo Saxon and Norman French. and when we e .t he most eloquent exponent of English as a means of communication that I have ever heard was a Hungarian. and it isn't a very effective medium of int ernational communication either. As it happens . Admittedly . English began as a creole. in which the world takes its revenge for being forced to a ccommodate monoglot English speakers by twisting their language into a barbarous dialect which they find awkward and unfamiliar. have sufficient ability to achieve communicative competence in Esperanto with in three months. still less in a foreign language. We do . one starts to realize that this is not the triumph of English a fter all. howe ver. trying to understand and make one's self understood.

or the thousands of unfamiliar characters.mploy it we'll be communicating with others in the same position as ourselves. but I have to accept them: that's the way it is. second language users. So it goes on. i. There won't be the fertile soil for m isunderstanding that exists when a na tive speaker instinctively exploits the quirk s of the language or a nonnative speaker makes a small slip of syntax with serious consequences. Hundreds. Esperanto was in several respects superior to Volapuk. But with an artificial language I'm tempted to think "that should be changed" whenever I come across a difficult or unappealing aspect. At certain points during a course on Esper anto you will come across a construction (such as using the so called accusative after a preposition to indicate motion) that makes you ask: why did Zamenhof do it that way -surely that wasn't a good idea? If I want to learn Chinese.e. and the Idists think than Ido is better is many respects than Esperanto. . Not everyone agrees. I may be daunted by the tonal system. there is always the seductive possibility of doing better. Why then does Esperanto remain a fringe cult? Why doesn't the EU insist that all children in Europe spend even a single term learning Esperanto? Part of the answer must be that. of artificial languages have been proposed in the past couple of centuries. Jespersen -no mere dabbler. he -believed that Novial wa s better than either. perhaps t housands. once you accept the idea of a constructed language.

after Esperanto. EU p olicy .Most never get used in action. the second most widely used artificial language. A list of those that have attracted at least some serious attention is given in the Appendix to this essay. Linguists sneer at it. Its grammar is far more r egular than that of most nat u ral languages. Nevertheless it is genera lly viewed as merely a hobby for cranks. In spelling it approaches the ideal of one character for one phoneme more closely than almost any natural language.g. Above all. apparently more elegant in concep t (e. is probably Klingon . In fact. for all its perceived imperfections. which was deliberately designed to sound ha rsh and be hard to learn! Only Esperanto. Interglossa. has ever sustained a community of users numbering more than a few thousand for more than a few decades . it has demonstrated repeatedly that international meetings can proceed smoothly without banks of interpreters sitting in cubicles and wires leading into everyone's ears. Lingua Franca Nova). have remained on the drawing board. early in the 21st century. we arrive at a situation where Esper anto stan ds as a proof of concept. consequently it is easy to pronounce from the page. but has failed to take off. Other international language projects. Its vocabulary contains a large number of roots found in the major European languages. Thus. consequently it doesn't impose a forbidding memory load on adult learners -prov ided that their first language is Indo European. consequently it can b e mastered in a month.

when hundreds before it have failed? Well. The strain placed on English in its role as de facto international language is turning it into a monstrosity. and which none of the more recent interlinguists seem to have exploited -the computer. bu t there is one advantage that neither Zamenhof nor any of the early pioneers enjoyed. In today's globalized civilization . 2. it might not. Take my Word for it! An international language needs (1) a simple o r t h ography . I am motivated to attempt to do something about it for two primary reasons: 1.makers would rather pour rivers of taxpayers' money into translation agencies and an endless stream of machine translation projects that never quite achieve their desir ed objectiv es than attempt to introduce Esperanto into the workings of the EU . Why should such a quixotic enterprise succeed. Therefore I intend part of my website to play host to yet another effort to devise a constructed auxiliary language for international communication. Personally. the need for a common international medium of communication is more urgent than ever before. I plan to kick off the process and with luck enlist some support. I believe this situation is highly unsatisfactory.

Novial. Most subsequent projects are open to the same criticism. The effect is not unpleasing. Zamenhof's approach to Esperanto vocabulary building can be described as 'eclectic'. a s Ogden (1937) and Hogben (1943) pointed out long ago. (2) a regular grammar. When it comes to creating a vocabulary. Unish. Typical interlanguage projects tend to emphasize the first two points but leave the third in the background.  Coherent . constructed languages take one of t wo main approaches:  Ec lectic . Examples include: Esperanto. but it is hardly systematic. Loglan. and (3) an easily learned vocabulary. What he didn't do was employ a clearl y stated method to create a concise but effective core vocabulary. Romance and Slavic languages of Europe. where the vocabulary is drawn predomin antly from a single source. sometimes with a small admixture of completely made up items. Yet choice of lexical units is the most important of the three. Examples inc . where the designers pick from a variety of linguistic sources. It is normal for propo nents of an auxiliary language to claim that its vocabulary is 'international' in some sense but the foundation for this claim is almost invariably subjective. It has bee n said that Esperanto sounds like a Czech speaking Italian.. He select ed a motley collection of roots from the Germanic.

none of these projects paid much attention to word economy. Esperantist s & Other Cute Animals My contention is twofold: firstly. Several projects have already shown this. where the accented consonant s are an irritant. Lingua Franca Nova (from the Roma n ce languages. in terms of learnability and usabilit y. 1951) boasts of having 27.lude: Latino since Flexione (from Latin). A Manifesto for Ve getarians. secondly. Interglossa (from Greek). Orthography : it is very easy to improve on English in this aspect. With the notable exception of Hogben's Interglossa (1943).g. that it is possible to create a language that is superior for this purpose. Again. than either English or Esperanto.unish. e. i.000 entr ies. to establish ing a minimal necessary core vocabulary. 1. the Interlingua English Dictionary (IALA. while the Unish website ( www. Interlingua (from the Romance languages). and not difficult to improve on Espe ranto. In other cases the designers appear to have relied on their intuitions to decide how many and which words were necessary. that the world does need an international language. Grammar : English grammar is a minefield for the unwary. 2. and Esperanto also contains some unnecessary pitfalls. Lingua Franca Nova.org ) has a section soliciting suggested new words from interested readers.e. ways of improving on . apparently using Catalan as a kind of tie breaker). Indeed.

this ha ve already been demons trated by Lingua Franca Nova among other projects. It is the third item that is really crucial. English much more so. and that is where all previous projects have fallen down. Lexis : Esperanto vocabulary is too large and disorderly . I believe the time is ripe for a more systematic approach. 3. with the aid of computer processing . .