Bulgarian, Cantonese, Catalan, Caucasian, Cebuano, Chattisgarhi, Chinese,Coorgi, Creole, Croatian, Czech, Danish, Dusun, Dutch, Esperanto,Estonian, Farsi, Finnish, Flemish, French, Fuzhou, Galician, Garwali,German, Greek, Gujarati, Haryanvi, Hawaiian, Hebrew, Hindi, Hokkien,Hungarian, Icelandic, Ilocano, Ilonggo, Indonesian, Irish, Italian, Japanese,Kadazan, Kannada, Kiswahili, Klingon, Konkani, Korean, Kurdish, Kutchi,Latin, Latvian, Lithuanian, Macedonian, Maithli, Malay, Malayalam,Mandarin, Manipuri, Marathi, Marwari, Nepali, Norwegian, Orriya, PaDutch, Pig Latin, Plattduitsch, Polish, Portuguese, Punjabi, Pushto,Rajasthani, Romanian, Russian, Sanskrit, Serbian, Shanghainese, Sindhi,Slovenian, Sowrashtra, Spanish, Swedish, Swiss German, Tagalog, Tamil,Telugu, Thai, Tulu, Turkish, Ukrainian, Urdu, Vietnamese, Visayan,Yiddish and Yupik! By smoothing estimates, it is safe to predict that at leasta few hundred more are spoken by AMT workers.There are some fuzzy (and not so fuzzy) interpretations. Hindi and Urdu areone language with some minor dialectal variation, as are Indonesian andMalay. At the other end, a number of the participants who reported speaking
‘Chinese’ probably speak any number of related languages, as distinctlanguages are often called ‘dialects’ within China, especially in relation to
the more prestige language
s. ‘Pig Latin’ is not a language. The one personwho claimed to speak Klingon … well, who knows, perhaps they do.
I combined the results with theWALS database to map the lineage and
origin of many of the languages, showing a huge geographical bias in the
distribution. The world’s languages are concentrated in or near the tropics
but those spoken here were predominantly from European or non-tropicalAsia in origin. Despite that, it is great to see a scattering of less widely-spoken languages like Kadazan (Austronesian) and Yupik (Eskimo-Aleut)showing that despite the biases in overall volume there is a
rich varietyof languages spoken by AMT workers. Six of the ten most commonlyspoken (Tamil, Malayalam, Telegu, Kannada, Marathi and Gujurati) do notyet have online translation tools via Google or Bing so there is clearly greatscope to support online translation for new languages, too.To populate the map in an interesting way, I also calculated the mostfrequent language reported at each hour of the day, restricting this to onelanguage per timezone. This gives us 24 languages (see below); one for each
hour of the day. I’ve added these to the map at midday for the timezone for
which they were most frequently spoken. This is more for visual effect thananything else, but it does give an idea of the optimal time to run tasks for any specific language, and strongly correlates with the part of the world thatthe language originates in (there are surprisingly few crossing lines).