You are on page 1of 6

Hapax legomenon


Hapax legomenon
A hapax legomenon ( /ˈhæpɪkslɪˈɡɒmənɒn/ or /ˈheɪpæks/;[1] [2] pl. hapax legomena; sometimes abbreviated to hapax, pl. hapaxes) is a word which occurs only once in either the written record of a language, the works of an author, or in a single text. While technically incorrect, the term is also sometimes used of a word that occurs in only one of an author's works, even though it occurs more than once in that work. Hapax legomenon is a transliteration of Greek ἅπαξ λεγόμενον, meaning "(something) said (only) once".[3] The related terms dis legomenon, tris legomenon, and tetrakis legomenon refer respectively to double, triple, or quadruple occurrences, but are far less commonly used.

Hapax legomena are quite common, as predicted by Zipf's Law,[4] which states that the frequency of any word in a work or corpus is inversely proportional to its rank in the frequency table. For large corpora, about 40% to 60% of the words (counting by type) occurring are hapax legomena, and another 10% to 15% are dis legomena.[5] Thus, in the Brown Corpus of American English, about half of the 50,000 words are hapax legomena within that corpus.[6] Note that the term hapax legomenon refers to a word's appearance in a body of text, not to its origins, nor to its prevalence in speech. It thus differs from a nonce word, which may never be recorded, or may find currency and be recorded widely, or may appear several times in the work which coins it, and so on.

Rank-frequency plot for words in the novel Moby-Dick. About 44% of the distinct set of words in this novel, such as "matrimonial", occur only once, and so are hapax legomena (red). About 17%, such as "dexterity", are dis legomena (blue). Zipf's Law predicts that the words in this plot should approximately fit a straight line.

the last three totals (for the Pastoral Epistles) are not out of line with the others. in 1896. Harrison's theory has faded in significance due to a number of problems raised by other scholars. Harrison. P.N.[7] Some scholars consider Hapax legomena useful in determining the authorship of written works. which ranged from 3. Workman also calculated the average number of hapax legomena per page of the Greek text. Titus 33. since it is easier to infer meaning from multiple contexts than from just one. Eph. Gal. . when he argued that there are considerably more of them in the three Pastoral Epistles than in other Pauline Epistles. I Tim. 82.Hapax legomenon 2 Significance Hapax legomena in ancient texts are difficult to translate and decipher. For example. He argued that the number of hapax legomena in a putative author's corpus indicates his or her vocabulary and is characteristic of the author as an individual.[9] Although the Pastoral Epistles have more hapax legomena per page. 43 Phil. W. 110.P. Col. each of Shakespeare's plays contains a roughly similar percentage of hapax legomena not found elsewhere in his work. 113. Philem. II Thess.6 to 13. Workman found the following numbers of hapax legomena in each Pauline Epistle: Rom. For example. and Biblical (particularly Hebrew) hapax legomena pose sometimes difficult issues in translation. Hapax legomena also pose challenges in natural language processing. 34. 11. I Cor. 5. as summarised in the diagram on the right. II Tim. 23. I Thess. 41. II Cor. in The Problem of the Pastoral Epistles (1921)[8] made hapax legomena popular among Bible scholars.[9] To take account of the varying length of the epistles. At first glance. 38. many of the remaining undeciphered Mayan glyphs are hapax legomena. 53. 99. For example.

both the language itself and a given author's knowledge and use of language. Authorship studies now usually use a wide range of measures.[10] In the particular case of the Pastoral Epistles. • text audience: if the author is writing to a peer rather than a student. all of these variables are quite different than in the rest of the Pauline corpus. again quite different vocabulary will appear.[12] It would not be especially difficult for a forger to construct a work with any percentage of hapax legomena desired. and hapax legomena are no longer widely accepted as a strong indicator of authorship (although the authorship of the Pastorals is subject to debate on other grounds). as they are likely to have little value for computational techniques. This was reinforced when Workman looked at several plays by William Shakespeare. rather than relying on a single measurement. of course many subject-specific words will occur only in limited contexts. clue vs clueless. much less thought it worth the effort. it is not a reliable indicator. and disparate authors can show very similar values. only about 400 of those are not obviously related to other attested word forms. and many other grey cases arise. the brevity of the Pastoral Epistles also makes any statistical analysis problematic.4 per page of Irving's one-volume edition). many words are hapaxes.Hapax legomenon Workman found the differences to be moderate in comparison to the variation between other Epistles. corpus linguistics and machine learned NLP. In other words. sign vs signature. as summarised in the second diagram on the right. 3 Computer science In the fields of computational linguistics and natural language processing (NLP).[11] There are also subjective questions over whether two forms amount to "the same word": dog vs dogs. such as: • text length: this directly affects the expected number and percentage of hapax legomena. However. will change. it seems unlikely that forgers much before the 20th century would have thought of such a ploy. since by Zipf's law.[13] Examples Some examples of hapax legomena in a given language or body of work are: . and look for a pattern across them.4 to 10.[9] Apart from author identity. • time: over the course of years. This has the added benefit of significantly reducing the memory usage of application. • text topic: if the author writes on different subjects. esp. it is common to disregard hapax legomena (and sometimes other infrequent words). A final difficulty with the use of hapax legomena for authorship determination is that there is considerable variation among works known to be by a single author. or their spouse rather than their employer. The Jewish Encyclopedia points out that although there are 1500 hapaxes in the Old Testament. there are several other factors which can explain the number of hapax legomena in a work. which showed similar variations (from 3.

found only in Job  10:10.[17] • zanǧabīl (‫ ليِبَجْنَز‬. Makka(t) (Q 48:24. The word is translated into English in several ways. in Psalms  95:10. • Akut (‫ טוקא‬. occurs exactly once in Chaucer. Ramaḍān (Q 2:185. found in a manuscript from around 1275. Qurayš (Q 106:1.Hapax legomenon 4 Hebrew examples • Gvina (‫ הניבג‬. the literal meaning is lost.[16] • aphedron "latrine" was a hapax legomenon thought to mean "bowel" until an inscription was found in Pergamos. Babylon).ginger) is a Qurʾānic hapax (Q 76:17). in the Exeter Book. Harut and Marut) occur only once in the Qurʾān. Magi). Maǧūs (Q 22:17. 25).cheese) is a hapax legomenon of Biblical Hebrew. • Flother. • Trasumanar is another hapax legomenon mentioned in Dante's Divina Commedia (Paradiso I. • Lilith (‫ )תיליל‬occurs once in the Hebrew Bible.fought). • The Greek New Testament contains 686 local hapax legomena. Mārūt (Q 2:102. precisely in Dante's Divina Commedia (Purgatorio XI.[14] Greek examples • autoguos (αυτογυος). which describes the desolation of Edom. an ancient Greek word for a sort of plough. and 54 occur in 2 Peter. ar-Rūm (Q 30:2." only appears in line 11 of Horace's Ode 1. • Nortelrye. only appears in Poem 12 of Catullus' Carmina. Bakka(t) (Q 3:96. presumably meaning a keepsake or aide-memoire. • panaorios (παναωριος). • The epitheton ornans aṣ-ṣamad (‫ دَمَّصلا‬. . Harut and Marut). (Ḏū) an-Nūn (Q 21:87) and Hārūt (Q 2:102. • Atzei Gopher (‫יֵצֲע‬-‫ רֶפֹג‬. sometimes called "New Testament hapax"[15] of which 62 occur in 1 Peter. 70. Because of the single appearance. a word for "education".the One besought (Names of God in the Qur'an)) is a Qurʾānic hapax (Q 112:2). which means "to fight fiercely" or "to struggle violently. translated as "Passing beyond the human" by Mandelbaum). • Slæpwerigne occurs exactly once in the Old English corpus. • Deproeliantis. English examples • Honorificabilitudinitatibus is a hapax legomenon of Shakespeare's works. Latin Examples • Mnemosynus. in Genesis  6:14. is one of many hapax legomena of the Iliad. Iram of the Pillars). ancient Greek for "very untimely". The word has become extremely common in modern Hebrew. Arabic examples • The proper nouns Iram (Q 89:7. is a hapax legomenon of written English pre-1900. Nasr (Q 71:23).9. only appears once in the Hebrew Bible. Bābil (Q 2:102. Ramadan). a synonym for snowflake. Mecca). Gopher is simply a transliteration. Quraysh). Ǧibt (Q 4:51). a participle of the word deproelior. Italian examples • Ramogna is mentioned only once in Italian literature. in the instruction to make Noah's ark "of gopher wood". although scholars today tentatively suggest that the wood intended is cypress. Tasnīm (Q 83:27). in Isaiah  34:14. Ancient Rome). is found once (and exclusively) in Hesiod: the precise meaning remaining obscure. Bakkah). There is debate over whether it means "weary with sleep" or "weary for sleep".Gopher wood) is mentioned once in the Bible.

Walvoord and Roy B. Cook. Jurafsky and J. "Epistles to Timothy and Titus" (http:/ / www. ( on-line (http:/ / www. 2002. [5] András Kornai. . google. edu/ hopper/ text?doc=Perseus:text:1999." [16] John F. Watchtower Bible and Tract Society. Terry L. 0057:entry=a(/ pac)]]. [14] "Ark. [9] Workman.g. Routledge. com/ view.H.v. Henry George Liddell. 04. 38-4. page 22. 2008. page 72. noted in The Catholic Encyclopedia. 9780809139750. Paulist Press. [6] Kirsten Malmkjær. Edinburgh University Press. [15] e. Mathematical Linguistics. ISBN 0882078127. Harrison.MIT Press. which occurs 19 times in Hermas. jewishencyclopedia.a New Testament hapax. com. [13] D. Random House. perseus. Zuck. University of Vienna. A Glossary of Corpus Linguistics. page 12. Manning and Hinrich Schütze.N.. "A Statistical Analysis of Certain Linguistic Arguments Concerning the Authorship of the Pastoral Epistles. Foundations of Statistical Natural Language Processing. ISBN 0809139758. ISBN 1846289858. 1982. Brown University. tufts. Oxford University Press. page 29 [18] http:/ / www. 87. [10] Steven J. [2] "hapax legomenon" (http:/ / dictionary. p. What are they saying about the Pastoral epistles?. jsp?artid=268& letter=H). ISBN 0748620184. 2nd ed. 2nd ed. Prentice Hall. "The Hapax Legomena of St. Expository Times. org/ cathen/ 14727b. 1983. 1921. htm). 7 (1896:418). reference. Oxford English Dictionary. Paul". [3] ἅπαξ[[Category:Articles containing Ancient Greek language text (http:/ / www. Includes a list of all the Old Testament hapax legomena. page 81. [8] P. 2008. "A Brief Defense of the Pastoral Epistles’ Authenticity". Richard Bauckham The Jewish world around the New Testament: collected essays I p431 2008 ". Design and Size" Aid to Bible Understanding. au/ books?id=IG7tE4-p-uUC& pg=PA87). pdf)) [11] Mark Harding. Oxford University Press. The Bible Knowledge Commentary: New Testament Edition. Midwestern Journal of Theology 2. ISBN 0415222109. 1971. [12] Article on Hapax Legomena in The Jewish Encyclopedia (http:/ / www. com/ project/ 4269 . Inc. David C. by book. com/ search?searchType=dictionary& q=hapax+ legomenon)". 2006. 1989. The Linguistics Encyclopedia (http:/ / books. com/ browse/ hapax+ legomenon). 1999. newadvent. [7] Christopher D. A Greek-English Lexicon at Perseus Project [4] Paul Baker." Doctoral thesis. ISBN 0262133601." Honors thesis. [17] Orhan Elmaz. edu/ pdfs/ academics/ wilder. Andrew Hardie. mbts. Dictionary. s. "Die Interpretationsgeschichte der koranischen Hapaxlegomena. Martin (2009). Speech and Language Processing. page 860. Unabridged. Robert Scott. DeRose.Hapax legomenon 5 External links • Open source Java software for text analysis and calculating hapax ratio ( JHapax ) [18] References [1] " hapax legomenon (http:/ / oed. and Tony McEnery. javaforge.. Wilder. The Problem of the Pastoral Epistles.1 (Fall 2003). 2001.

Maratanos. 0/ . Sderose. King Hildebrand.php?title=File:Workman'sShakespearePlays. org/ licenses/ by-sa/ 3.svg  Source: http://en. Leandrod. Anthony Appleyard. Eth. PierreAbbat. AnonMoos. Ohnoitsjamie. Audrey. Rocket000. Tbjablin.svg  Source: Looris.php?title=File:Workman'sPaulineHapaxes. Radagast3.svg  License: Public Domain  Contributors: Radagast3 License Creative Commons Attribution-Share Alike 3. Feline Hymnic. Sputnikcccp. EdgarMCMLXXXI. DaveGorman. The Evil IP address. Jengod. Davegerbil. Bryan Derksen. StAnselm.svg  License: Public Domain  Contributors: Radagast3 File:Workman'sShakespearePlays. Gpvos. Kevinpurcell. Shii. Sleigh. LizardWizard. Irrbloss. Isis. Philthecow. Deflective. Viriditas. Xanzzibar. Nethac DIU. N5iln. Wetman.wikipedia. Sources and Contributors 6 Article Sources and Contributors Hapax legomenon  Source: http://en.wikipedia. Erutuon. Scarlight. Timberframe. Drbreznjev. Feureau. Gabbe. Fluffernutter. 9 anonymous edits File:Workman'sPaulineHapaxes.php?title=File:Loudspeaker. Jmrowland.svg  Source: http://en. OperaJoeGreen.  Source: http://en. Zinnmann. Totnesmartin. Bennylin. Sonic3KMaster. Husky.0 Unported http:/ / creativecommons.gif  License: Public Domain  Contributors: Radagast3 File:Loudspeaker. Squandermania. Qwertyus. Tregoweth. Hmains. Rui Gabriel Correia.php?oldid=427482776  Contributors: A  License: Public Domain  Contributors: Bayo. Mississippifred. Vicki Rosenzweig. KnightRider. Hkd2029. Kwamikagami. Bigbluefish. Mpost89. Valley2city. Cuddlyable3. Quuxplusone. In ictu 93 anonymous edits Image Sources. Tomisti. Woohookitty. Sja. Provider uk. Lanceka.php?title=File:Moby_Dick_Words.h. Springhill40. Oreo Priest. Rob Hooft. Omegatron. Dnik. Myself488. Tothebarricades. D3av. Eratatosk. Arlen22.wikipedia. Iamunknown. Barbov. Gmaxwell. Wouterhagens. LittleSis1006. McGeddon. Crazytales. Санта Клаус.wikipedia. Licenses and Contributors File:Moby Dick Mdotley. Someone else.wikipedia.

Related Interests