Models of Grammar Evolution

:
Evolving English
Markus Gerstel
Lincoln College

Supervisors: Dr Stephen Clark

COMPUTING LABORATORY

Prof. Jotun Hein

DEPARTMENT OF STATISTICS

Thesis submitted in partial fullment of the requirements for the degree of MSc. in Computer Science
September 2008



I would like to thank in particular my supervisors Stephen Clark and Jotun Hein as well as Rune Lyngsø
and Ferenc Huszár and everyone at our presentation sessions for many insightful and helpful comments
This work was supported by stipend D/06/48365 from the German Academic Exchange Service, Bonn, Germany.



Table of Contents
1.1.
1.2.

Introduction
4
Cross-Language Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Models of Grammar Evolution – Report Structure. . . . . . . . . . . . . . . . . . 6

2.1.
2.1.1.
2.1.2.
2.2.
2.3.

Background
8
Linguistic Motivation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Retracing Language Evolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Modelling English Grammar. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Existing Language Evolution Simulations. . . . . . . . . . . . . . . . . . . . . . . . . . 12
Project Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Formalities
14
3.1.
Modelling a Probabilistic Context-Free Grammar . . . . . . . . . . . . . . . . . . 15
3.1.1. Context-Free Grammars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.2. Probabilistic Context-Free Grammars. . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.3. Limitations of Natural Language CFGs/PCFGs. . . . . . . . . . . . . . . . . . . . . 16
3.1.4. Beyond Context-Free. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.5. Inversion Transduction Grammars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.
Treebanks as Grammar Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.1. Penn Treebank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.2. CCG Bank. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.3. Penn Chinese Treebank. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.4. German Treebank TIGER. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.5. German Treebank TüBa-D/S. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.
Compatibility of Language Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4.
Comparability of Language Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5.
Introducing N-Gram Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.6.
Comparing N-Gram Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.7.
Distance Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.8.
Language Evolution Model – An Overview. . . . . . . . . . . . . . . . . . . . . . . . 30

4.1.
4.2.
4.2.1.
4.2.2.
4.2.3.
4.3.
4.4.

7.1.
7.1.1.
7.1.2.
7.1.3.
7.2.
7.3.
7.4.
7.5.
7.6.

Details & Results
31
Measuring Distances. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Modifying the English Language. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Adding and Removing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Change Across Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Change Within Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Directed Evolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Interpreting Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Conclusion & Outlook

39

References

42

Appendix
45
The Penn Treebank Tagset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Phrase Level Annotation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Clause Level Annotation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Word Level Annotation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Mapping Tagsets: Chinese/CTB to English/PTB. . . . . . . . . . . . . . . . . . . . 47
Mapping Tagsets: German/STTS to English/PTB. . . . . . . . . . . . . . . . . . . 49
Code: evolution.pl. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Code: ThsCorpora.pm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Code: ThsFunctions.pm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

 Chapter 1 Introduction .

Cross-Language Structure Anyone who ever considered learning a new language may have come across this or a similar quote. but simply because even a rudimentary knowledge of Latin cuts down the labour and pains of learning almost any other subject by at least 50 per cent. not because Latin is traditional and medieval. Others have a common ancestry. team. Tracing the history of these words is very easy if one has an etymological dictionary. A bilingual speaker will usually only notice the cases in which grammatical constructions differ between languages when thinking up a sentence in his native tongue and translating this sentence word for word into a foreign language. for example hand/Hand. Chapter 1 – Introduction “(…) the best grounding for education is the Latin grammar. borrowed via cultural exchange. Bremsstrahlung. They are word for word translations of each other in three West Germanic languages. However. On the other hand a fluent speaker might notice and appreciate sentence structures shared between both languages.…). Sayers. . Even more visible than words are borrowed letters or digits. But how would one know which word to pick and where to put it?  In linguistics an asterisk in front of a phrase indicates that the following phrase is ungrammatical. Correcting that sentence is easy: Moving the word ‘read  ’ in front of ‘this book  ’ solves the problem. The following examples illustrate such a structural relation: Ich habe dieses Buch gelesen. (English) All three sentences express the same fact. And even when sentence constructions differ it is not always obvious how exactly the sentence structure has to be changed during translation. A lot of German words can be found in the English language (Lied. like Umlauts or the Arabic numerals. (German) Ik heb dit gelezen. Some of these words are loanwords.1. It is quite common to find foreign words incorporated in a language. The German and Dutch sentences are grammatically correct. The Lost Tools of Learning (1947). I say this. It is the key to the vocabulary and structure of all the Romance languages and to the structure of all the Teutonic languages (…)” — Dorothy L.…) and vice versa (label. resulting in an awkward or ungrammatical sentence. Oxford 1. (Dutch) (*) I have this boek book read. once it comes to grammatical constructions it gets more difficult. Some words take more complicated ways: ‘shop  ’ in German is borrowed from the English ‘shop’ which has a common ancestor with the German ‘Schuppen  ’. and if you ever spoke to someone from these countries you may have heard something close to the (ungrammatical) English sentence.

1. A full description of the entire syntax tree annotation can be found in the appendix.) Fig.2. or in fact any other natural language. in these cases ‘S  ’ the tree is expanded by derivation rules (e. Counting the occurrences of each derivation rule and assigning it a probability results in a Probabilistic Context-Free Grammar. 1. The root ‘S  ’ of each tree describes the clause in its entirety as a sentence. In a syntax tree the words of the sentence can be found at the leaf nodes. section 7. Syntax trees can also be understood as derivation trees: Beginning with a starting symbol.g. (read. will be explored. . indicate the function of the phrase beneath. S Ž NP VP. Chapter 1 – Introduction Every sentence has internal syntactic structure. Chapter 2 will give a more detailed account of the linguistic and computational motivation behind this project. The required change between the German and English sentence can now be pinpointed S easily by comparing the two syntax trees: S NP PRP I NP VP VBP have PRP Ich (I) VP NP VBN read VP DT NN this book. Models of Grammar Evolution – Report Structure This project aims to create a novel evolutionary model for Probabilistic Context-Free Grammars (PCFG) of natural languages. The nodes directly connected to leaf nodes are annotated with so-called part of speech tags (POS tags) that describe the function the corresponding word takes in a sentence. 1 – Syntax tree for the correct English sentence VBP VP habe (have) NP VBN DT NN dieses (this) Buch (book) gelesen. terminals and these derivation rules constitutes a Context-Free Grammar (CFG). The question of whether such a CFG model can give an accurate representation of the English language. which can be described by a syntax tree. 2 – S  yntax tree for the correct German and incorrect English sentence Apart from the words both trees differ only in the ordering of the subtrees of the second. Fig. Inner nodes.…) until at the leaves only terminal symbols (words) remain. lower verb phrase (VP  ) node. For example in the noun phrase (NP  ) ‘this book  ’ the word ‘this  ’ is a determiner (DT  ) and book is a singular noun (NN  ). or branch nodes. The tuple of a starting symbol and sets of non-terminals.

especially in comparative situations.Chapter 1 – Introduction  In chapter 3 the necessary formal techniques will be introduced. In chapter 4 practical results will be presented. It will also give suggestions for continuation projects. The proposed distance metric will be tested on four language models derived from English. The idea will be to change the English language model in such a way as to move it closer to the German language. Finally the changes made to the English language model will be analysed and the results interpreted. The English language will then be subjected to the directed evolutionary model to produce Modified English. A possible distance metric between two languages will be derived and finally an overview of the entire evolutionary model will be provided. German and Chinese treebanks. Chapter 5 will summarize the main findings of this project. Beginning with the formal definition of CFG/PCFGs and an introduction to treebanks as sources for these grammars. a different language. this chapter will also reveal weaknesses and limitations of these CFG models. . A way around these limitations will be presented with the introduction of n-gram models.

 Chapter 2 Background .

 Chapter 2 – Background “What‘s past is prologue. namely irregular verbs like ‘scribere  ’. not (∗)scribsi Fig. Latin. Hamp et al. E.000 BC. (1994).” — Antonio in William Shakespeare. Proto-Indo-European Celtic Germanic Indo-Iranian Brythonic Welsh Iranian North Germanic Swedish Farsi West Germanic Icelandic Dutch English High German Fig. It may be possible that this Proto-Indo-European language existed but has never been used in writing. For example in Latin the regular perfect forms of verbs are created by the verb’s stem and the added suffix ‘si   ’. 3 – Part of the Proto-Indo-European language family tree 2. not (∗)figsi scripsi. Retracing Language Evolution The two main methods by which knowledge about the Proto-Indo-European language was acquired are called internal reconstruction and the comparative method. The Tempest (1611).1. Linguistic Motivation Germanic languages like English and German both evolved from a hypothetical common ancestor. The internal reconstruction method postulates the original verb forms (*)’figsi  ’ and (*)’scribsi  ’. It is  carpo duco figo scribo carpere ducere figere scribere carpsi duxi fixi. In contrast French. scene I 2. This however was not writing with linguistic content – and this happened in ancient China. All these languages however do have a common root in the (again hypothetical) Proto-Indo-European language. P. But there are some exceptions to that rule. There have been two notable attempts to create a text in reconstruted Proto-Indo-European: Schleicher’s fable originally by A. not Europe. There is no direct evidence for this language. According to linguistic theory this unified Indo-European language existed somewhere between 10. 4 – L atin verb forms The earliest evidence for something that could be considered writing emerged around 6. Act II. Schleicher (1868) and “The king and the god” by S.1.000 BC and 3. and not much is known about how it sounded or how it could have looked like in writing. Welsh and Farsi (Persian) are not Germanic languages. Proto-Germanic. The internal reconstruction method looks at the current state of variation within a language and postulates that exceptions to a general case are newer than the general case. .600 BC. Sen.1. K.

UE History of English. is simpler). the changes have to be natural.10 Chapter 2 – Background then tried to reconstruct the exceptions from their original forms. Lohöfer. ‘t  ’ or an unobserved element ‘X  ’. Secondly. This reconstruction has to satisfy two plausibility principles: Firstly.1. A third element ‘X  ’ is not required day deus diabolus ten two dies divine devil decem. A prerequisite for that is the availability of treebanks that contain information about historic grammar evolution. that there can not be a different postulate that has the same outcome but makes less assumptions (i. The hypothetical common ancestor of the sound correspondence example in figure 5 could either be ‘d  ’.2. Linguistic Reconstruction. common ancestral form can be established. In this case the postulated change is that devoicing occurs in front of voiceless consonants. Pre-Germanic). the generic evolution model presented in the following chapters can at a later stage be modified to accommodate accurate historic grammar evolution. simple and language-universal. With natural languages this may be more difficult than it seems. 5 – S  ound correspondences and would violate Occam’s Razor. the postulate has to satisfy Occam’s Razor. Hypothetical languages created by means of the comparative method carry the prefix ‘proto-  ’ (e. since there is not much information or evidence – certainly not enough for any kind of statistical approach – for any of the Proto-languages the goal of the project will not be to create an accurate model of historic language evolution. in other words the required sound changes have to be regular. While the immediate focus of this thesis lies on grammar evolution. Session 2. 2. The comparative method instead compares variation between two or more languages. April 2007 . not (∗)tecem duo. What should be considered ‘English’? The difference between American English and British English are just the tip of the iceberg. If a systematic correspondence of some feature can be established then a hypothetical. Again the plausibility principles are applied: A change d Ž t is quite uncommon.e.g. not (∗)tuo Fig. which is currently not the case. Hypothetical languages created by means of internal reconstruction carry the prefix ‘pre-  ’ (e. However. Arguably it is impossible to give a complete and accurate grammar for the English language. but the change t Ž d can be observed quite frequently in other languages. This is caused by two separate problems: The first problem lies in the definition of the English language itself. Modelling English Grammar Before attempting to describe how something changes it is usually desirable to know the current state of affairs.  Example and explanation from A. Proto-Indo-European).g.

The grammar for the English language in this project will be derived from the Penn Treebank. 25000 20000 15000 words. blogosphere and blawg. The different settings of place. for example ‘words’ like blog. M. The second problem is the inherent richness of language. An approach like this fails of course to keep up with new word crea- 0 0 5000 10000 15000 20000 25000 30000 35000 40000 Examined Sentences Fig. Noam Chomsky (1957) argued that English is so complex that it cannot be described by any context-free grammar. Mohri. R. 2006).…). Maybe language should be defined. Working with a statistical method on something organic and infinitely complex as language clearly requires more pragmatism than perfectionism. an annotated corpus built from articles in the Wall Street Journal. Sproat. While there is proof that this property holds for some languages like Swiss German. It will neither be a 100% complete nor a 100% accurate Probabilistic Context-Free Grammar for the English language.5 million articles and keeps a list of seen words – that list will always continue to grow. time and even social hierarchy result in different definitions of English. snowmobile. Even when one analyses a copy of the entire Wikipedia with over 2. snowball. William Shakespeare’s works are certainly considered well-formed English – but times have changed. .g. by some authority that would itself rely only on highest quality works from selected writers. and so has the language. vlog. snow- Number of Distinct Words found after examining n sentences 45000 man. The other extreme would 10000 be to only consider words found 5000 in a certain dictionary. mlog. 6 – Increase in the number of seen distinct English words while analysing the Penn Treebank tions. prescribed instead of described. This can be illustrated by the simple question “How many words are there in the English language?” While it is a myth that the Inuit have 400 different words for snow it is certainly possible to give 400 English compounds with snow (snowflake.11 Chapter 2 – Background There are differences in the dialects of English spoken in Yorkshire and London. then 30000 it is usually possible to construct an infinite amount of compound Distinct Words ers technical terms of any sort. probabilistic or not. the question of whether this is true for English is still open (e. If one consid- 40000 35000 especially chemical terms.

These can be represented by sets of rules or neural networks.Chapter 2 – Background 2. but this is not immediately obvious (via Sanskrit ‘svasar  ’. This approach also requires a direct connection between two languages. 1995) applied a machine learning technique to the morphology of English verbs in an effort to simulate evolution on a word level. Word analysis becomes more difficult the further apart two languages are: The English ‘sister  ’ descends from the same root as the Hindi ‘bahan  ’. The question of a required minimum complexity for any natural language has been studied as well as the question how a language can be learned at all. The final method is the comparisons of parameters in the given languages. These parameters can for example be the number of genders or the number of cases occurring in a language. 2002). H. the appearance of syntax and syntactic universals. Of course this estimation technique is limited. This kind of analysis relies on known rules of how spelling and pronunciation changed over time. in which independent entities form a society and interact with and learn from each other. It appears that so far there is no existing research on syntax level evolutionary simulation on one or more natural languages. Others are much closer to an existing language: (M. Cangelosi and D. Elman.2. 2002. Hare and J. as well . Lohöfer.3. The second method is estimating the distance by looking at the language family. 2. 2007). Parisi. Project Aims A pure linguistic approach to estimate distance between languages usually relies on one or more of these three methods: The first method is word analysis. The Latin pater became the English father and German Vater. 12 Existing Language Evolution Simulations Computer simulation has been used for very early stages of language evolution: The emergence of simple signalling systems allowing shared communication.g. A. Turner. genetic algorithms. L. Old English ‘sweostor  ’. The language family tree shown on page 9 makes it quite clear that English is much closer related to German than to Welsh. Simulation on the origins of language usually involves an agent-based view. Nothing can be said about whether Dutch or English is closer to Swedish. Most of these simulations however involve very abstract systems that bear no strong resemblance to any existing language (A. cf. Evolution then can be introduced by e.

English to Chinese). 2006) attempts to recreate language trees by applying concepts from molecular biology to a set of 109 parameters per language.g. . This project enables a novel statistical. (Ryder. It will also allow to measure an effective distance between two languages: If a language evolved independently from English but had the same grammatical structure the reported difference between these languages would be very small. semi-automatic approach that will work even between languages that are not connected by the language tree (e. A model to match grammars between languages may give practical results that can then be reused in other fields working with language: The most obvious application to benefit from a set of fixed rewriting rules would be statistical machine translation that could incorporate these rules in its model.Chapter 2 – Background 13 as more complex properties like word ordering. With these approaches it is still quite difficult to decide whether English is closer to Welsh or to Farsi. Later analysis would then show how to manipulate one language to get closer to grammatical structures of the other.

14 Chapter 3 Formalities .

” — Noam Chomsky. The set of nonterminal symbols will be the set of all phrase level and sentence level annotations and a dedicated starting symbol ∑. In the resulting context-free language each word. since not every allowed phrase is a sentence.1.1. Since the grammars in this project will be derived from treebanks the probabilities will be calculated by taking counts and using a simple maximum likelihood estimation: P (Y|X) = #(X→Y) #(X) . The probabilities of all rules with the same left hand side X   will always sum up to 1. To account the probabilities of all n   involved derivation rules: P                                                               for ambiguous trees the probability for a specific phrase is then defined to be the sum of the probabilities for all distinct derivation trees leading to that phrase. Modelling a Probabilistic Context-Free Grammar 3. R) N ∩ T = ∅.1. ∑ a starting symbol that is also an element of N. These derivation rules have of the form X Ž Y where X   is an element of N. Syntactic Structures (1957).R) with N   being a set of nonterminal symbols. and Y   is any GCF G := (N. T   a set of terminal symbols.∑. In a generative view this corresponds to the probability that one of the possible derivation rules is chosen. Σ.T. The language corresponding to the grammar is then defined as the (possibly infinite) set of words that can be generated by this grammar. 3. The use of a dedicated starting symbol together with rules like ∑ Ž S is required. A word is a string that contains only terminal symbols.15 Chapter 3 – Formalities “Colorless green ideas sleep furiously. Probabilistic Context-Free Grammars In a probabilistic context-free grammar there is a probability P(Y|X) defined for every rule X Ž Y. will therefore correspond to an entire phrase in the natural language.2. 7 – C  FG definition combination of elements from N   and T. In the context of this project the set of terminal symbols will be the set of parts of speech (POS) tags. a stream of terminal symbols. T. This also means that it is now possible to calculate the probability of a specific derivation tree by multiplying  n (DerivationTree) = i=1 P (Yi |Xi ). 3. (X → Y ) ∈ R ⇒ X∈N ∧ Y ∈ (N ∪ T )∗ Fig. . and R a (finite) set of derivation rules. Context-Free Grammars A context-free grammar (CFG) is usually defined as a 4-tuple (N. For example headlines or titles (“Models of Grammar Evolution  ”) are considered phrases.1. Σ ∈ N. but not sentences.

Assume the singular/plural problem is completely solved – the phrase (*)”much dogs  ” could still be generated. Another problem is the possibility of repeated derivation. This means that basically the number rules involving noun phrases doubles.1. rules like S Ž NP VP will have to be duplicated. But both “Happy Christmas!  ” and “Happy Birthday!  ” are acceptable.g. Finally there are expressions that are just not idiomatic in a language. This means noun phrases like (*)“one dogs  ”. Natural languages however can be very intricate and sometimes even appear to defy logic. Number agreement however does exist for example between third person singular verbs (‘VBZ  ’) and singular nouns (‘NN  ’/’NNP  ’). e. It therefore will lead to a lot of cases where the grammar diverges from real language usage. Limitations of Natural Language CFGs/PCFGs A pure context-free grammar as described so far is very simply structured. 3. etc. The perfectly fine rule NP Ž DT NP in “the book  ” could be applied a second time and would then lead to (*)“the the book  ”. This increases resource demand for computa- . There is for example no notion of agreement. So the process starts again duplicating rules for countable and uncountable phrases.Chapter 3 – Formalities 16 Probabilistic Context-Free Grammars are also used in a range of fields outside of Computer Science and Linguistics. 1993). Again this can be solved by increasing information in the nonterminal symbols. D. There is for example no notion of countability in POS tags. Encoding all this information into a context-free grammar will lead to an incredibly complex model. therefore there can not be any disagreement. Since in this project the PCFG generates POS tags instead of actual words it is not affected by some of these problems. Since the information from the nouns will have to get ‘transported’ to the verb. “Merry Christmas!  ” is fine. Searls. by creating another symbol for a noun phrase that cannot be derived to a leading determiner. There are of course consequences to putting more information into the grammar symbols: The number of symbols and rules increases dramatically. Verbs also may change depending on the number of the subject. for example RNA structures are routinely described by PCFGs (cf. (*)“Merry Birthday!  ” is not. so some verb rules will have to be split as well. (*)“two dog  ” could be generated.3. This is most obvious if a PCFG/CFG is used to create sentences consisting of normal words. One way to deal with this kind of problem is to encode more information into the set of nonterminal symbols: The original rule NP Ž CD NP where a noun phrase derives to a number plus a noun phrase could be split into NP Ž CD-singular NP-singular and NP Ž CD-plural NP-plural.

.5. Interestingly enough it was later proven by (K. But the increased generative power of context-sensitive grammars also results in increased operation complexity. Beyond Context-Free One way to increase the capabilities of a grammar is to reduce the constraints on its set of rules. In context-sensitive grammars the derivation rules can take the context of a nonterminal symbol into account. If context-free grammars are not powerful enough to accurately model natural languages the logical step would be to use context-sensitive grammars. The same probproblem can be solved in O(n           lem for context-sensitive languages is solvable in O(|N |n ) = O(2n ).1.4. This is basically a modified CFG that can simultaneously generate equivalent sentences in two different languages. Vijay-Shanker and D. The definitions of N  and ∑ stay unchanged.2. The grammars used in this project will not be augmented by any additional information beyond the POS tag itself. In order to avoid the exponential complexity of context-sensitive grammars.T2. Weir. Amongst the independently proposed extensions proposed were Combinatory Categorial Grammars (CCG).T1. Linear Indexed Grammars and Tree Adjoining Grammars. This is achieved by extending the CFG definition to a 5-tuple (N. 1994) that these four extensions all describe the same class of languages.R). In section 3. And finally any statistical approach will require a lot more data.2. They all describe languages that are more powerful than context-free languages without incurring the large computational overhead of context-sensitive grammars. While a mildly context-sensitive language model may approximate the English language better than a plain CFG model it remains unclear whether it is more useful in the context of this project. These extensions are chosen carefully so to be as powerful as necessary but as weak as possible. T1 and T2 are now the sets of terminal symbols in the two respective languages. or data sparsity will become a problem. Since there is no authoritative source for an English grammar (in the sense of a formal language) even a CCG/English grammar would only be an approximation. research focused on extending context-free grammars instead to reach a new class of so called mildly context-sensitive grammars. 3.Chapter 3 – Formalities 17 tions. 3. the CCG treebank will be analysed and its suitability for this project examined.∑. Head Grammars. Inversion Transduction Grammars Dekai Wu (1997) introduced a novel stochastic inversion transduction grammar (ITG) formalism.1. For example the word 3 ) for context-free languages with the CYK algorithm.

. Treebanks can either be created completely manually or semi-automatically with a parser suggesting a syntactic structure that then has to be checked by a linguist. but there is usually a tendency to prefer one word ordering. but for the second sentence the rule gets inverted. . (1) (2) (3) (4) (5) (6) (7) Σ NP NP IN DT NNP NN → → → → → → → [IN NP] [DT NP] NNP NN [of/de] [the/] [gallic/gallico] [war/bello] Fig. Treebanks as Grammar Sources Treebanks are collections of sentences. These pairs can consist of exactly one symbol from set T1 and one GIT G := (N. 9 – Sample ITG derivation rules →1 →2 →5 →3 →4 →6 →7 ( ( ( ( ( ( ( ( Σ IN NP IN DT NP IN the NP IN the NNP NN of the NNP NN of the gallic NN of the gallic war . taggers and other machine learning software working on Natural Language Processing.18 Chapter 3 – Formalities The derivation rules have either the form X Ž [Y] or X Ž <Y>. It is not possible to move the preposition ‘of/de  ’ into the inner noun phrase between ‘gallic  ’ and ‘war  ’ in this example. The inversion operator <> means the derivation is applied to the first sentence as usual. They are a very useful tool in computational linguistics to train and test parsers. . X   is any nonterminal symbol in N   while Y   is a string that can contain nonterminal symbols and pairs of terminal symbols. Either way it is a very labour-intensive and timeconsuming process. Latin is a free-word-order language. one sentence in each language. . (X → [Y ]) ∨ (X → Y ) ∈ R ⇒ X∈N � ∗ ∧ Y ∈ N ∪ (T1 ∪ {}) × (T2 ∪ {}) symbol from T2. 3. R) N � ∩ (T1 ∪ T2 ) = ∅. with a few exceptions monolingual.2. T1 . The most important part of the ITG are the bracket operators. . . Derivations are always applied to a pair of sentences. so ITGs will still work to some IN of/de NP DT the/ NP NNP NN gallic/gallico war/bello Fig. 11 – A syntax tree showing both sentence parses with the triangle marking the inversion operator extent. . 8 – ITG definition multaneously. Σ. Σ IN NP IN DT NP IN NP IN NN NNP de NN NNP de NN gallico de bello gallico ) ) ) ) ) ) ) ) Fig. Theoretically ITGs are of limited use with languages that have a very loose or even completely free word order. Σ ∈ N. The operator []  means the derivation is applied to both sentences normally. but either (or both) can be replaced by the empty word. annotated with syntactic information. 10 – Creating the sentence in both languages simultaneously by applying the ITG derivation rules Σ An ITG can not express all possible word mappings. si- Fig. . T2 .

1. What remains are the VP VBG NP Fig.129 words in 39. Work on the Penn Treebank started in 1989 at the University of Pennsylvania. Both the null element and the corresponding wh-word receive the index number 1.1. Is there an upper bound for the number of distinct rules? 0 0 5000 10000 15000 20000 25000 Examined Sentences 30000 35000 Fig. clause and phrase annotation can be found in the appendix.832 sentences.*T*-1)))) (. 13 – Syntax tree of the example sentence three derivation rules: SBARQ Ž WHNP SQ. Every sentence is annotated with a syntax tree. The original space of the word is marked by the null element *T*. In the question “What is Tim eating?  ” there is such a null element: Whatever it is that Tim is eating. For the grammar model these WHNP SQ VBZ NP sentence markers. section 7. By plotting the count of seen distinct derivation rules against the number of analysed sentences Number of Distinct Derivation Rules found after examining n sentences 12000 it can be easily shown that the 10000 a plateau. The growth behaviour is remarkably similar Distinct Derivation Rules rule count is not yet reaching 8000 6000 4000 2000 to that of the number of seen distinct words.014. A complete list of the 36 POS tags (without punctuation). These sections contain 1. unseen rules to be learned. 14 – Increase in the number of seen distinct derivation rules while analysing the Penn Treebank 40000 . ?)) tion however the food gets replaced by the word ‘what  ’. it should follow the verb ‘eating’ in a normal sentence. This indicates that there are still more new.2. This project will work on the sections 02–21 of the Wall Street Journal corpus. 12 – E xample sentence in Penn Treebank annotation which then moves to the front of the sentence. With these simplifications a total of 10. Fig. null elements and even a few semantic roles.19 Chapter 3 – Formalities 3.330 distinct derivation rules can be extracted from the analysed treebank sections. SQ Ž VBZ NP VP and VP Ž VBG NP. null elements. In a ques- (SBARQ (WHNP-1 What) (SQ (VBZ is) (NP-SBJ Tim) (VP (VBG eating) (NP (-NONE. Penn Treebank The Penn Treebank is the best known English language treebank. Additionally ‘Tim  ’ is identified as subject and marked SBARQ with ‘-SBJ  ’. In its latest revision it contains more than 7 million words. functional tags and the words themselves will be filtered.

Approximating the real distribution becomes difficult due to data sparsity. Another example that does not involve a trademark is the rule VP Ž (VB spin) (PRT off) (NP its textiles operations) (PP to existing shareholders) (PP in a restructuring) (S to boost shareholder value) of length 6. grouped by length of their right hand side ple would just result in one rule inNP NP NP NP NNP NNP NNP NNP NNP NNP NNP NNP New York Stock Exchange New York Stock Exchange Fig.832 distinct rules in the Penn Treebank. 15 – Number of distinct derivation rules seen in the Penn Treebank.20 Chapter 3 – Formalities Depending on the number of nonterminals N  and terminals T  the upper bound can be formulated as k n |N|                                n=1 |N ∪ T| with k   being the maximum length of the right hand side of a derivation rule. An alternative syntax tree is shown on the right. stead of possibly up to three rules.03*1023 possible rule combinations. 16/17 – Example of structural flatness in the Penn Treebank. the amount of information carried within a single rule diminishes. The term “New York Stock Exchange  ” for exam- # of Distinct Derivation Rules rule with an extraordinary length 1000 100 10 1 0 5 10 15 20 Non-Terminals in Derivation 25 30 Fig. The Penn Treebank annotation guide does not impose a maximum length. On the left the syntax tree of the term “New York Stock Exchange” as it appears in the treebank. Long derivation rules will inevitably lead to a long tail in the rule distribution: A lot of the rules are only observed a few times. . Even though approximating the real distribution becomes much easier due to the reduced number of possible rules. The above formula with k=12 gives a number of 1. The longest seen derivation Number of Distinct Derivation Rules with n Non-Terminals rule has length 12 – ignoring one 10000 of 27. The reason why the Penn Treebank has very long derivation rules lies in its flat annotation structure. Having only short rules however is not necessarily better: A maximum rule length of 2 would result in a maximum of 104.

The outcome of this application will result in a category that expects another NP  to its left. S.2. PP. a syntactic type. N  ) or functions. In a CCG each word is assigned a category. By design CCG derivation trees can only have a branching factor of 1 or 2.2. As in a normal syntax tree bought ‘Lotus  ’ is considered a verb phrase on its own. This type is the same (S\NP  ) as that of the transitive verb encountered previously. To the right the same sentence annotated with CCG In the sentence ‘IBM bought Lotus  ’ the intransitive verb bought expects two arguments: Something that is buying and something that is being bought.g. The set of primitive categories is very small and there are only a few function application rules. A selected set of the possible function applications is shown X/Y Y X/Y X Y X\Y Y /Z ⇒X ⇒X ⇒ X/Z ⇒ Y /(Y \X) Fig.  ’ Tim  will be assigned the primitive category NP. This closer connection dictates operator precedence: ‘Lotus  ’ has to be applied first. ‘Bought  ’ expects one argument to its right and one to its left. X  and Y  themselves stand for any possible category. 18 – Sample function applications in figure 18. CCG has the advantage that by following function application semantic structure within the sentence can be extracted. By applying these rules the complete derivation tree can be built from the bottom up. Now the category of ‘bought  ’ can be defined as (S\NP)/NP. (1) (2) (3) (4) Therefore the verb will be tagged S\NP  . NP. The transitive verb laughed   is an entity that will expect one argument (NP  ). Both subject and object are again of type NP. These types are either a primitive category (e. specifically to its left. the forward application operator (/). In the sentence ‘Tim laughed. Once this argument is applied the result will be a sentence (S  ). etc. . read: a sentence that is missing an NP   to its left. 19/20 – Example sentence with regular syntax tree on the left. Function categories are a combination of primitive categories. CCG Bank The Combinatory Categorial Grammar (CCG) Bank is based on the Penn Treebank. Derivation rules can be deduced by reading these rules from right to left: X Ž X/Y Y. S S NP IBM VP NP VBD NP bought Lotus IBM S\NP (S\NP)/NP NP bought Lotus Fig. the backward application operator (\) and appropriate bracketing to establish operator precedence.21 Chapter 3 – Formalities 3. The forward application (1  ) means that a function (X/Y  ) takes an argument (Y  ) to its right and returns type (X  ). The translation algorithm described in Hockenmaier and Steedman (2002) creates a categorial derivation tree for each sentence.

Sinorama news maga- ) (NN ))) )) ))) Penn Treebank. The treebank uses 33 POS tags (NP-PRD (NP (PP (P (NP (NP (NP (NN (without punctuation) and uses a format similar to that of the Fig. German Treebank TIGER The TIGER Treebank was a joint project of institutes at the Saarland University. The number of possible rules therefore is again unbounded. 23 – Example from the Chinese Penn Treebank mately 550. 22 – S  emantic change modifying two categories The second drawback is that modifying the language is much more complicated than with a simple CFG. This will make it very difficult to change the grammar in a controlled way to e.000 sentences.2.2. Too much encoded information will also make changing the grammar in a consistent way across many sentences quite difficult. Penn Chinese Treebank The Penn Chinese Treebank was also created at the University of Pennsylvania. This also causes categories to become increasingly complex in larger sentences. not the other way around.000 sentences from the German newspaper Frankfurter Rundschau. 21 – Word order change S/NP NP (S/NP)\NP IBM bought NP Lotus Fig. and is compiled from sources such as Xinhua newswire.000 tokens in 19. get closer to another language model. Since the derivation tree is completely dependent on the word level categories those have to be modified. It contains approxi- zine and Hong Kong News. In its latest version 2.g. It is freely available for research purposes. One small change can cause a ripple effect throughout the tree. An example from the CCG bank is the category (((S\NP)/((S\NP)/NP))/NP)\((((S\NP)/((S\NP)/NP))/NP)/NP) assigned to the phrase ‘BPC residents’.3. If one category is changed others have to be changed as well so that a valid tree can still be built.000 tokens in 50. .1 it contains 890. 3. which leads to data sparsity problems. S S S/NP NP (S/NP)/NP NP bought Lotus IBM Fig. Furthermore in more complex sentences there are interdependencies between words.22 Chapter 3 – Formalities But there are two major drawbacks making CCG unsuitable for this project. the Universität Stuttgart and the University of Potsdam. 3. The derivation tree is defined by the categories.4. Firstly the information is stored at the word level.

Sg. 24 – Example from the TIGER Treebank in XML format 3. To be able to cross the tagset barrier a translation.Sg. It consists of around 38.Ind"/> <t id="s4231_4" word="offenbar" lemma="offenbar" pos="ADJD" morph="Pos"/> (. a standard tagset used by all the major German treebanks.2. but it contains several important differences. Its 54 POS tags represent a slightly modified version of the STTS tagset. <graph root="s4231_VROOT" discontinuous="true"> <terminals> <t id="s4231_1" word="In" lemma="in" pos="APPR" morph="--"/> <t id="s4231_2" word="Japan" lemma="Japan" pos="NE" morph="Dat. annotated with the STTS tagset. a mapping from the tags in one language to the tags in the other.3. While overcoming the format barrier is merely a programming exercise. working with disparate tagsets can be tedious. the only difference to the English tagset will be that elements that cannot be reached from the foreign language will be removed.000 words. This common tagset will be kept as close to the English tagset as possible.Chapter 3 – Formalities 23 The treebank was semi-automatically POS-tagged and annotated with syntactic structure. STTS was developed at the University of Stuttgart and the University of Tübingen.. While the Penn Treebank offers 36 POS tags (ignoring punctuation tags) the Chinese Treebank has 33 POS tags. annotation guidelines and tagset. gender. The way this mapping will be accomplished in this project is by means of an intermediate (common) POS tagset and two surjective and total translation functions. person. number.Neut"/> <t id="s4231_3" word="wird" lemma="werden" pos="VAFIN" morph="3. The tagset used in the Chinese Treebank was clearly inspired by the Penn Treebank tagset.000 sentences with about 360.5.) <nonterminals> <nt id="s4231_500" cat="PP"> <edge label="AC" idref="s4231_1"/> <edge label="NK" idref="s4231_2"/> </nt> Fig. degree. It is also freely available for scientific purposes. Compatibility of Language Models Working with data from more than one treebank inevitably leads to the problem that every treebank likes to use its own format.Pres. or merged with reachable elements. or as a last resort collected in a special catch-all tag that will be used for all tags that have no correspondent. Nearly all words in the TIGER treebank are additionally annotated with detailed information about their case. However out of these 33 POS tags only 31 tags . 3. It does however not come with the additional information available in the TIGER treebank. tense and mood. German Treebank TüBa-D/S The Tübingen Treebank of Spoken German (TüBa-D/S) is a syntactically annotated corpus based on spontaneous spoken dialogues. is required.

24

Chapter 3 – Formalities

actually appear in the corpus, out of which 8 are particles and 6 are classified ‘other’. The remaining
17 tags are sometimes quite different from the English set. The Chinese word         can, depending
on context, be translated to destroy, destroys, destroyed, destroying or destruction. Consequently
the Chinese treebank only has 4 POS tags to indicate different verb forms, but these do not map
directly to any of the 7 POS tags of the English treebank. There is a suggested mapping from Chinese
tags to English in Xia (2000), but it is neither total nor surjective nor a function.
A small excerpt of the mapping procedure is presented to the right. The Chinese tags VA   and VV   can
both mark verbs equivalent to the English VB  . Ad-

 VB







 MD
VV
VA

ditionally VV   can mark modal verbs, MD   in English.
Since it is impossible to distinguish whether a tagged
Chinese word corresponds to VB   or MD   a single
common tag will be assigned to the entire group.
The two translation functions will then map every involved tag to this new common tag. Likewise Chinese

VB

VA
 
V 

MD

VV


 NN







NNS
NT

NN

(CTB)

(PTB)

NN

NN
 
N N

NT
(CTB)

NNS
(CTS)

(PTB)

Fig. 25 – E xample of POS tag mapping functions

noun tags lack number information, so they will be mapped to the English singular noun NN  . The
English plural noun NNS   stands isolated. Depending on the view the mapping has to be surjective
and total, so isolated tags can not be not allowed. NNS   is closest in meaning to the Chinese NN  , so
it is added to the noun cluster. Once all isolated tags are assigned to their closest equivalent (or a
catch-all common tag for tags without an appropriate equivalent) the entire connected component
is assigned a tag in the new common tagset.
One possible mapping between English and Chinese tags created by this method is presented in
the appendix, chapter 7.2. In this case the common tagset consists of 18 tags. Generally a higher
number of common tags is expected to give more reliable results later on, as a higher number suggests a higher precision in what this cluster represents. A lower number of common tags represents
less information, and will make the languages appear closer together than they really are. If the common tagset were to contain only a single tag (i.e. WORD  ) then the involved languages would look
identical.
Trying to map CCG categories to POS tags in other languages poses huge difficulties. Due to the
possibly infinite number of distinct categories an automatic mapping process would be needed. Together with the already mentioned problems of modifying the grammar it was decided not to pursue
the use of CCG in this project any further.

25

Chapter 3 – Formalities

The German STTS tagset is also quite different from the English tagset but it is arguably a lot
closer than the Chinese tagset. In fact the additional annotation in the TIGER treebank can be used
together with the German POS tag to find appropriate English tags to map to. Most of the time
a very exact matching, in some cases 1:1, is possible. The common tagset between English/Penn
Treebank and German/TIGER contains 32 tags.
Since TüBa-D/S is lacking the additional annotation of the TIGER treebank some information is
lost. Like with the Chinese treebank singular and plural cannot be distinguished. The third person
verb forms have to be unified with the regular ones, differentiation between different degrees of
adjectives is no longer possible. The resulting common tagset between English/PTB and German/
TüBa-D/S contains only 26 tags.

3.4.

Comparability of Language Models

With the introduction of a common tagset it should now be possible to compare two languages.
Comparing languages is required to do directed evolution. Ideally one has a function that returns
a number indicating whether two language models are equal, or how far apart they are. HowevSL1
NP

SL2
VP

NNP

VBD

OP

Sue

saw

NP

NNP

VBD

NP

Sue

saw

NNP

NNP

Mary

Mary

Fig. 26/27 – The same sentence in two different annotations

er comparing two CFG language models is harder than it seems. Strictly speaking the equivalence
problem for Context-Free Grammars is even undecidable.
With these two sentences from two language models it is not possible to deduce whether the
two models describe the same language. One way of comparing the two languages is to compare
the derivation trees of the sentences. The node annotation is a bit different and would have to be
translated to a common tagset as it was the case with the POS tags. It can be seen easily that a
noun phrase is a hyponym, a sub-type of an object phrase (OP  ). But except from the derivation
NP Ž NNP for Mary the derivation trees have nothing in common. Another way of comparing is by

Chapter 3 – Formalities

26

looking at the output. The actual words are of course not included in either CFG, but they would be
of no help since the two CFGs are describing different real languages and hardly any words would
match. So the POS tag sequence NNP VBD NNP   has to be used instead.
It is very simple to decide whether this sequence can occur in the other language. The word
3
) with the CYK algorithm. The algorithm
problem for Context-Free Grammars can be solved in O(n
         

would however only return true or false – not exactly helpful to determine the distance between
two languages. A better variation is not to determine if a sequence can be generated, but to determine the likelihood that this sequence is generated. The probability of a sequence is the sum of all
the probabilities of its possible parses, the probability of a specific parse is the product of the probabilities of all involved derivation rules.
A few problems remain with this approach: If there is no possible parse for the sequence then the
sequence will have a probability of zero. Especially with flat grammar structures this could be caused
by data sparsity. It was shown in chapter 3.2.1. that there are 1.03*1023 possible rule combinations
for Penn Treebank derivation rules with a maximum length of 12. Data sparsity is expected to be a
big problem, especially because the two language models will not be based on translations of the
same sentences.

3.5.

Introducing N-Gram Models

There are two ways to avoid or mitigate the data sparsity problem. One is to apply some kind of
smoothing to the PCFG model to assign unseen derivations a low but non-zero probability. Smoothing would likely be a quite complicated process, and it is uncertain how successful smoothing on the
PCFG can be.
There is another widely used model in natural language processing, called the n-gram  model. The
probability of a sentence S   occurring in a language L   can also be rewritten using the chain rule and
conditional probabilities. These probabilities can then be approximated by making the independence
P (S|L)

=
=
3-gram

2-gram

P (t1 t2 t3 t4 t5 . . . tn |L)
P (t1 |L) × P (t2 |L, t1 ) × P (t3 |L, t1 , t2 ) × P (t4 |L, t1 , t2 , t3 ) × P (t5 |L, t1 , t2 , t3 , t4 ) × . . .

P (t1 |L) × P (t2 |L, t1 ) × P (t3 |L, t1 , t2 ) × P (t4 |L, t2 , t3 ) × P (t5 |L, t3 , t4 ) × . . .

P (t1 |L) × P (t2 |L, t1 ) × P (t3 |L, t2 ) × P (t4 |L, t3 ) × P (t5 |L, t4 ) × . . . × P (tn |L, tn−1 )

Fig. 28 – P 
robability of a sentence occuring in a language according to the chain rule, a trigram and a bigram model.

assumption that the probability of any specific tag occurring only depends on the (n-1) preceding
tags. Each of these sub-sequences containing (n-1) tags plus the following tag constitutes an ngram. Probabilities are estimated by the usual maximum likelihood estimation.

it is not necessary to implement a parser to calculate the probabilities. However this technique allocates a relatively high probability to the entire mass of unseen n-grams. Katz.  The practical viability of n-gram models was for example shown in the practical of the lecture Computational Linguistics. There are some established smoothing methods for n-gram models. so again smoothing is crucial. It was possible to correctly annotate between 88% (on texts with previously unseen words) and 98% (on texts without unseen words) of the untagged text with the help of a bigram model. Disregarding parts of the history of a sentence of course means that in some cases long range dependencies cannot be taken into account.Chapter 3 – Formalities 27 The described view is that of an (n-1)-th order Markov model. The assignment was to program a Parts of Speech tagger that was to be trained on the words and tags of a part of the Penn Treebank. This project will use 5-gram models. This technique leaves probability estimates of very common n-grams as they are. 1987). which is why it is usually not used in practice. and finally it is not necessary to create PCFG models for the foreign languages as the n-gram models can be extracted directly from the corpora. Probability estimates of rarely seen (less than k   times) n-grams are reduced in favour of the probability of unseen n-grams. Precision can be balanced against data sparsity issues by choosing an appropriate value for n  . 5 in the Chinese model and 10 in the German models. As in any other Markov model the past beyond a certain point does not influence the current state. as the estimation is deemed to be quite reliable due to a lot of evidence. Using n-gram models instead of PCFG parsing probabilities has three more practical advantages: It is not necessary to map phrase and clause level annotations between language models. For this project k   was chosen to have a value of 9 in the English model. The simplest method is to treat any n-gram like it was seen at least once. With n-gram models the problems of directly comparing PCFGs can be avoided. A far more sophisticated solution is the Katz back-off smoothing technique. presented in (S. M. Of course there is a very high number of 5-grams that will only be seen a few times or not at all. . The probability of unseen n-grams is estimated recursively by looking at the Katz smoothed (n-1)-gram model of the data. and then had to assign POS tags to untagged text.

One way to measure how far apart two probability distributions over the sentence space are. . Comparing N-Gram Models With this definition of a sentence probability the next logical step is to define a distance between tences are shown on the right. For the English Penn Treebank section 00 (containing 1. but the Katz smoothing of the ngram model ensured that there are no sentences with a zero probability. very high weights.00001 0. it does not make sense to compare probabilities between sentences of different lengths. 29 – S  entence probabilities in English and German language German texts are stated.28 Chapter 3 – Formalities 3.015 0. The best way to approximate a weighted sum is to only consider the most important summands with KL1 can be found easily – in the L1 corpus. This definition is of course rather problematic for any real world scenario.015 S two sentences. which is the reason why it is called KL divergence rather than KL distance. The KL divergence  P (i|L ) P (i|L2 ) 1 P (i|L1 ) log of two language models L1 and L2 is defined as d(L                                                          . The real sentence probability function P   is unknown and approximated by the Katz smoothed n-gram model K  . Sentences with a high probability         While technically any sentences from the corpus could be used for the approximation of the KL distance it seemed prudent to use a set of sentences that is separate from the set of sentences that was used to create the PCFG or n-gram models. Since calculating the probabilities of an infinite number of sentences is out of the question another simplification is needed.00095 0. So while it makes sense to compare the probabilities of sentences of the same length between languages.921 sentences are put aside and are not used to create the language model. probabilities of it appearing in English texts and in P (S|LE ) P (S|LG ) read. (NN) Fig. but within the example plausible.6. Intuitively the total difference between two entire models is the sum of all the differences on a sentence level.921 sentences) is used for this task. For each sentence fictional. which is just a 1 ||L2 ) = i weighted sum of the probability differences between all sentences. The weighting of the sum causes the KL divergence to become asymmetric. Similarly in the Chinese and German treebanks the first 1. 0. The weighted sum of course means that any sentences with a zero probability can be ignored.00001 0.00095 book. Due to the used definition of sentence probability shorter sentences will usually have a much higher probability than longer sentences. 0. Hence the  K (i) KL2 (i) L1 KL1 (i) log real KL divergence can only be approximated with d(L                                                        . Three example senI have I have I went (PRP) (PRP) (PRP) (VBP) (VBP) (VBD) this book (DT) (NN) read this (VBN) (DT) (VBN) (NN) home. is the so called relative entropy or Kullback-Leibler (KL) divergence. The next prob1 ||L2 ) ≈ i lem is that the KL divergence is a sum over the entire sentence space.

The new KL divergence is then calculated. In the weighted sum only the probability relative to the other sentences in the set s   is required. But there is good reason to change the weighting entirely: As it was mentioned before. Distance Learning A controlling instance can now close the loop. is then fed into a new n-gram model. This optimization problem can for example be tackled by a simulated annealing algorithm or by genetic algorithms. To get from the asymmetrical divergence to a real distance between two language models the divergence and the inverted divergence can be added up: d(L                                                          . After removing an infinite chunk of the original weighted sum it is a natural step to normalize the remaining weights so that the sum of the weights adds up to 1 again. The output of the PCFG. While d(L 1 ||L2 ) ≈ n log L1 KL2 (si ) i=1 this will usually still be the case it can due to the approximations no longer be guaranteed. In the intermediate steps the KL divergence will rise. Since the sentences in the set s   are probably not of a uniform length using the n-gram probability at this point will certainly bias the results toward short sentences.29 Chapter 3 – Formalities n  K (sL1 ) KL2 (si ) L1 i 1 The KL divergence approximation is now defined as d(L                                                                  with s   KL1 (sL 1 ||L2 ) ≈ i ) log L1 i=1 being the set of sentences put aside for the divergence calculation. A simple way to estimate this probability is by using a maximum likelihood estimation on the sentences in s  . . which finally results in the KL divergence approximation 1 n  KL1 (sL i ) 1                                                    . Depending on whether the modified English got closer to the foreign language the modification is kept or reverted. In a lot of cases more than one change to the original grammar is needed to get closer to a foreign grammar. Hence the value of greedy algorithms will be limited. 1 .7. The process then starts anew. L2 ) := d(L1 ||L2 ) + d(L2 ||L1 ) 3. One solution is to re-estimate the sentence probability. It modifies the original English Probabilistic Context Free Grammar extracted from the Penn Treebank. In other words each sentence in the set s   will be predicted as equally likely. using the sentence probability to compare between sentences of different lengths is not very meaningful. sentences in POS tag form. The original KL divergence was guaranteed to be nonnegative.

Penn Treebank 3. Katz smoothed 5-gram model   TIGER German 3.  PCFG ModEN 4.2.921 sentences VP VP NP VBD Lotus bought     Modified English POS streams/PTB   TIGER German POS streams/STTS POS streams/STTS map tags map tags  Modified English 3. sect.1.  S NP IBM VBD NP bought Lotus 3.  TIGER German 3. Language Evolution Model – An Overview The schema below provides an overview for one complete possible (directed) evolution model for natural languages. Numbers next to the elements reference the relevant chapter.3.2. 30 – The entire directed evolution model  .7.2.2.5. POS streams/CTS   TIGER German map tags    POS streams/CTS POS streams/CTS    Controller modfies grammar to  3. S NP IBM first 1. Katz smoothed 5-gram model  Fig. 02-21  PCFG English 3.6.  d(German||ModEnglish)   Modified English 3.1.  minimize divergence     KL Divergence  3. TIGER Treebank 3.30 Chapter 3 – Formalities 3.5.  TIGER German 3.6.4.8.3.3.

31 Chapter 4 Details & Results .

56 d(LE . a collection of newspaper text. These can then be fed into the n-gram model. and apparently both the written and spoken German sentences are equally distant from the English language model. This direction of the divergence is calculated by calculating the probabilities using sentences from the German corpora. As was already mentioned in chapter 3. and the fourth column holds the size of the used common tagset.21 27.32 Chapter 4 – Details & Results “Those who know nothing of foreign languages know nothing of their own. a smaller common tagset will cause the distance between languages to appear smaller than it actually is. the sum of the KL divergences. the English language PCFG has to be translated into an n-gram model. 4. Measuring Distances foreign language LF Penn Chinese Treebank German Treebank TIGER German Treebank T¨ uBaD/S d(LE ||LF ) 31. There are three ways of doing this: The mathematically correct way is to start with the PCFG and generate all possible sentences.89 d(LF ||LE ) 45. For this model a number of POS tag sequences is required.29 14. Modifying the English Language Within the language evolution model as shown in section 3. and explained in section 3. The third column contains the KL distance. It is all the more surprising and reassuring that the calculated divergences between the languages behave as expected. TüBaD/S instead is a collection of spoken dialogue. The divergences between English and Chinese and vice versa are the highest across-the-board. This result was to be expected considering that TIGER is. get interrupted mid-sentence.3.5. A rather interesting result is that the divergence d(L                   F ||LE ) is nearly identical for both German language models.91 17. LF ) 77. The model based on TIGER gets closest to English. .2.45 |CTS| 18 32 26 Fig. The first two columns show the KL divergence and the inverse KL divergence between the languages. and generally not adhere to a strict grammar when speaking. 31 – KL divergences and distances of English to the three foreign language models In a first step the existing divergence between the n-gram model of original English and the other languages is determined. Its model should inherently be a lot more chaotic as people tend to insert interjections. like the used sections of the Penn Treebank.1. with TüBaD/S coming in second.” — Johann Wolfgang von Goethe 4.8. and the distribution of POS tag sequences corresponds exactly to the language defined by the PCFG.38 14. although the associated common tagset is the smallest with only 16 tags.59 42.20 31.

Chapter 4 – Details & Results 33 Creating an infinite number of sentences is inherently impractical. In the syntax tree model sentences are manipulated directly.2.1. and having very long sentences will bias the n-gram model. Due to the dependency of the PCFG on the syntax trees it also means that the PCFG does not have to be generated explicitly. and it again becomes necessary to generate POS tag sequences for the n-gram model. In a Probabilistic CFG this corresponds to assigning a positive or zero probability to a derivation rule and subsequently re-normalizing the probabilities of all derivation rules with the same left hand side. so an alternative is to use random sampling. The advantage of having the syntax trees is that the extracted POS tag sequences mirror the PCFG distribution exactly. so that the sum of these probabilities equals 1. By creating random sentences according to the PCFG the distribution of the POS tag sequences will converge to the distribution of sentences in the Probabilistic Context-Free Language. Care has to be taken that the length of the generated sentences has a similar distribution as the original language. In order to do this the syntax trees of all those sentences have to be kept. The drawback to this approach is that manipulating the PCFG directly is no longer possible – otherwise the link to the syntax trees is lost. 4. Generating random sentences is reasonably simple. The consequences of this choice and the possible operations on the language will be explored in the following sections. If a rule that is to be deleted has sub-trees then . while removing rules deletes existing POS tags. Another problem with this randomized algorithm is that the output will not be consistent. These operations were not used in this project. Adding and Removing Rules The most basic operations on a Context Free Grammar are to add or remove derivation rules. as they come from the same source. Any modifications to the grammar have to be made by changing the stored syntax trees. Finally there is the realistic and pragmatic approach that avoids generating sentences altogether: Since the PCFG is generated from real sentences from a corpus an obvious choice is to use these sentences to create the n-gram model. This means adding rules involves introducing new POS tags into existing sentences. so the KL divergences obtained before and after a language modification cannot be compared directly. The POS tag sequences required to create the n-gram model can be recovered from the syntax tree by means of an inorder depth first search. The PCFG can be derived easily from those syntax trees. as there is no sentence length information encoded in a PCFG. In the program used in this project to simulate language change the syntax tree model was used.

the linguistic motivation behind these sentence modifications seems inconclusive. Moreover. but assigns a new root – similar to an AVL rebalancing operation. How new rules are chosen if of course again undefined.2. 4.34 Chapter 4 – Details & Results these have to be dealt with. 32 – Possible tree operations to modify a language model VBG NP NP . works by introducing a new intermediate parent node for a group of siblings. In the syntax tree model these are tree operations. making a certain existing construct more probable or obsolete. While the shown example operations do change the syntax tree they do not change the produced sentence. too. Similarly the inverse operation. either by deleting them. A sub-tree can be promoted by removing its root node and reconnecting all its child nodes directly to its parent. or re-attaching them to the tree in a different place. They do have an effect in combination with other modifications. Subtree raising leaves all node connections within a sub-tree intact. It is also possible to adjust the probabilities of existing rules.2. SBARQ Promotion WHNP VBZ Demotion Sub-tree Raising SQ NP VP VBG NP Sub-tree Raising SBARQ SBARQ WHNP SQ VP VBZ WHNP VP VP NP VBG SQ NP VBZ Fig. demotion. It is also unclear how these new rules and the places they are applied to are chosen. Change Across Rules More complicated operations on grammars can involve changing more than one rule at the same time. In a pure PCFG model adding rules can be justified somewhat more easily because these rules are applied automatically while generating sentences. Tree operations are currently not used in the simulation.

that key officials may fail) and occurs 9. A swap was all that was needed to align the sentences “I have read this book”/”Ich habe dieses Buch gelesen” in the first chapter. Since swaps do not directly affect sub-trees or other derivation rules it is also a lot easier to later optimize a model for a low KL divergence. its inverse counterpart fails to appear once.958 times.. Since all possible permutations of a derivation rule can be reached by swapping pairs.Chapter 4 – Details & Results 35 4.488 times.3. The constraint on which rules are chosen will be relaxed after some time. Directed Evolution The perl program presented in the appendix uses the following method to simulate directed evolution: The syntax trees of the English treebank are loaded. After 30 simulated generations rules that occur at least 100 times will also be considered for manipulation. Change Within Rules Another possibility to modify an existing grammar is to change existing rules. or at least does not grow beyond a certain threshold above the best known divergence.2. VP -> VP TO does not appear in the treebank. The latter is particularly interesting in the context of this project. This includes rules with a very high frequency. 4. but also means changing the order of existing symbols.. If the divergence improves. Swapping pairs of symbols within production rules of a grammar corresponds to reordering subtrees of a node in a syntax tree. for example VP -> TO VP occurs 12. SBAR -> IN S starts a relative clause (. Then the n-gram models are recalculated and the change in the KL divergence is observed. An analysis of the PCFG extracted from the Penn Treebank showed that 60% of the derivation rules do not have any of their possible permutations appearing in the treebank. Then the program picks any binary rule that appears at least 300 times in the entire corpus and then swaps all occurrences of this rule in all the syntax trees. Changing rules does not only include introducing or removing symbols from the right hand side of a derivation rule. The threshold for keeping grammar modifications is reduced by a small amount in each generation to slowly enforce reaching a possibly local KL divergence minimum. These numbers suggest that there is indeed a lot of information stored in some simple derivation rules. . Otherwise the rule change is reverted. then the rule change is kept and recorded. Computing swaps or complete rule inversions is very simple. This operation alone gives the evolutionary model the expressiveness of the previously mentioned Inversion Transduction Grammars.3.

and were kept. KL Divergence 16 15. simulating language change is a very time-demanding and memory-intensive process. Currently between 20 and 30 seconds are required for each generation and around 1 GB of RAM is needed for the language models. The KL divergence between German and English was reduced to 11. It spanned 369 generations and took slightly over three hours.5 14 13. Afterwards the KL divergence appears to be just moving around randomly. Most progress in reducing the KL divergence was made in the first 30 generations.36 Chapter 4 – Details & Results Even with heavy use of efficient data structures. or at least not too detrimental. intermediate result caching and partially lazy evaluation of the n-gram model.00 from the original 14. In the end 25 swaps were deemed useful.5 15 KL Divergence 14. The longest running simulation so far was on English and German/TIGER. .5 11 0 50 100 150 200 250 300 350 400 Generation Fig.5 12 11.38. with the minimum reached in the 27th generation. 33 – Change of the KL divergence during the simulation By monitoring the change of the KL divergence during the simulation is becomes obvious that the restrictions on which rules are eligible for swaps play an important role.5 13 12. Most of the processing time is spent for creating the modified Katz-smoothed n-gram models.

Eventually the set of swaps can be reduced by randomly removing swaps while monitoring the effect on the KL divergence. 35 – A smaller subset of 11 rules gives even better results VP Ž [VB VP] VP Ž [VBD VP] .4. 34 – Swapping these 25 rule pairs moves English closer to German The notation with brackets is meant to convey that each rule stands for a set of affected rules. This set of rules can be reduced by removing these duplicate rules as well as symmetric swaps like NP Ž NNP NNP. With the remaining 11 rule changes KL divergence is improved to 10.58: ADVP Ž [PP RB] NP Ž [NP SBAR] VP Ž [MD VP] VP Ž [S VBD] VP Ž [S VBN] VP Ž [S VBP] VP Ž [SBAR VBD] VP Ž [PP VBN] VP Ž [PP VBP] Fig. which shows the need for simulation methods like simulated annealing that can avoid getting stuck in local KL minima.37 Chapter 4 – Details & Results 4. Interestingly this swap improved the KL divergence. This swap is reversed a bit later and improved the result again. So every occurence of NP Ž CD NNP will be changed to NP Ž NNP CD and vice versa. Interpreting Results The simulation run to modify English grammar to get closer to the model of German obtained from the TIGER treebank returned the following 25 rule sets in this order: NP Ž [CD NNP] VP Ž [PP VBG] VP Ž [SBAR VBD] NP Ž [CD NNP] VP Ž [S VBD] VP Ž [VBD VP] VP Ž [S VBN] VP Ž [VB VP] VP Ž [NP VBN] NP Ž [NNP NNP] NP Ž [NN PRP$] VP Ž [PP VBP] VP Ž [NP VBN] VP Ž [PP VBG] NP Ž [ADJP NNS] NP Ž [NNP NNP] NP Ž [NP NP] ADVP Ž [PP RB] NP Ž [NNP NNPS] VP Ž [S VBP] ADJP Ž [CD NN] VP Ž [MD VP] VP Ž [PP VBN] NP Ž [NP SBAR] VP Ž [ADVP VB] Fig.

Then there are limits that are caused by the PCFG model itself. which was actually found earlier. the introduction of a second determiner and the replacement of the verb ‘will  ’ (MD  ) with its German equivalent ‘werden  ’ (VB  ) can not be achieved by swapping. One should therefore not expect to find a finite set of swapping rules that would lead to a KL divergence of zero. In the shown example moving the modal verb to the end of the sentence is one of the required changes to achieve the same POS tag sequence. With swaps alone it is not possible to reach the POS tag distribution of the foreign language. Generally moving verbs to the end is a good start to get closer to the German language. 15 (NNP CD) will (MD) werden (VB) Fig. 36 – Effect of these 11 rules on an English sentence compared to a German translation of the same sentence Most of these rules are concerned with verbs. It is however not possible to express such a constraint in a PCFG (without resorting to tricks that lead to a sharp increase in the number of non-terminals and an equally sharp decrease in the usefulness of the grammar). . 15 am 15. German for example is a V2 language. The other required changes are swapping NP Ž [CD NNP]. The remaining differences.38 Chapter 4 – Details & Results The new The new Die neue (DT) (DT) (DT) (JJ) (JJ) (JJ) rate will rate be Rate wird (NN) (NN) (NN) (MD) (VB) (VBP) be payable payable Feb. it will lead to a lower bound on the KL divergence. By applying these swaps to a sample text their effect can be observed and interpreted (Fig 38). It is obvious that there are limits how close one can get to another language. and ADJP Ž [JJ NP]. f¨allig (VB) (JJ) (DT CD NNP) (JJ) (NNP CD) (JJ) Feb. which means that in a declarative sentence the second constituent is always a verb. Feb. So if one language uses more verbs than the other. In this case the rule changes have the effect that a verb will move to the end of the sentence or after another constituent.

39 Chapter 5 Conclusion & Outlook .

There is also a distinct need for a common tagset or even a sufficiently general and simple common feature annotation that is consistent and consistently used across languages. Miège In this thesis a novel statistical method to compare language grammars was devised. English. Results show that it is possible to find meaningful grammatical differences that could be used in other fields of natural language processing. While working out the intricacies of the evolutionary model. Old Armenian. The model was implemented and put to the test with the help of annotated Chinese. This method would work without any particular knowledge about the languages’ heritage. and strongly reflect specific language patterns and may to a certain point even convey the author’s mentality. and German treebank data. The project PROIEL at the University of Oslo aims to create a parallel corpus in a variety of ancient Indo-European languages (Greek.uio. Existing tagsets are usually created with one specific language in mind. In the case of mapping different POS tagsets a complete workable solution was demonstrated. There are still a lot of options to optimize for speed (e.Chapter 5 – Conclusion & Outlook 40 “There is nothing so trivial as a grammar.g.html . Latin. The conclusion is that statistical approaches to model grammar change are workable.no/ifikk/proiel/index. The specific implementation of a directed language evolver presented in this Thesis has yet to be brought to its full potential. and scarce any thing so rare as a good grammar. There is however a distinct lack of historical data to create historically accurate evolutionary models. several problems with the suitability of treebank data for cross-language research in general and comparative grammar research in particular were identified. in other cases viable workarounds were presented.hf. This method was then developed into a complete formal model for directed evolution of natural languages. This may become a very useful resource for cross-language research and it is certainly sensible to keep an eye on it. These properties in turn deter others from reusing an existing tagset or annotation. The advantages and drawbacks of using probabilistic context free grammars to reproduce natural language change were investigated. instead change it according to the changes in the POS tag  http://www. Old Church Slavonic and Gothic).” — G. do not recalculate the Katz n-gram model from scratch.

for all derivation rules containing a verb POS tag at the front and a noun phrase. Replacing the evolution controller with a more sophisticated implementation of a simulated annealing algorithm will also certainly improve the overall results. demotion and sub-tree raising operations. and subsequent interpretation.g.Chapter 5 – Conclusion & Outlook 41 sequences) to get a lower simulation time per generation. e. The grammar evolver should be extended to allow different changes to the syntax trees. the verb could be swapped with the right-most instance of a noun phrase. To speed up simulation. Further research should go into determining exactly how close this model can evolve a language towards another. It may also be worthwhile to investigate and implement promotion. It will then become viable to do simulation runs on a larger scale that may get a lot closer to the foreign language. The presented grammar evolver is a completely new and innovative model. rules could be changed according to some pattern. . and give a larger and more detailed set of proposed changes.

42 References .

UE History of English. 56(1):61–98. 2002. ed. ECE 8463: Fundamentals of Speech Recognition.. Learning and morphological change. 1987. 465-488. Syntactic Structures. in Computational Linguistics.staff. 2003)... 35(3). 3 (Jul. pdf Martín-Vide. Hockenmaier. A. Sproat. A. IEEE Transactions on Acoustics. and Parisi. R... N. Philipps-Universität Marburg.). E. 2006.pdf Lohöfer. Estimation of probabilities from sparse data for the language model component of a speech recogniser. in Simulating language evolution (A. Haspelmath. In Proceedings of the Third LREC Conference.. Las Palmas... 4 (Dec. M. Santorini. M. pp.uni-marburg. Speech. 2007. M. Grammar and Phylogenies. Comrie. http://www. Hare. 2007.Chapter 6 – References 43 “There are worse crimes than burning books.msstate. Mitkov. 2002. Marcinkiewicz. September 15... in The Oxford Handbook of Comparative Syntax. On a Common Fallacy in Computational Linguistics. D. Philipps-Universität Marburg. J. pp. Head-Driven Statistical Models for Natural Language Parsing. M. 1957. Picone. Eds.edu/research/isip/publications/ courses/ece_8463/lectures/current/lecture_33/index.ece. 30/04/07. Mississippi State University. 1993. World Atlas of Language Structures. J. S.staff.. B. D. http://www... J. One of them is not reading them. ACM 15. A Man of Measure: Festschrift in Honour of Fred Karlsson on his 60th Birthday. Session 2: Historical Linguistics and Linguistic Reconstruction. 3–28. The Hague: Mouton. L.. M. and Signal Processing. Oxford. J. A.. R.. 2005. 2005. Collins. P.. 29.de/~lohoefea/Sessions/S02_Slides. P. http://www. 2003. B. R. M. in R. Dryer.. Parisi.. Syntax-Directed Transduction. 400–401 Kayne R. M. 3–69 Lewis. London: Springer-Verlag. UE History of English. 1995. Linguist. M.. Building a Large Annotated Corpus of English: The Penn Treebank. Katz. Mitchell. Formal Grammars and Languages. Spain.. 589-637. Lohöfer A. M.html Ryder. Gil. Elman. Oxford University Press. Steedman.. Comput.. Computer simulation: A new scientific approach to study of language evolution. Oxford Handbook of Computational Linguistics: 157–177. 2003. 2006. Session 3: From PIE to Proto-Germanic. Cangelosi and D. 313–330 Mohri. Volume 19. C. Number 2 (June 1993). S. Acquiring Compact Lexicalized Grammars from a Cleaner Treebank.. Chomsky. 23/04/07. Stearns. 1968). 1968.” — Joseph Brodsky Cangelosi. J. Cognition.. 2006 . Some notes on Comparative Syntax.uni-marburg. Lecture 33: Smoothing N-Gram Language Models. pp.de/~lohoefea/Sessions/S03_Slides.

23:377–404. Wu.). Xia. pp. 1993. Parisi.).. 47–120. AAAI Press. ..0) October 17. F.-D... The equivalence of four extensions of context-free grammar. H. Taiwan. Building a Large-Scale Annotated Chinese Corpus. 29–50. D. . in Simulating language evolution (A. Chiou. International Conference on Computational Linguistics (COLING 2002). Xue. 2000.. Turner. In Artificial Intelligence and Molecular Biology (Hunter. K. 2002. L. Computational Linguistics. Taipei. An introduction to methods for simulating the evolution of language. N. pp. The Part-Of-Speech Tagging Guidelines for the Penn Chinese Treebank (3. 27:511–546... Vijay-Shanker. 1994.. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora.. F. Weir. 2000. D. 1997. Eds. Mathematical Systems Theory. The computational linguistics of biological sequences. D. London: Springer-Verlag. 2002. M. Cangelosi and D. ed.Chapter 6 – References 44 Searls. Proceedings of the 19th. Palmer.

45 Appendix .

Not a Constituent (within an NP) Noun Phrase. . superlative List item marker Modal Noun. CC CD DT EX FW IN JJ JJR JJS LS MD NN NNS NNP NNPS PDT POS PRP PRP$ SINV SQ Inverted declarative sentence. RRC Reduced Relative Clause. Prepositional Phrase. X Unknown. WHAVP Wh-adverb Phrase. WHNP Wh-noun Phrase. plural Proper noun. Includes surrounding punctuation. WHPP Wh-prepositional Phrase.1. Used within NPs to mark the head of the NP. singular or mass Noun. UCP Unlike Coordinated Phrase. 3rd person singular present Wh-determiner Wh-pronoun Possessive wh-pronoun Wh-adverb Additional parts of speech tags are used for currency signs. WHADJP Wh-adjective Phrase.2. Phrase Level Annotation S SBAR SBARQ Simple declarative clause Clause introduced by (possibly empty) subordinating conjunction Direct question 7. superlative Particle Symbol to Interjection Verb. Clause Level Annotation ADJP ADVP CONJP FRAG INTJ LST Adjective Phrase Adverb Phrase Conjunction Phrase Fragment Interjection List marker. or unbracketable. uncertain. gerund or present participle Verb. subject follows tensed verb or modal Inverted yes/no question. singular Proper noun.3. past participle Verb. or main clause of a wh-question PRN PRT Parenthetical. NAC NP NX PP 7.1. Category for words that should be tagged RP. Word Level Annotation Coordinating conjunction Cardinal number Determiner Existential there Foreign word Preposition or subordinating conjunction Adjective Adjective.1. QP Quantifier Phrase within NP. brackets and punctuation including quotation marks. plural Predeterminer Possessive ending Personal pronoun Possessive pronoun RB RBR RBS RP SYM TO UH VB VBD VBG VBN VBP VBZ WDT WP WP$ WRB Adverb Adverb. The Penn Treebank Tagset 7. VP Verb Phrase. non-3rd person singular present Verb.46 Chapter 7 – Appendix 7.1. Particle.1. base form Verb. comparative Adjective. comparative Adverb. past tense Verb.

CTB tag Mapping to PTB tags AD RB AS X BA X CC CC CD CD CS IN DEC VBG DEG POS DER V DEV Does not occur on its own. VE. not exist on a word level. VP and VV DT DT ETC CC FW FW IJ UH JJ JJ LB X LC IN M NN MSP TO NN NN NP NR NN. NNP NT NN OD CD P IN PN PRP PU SB If occurring at the end of the sentence it will mapped to (.2.) In any other place it will be mapped to (:) X SP X 47 . and is probably closest to a noun. However it does. see VC.Chapter 7 – Appendix 7. according to the documentation. Mapping Tagsets: Chinese/CTB to English/PTB Only word level annotation is covered. This tag should.

Chapter 7 – Appendix CTB tag Mapping to PTB tags VA If VA is followed by DEV then both are replaced with RB If VA is not followed by DEV then it is mapped to JJ If VC is followed by DEV then both are replaced with RB If VC is not followed by DEV then it is mapped to V If VE is followed by DEV then both are replaced with RB If VE is not followed by DEV then it is mapped to V This tag should. If VP is followed by DEV then both are replaced with RB If VP is not followed by DEV then it is mapped to V If VV is followed by DEV then both are replaced with RB If VV is not followed by DEV then it is mapped to V X VC VE VP VV X 48 PTB tags of the English text have to be mapped as follows (second translation function): EX Ž X and LS Ž X and RP Ž X and SYM Ž X particles without counterpart JJR Ž JJ and JJS Ž JJ comparative/superlative adjectives RBR Ž RB and RBS Ž RB and WRB Ž RB NNS Ž NN and NNPS Ž NNP plural forms not marked in CTB PDT Ž CD WDT Ž DT MD Ž V and VB Ž V and VBD Ž V and VBN Ž V and VBP Ž V and VBZ Ž V WP Ž PRP and WP$ Ž PRP and PRP$ Ž PRP $ Ž NN and # Ž NN currency symbols all other tags remain unchanged. not exist on a word level. according to the documentation. However it does. and is probably closest to a verb. .

ach. doch. STTS tag Usage Mapping to PTB tags ADJA attributive adjective [das] große [Haus] ADJD adverbial or predicative adjective [er fährt] schnell. The TIGER corpus does however not provide contextual information for varying degrees like oft. damit. RBR and RBS. tja subordinating conjunction with zu-infinitive um [zu leben]. anstatt [zu fragen] subordinating conjunction with sentence weil. and the distinction between RB. This distinction is not made in the PTB. RBR and RBS is lost. JJR and JJS. von [jetzt an] preposition with implicit article im [Haus]. [im Jahre] 1994 foreign material [Er hat das mit “] A big fish [“ übersetzt] interjection mhm. Therefore all adverbs will be mapped to RB. JJR. JJS like ADJA. wenn. more often). das. aber comparative conjunction als. ob coordinating conjunction und. circumposition.49 Chapter 7 – Appendix 7. Mapping Tagsets: German/STTS to English/PTB Only word level annotation is covered. ein. wie IN has a similar function to IN has a similar function to IN DT CD FW UH IN IN CC IN (like) . left part in [der Stadt]. daß. Adverbial comparatives and superlatives are quite rare in German and are usually replaced by adjectives. eine cardinal number zwei [Männer]. die. ‘Am öftesten’ (most often) for example is nonsensical and the adjectival superlative ‘am häufigsten’ would be used instead. öfter (often. [der Sache] wegen circumposition. bald. oft ADJA corresponds to the PTB tags JJ.3. [er ist] schnell adverb schon. right part [von jetzt] an definite or indefinite article der. Mapped to JJ. Contextual information in the TIGER treebank allows a fine-grained mapping. oder. IN ADV APPR APPRART APPO APZR ART CARD FM ITJ KOUI KOUS KON KOKOM preposition. zur [Sache] postposition [ihm] zufolge. ohne [mich]. Adverbs should be mapped to RB.

dir substitutive possessive pronoun meins. TIGER does not provide contextual information. trotzdem DT NN DT (any) NN (a bit/NN of water). A mapping to DT would also be possible (both/DT brothers). but NN seems to be closer to the function of most of these words. deine [Mutter] substitutive relative pronoun [der Hund. [das] Reisen NE proper noun Hans. [die] beiden [Brüder] irreflexive personal pronoun ich. therefore) . where) RB (hence. Depending on contextual information this tag is mapped either to NNP or to NNPS. DT NNE PDS PDAT PIS PIAT PIDAT PPER PPOSS PPOSAT PRELS PRELAT PRF PWS PWAT PWAV PAV / PROAV (TIGER) substitutive demonstrative pronoun dieser. PRP PRP PRP$ This tag is mapped to WP. worüber. deswegen. wobei pronominal adverb dafür. mich. er. irgendein [Glas] attributive indefinite pronoun with determiner [ein] wenig [Wasser]. HSV combination of common and proper noun Theodor-Heuss-Stiftung. This STTS tag also encompasses gerund constructions (VBG) which generally act as nouns (das Reisen = travelling). niemand attributive indefinite pronoun kein [Mensch]. jener attributive demonstrative pronoun jener [Mensch] substitutive indefinite pronoun keiner. was attributive interrogative pronoun welche [Farbe]. dabei.50 Chapter 7 – Appendix STTS tag Usage Mapping to PTB tags NN common noun Tisch. mir substitutive interrogative pronoun wer. The Penn Treebank treats these combinations as proper nouns. dich. wo. deiner attributive possessive pronoun mein [Buch]. but could also be mapped to WDT WP$ (whose) PRP (myself) WP WP$ WRB (why. einander. but these constructs are usually singular and are thus mapped to NNP. man.] der attributive relative pronoun [der Mann.] dessen [Hund] reflexive pronoun sich. Herr. Niger-Delta Depending on contextual information this tag is mapped either to NN or to NNS. viele. Hamburg. wann. wessen [Hut] interrogative adverb or adverbial relative pronoun warum. when. ihm.

51 Chapter 7 – Appendix STTS tag Usage Mapping to PTB tags PTKZU “zu” before infinitive zu [gehen] PTKNEG negating particle nicht separated verb particle [er kommt] an. sein past participle. (too/RB) Other valid mappings would be more/JJR and most/JJS. However to is a much more common word than zu. The suspended hyphen is considered a separate sentence element: third -/: and fourth-quarter growth TRUNC will be mapped to (:) Will be mapped to VBZ. auxiliary verb werden. RB PTKVZ PTKANT answer particle ja. bitte PTKA particle with adjective or adverb am [schönsten]. Similar constructions are annotated differently in the Penn Treebank. Thanks/NNS. [er fährt] rad This tag is a perfect match to the Penn Treebank tag TO. auxiliary verb gewesen finite modal verb dürfen infinitive. This tag will be mapped to UH Is mapped to RB. VBP or VBD according to number and tense information VB VB VBN MD MD MD . zu [schnell] TRUNC truncated element An.[und Abreise] VVFIN finite full verb [du] gehst. loszulassen past participle. Please/UH. angekommen finite auxiliary verb [du] bist. to break down. [wir] kommen [an] imperative. These answer particles are not tagged consistently in the Penn Treebank: Yes/ UH. Please/VB. full verb gehen. auxiliary verb sei [ruhig !] infinitive. nein. full verb anzukommen. [er hat gehen] können VVIMP VVINF VVIZU VVPP VAFIN VAIMP VAINF VAPP VMFIN VMINF VMPP These constructions do appear in the English language: being better off. modal verb gekonnt. ankommen infinitive with zu. [wir] werden imperative. full verb komm [!] infinitive. modal verb wollen past participle. Particles are tagged RP. VBP or VBD according to number and tense information VB VB VB VBN Will be mapped to VBZ. Thank/VB you. danke. full verb gegangen. No/DT.

with special characters 3:7. .52 Chapter 7 – Appendix STTS tag Usage Mapping to PTB tags XY nonword.) $. $( (. H2O.?!.: other punctuation marks . D2XW3 comma SYM final punctuation .) (:) PTB tags of the English text have to be mapped as follows (second translation function): VBG Ž NN EX (Ž PPER) Ž PRP LS Ž SYM PDT (Ž PIAT) Ž DT POS (Ž PPOSAT) Ž PRP$ WDT (Ž PRELS) Ž WP RBR Ž RB and RBS Ž RB comparative and superlative forms of adverbs $ Ž NN and # Ž NN currency symbols all other tags remain unchanged.[.]() (. $.

\n"." sample sentences to German/TIGER format. (0+@TIGERstreams) . use ThsFunctions. } else { for (my $n = 0.svunt(916464). # # # # To which other language should English evolve? 0 = German/TIGER 1 = German/TueBadS 2 = Chinese/PTB use constant SAMPLELANGUAGE => 1. @samplesentences." sample sentences to German/TueBadS format. create Katz-model my $GermanTIGERModel. if (SAMPLELANGUAGE == 1) { for (my $n = 0. $n++) { $samplesentences[$n] = prepare_sentence_for_TueBadS $samplesentences[$n]. (0+@samplesentences) . %derivations. $n < @samplesentences. extract POS-streams." German/TueBadS sentences for sampling. } } $GermanTueBadSModel = create_katz_model(\@TueBadSstreams.4. (0+@samplesentences) .\n". if (OTHERLANGUAGE == 1) { my @TueBadSstreams = @{ get_TueBadS_pos_streams() }.pl #!/usr/bin/perl # # # "Perl terrifies me. print "Read ". use ThsCorpora. Code: evolution. $n <= 1920. my $MEnglishModel. # D) dito CN my $ChineseTBModel.\n".53 Chapter 7 – Appendix 7." German/TIGER sentences for sampling. it looks like an explosion in an ascii factory. } . $n < @samplesentences. } } $GermanTIGERModel = create_katz_model(\@TIGERstreams. # From where should the 1921 sample sentences be taken? # 0 = English/PTB (section 00) # 1 = other selected language my (@treebank. } else { for (my $n = 0. } print "Set aside ". if (OTHERLANGUAGE == 2) { my @Chinesestreams = @{ get_CTB_pos_streams() }. ". $n++) { $samplesentences[$n] = shift @TueBadSstreams. (0+@TueBadSstreams). } print "Set aside ". } print "Converted ". } else { for (my $n = 0. $n <= 1920." # -. ". if (SAMPLELANGUAGE == 1) { for (my $n = 0.\n". (0+@samplesentences) .mrg-sect0'). 10). (0+ @samplesentences). # D) dito DE-2 my $GermanTueBadSModel. $n++) { $samplesentences[$n] = prepare_sentence_for_TIGER $samplesentences[$n]. (0+@samplesentences) . " English sampling sentences. $n++) { $samplesentences[$n] = prepare_sentence_for_CTB $samplesentences[$n]. $n++) { $samplesentences[$n] = shift @TIGERstreams.\n". # Model of the modified English language. use constant OTHERLANGUAGE => 0. " remain. @derivationslist). " remain. } # B) read GER-Treebank. $n < @samplesentences. POS-Streams if (SAMPLELANGUAGE == 0) { my $wsj = read_sentences_from_file('wsj. } print "Set aside ". if (OTHERLANGUAGE == 0) { my @TIGERstreams = @{ get_TIGER_pos_streams() }. 10). Slashdot use strict. $n <= 1920. ". @samplesentences = @{(parse_PTB_file($wsj))[0]}. $n++) { $samplesentences[$n] = shift @Chinesestreams. (0+@Chinesestreams) . " remain.\n". } print "Converted ". if (SAMPLELANGUAGE == 1) { for (my $n = 0. (0+@samplesentences) . obtained from leaves of the trees # A) Load sample sentences (eg sect 00 PTB)." Chinese sentences for sampling.

push @{$sentencedescr{$sortedderivation}}. # Process root nodes. index($descr. (0+@samplesentences) .mrg'). while ($deriv =~ s|(^``\|``$\|\s``\s)| |g) {}. } foreach (@wsjsentences) { my (%sentencedescr. # Only list binary derivations $descr = $nodenumber. $deriv =~ s|(^\s+\|\s+$)||go. '-') > 0). $2). } ($deriv eq '') ? '-NONE-' : $descr/geo. $descr = substr($descr. $i++) { my @s = @{$sentencetree[$i]}.Chapter 7 – Appendix } } 54 print "Converted ". " "/geo. $i++) { if ($wordtree[-$node]->[$i] eq '. my $rule = ''. $nodenumber). (join ' '. '=')) if (index($descr. do { # Process innermost layer. } push @stack. # last element is [0] = parent node } else { for (my $i = 0. my (@wordtree. while ($deriv =~ s|(^''\|''$\|\s''\s)| |g) {}. } until /^\s?$/o. ']'. if ($deriv ne '') { $nodenumber++. brackets that contain only one descriptor s/\s*\(([^()\s]+)\)\s*/$sentencetree[0] = $1. $wordtree[0] = $sentencetree[0]. $rule .\|\. @sentencetree. sort split ' '. $j++) { $s[$j] = '.' if ($s[$j] <= 0)." sample sentences to Chinese format.\n". { my @parseptb = parse_PTB_file($wsj). ' -> [' . pop @stack. $ChineseTBModel = create_katz_model(\@Chinesestreams. $i = 1 + @{$wordtree[-$node]}.$\|\s\. while ($deriv =~ s|(^\. 0. extract syntax trees { my $wsj = read_sentences_from_file('wsj. $j < @s. my @sentencewords = split ' '.') { $wordtree[-$node]->[$i] = '-'. @stack). $rule) . $derivations{$sortedderivation}++ if (($sortedderivation =~ tr: : :) == 3). $rule . for (my $i = 1. @wsjwords = @{$parseptb[4]}. } my $sortedderivation = $descr . do { my $node = pop @stack. $descr.\s)| |g) {}. $descr = substr($descr. '=') > 0). if (@wsjwords > 0) { $sentencedescr{'orig'} = shift @wsjwords. } } } } while (@stack > 0). load WSJ # 2. ($_ > 0 ? $_ : -$node) foreach (reverse @{$sentencetree[$node]}).= ' '. $sentencedescr{'orig'}. $wordtree[0]. 0. push @{$sentencetree[$nodenumber]}.$\|\s\. # E) # 1. $deriv) { push @{$sentencetree[$nodenumber]}. $_. ($sentencetree[$_])->[0] if ($_ > 0). $i < @{$wordtree[-$node]}. $i < @sentencetree. } $sentencedescr{'word'} = \@wordtree. replace brackets by their descriptors s/\(([^()\s]+)\s([^()]+)\)/ my ($descr. while ($deriv =~ s|\s*-(NONE\|LRB\|RRB)-\s*| |g) {}.\s)| |g) {}. (shift @sentencewords). @wsjwords). while ($deriv =~ s|(^\. 5). for (my $j = 1.\|\. @wsjsentences = @{$parseptb[1]}. index($descr. $sentencedescr{'tree'} = \@sentencetree. . } $wordtree[$i] = \@s. '-')) if (index($descr. $deriv) = ($1. $_ unless ($_ > 0). my (@wsjsentences. foreach (split ' '.= ' '. if ($node > 0) { push @stack. $nodenumber.

undef($ChineseTBModel). 55 .= ' ' . $n++) { if ($treebank[$n]->{'pos'} eq '') { my $POSstream = extract_leaves($treebank[$n]->{'tree'}). $samplesentences[$n]. print "Picking one random derivation out of ". } } $samplesentenceprobabilities[$n] = $lpL2. return $leafstream. $leafstream). for (my $n = 0. } @derivationslist = keys %derivations. $lpL2 = sentence_katz_probability($GermanTIGERModel.$POSstream. $n++) { my $lpL2. Select a binary derivation sub pick_derivation { my $minpick = shift. \%sentencedescr. $POSstream = prepare_sentence_for_TIGER($POSstream) if (OTHERLANGUAGE == 0). my @POSstreams.. Generate Katz-smoothed model sub extract_leaves { my (@stack. Generate POS-streams # 4. # and free some RAM undef($GermanTIGERModel). $POSstream = 'START'. 5) if (OTHERLANGUAGE == 1). print " POS-model updated. # Optimization: Precompute foreign language sentence probabilities my @samplesentenceprobabilities. $POSstream = prepare_sentence_for_CTB($POSstream) if (OTHERLANGUAGE == 2). my $pick..Chapter 7 – Appendix } push @treebank. for (my $n = 0. $samplesentences[$n]. print $stack[0]. $treebank[$n]->{'words'} = $words. if (defined $treebank[$n]->{'word'}) { my $words = extract_leaves($treebank[$n]->{'word'}). ": ". 5) if (OTHERLANGUAGE == 0). $_. } } } } push @POSstreams. $lpL2 = sentence_katz_probability($ChineseTBModel. do { $_ = pop @stack. $n < @treebank. $MEnglishModel = create_katz_model(\@POSstreams. undef($GermanTueBadSModel). pop @stack. # last element is [0] = parent node } else { $leafstream . while ($words =~ s/ -/ /go) {}.' STOP'. # print "sentence $n stream : $POSstream\n". my $tree = @_[0]. # 3. print "done\n". #update_model. $treebank[$n]->{'pos'} = $POSstream.\n". 5) if (OTHERLANGUAGE == 2). 9). push @stack. # # F) # 1. (0+@derivationslist).". } } while (@stack > 0). $treebank[$n]->{'pos'}. { print "Precomputing sentence probabilities. } sub update_model { print "Updating POS-model\n". #print_treebank \@treebank. $tree->[0]. if ($_ > 0) { push @stack. $POSstream = prepare_sentence_for_TueBadS($POSstream) if (OTHERLANGUAGE == 1). $n < @samplesentences. $_ foreach (reverse @{$tree->[$_]}). $lpL2 = sentence_katz_probability($GermanTueBadSModel. $samplesentences[$n].

$derivations{$pick} . } while ($derivations{$pick} < $minpick). Test on sample sentences sub kldivergence { my ($sumpos. keep $mutation. If closer to foreign language then keep. otherwise revert (or if within threshold do random walk) # 7. ". } elsif ($kld2 > $kll) { $kll = $kld2. $n <= $#treebank. return $kld / @samplesentences. $lpL2 = $lpL1 .002. } else { $mutation = pick_derivation 1."\n". for (my $n = 0. my $kld2 = kldivergence. KLD=$kld. EF=$evolutionfreedom\n". while ($generation < 2000) { ++$generation. my $generation = 0. print "$pick (occurs ". print "G$generation . $n < @samplesentences. } $evolutionfreedom -= $evolutionpressure. 5). } } } } $treebank[$n]->{'pos'} = ''.. $n++) { if (defined $treebank[$n]->{$pick}) { foreach (@{$treebank[$n]->{$pick}}) { ($treebank[$n]->{'tree'}->[$_]->[1]. } mutate_treebank $mutation. mutate_treebank $mutation. # 2. } else { print "G$generation . print "G$generation .$sumneg. ($treebank[$n]->{'word'}->[$_]->[1]. } } my $kld = $sumpos . KLD=$kld\n".This is within reason.This is much better! keep $mutation. $treebank[$n]->{'word'}->[$_]->[1]) if (defined $treebank[$n]->{'word'}). } elsif ($generation < 600) { $mutation = pick_derivation 10. #print_treebank \@treebank. print "G$generation ." times)\n". EF=$evolutionfreedom\n". return $pick.Doesn't help. $sumneg). print "KL DIVERGENCE = $sumpos . $treebank[$n]->{'tree'}->[$_]->[2]) = ($treebank[$n]->{'tree'}->[$_]->[2]. # 4. $sumpos += $lpL2 if ($lpL2 > 0). $evolutionfreedom = 0 if ($evolutionfreedom < 0). if ($kld2 > $kld) { $kld = $kll = $kld2. $kld / @samplesentences . $sumneg -= $lpL2 if ($lpL2 < 0). } # 6. $treebank[$n]->{'word'}->[$_]->[2]) = ($treebank[$n]->{'word'}->[$_]->[2]. # 3. Update Katz-smoothed model #update_model. $treebank[$n]->{'tree'}->[$_]->[1]). # = kldivergence. update_model. KLD=$kld. my $kld. } elsif ($kld2 > $kld . if ($generation < 30) { $mutation = pick_derivation 300.Chapter 7 – Appendix } do { $pick = $derivationslist[rand @derivationslist]. Reverting.$lpL2. my $KL = $lpL2 * (exp $lpL1). my $mutation. my $kll = $kld. my $evolutionfreedom = 2. Contd.\n". at F1 56 . } elsif ($generation < 100) { $mutation = pick_derivation 100.$evolutionfreedom) { $kll = $kld2. my $lpL2 = $samplesentenceprobabilities[$n].. Swap sub mutate_treebank { my $pick = shift.$sumneg = $kld = avg. $samplesentences[$n]. $n++) { my $lpL1 = sentence_katz_probability($MEnglishModel. for (my $n = 0. my $evolutionpressure = 0.This is better! keep $mutation.

. print 'Sentence '.= "$2 ". ')'. # Cleaning up: # . $buffer if ($bufright > 0). # print $bufleft .$bufleft.5. print 'Input : '. @EXPORT = qw(read_sentences_from_file parse_PTB_file get_CTB_pos_streams prepare_sentence_for_CTB get_TIGER_pos_streams prepare_sentence_for_TIGER get_TueBadS_pos_streams prepare_sentence_for_TueBadS ). print 'Cleaned : '. $/) = "<$filename". <> }. use constant DEBUG => 0. sub read_sentences_from_file { my ($filename) = @_. require Exporter. }.pm #!/usr/bin/perl use strict.'-count'}++."\n" if (DEBUG). foreach (@sentences) { ++$nx." bytes\n".= "$1 "." bytes\n". do { my $buffer. (0 + @sentences) .Collapse consecutive whitespace including line endings s/\s\s+/ /g.$2}++ if (($1 ne '-NONE-') && ($1 ne '-LRB-') && ($1 ne '-RRB-')). if (($1 ne '. length() . tr/\n/ /. print "Reading $filename\n". "-" . # Store the stream of seen POS tags to create an n-gram model later on 57 .') && ($1 ne '

Related Interests

') && ($1 ne '``') && ($1 ne '-NONE-') && ($1 ne '-LRB-') && ($1 ne '-RRB-')) { $POSStream . my $openbrackets = 1. push @sentences. s/\(\s+/\(/g. } while ($bufright < length($wsj)) && ($bufright > 0). my $bufright = 0. $1/geo. } print "Extracted: ".Remove redundant whitespace near brackets # . # Slurp input file $_ = do { local(@ARGV. "\n". ':' . s/\s+\)/\)/g. $_ . } while ($openbrackets > 0). $terminaldistributions{$1. $openbrackets. our (@ISA. ($terminaldistributions{$1}->{$2})++. @POSTagStreams. %terminal. ++$nx . ": ".Chapter 7 – Appendix 7. @sentencetrees. $buffer = substr $wsj. containing all terminal derivations # Replace whole brackets with their non-terminal descriptor # Record distribution of words for POS-tags s/\(([^()\s]+)\s([^()]+)\)/$terminal{$1.') && ($1 ne '. @ISA = qw(Exporter). my $bufleft = $bufright. $WordStream . $WordStream). { my $wsj = @_[0]. $openbrackets = ($buffer =~ tr/(/(/) . $buffer.' -> '. $bufright for 1 . Code: ThsCorpora. $bufright . my ($POSStream. @WordStreams). $bufright . length() .Find all substrings with a matching number of opening # and closing parentheses my @sentences." sentences\n". @EXPORT). # Process innermost layer of brackets. sub parse_PTB_file { # Extract sentences: # . my ($nx. $bufleft. %terminaldistributions. do { $bufright = 1 + index $wsj. } return $_. package ThsCorpora.($buffer =~ tr/)/)/).

\@sentencetrees. s/\(([^ ()]+) [^ ()]+\)/$posstream . $CTBmap{(pop)} = pop while (@_). push @posstreams. $_. ++$numtags.= ' '. $CTBmap{$1} if ($1 ne '-NONE-'). 58 .= ' ' . 'START'. $/) = "<$filename". $posstream =~ s/ PU $/ . my %CTBmap. <> }. $1. \%terminal. return \@posstreams.'STOP'.'STOP'. $_ = <<'CTBMAP'. s/\s\s+/ /g. $POSStream . AD AS BA CC CD CS DEC DEG DT ETC FW IJ JJ LB LC M MSP NN NN-OBJ NN-SBJ NN-SHORT NP NR NR-PN NR-SHORT NT NT-SHORT OD P PN SB SP VC VE VP VV RB X X CC CD IN VBG POS DT CC FW UH JJ X IN NN TO NN NN NN NN NN NNP NNP NNP NN NN CD IN PRP X X V V V V VA DER PU DEV X VA DER PU DEV X CTBMAP tr/\n\t/ /. } print "ChineseTB: read $numsent POS streams with a total of $numtags tags\n". push @WordStreams. while ($posstream =~ s/ DER / V /go) {}.Chapter 7 – Appendix push @POSTagStreams. \%terminaldistributions. $numsent. while ($posstream =~ s/ VA DEV / RB /go) {}. } } # Store the tree structure to create a PCFG later on push @sentencetrees. 'START '. /go. split ' '. return (\@POSTagStreams. $numtags). } my @posstreams. tr/\n/ /. # get_CTB_pos_streams # # Returns a reference to an array containing all streams of POS tags # from the Chinese (Penn) tree bank sub get_CTB_pos_streams { my $filename = 'chtb. my (@sentences. ++$numsent. foreach (@sentences) { my $posstream = ''. while ($posstream =~ s/ VA / JJ /go) {}. while ($posstream =~ s/ V DEV / RB /go) {}. s/<S [^>]*>(. $posstream . $posstream . while ($posstream =~ s/ PU / : /go) {}. $WordStream./geo.fid'. $_ = do { local(@ARGV.*?)<\/S>/push @sentences./geo. \@WordStreams). s/\s+/ /g.

++$numsent. while ($sentence =~ s/ \$ / NN /go) {}./geo. ADJA-Pos ADJA-Comp ADJA-Sup ADJA-*. while ($sentence =~ s/ JJR / JJ /go) {}.*?)<\/terminals>/push @sentences. while ($sentence =~ s/ JJS / JJ /go) {}. while ($sentence =~ s/ WP / PRP /go) {}.*. my $filename = "TIGER/corpus/tiger_release_aug07. <> }. $_ = do { local(@ARGV. $numtags) = (0. # pound } return $sentence.*. VBN / V /go) {}. SYM / X /go) {}.59 Chapter 7 – Appendix # prepare_sentence_for_CTB $sentence # # Replaces PTB tags that have no CTB tags mapped to them sub prepare_sentence_for_CTB { my ($sentence) = @_. s/<terminals>(. VBP / V /go) {}. while ($sentence =~ s/ PDT / CD /go) {}. while ($sentence =~ s/ RBS / RB /go) {}.* ADJD-Pos ADJD-Comp ADJD-Sup ADV APPR APPRART APPO APZR ART CARD FM ITJ KOUI KOUS KON KOKOM NN-* NN-Sg NN-Pl NE-* NE-Sg NE-Pl NNE PDS PDAT PIS PIAT JJ JJR JJS JJ JJ JJR JJS RB IN IN IN IN DT CD FW UH IN IN CC IN NN NN NNS NNP NNP NNPS NNP DT DT NN DT . VBZ / V /go) {}. VBD / V /go) {}. while ($sentence =~ s/ WRB / RB /go) {}. $/) = "<$filename". while ($sentence =~ s/ RBR / RB /go) {}. while while while while while while ($sentence ($sentence ($sentence ($sentence ($sentence ($sentence =~ =~ =~ =~ =~ =~ s/ s/ s/ s/ s/ s/ MD / V /go) {}. s/\s\s+/ /g..\n".. my @sentences. while ($sentence =~ s/ PRP\$ / PRP /go) {}. LS / X /go) {}. # dollar while ($sentence =~ s/ # / NN /go) {}. while ($sentence =~ s/ NNPS / NNP /go) {}. # get_TIGER_pos_streams # # Returns a reference to an array containing all streams of POS tags # from the TIGER tree bank sub get_TIGER_pos_streams { print "Loading TIGER.xml". VB / V /go) {}. 0). while ($sentence =~ s/ NNS / NN /go) {}. tr/\n/ /. while ($sentence =~ s/ WP\$ / PRP /go) {}. $1. while ($sentence =~ s/ WDT / DT /go) {}. $_ = <<'STTSMAP'. while while while while ($sentence ($sentence ($sentence ($sentence =~ =~ =~ =~ s/ s/ s/ s/ EX / X /go) {}. my ($numsent. RP / X /go) {}.

$posstream . } my @posstreams. . $posstream =~ s/ [.Subj VVFIN-Pres VBP VVFIN-Past VBD VVIMP VB VVINF VB VVIZU VB VVPP VBN VAFIN-3.Sg.]//go. # EX -> PPER -> PRP LS / SYM /go) {}. # WDT -> PRELS -> WP RBR / RB /go) {}.Pres.Ind VVFIN-3. split ' '. EX / PRP /go) {}. 'START'. and .Pres.'="([^">]+)"([^>]*)' foreach ('pos'.60 Chapter 7 – Appendix PIDAT NN PPER PRP PPOSS PRP PPOSAT PRP$ PRELS WP PRELAT WP$ PRF PRP PWS WP PWAT WP$ PWAV WRB PAV RB PROAV RB PTKZU TO PTKNEG RB PTKVZ RP PTKANT UH PTKA RB TRUNC : VVFIN-3.$9} || "unknown:$1:$3:$5:$7:$9 ".'-'.= $STTSmap{$1} || $STTSmap{$1.Pres.$3} || $STTSmap{$1. \$ / NN /go) {}.Subj VAFIN-Pres VBP VAFIN-Past VBD VAIMP VB VAINF VB VAPP VBN VMFIN MD VMINF MD VMPP MD XY SYM $. RBS / RB /go) {}. # PDT -> PIAT -> DT POS / PRP\$ /go) {}. # POS -> PPOSAT -> PRP$ WDT / WP /go) {}.Sg.= ' '.= ' '. $STTSmap{(pop)} = pop while (@_). 'morph'.' STOP'. ++$numtags. # dollar # / NN /go) {}. . # filter . $regex . push @posstreams.$7} || $STTSmap{$1. $posstream .'-'./geo. s/ s/ s/ s/ s/ s/ s/ s/ s/ s/ VBG / NN /go) {}.Sg. return \@posstreams. PDT / DT /go) {}. 'number'.$_.Sg. # pound # get_TueBadS_pos_streams # # Returns a reference to an array containing all streams of POS tags # from the TueBadS tree bank . 'degree'. my $regex = "". s/$regex/$posstream . } print "TIGER: read $numsent POS streams with a total of $numtags tags\n". foreach (@sentences) { my $posstream = ''. # prepare_sentence_for_TIGER $sentence # # Replaces PTB tags that have no TIGER tags mapped to them sub prepare_sentence_for_TIGER { my ($sentence) = @_. } while ($sentence =~ while ($sentence =~ while ($sentence =~ while ($sentence =~ while ($sentence =~ while ($sentence =~ while ($sentence =~ while ($sentence =~ while ($sentence =~ while ($sentence =~ return $sentence.$5} || $STTSmap{$1. s/\s+/ /g. $.'-'. $( : VBZ VBZ VBZ VBZ STTSMAP tr/\n\t/ /.Ind VAFIN-3. my %STTSmap. 'tense')..'-'.Pres.

++$numtags. } print "TueBadS: read $numsent POS streams with a total of $numtags tags\n". 61 .= ' '.= $STTSmap{$2} || "unknown:$2". return \@posstreams. s/\s+/ /g. tr/\n/ /. s/\s\s+/ /g. foreach (@sentences) { my $posstream = ''. $2. $posstream . push @posstreams.xml". my %STTSmap. s/<word ([^>]*)pos="([^">]*)"([^>]*)>/$posstream . $numtags) = (0. $STTSmap{(pop)} = pop while (@_). . } my @posstreams./geo. my @sentences. # prepare_sentence_for_TueBadS $sentence # # Replaces PTB tags that have no TueBadS tags mapped to them sub prepare_sentence_for_TueBadS { my ($sentence) = @_. .Chapter 7 – Appendix sub get_TueBadS_pos_streams { my $filename = "TueBadS/tuebads.*?)<\/sentence>/push @sentences. $_ = <<'STTSMAP'.. my ($numsent. 0). <> }. $posstream ./geo. s/<sentence([^>]*)>(. 'START'. $posstream =~ s/ [. ++$numsent. $( : STTSMAP tr/\n\t/ /. $/) = "<$filename". $_ = do { local(@ARGV. $. and . split ' '. ADJA JJ ADJD JJ ADV RB APPR IN APPRART IN APPO IN APZR IN ART DT BS SYM CARD CD FM FW ITJ UH KOUI IN KOUS IN KON CC KOKOM IN NN NN NE NNP NNE NNP PDS DT PDAT DT PIS NN PIAT DT PIDAT NN PPER PRP PPOSS PRP PPOSAT PRP$ PRELS WP PRELAT WP$ PRF PRP PWS WP PWAT WP$ PWAV WRB PAV RB PROAV RB PROP RB PTKZU TO PTKNEG RB PTKVZ RP PTKANT UH PTKA RB TRUNC : VVFIN VERB VVIMP VB VVINF VB VVIZU VB VVPP VBN VAFIN VERB VAIMP VB VAINF VB VAPP VBN VMFIN MD VMINF MD VMPP MD XY SYM $. # filter .' STOP'.]//go.

# pound . # WDT -> PRELS -> WP RBR / RB /go) {}.62 Chapter 7 – Appendix while while while while while while while while while while while while while while while while while } ($sentence ($sentence ($sentence ($sentence ($sentence ($sentence ($sentence ($sentence ($sentence ($sentence ($sentence ($sentence ($sentence ($sentence ($sentence ($sentence ($sentence =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ return $sentence. s/ s/ s/ s/ s/ s/ s/ s/ s/ s/ s/ s/ s/ s/ s/ s/ s/ VBG / NN /go) {}. # dollar # / NN /go) {}. # EX -> PPER -> PRP LS / SYM /go) {}. PDT / DT /go) {}. VBD / VERB /go) {}. NNS / NN /go) {}. VBZ / VERB /go) {}. NNPS / NNP /go) {}. VBP / VERB /go) {}. RBS / RB /go) {}. JJS / JJ /go) {}. EX / PRP /go) {}. # PDT -> PIAT -> DT POS / PRP\$ /go) {}. \$ / NN /go) {}. # POS -> PPOSAT -> PRP$ WDT / WP /go) {}. JJR / JJ /go) {}.

6. # reset iterator to beginning of hash } return $entity. } my @topcounts = reverse sort {$a <=> $b} keys %counts. require Exporter. my $probability) = each(%$rndstruct) ) { $rndnumber -= $probability. $n) = @_. sub print_array { while (@_) { printf "%3dx %s\n". @EXPORT = qw(get_top_n_results print_array print_places get_random_hash_element create_katz_model sentence_katz_probability print_treebank ). for (my $i = 1. $results). $i++) { for (my $n = 1.pm #!/usr/bin/perl use strict. my (%counts. while (($entity. # @EXPORT_OK = qw(funcTwo $varTwo). while (@_) { printf " Place %3d: %3dx %s\n". my $wordtotal. # reset iterator to beginning of hash my $entity. $wordtotal += @_. shift @_. } } } return @output. $n++) { 63 . # Count occurences foreach (@{$TagStreams}) { split ' '. $entity). $_). 2-. $n <= 4. @output). ++$n. my %gramcount. } } sub print_places { my $n = 0. # create_katz_model \@array of streams of POS tags # $K K constant in katz-smoothing # # Creates a 1-. 3-. } } sub get_random_hash_element { my ($probabilitysum. my $rndnumber = rand($probabilitysum). shift @_. @EXPORT). while (my ($entity. $place++ ) { foreach (@{$counts{$topcounts[$place]}}) { if ((($n == 0) || (++$results <= $n)) && ($topcounts[$place] > 0)) { push @output. $i < @_. our (@ISA. $rndstruct) = @_. shift @_. package ThsFunctions. ++$wordcount{$_} foreach (@_). Code: ThsFunctions. @ISA = qw(Exporter). keys %$rndstruct. $count) = each(%$hash) ) { push(@{$counts{$count}}. } } keys %$rndstruct. } for (my ($place. 4-. shift @_. sub get_top_n_results { my ($hash. 5-gram model with Katz-smoothing # Returns a reference to a hash containing the counts of all # n-grams with 1 < n <= 5 as well as individual word counts sub create_katz_model { my ($TagStreams. ($place <= $#topcounts) && (($n == 0) || ($results < $n)).Chapter 7 – Appendix 7. if ($rndnumber <= 0) { last. my %wordcount. ($topcounts[$place]. $K) = @_.

$d /= 1 .$katz->{'ic'}->{$n}->{$r + 1}.gc="."\n" if (KDBG).' if (KDBG). my ($katz. } my %model. @knownderivations = keys %{$katz->{'gc'}->{$input}}. # result cached? return $katz->{'p'}->{$input}->{$output} if defined($katz->{'p'}->{$input}->{$output}). ($p * $d).$output. $backoff = substr($input."\n" if (KDBG). my ($alphat. $model{'gc'} = \%gramcount. print "observed $n-grams occuring $r+1 times: "." if (KDBG).Chapter 7 – Appendix } } } if ($n <= $i) { ++$gramcount{join ' '.(($K + 1) * $katz->{'ic'}->{$n}->{$K + 1} / $katz->{'ic'}->{$n}->{1}).. print "n=$n. $model{'K'} = $K. $output) = @_. if ($r > 0) { my $p = $r / $katz->{'gc'}->{$input}->{''}. } my %inversecount."\n" if (KDBG). index($input. return $katz->{'p'}->{$input}->{$output} = $p * $d.'. } } foreach (keys %wordcount) { $inversecount{1}->{$wordcount{$_}}++. # simple probability for unigrams return $katz->{'p'}->{$input}->{$output} = $katz->{'wc'}->{$output} / $katz->{'N'} if ($input eq ''). @_[$i-$n. $model{'wc'} = \%wordcount. @_[$i-$n. ' ')+1) if ($n > 2). print "Input: $input Backf: $backoff Output: $output\n" if (KDBG).$i-1]}->{@_[$i]}.$katz->{'ic'}->{$n}->{1}."\n" if (KDBG). ++$gramcount{join ' '. $input.$katz->{'ic'}->{$n}->{$K+1}. my $d = ($rstar / $r). print "ug.$katz->{'gc'}->{$input}->{''}. my $rstar = ($r + 1) * $katz->{'ic'}->{$n}->{$r + 1} / $katz->{'ic'}->{$n}->{$r}. print "\nD-P: $p R*: $rstar D-D: $d D: ". } my $backoff. @knownderivations).$katz->{'wc'}->{$output}. $backoff = '' if ($n <= 2). 64 . foreach my $gc (keys %gramcount) { my $len = 2 + $gc=~tr/ / /. print "observed $n-grams occuring $r times: ". # $model{'p'} = probability cache # $model{'lp'} = log probability cache } return \%model. print "observed $n-grams occuring $K+1 times: ".wc=". # if r > k then return traditional estimate return $katz->{'p'}->{$input}->{$output} = $r / $katz->{'gc'}->{$input}->{''} if ($r > $K). print "r=$r. my $r = $katz->{'gc'}->{$input}->{$output}.' if (KDBG). print "observed $n-grams occuring 1 times: ". $alphab. foreach (keys %{$gramcount{$gc}}) { $inversecount{$len}->{$gramcount{$gc}->{$_}}++ if ($_ ne ''). $d -= ($K + 1) * $katz->{'ic'}->{$n}->{$K + 1} / $katz->{'ic'}->{$n}->{1}. my $n = 2 + $input =~ tr/ / /. $model{'ic'} = \%inversecount.."\n" if (KDBG). my $K = $katz->{'K'}. # katz_probability \%katz reference to the katz model # $input input so far # $output next element in stream # # Returns the probability of '$output' occuring if '$input' was # the last observation sub katz_probability { use constant KDBG => 0.$katz->{'ic'}->{$n}->{$r}.'.$i-1]}->{''}. $model{'N'} = $wordtotal.

$_) if ($_ ne ''). foreach (keys %{$treebank[$n]}) { print "$_ :: ".$i+$depth-1]. $i < @{$treebank[$n]->{$_}}.($j == 0 ? '-> ' : ''). print "PS ($backoff -> $output) = $ps\n" if (KDBG). $_) if ($_ ne '').$alphab) if ($alphab != 1). print " + PR [ $output | $input ] = $katz->{'lp'}->{$input}->{$output}\n". for (my $n = 0. $katz->{'lp'}->{$input}->{$output} = 0. $i++) { my $input = join ' '.Chapter 7 – Appendix foreach (@knownderivations) { print " knownderiv: $input -> $_\n" if (KDBG). $depth. --$depth. $backoff. print "$_ " foreach (@{$treebank[$n]->{$_}}). print "ERR: $input -> $output\n" if ($katz->{'lp'}->{$input}->{$output} == 0). print " $backoff => $_ ($katz->{'gc'}->{$backoff}->{$_}. my $output = @_[$i+$depth]. $ps). $katz->{'lp'}->{$input}->{$output} = log $katz->{'lp'}->{$input}->{$output}.'] '. @_[0.$i-1]. @_[$i. $ps = katz_probability($katz.000000875 if ($katz->{'lp'}->{$input}->{$output} <= 0). $n < @treebank. for (my $i = 1. $alphat += katz_probability($katz. $sentence."\n". $katz->{'lp'}->{$input}->{$output} = log $katz->{'lp'}->{$input}->{$output}. $alpha = (1 . print "A = $alpha\n" if (KDBG). my $probability.$depth. # # } # } print "ERR: $input -> $output\n" if ($katz->{'lp'}->{$input}->{$output} == 0). $i <= $depth. $i++) { print " [$i] = ".. $j < @{$treebank[$n]->{$_}->[$i]}.$alphat) / (1 . $output) if ($alpha != 0). # sentence_katz_probability \%hash containing the katz model # $sentence a stream of POS tags # $depth max n in n-gram # # Returns the probability of the given sentence in the given model sub sentence_katz_probability { my ($katz. $alphab += katz_probability($katz. if (($_ eq 'tree') || ($_ eq 'word')) { print "[0] = $treebank[$n]->{$_}->[0]\n". } # # # } } $probability += $katz->{'lp'}->{$input}->{$output}. $alphab)\n" if ($_ ne '') && KDBG. $katz->{'lp'}->{$input}->{$output} = 0. for (my $i = 1. $i++) { my $input = join ' '. print " + PR [ $output | $input ] = $katz->{'lp'}->{$input}->{$output}\n". $j++) { print '['. $alphat)\n" if ($_ ne '') && KDBG. $input. } else { print '[ '. $input. my $output = @_[$i]. if (!defined $katz->{'lp'}->{$input}->{$output}) { $katz->{'lp'}->{$input}->{$output} = katz_probability($katz. } } elsif (($_ eq 'pos') || ($_ eq 'post') || ($_ eq 'words') || ($_ eq 'orig')) { print $treebank[$n]->{$_}. $depth = @_-1 if ($depth >= @_). $output). print "ERR!:$input -> $output\n" if ($katz->{'lp'}->{$input}->{$output} < 0). split ' '. 65 . return $probability. $input.$treebank[$n]->{$_}->[$i]->[$j]. for (my $i = 1. } return $katz->{'p'}->{$input}->{$output} = $alpha * $ps. } print "\n".000000875 if ($katz->{'lp'}->{$input}->{$output} <= 0). print "ERR!:$input -> $output\n" if ($katz->{'lp'}->{$input}->{$output} < 0). $output). $sentence.. print " $input -> $_ ($katz->{'gc'}->{$input}->{$_}. sub print_treebank { my @treebank = @{@_[0]}. $probability += $katz->{'lp'}->{$input}->{$output}. } my ($alpha. $k) = @_. $n++) { print "Sentence $n:\n". $backoff. if (!defined $katz->{'lp'}->{$input}->{$output}) { $katz->{'lp'}->{$input}->{$output} = katz_probability($katz. $i < @_ . for (my $j = 0.

} 66 .Chapter 7 – Appendix } } print "]\n". } print "\n".

Related Interests

","doc_promotions_enabled":false,"static_promo_banner_cta_url":"https://www.scribd.com/"},"eligible_for_exclusive_trial_roadblock":false,"eligible_for_seo_roadblock":false,"exclusive_free_trial_roadblock_props_path":"/doc-page/exclusive-free-trial-props/312724025","flashes":[],"footer_props":{"urls":{"about":"/about","press":"/press","blog":"http://literally.scribd.com/","careers":"/careers","contact":"/contact","plans_landing":"/subscribe","referrals":"/referrals?source=footer","giftcards":"/giftcards","faq":"/faq","accessibility":"/accessibility-policy","faq_paths":{"accounts":"https://support.scribd.com/hc/sections/202246346","announcements":"https://support.scribd.com/hc/sections/202246066","copyright":"https://support.scribd.com/hc/sections/202246086","downloading":"https://support.scribd.com/hc/articles/210135046","publishing":"https://support.scribd.com/hc/sections/202246366","reading":"https://support.scribd.com/hc/sections/202246406","selling":"https://support.scribd.com/hc/sections/202246326","store":"https://support.scribd.com/hc/sections/202246306","status":"https://support.scribd.com/hc/en-us/articles/360001202872","terms":"https://support.scribd.com/hc/sections/202246126","writing":"https://support.scribd.com/hc/sections/202246366","adchoices":"https://support.scribd.com/hc/articles/210129366","paid_features":"https://support.scribd.com/hc/sections/202246306","failed_uploads":"https://support.scribd.com/hc/en-us/articles/210134586-Troubleshooting-uploads-and-conversions","copyright_infringement":"https://support.scribd.com/hc/en-us/articles/210128946-DMCA-copyright-infringement-takedown-notification-policy","end_user_license":"https://support.scribd.com/hc/en-us/articles/210129486","terms_of_use":"https://support.scribd.com/hc/en-us/articles/210129326-General-Terms-of-Use"},"publishers":"/publishers","static_terms":"/terms","static_privacy":"/privacy","copyright":"/copyright","ios_app":"https://itunes.apple.com/us/app/scribd-worlds-largest-online/id542557212?mt=8&uo=4&at=11lGEE","android_app":"https://play.google.com/store/apps/details?id=com.scribd.app.reader0&hl=en","books":"/books","sitemap":"/directory"}},"global_nav_props":{"header_props":{"logo_src":"/images/landing/home2_landing/scribd_logo_horiz_small.svg","root_url":"https://www.scribd.com/","search_term":"","small_logo_src":"/images/logos/scribd_s_logo.png","uploads_url":"/upload-document","search_props":{"redirect_to_app":true,"search_url":"/search","query":"","search_page":false}},"user_menu_props":null,"sidebar_props":{"urls":{"bestsellers":"https://www.scribd.com/bestsellers","home":"https://www.scribd.com/","saved":"/saved","subscribe":"/archive/pmp_checkout?doc=312724025&metadata=%7B%22context%22%3A%22pmp%22%2C%22action%22%3A%22start_trial%22%2C%22logged_in%22%3Afalse%2C%22platform%22%3A%22web%22%7D","top_charts":"/bestsellers","upload":"https://www.scribd.com/upload-document"},"categories":{"book":{"icon":"icon-ic_book","icon_filled":"icon-ic_book_fill","url":"https://www.scribd.com/books","name":"Books","type":"book"},"news":{"icon":"icon-ic_articles","icon_filled":"icon-ic_articles_fill","url":"https://www.scribd.com/news","name":"News","type":"news"},"audiobook":{"icon":"icon-ic_audiobook","icon_filled":"icon-ic_audiobook_fill","url":"https://www.scribd.com/audiobooks","name":"Audiobooks","type":"audiobook"},"magazine":{"icon":"icon-ic_magazine","icon_filled":"icon-ic_magazine_fill","url":"https://www.scribd.com/magazines","name":"Magazines","type":"magazine"},"document":{"icon":"icon-ic_document","icon_filled":"icon-ic_document_fill","url":"https://www.scribd.com/docs","name":"Documents","type":"document"},"sheet_music":{"icon":"icon-ic_songbook","icon_filled":"icon-ic_songbook_fill","url":"https://www.scribd.com/sheetmusic","name":"Sheet Music","type":"sheet_music"},"summary":{"icon":"icon-ic_globalnav_snapshot","icon_filled":"icon-ic_globalnav_snapshot_fill","url":"https://www.scribd.com/snapshots","name":"Snapshots","type":"summary"}},"nav_categories":["mixed","book","audiobook","magazine","document","sheet_music"],"selected_content_type":"mixed","username":"","search_overlay_props":{"search_input_props":{"focused":false,"keep_suggestions_on_blur":false}}}},"recommenders":{"related_titles_recommender":{"ids":[377994927,339320984,135698537,152196591,76597455,61340580,48399268,154956254,148087233,61637401,190892790,352430631,187746256,230356910,372973803,199369858,266015392,290574426,349321157,67068553,229749235,173745136,353421486,177593029,338018021,77539463,205606023,43880007,232982475,312855696,312855578,312855550,312855733,312855483,312550606,312549443,312550702,312550503,312724048],"title_link":null,"title":null,"track_opts":{"compilation_id":"QHdQnWZ7/7DK17lFG+lCaeDN1J8=","module_id":"YeKSnQjzM8uk5Pb/kdA2EF5kVb4=","widget_name":"right sidebar","track_id":"flattened_recommender"}},"footer_recommenders":{"recommenders":[{"ids":[377994927,339320984,135698537,152196591,76597455,61340580,48399268,154956254,148087233,61637401,190892790,352430631,187746256,230356910,372973803,199369858,266015392,290574426,349321157,67068553,229749235,173745136,353421486,177593029,338018021,77539463,205606023,43880007,232982475],"title_link":null,"title":"Documents Similar To 2008-MSc-Thesis.pdf","track_opts":{"compilation_id":"QHdQnWZ7/7DK17lFG+lCaeDN1J8=","module_id":"CgCvsPoaxiehqQzI9MyqdV0vaUQ=","widget_name":"document_carousel"}},{"ids":[312855696,312855578,312855550,312855733,312855483,312550606,312549443,312550702,312550503,312724048],"title_link":null,"title":"More From krishan","track_opts":{"compilation_id":"QHdQnWZ7/7DK17lFG+lCaeDN1J8=","module_id":"hOCTso6ZNrZ7nC8hf5gPNPoFQVg=","widget_name":"document_carousel"}}]},"seo_new_docs_recommenders":{"recommenders":[]},"documents":{"43880007":{"type":"document","id":43880007,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/43880007/149x198/f5e59bae0d/1393528080?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/43880007/298x396/ede50fe06d/1393528080?v=1","title":"Phrasal Verbs List","short_title":"Phrasal Verbs List","author":"guenpradas","tracking":{"object_type":"document","object_id":43880007,"track":"flattened_recommender","doc_uuid":"hzIBrDPiVP1ST9hrUQumO/o5518="},"url":"https://www.scribd.com/document/43880007/Phrasal-Verbs-List","top_badge":null},"48399268":{"type":"document","id":48399268,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/48399268/149x198/6df58cde2d/1335767605?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/48399268/298x396/4e7d8a341c/1335767605?v=1","title":"botttom up parsing","short_title":"botttom up parsing","author":"vishnugehlot","tracking":{"object_type":"document","object_id":48399268,"track":"flattened_recommender","doc_uuid":"3xPAv+olmU88uYUdsJGqoZAFla4="},"url":"https://www.scribd.com/presentation/48399268/botttom-up-parsing","top_badge":null},"61340580":{"type":"document","id":61340580,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/61340580/149x198/3e24986eee/1388825947?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/61340580/298x396/34fa530f91/1388825947?v=1","title":"Articulo12Co","short_title":"Articulo12Co","author":"Lizita Pinto Herbas","tracking":{"object_type":"document","object_id":61340580,"track":"flattened_recommender","doc_uuid":"yagotgEnmhA97Y+GRv7qQcV44+0="},"url":"https://www.scribd.com/document/61340580/Articulo12Co","top_badge":null},"61637401":{"type":"document","id":61637401,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/61637401/149x198/ec5994ce0a/1312491182?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/61637401/298x396/199fd734b1/1312491182?v=1","title":"Lecture I","short_title":"Lecture I","author":"Faruk Iqbal","tracking":{"object_type":"document","object_id":61637401,"track":"flattened_recommender","doc_uuid":"eHqtYf5xF8isjLfROOX+ukwf0qk="},"url":"https://www.scribd.com/document/61637401/Lecture-I","top_badge":null},"67068553":{"type":"document","id":67068553,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/67068553/149x198/8beb796bad/1354586205?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/67068553/298x396/f19900802d/1354586205?v=1","title":"angl gramatika","short_title":"angl gramatika","author":"Antonia Dobreva","tracking":{"object_type":"document","object_id":67068553,"track":"flattened_recommender","doc_uuid":"2XMJVB+/aepsBEg1L8ALIIoauNM="},"url":"https://www.scribd.com/document/67068553/angl-gramatika","top_badge":null},"76597455":{"type":"document","id":76597455,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/76597455/149x198/22279d259f/1373390089?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/76597455/298x396/bfc2068a5a/1373390089?v=1","title":"OS_FINAL","short_title":"OS_FINAL","author":"tejpal goda","tracking":{"object_type":"document","object_id":76597455,"track":"flattened_recommender","doc_uuid":"HRX2BzBFPwFNg5aXXlbKE76kIKQ="},"url":"https://www.scribd.com/doc/76597455/OS-FINAL","top_badge":null},"77539463":{"type":"document","id":77539463,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/77539463/149x198/c396963bb4/1326028345?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/77539463/298x396/90064bda2b/1326028345?v=1","title":"Comma Rules","short_title":"Comma Rules","author":"Clifford Guineden Bosaing","tracking":{"object_type":"document","object_id":77539463,"track":"flattened_recommender","doc_uuid":"W+exKteT0aavE++Rhxp0IIE0bDI="},"url":"https://www.scribd.com/document/77539463/Comma-Rules","top_badge":null},"135698537":{"type":"document","id":135698537,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/135698537/149x198/4fee59b13f/1437289282?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/135698537/298x396/cc61768e61/1437289282?v=1","title":"Chapter 4 \r\nPDA","short_title":"Chapter 4 \r\nPDA","author":"Kalaivani Manoharan","tracking":{"object_type":"document","object_id":135698537,"track":"flattened_recommender","doc_uuid":"0ob9O+IGcIy3H9HsB0zpWRcBdts="},"url":"https://www.scribd.com/document/135698537/Chapter-4-PDA","top_badge":null},"148087233":{"type":"document","id":148087233,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/148087233/149x198/4b01644eba/1398442938?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/148087233/298x396/048dbd2b19/1398442938?v=1","title":"Upper Intermediate Indice","short_title":"Upper Intermediate Indice","author":"Daniela Andrea Ruggiero","tracking":{"object_type":"document","object_id":148087233,"track":"flattened_recommender","doc_uuid":"132zRTLf/xPganKC8eTzjdseI44="},"url":"https://www.scribd.com/document/148087233/Upper-Intermediate-Indice","top_badge":null},"152196591":{"type":"document","id":152196591,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/152196591/149x198/51fff67159/1391777739?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/152196591/298x396/71d22d4555/1391777739?v=1","title":"Theory of Computation","short_title":"Theory of Computation","author":"Shanmugasundaram Muthuswamy","tracking":{"object_type":"document","object_id":152196591,"track":"flattened_recommender","doc_uuid":"3zbolLUWh59izWvcu9LbQjCa3r8="},"url":"https://www.scribd.com/document/152196591/Theory-of-Computation","top_badge":null},"154956254":{"type":"document","id":154956254,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/154956254/149x198/d9ebca8156/1445921611?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/154956254/298x396/0bd9cc6ec8/1445921611?v=1","title":"Formal Languages and Automata, Grammars and Parsing Using Haskell (1)","short_title":"Formal Languages and Automata, Grammars and Parsing Using Haskell (1)","author":"cranckcracker123","tracking":{"object_type":"document","object_id":154956254,"track":"flattened_recommender","doc_uuid":"vVxfblsl3S0GSc/BUQuXCyZy3hg="},"url":"https://www.scribd.com/document/154956254/Formal-Languages-and-Automata-Grammars-and-Parsing-Using-Haskell-1","top_badge":null},"173745136":{"type":"document","id":173745136,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/173745136/149x198/6f801df803/1381015043?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/173745136/298x396/014da91fb6/1381015043?v=1","title":"Chapter 05 (Syntax)","short_title":"Chapter 05 (Syntax)","author":"Barbara Jun","tracking":{"object_type":"document","object_id":173745136,"track":"flattened_recommender","doc_uuid":"xBJxmAXdsy4O3x3C2QpMw5fIvf8="},"url":"https://www.scribd.com/document/173745136/Chapter-05-Syntax","top_badge":null},"177593029":{"type":"document","id":177593029,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/177593029/149x198/e91e899cae/1382300999?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/177593029/298x396/83c1fdfb96/1382300999?v=1","title":"Tips on Avoiding Common Writing Errors","short_title":"Tips on Avoiding Common Writing Errors","author":"franc_estors","tracking":{"object_type":"document","object_id":177593029,"track":"flattened_recommender","doc_uuid":"P9PvLt73Dk+OQO+L3vzq582CJig="},"url":"https://www.scribd.com/document/177593029/Tips-on-Avoiding-Common-Writing-Errors","top_badge":null},"187746256":{"type":"document","id":187746256,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/187746256/149x198/8152f5b19b/1441196021?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/187746256/298x396/8432356c30/1441196021?v=1","title":"Compiler Design","short_title":"Compiler Design","author":"Mohd Shahid","tracking":{"object_type":"document","object_id":187746256,"track":"flattened_recommender","doc_uuid":"RJ+UBloaiaZuwKd5tP4XvOaWg2k="},"url":"https://www.scribd.com/document/187746256/Compiler-Design","top_badge":null},"190892790":{"type":"document","id":190892790,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/190892790/149x198/96e2fd4a9f/1386768510?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/190892790/298x396/281c4f3e0e/1386768510?v=1","title":"Pcd Qb Final","short_title":"Pcd Qb Final","author":"palanisamyam123","tracking":{"object_type":"document","object_id":190892790,"track":"flattened_recommender","doc_uuid":"asMcpIukiRuAqZ5PFaQma1mxhjk="},"url":"https://www.scribd.com/document/190892790/Pcd-Qb-Final","top_badge":null},"199369858":{"type":"document","id":199369858,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/199369858/149x198/4cf9c373f8/1389635323?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/199369858/298x396/d3897b615c/1389635323?v=1","title":"Chapter 2","short_title":"Chapter 2","author":"Jayanth Samavedam","tracking":{"object_type":"document","object_id":199369858,"track":"flattened_recommender","doc_uuid":"D3g/0GnDBCAsDrkEtynVtsXCxKg="},"url":"https://www.scribd.com/document/199369858/Chapter-2","top_badge":null},"205606023":{"type":"document","id":205606023,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/205606023/149x198/eaa387c7a4/1538659688?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/205606023/298x396/ff15d14748/1538659688?v=1","title":"Ready for First Teacher's Book Samples","short_title":"Ready for First Teacher's Book Samples","author":"abgcordoba2013","tracking":{"object_type":"document","object_id":205606023,"track":"flattened_recommender","doc_uuid":"Ic3lQpOwovEq3+ksl35iYColHgo="},"url":"https://www.scribd.com/doc/205606023/Ready-for-First-Teacher-s-Book-Samples","top_badge":null},"229749235":{"type":"document","id":229749235,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/229749235/149x198/8d797b46f8/1402844140?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/229749235/298x396/64327f0863/1402844140?v=1","title":"Vocabulary Activities","short_title":"Vocabulary Activities","author":"green_panther_eyes","tracking":{"object_type":"document","object_id":229749235,"track":"flattened_recommender","doc_uuid":"zecc1y4RDXVs+mozNrpZIq8+g1I="},"url":"https://www.scribd.com/document/229749235/Vocabulary-Activities","top_badge":null},"230356910":{"type":"document","id":230356910,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/230356910/149x198/c44b43504e/1403145520?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/230356910/298x396/f53ed9497f/1403145520?v=1","title":"Cafetiere Report","short_title":"Cafetiere Report","author":"marlonumit","tracking":{"object_type":"document","object_id":230356910,"track":"flattened_recommender","doc_uuid":"xEmTMwmrIO4/Q5K9KeIzuBiq1ek="},"url":"https://www.scribd.com/document/230356910/Cafetiere-Report","top_badge":null},"232982475":{"type":"document","id":232982475,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/232982475/149x198/02985295d9/1404784168?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/232982475/298x396/0ea9789211/1404784168?v=1","title":"Items Perc22013","short_title":"Items Perc22013","author":"Lucille Edward","tracking":{"object_type":"document","object_id":232982475,"track":"flattened_recommender","doc_uuid":"W8gbFWy24PGQKo1mTGPHw1eGSMI="},"url":"https://www.scribd.com/document/232982475/Items-Perc22013","top_badge":null},"266015392":{"type":"document","id":266015392,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/266015392/149x198/73d701a3d2/1432139480?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/266015392/298x396/d91e6892c5/1432139480?v=1","title":"English Essay","short_title":"English Essay","author":"Emily H.","tracking":{"object_type":"document","object_id":266015392,"track":"flattened_recommender","doc_uuid":"glEfOOBRKoDH3mbXlkwrHlnuPU8="},"url":"https://www.scribd.com/document/266015392/English-Essay","top_badge":null},"290574426":{"type":"document","id":290574426,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/290574426/149x198/cbcafa22ca/1500751997?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/290574426/298x396/679d73b63d/1500751997?v=1","title":"English Phrasal Verbs in Arabic","short_title":"English Phrasal Verbs in Arabic","author":"Mohammed Ali","tracking":{"object_type":"document","object_id":290574426,"track":"flattened_recommender","doc_uuid":"1Gbz8Geel7sc375CrTsNCk4Y5sE="},"url":"https://www.scribd.com/document/290574426/English-Phrasal-Verbs-in-Arabic","top_badge":null},"312549443":{"type":"document","id":312549443,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/312549443/149x198/548e93617c/1463202544?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312549443/298x396/a287b898a2/1463202544?v=1","title":"5 Steps to a 5 History and Approaches","short_title":"5 Steps to a 5 History and Approaches","author":"krishan","tracking":{"object_type":"document","object_id":312549443,"track":"flattened_recommender","doc_uuid":"LoBjmXyWn5tZsqz8CiAW9YSTMWQ="},"url":"https://www.scribd.com/doc/312549443/5-Steps-to-a-5-History-and-Approaches","top_badge":null},"312550503":{"type":"document","id":312550503,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312550503/149x198/729153a4b5/1463203929?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312550503/298x396/417148d0d4/1463203929?v=1","title":"01. GONDRA","short_title":"01. GONDRA","author":"krishan","tracking":{"object_type":"document","object_id":312550503,"track":"flattened_recommender","doc_uuid":"uu7ssf9jnHd3PIE13AENVjvhtJ0="},"url":"https://www.scribd.com/doc/312550503/01-GONDRA","top_badge":null},"312550606":{"type":"document","id":312550606,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/312550606/149x198/c83bdeef51/1463204154?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/312550606/298x396/5bfa7ddfed/1463204154?v=1","title":"1.b Notes Approaches","short_title":"1.b Notes Approaches","author":"krishan","tracking":{"object_type":"document","object_id":312550606,"track":"flattened_recommender","doc_uuid":"trE0/JVW286hHp68g/RE6RNpkWk="},"url":"https://www.scribd.com/presentation/312550606/1-b-Notes-Approaches","top_badge":null},"312550702":{"type":"document","id":312550702,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312550702/149x198/6bf2719f08/1463204378?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/312550702/298x396/ddc447cfdb/1463204378?v=1","title":"09 Chapter 2","short_title":"09 Chapter 2","author":"krishan","tracking":{"object_type":"document","object_id":312550702,"track":"flattened_recommender","doc_uuid":"GykEnOBf05XLqK2nVez6Xe4fYjk="},"url":"https://www.scribd.com/document/312550702/09-Chapter-2","top_badge":null},"312724048":{"type":"document","id":312724048,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/312724048/149x198/41a1fb4323/1463375623?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312724048/298x396/30e1377d5b/1463375623?v=1","title":"60 Nielsen","short_title":"60 Nielsen","author":"krishan","tracking":{"object_type":"document","object_id":312724048,"track":"flattened_recommender","doc_uuid":"nWa+e0XAslbZaprOmwWuHO3mX14="},"url":"https://www.scribd.com/doc/312724048/60-Nielsen","top_badge":null},"312855483":{"type":"document","id":312855483,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312855483/149x198/4d4f67fe56/1463463127?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312855483/298x396/c8d27c2dc0/1463463127?v=1","title":"ES-363a","short_title":"ES-363a","author":"krishan","tracking":{"object_type":"document","object_id":312855483,"track":"flattened_recommender","doc_uuid":"x+JhcOITWRgdA/I+Q5H777nzBCg="},"url":"https://www.scribd.com/document/312855483/ES-363a","top_badge":null},"312855550":{"type":"document","id":312855550,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312855550/149x198/a85a710ad4/1463463203?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312855550/298x396/4bd1dc7431/1463463203?v=1","title":"Dialnet-TheRoleOfTheGramamarTeaching-4778849.pdf","short_title":"Dialnet-TheRoleOfTheGramamarTeaching-4778849.pdf","author":"krishan","tracking":{"object_type":"document","object_id":312855550,"track":"flattened_recommender","doc_uuid":"P1RAGlphNicWW4zUvQRdj6AMnKo="},"url":"https://www.scribd.com/doc/312855550/Dialnet-TheRoleOfTheGramamarTeaching-4778849-pdf","top_badge":null},"312855578":{"type":"document","id":312855578,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312855578/149x198/f75ef40703/1463463236?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312855578/298x396/9a668ab509/1463463236?v=1","title":"Grad Infant Development Fall 2005 Lecture 3","short_title":"Grad Infant Development Fall 2005 Lecture 3","author":"krishan","tracking":{"object_type":"document","object_id":312855578,"track":"flattened_recommender","doc_uuid":"p1Zna3/7iKhhgqMj6z6fDn+Z2XM="},"url":"https://www.scribd.com/document/312855578/Grad-Infant-Development-Fall-2005-Lecture-3","top_badge":null},"312855696":{"type":"document","id":312855696,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/312855696/149x198/34bb6942ab/1463463365?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312855696/298x396/b499211db5/1463463365?v=1","title":"2008 a Case Study in Grammar Engineering","short_title":"2008 a Case Study in Grammar Engineering","author":"krishan","tracking":{"object_type":"document","object_id":312855696,"track":"flattened_recommender","doc_uuid":"hcAhOWRZwULvQq4ivyCudWPGL+g="},"url":"https://www.scribd.com/document/312855696/2008-a-Case-Study-in-Grammar-Engineering","top_badge":null},"312855733":{"type":"document","id":312855733,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312855733/149x198/c92b105fee/1463463375?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312855733/298x396/8edff628d6/1463463375?v=1","title":"major steps.pdf","short_title":"major steps.pdf","author":"krishan","tracking":{"object_type":"document","object_id":312855733,"track":"flattened_recommender","doc_uuid":"xwXhyZZrCJNz1M7S8EmWUq2OwkQ="},"url":"https://www.scribd.com/document/312855733/major-steps-pdf","top_badge":null},"338018021":{"type":"document","id":338018021,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/338018021/149x198/7d59da8ae5/1485876152?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/338018021/298x396/282f4e925a/1485876152?v=1","title":"15 LAKATOS-Andrei Eugen Recovering the Memory 14.09.15","short_title":"15 LAKATOS-Andrei Eugen Recovering the Memory 14.09.15","author":"Tudor Gheorghe","tracking":{"object_type":"document","object_id":338018021,"track":"flattened_recommender","doc_uuid":"Ru3E/r59GRcJyuUQtDG3Xg0AKhs="},"url":"https://www.scribd.com/document/338018021/15-LAKATOS-Andrei-Eugen-Recovering-the-Memory-14-09-15","top_badge":null},"339320984":{"type":"document","id":339320984,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/339320984/149x198/d6da03a698/1487095259?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/339320984/298x396/44401f0abc/1487095259?v=1","title":"ICAS2017 SamplePaper en A","short_title":"ICAS2017 SamplePaper en A","author":"ujjwal saha","tracking":{"object_type":"document","object_id":339320984,"track":"flattened_recommender","doc_uuid":"VOvqjLD7KKzIMi6jtcdqS6Bc3sQ="},"url":"https://www.scribd.com/document/339320984/ICAS2017-SamplePaper-en-A","top_badge":null},"349321157":{"type":"document","id":349321157,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/349321157/149x198/68f56de2eb/1495642101?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/349321157/298x396/29e6c5245b/1495642101?v=1","title":"Chapter II Wrong","short_title":"Chapter II Wrong","author":"citra indah syaputri","tracking":{"object_type":"document","object_id":349321157,"track":"flattened_recommender","doc_uuid":"asW6EqvYbjIzHzC4Sh0gC3VnH9g="},"url":"https://www.scribd.com/document/349321157/Chapter-II-Wrong","top_badge":null},"352430631":{"type":"document","id":352430631,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/352430631/149x198/82c4b00370/1498634602?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/352430631/298x396/34eb887c2f/1498634602?v=1","title":"Phrasal Verbs With","short_title":"Phrasal Verbs With","author":"Ruben Dario Barros Gonzalez","tracking":{"object_type":"document","object_id":352430631,"track":"flattened_recommender","doc_uuid":"H/PJTLVFd1wbL8nrFn3/UKleXxw="},"url":"https://www.scribd.com/document/352430631/Phrasal-Verbs-With","top_badge":null},"353421486":{"type":"document","id":353421486,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/353421486/149x198/a6b0b13c51/1499716078?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/353421486/298x396/a25b8fe8e6/1499716078?v=1","title":"nn.4186","short_title":"nn.4186","author":"Elí Lezama","tracking":{"object_type":"document","object_id":353421486,"track":"flattened_recommender","doc_uuid":"vfUjTi/YmZuthbQzRvZ1pjj93zI="},"url":"https://www.scribd.com/document/353421486/nn-4186","top_badge":null},"372973803":{"type":"document","id":372973803,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/372973803/149x198/7937e133a5/1520220822?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/372973803/298x396/e9421739c3/1520220822?v=1","title":"SyntaticAnalysis Sample","short_title":"SyntaticAnalysis Sample","author":"FidaHussain","tracking":{"object_type":"document","object_id":372973803,"track":"flattened_recommender","doc_uuid":"a4xy6nozyJUqP/ymRoyWgHt50D0="},"url":"https://www.scribd.com/document/372973803/SyntaticAnalysis-Sample","top_badge":null},"377994927":{"type":"document","id":377994927,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/377994927/149x198/193b6ede50/1525270161?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/377994927/298x396/17fa385e9f/1525270161?v=1","title":"02-scanning-handout.pdf","short_title":"02-scanning-handout.pdf","author":"mamudu francis","tracking":{"object_type":"document","object_id":377994927,"track":"flattened_recommender","doc_uuid":"VIu13PsDE8oRYnRTHccyGYpEkPo="},"url":"https://www.scribd.com/document/377994927/02-scanning-handout-pdf","top_badge":null}}},"seo_roadblock_props_path":"/doc-page/seo-roadblock-props/312724025","signup_context":null,"toolbar":{"search_path":"/search-4gen?allowed_pages=1%2C2%2C3%2C24%2C25%2C26&auth_token=AIwpT6A%2B71jL1eST3jVUrxPo%2Bls%3D&authenticity_token=M6m8u7klluve8ro3OVeFeZ8xGQDro8I5eD56lkW71KANVkVcrK3E%2FpizW3gSepXoPVyoUgR3DqscY%2FsC6XTnRg%3D%3D&expires=1540445043&wordDocumentId=312724025&wordUploadId=318098619"},"renewal_nag_props":null}-->
') && ($1 ne '``') && ($1 ne '-NONE-') && ($1 ne '-LRB-') && ($1 ne '-RRB-')) { $POSStream . my $openbrackets = 1. push @sentences. s/\(\s+/\(/g. } while ($bufright < length($wsj)) && ($bufright > 0). my $bufright = 0. $1/geo. } print "Extracted: ".Remove redundant whitespace near brackets # . # Slurp input file $_ = do { local(@ARGV. "\n". ':' . s/\s+\)/\)/g. $_ . } while ($openbrackets > 0). $terminaldistributions{$1. $openbrackets. our (@ISA. ($terminaldistributions{$1}->{$2})++. @POSTagStreams. %terminal. ++$nx . ": ".Chapter 7 – Appendix 7. @sentencetrees. $buffer = substr $wsj. containing all terminal derivations # Replace whole brackets with their non-terminal descriptor # Record distribution of words for POS-tags s/\(([^()\s]+)\s([^()]+)\)/$terminal{$1.') && ($1 ne '. @ISA = qw(Exporter). my $bufleft = $bufright. $WordStream . $WordStream). { my $wsj = @_[0]. $openbrackets = ($buffer =~ tr/(/(/) . $buffer.' -> '. $bufright for 1 . Code: ThsCorpora. $bufright . my ($POSStream. @WordStreams). $bufright . length() .Find all substrings with a matching number of opening # and closing parentheses my @sentences." sentences\n". @EXPORT). # Process innermost layer of brackets. sub parse_PTB_file { # Extract sentences: # . my ($nx. $bufleft. %terminaldistributions. do { $bufright = 1 + index $wsj. } return $_. package ThsCorpora.($buffer =~ tr/)/)/).

\@sentencetrees. s/\(([^ ()]+) [^ ()]+\)/$posstream . $CTBmap{(pop)} = pop while (@_). push @posstreams. $_. ++$numtags.= ' '. $CTBmap{$1} if ($1 ne '-NONE-'). 58 .= ' ' . 'START'. $/) = "<$filename". $posstream =~ s/ PU $/ . my %CTBmap. <> }. $1. \%terminal. return \@posstreams.'STOP'.'STOP'. $_ = <<'CTBMAP'. s/\s\s+/ /g. $POSStream . AD AS BA CC CD CS DEC DEG DT ETC FW IJ JJ LB LC M MSP NN NN-OBJ NN-SBJ NN-SHORT NP NR NR-PN NR-SHORT NT NT-SHORT OD P PN SB SP VC VE VP VV RB X X CC CD IN VBG POS DT CC FW UH JJ X IN NN TO NN NN NN NN NN NNP NNP NNP NN NN CD IN PRP X X V V V V VA DER PU DEV X VA DER PU DEV X CTBMAP tr/\n\t/ /. } print "ChineseTB: read $numsent POS streams with a total of $numtags tags\n". push @WordStreams. while ($posstream =~ s/ DER / V /go) {}.Chapter 7 – Appendix push @POSTagStreams. \%terminaldistributions. $numsent. while ($posstream =~ s/ VA DEV / RB /go) {}. } } # Store the tree structure to create a PCFG later on push @sentencetrees. 'START '. /go. split ' '. return (\@POSTagStreams. $numtags). } my @posstreams. tr/\n/ /. # get_CTB_pos_streams # # Returns a reference to an array containing all streams of POS tags # from the Chinese (Penn) tree bank sub get_CTB_pos_streams { my $filename = 'chtb. my (@sentences. ++$numsent. foreach (@sentences) { my $posstream = ''. while ($posstream =~ s/ VA / JJ /go) {}. while ($posstream =~ s/ V DEV / RB /go) {}. s/<S [^>]*>(. $posstream . $posstream . while ($posstream =~ s/ PU / : /go) {}. $WordStream./geo.fid'. $_ = do { local(@ARGV.*?)<\/S>/push @sentences./geo. \@WordStreams). s/\s+/ /g.

++$numsent. while ($sentence =~ s/ \$ / NN /go) {}./geo. ADJA-Pos ADJA-Comp ADJA-Sup ADJA-*. while ($sentence =~ s/ JJR / JJ /go) {}.*?)<\/terminals>/push @sentences. while ($sentence =~ s/ JJS / JJ /go) {}. while ($sentence =~ s/ WP / PRP /go) {}.*. my $filename = "TIGER/corpus/tiger_release_aug07. <> }. $_ = do { local(@ARGV. $numtags) = (0. # pound } return $sentence.*. VBN / V /go) {}. SYM / X /go) {}.59 Chapter 7 – Appendix # prepare_sentence_for_CTB $sentence # # Replaces PTB tags that have no CTB tags mapped to them sub prepare_sentence_for_CTB { my ($sentence) = @_. s/<terminals>(. VBP / V /go) {}. while ($sentence =~ s/ PDT / CD /go) {}. while ($sentence =~ s/ RBS / RB /go) {}.* ADJD-Pos ADJD-Comp ADJD-Sup ADV APPR APPRART APPO APZR ART CARD FM ITJ KOUI KOUS KON KOKOM NN-* NN-Sg NN-Pl NE-* NE-Sg NE-Pl NNE PDS PDAT PIS PIAT JJ JJR JJS JJ JJ JJR JJS RB IN IN IN IN DT CD FW UH IN IN CC IN NN NN NNS NNP NNP NNPS NNP DT DT NN DT . VBZ / V /go) {}. VBD / V /go) {}. while ($sentence =~ s/ WRB / RB /go) {}. $/) = "<$filename". while ($sentence =~ s/ RBR / RB /go) {}. while while while while while while ($sentence ($sentence ($sentence ($sentence ($sentence ($sentence =~ =~ =~ =~ =~ =~ s/ s/ s/ s/ s/ s/ MD / V /go) {}. s/\s\s+/ /g..\n".. my @sentences. while ($sentence =~ s/ PRP\$ / PRP /go) {}. LS / X /go) {}. # dollar while ($sentence =~ s/ # / NN /go) {}. while ($sentence =~ s/ NNPS / NNP /go) {}. # get_TIGER_pos_streams # # Returns a reference to an array containing all streams of POS tags # from the TIGER tree bank sub get_TIGER_pos_streams { print "Loading TIGER.xml". VB / V /go) {}. 0). while ($sentence =~ s/ NNS / NN /go) {}. tr/\n/ /. while ($sentence =~ s/ WP\$ / PRP /go) {}. $1. while ($sentence =~ s/ WDT / DT /go) {}. $_ = <<'STTSMAP'. while while while while ($sentence ($sentence ($sentence ($sentence =~ =~ =~ =~ s/ s/ s/ s/ EX / X /go) {}. my ($numsent. RP / X /go) {}.

$posstream . } my @posstreams. . $posstream =~ s/ [.Subj VVFIN-Pres VBP VVFIN-Past VBD VVIMP VB VVINF VB VVIZU VB VVPP VBN VAFIN-3.Sg.]//go. # EX -> PPER -> PRP LS / SYM /go) {}. # WDT -> PRELS -> WP RBR / RB /go) {}.Pres.Ind VVFIN-3. split ' '. EX / PRP /go) {}. 'START'. and .Pres.'="([^">]+)"([^>]*)' foreach ('pos'.60 Chapter 7 – Appendix PIDAT NN PPER PRP PPOSS PRP PPOSAT PRP$ PRELS WP PRELAT WP$ PRF PRP PWS WP PWAT WP$ PWAV WRB PAV RB PROAV RB PTKZU TO PTKNEG RB PTKVZ RP PTKANT UH PTKA RB TRUNC : VVFIN-3.$9} || "unknown:$1:$3:$5:$7:$9 ".'-'.= $STTSmap{$1} || $STTSmap{$1.Pres.$3} || $STTSmap{$1. \$ / NN /go) {}.Subj VAFIN-Pres VBP VAFIN-Past VBD VAIMP VB VAINF VB VAPP VBN VMFIN MD VMINF MD VMPP MD XY SYM $. RBS / RB /go) {}. # PDT -> PIAT -> DT POS / PRP\$ /go) {}. # POS -> PPOSAT -> PRP$ WDT / WP /go) {}.Sg.= ' '.= ' '. $STTSmap{(pop)} = pop while (@_). 'morph'.' STOP'. ++$numtags. # dollar # / NN /go) {}. . # filter . $regex . push @posstreams.$7} || $STTSmap{$1. $posstream .'-'./geo. s/ s/ s/ s/ s/ s/ s/ s/ s/ s/ VBG / NN /go) {}.Sg. return \@posstreams. PDT / DT /go) {}. 'number'.$_.Sg. # pound # get_TueBadS_pos_streams # # Returns a reference to an array containing all streams of POS tags # from the TueBadS tree bank . 'degree'. my $regex = "". s/$regex/$posstream . } print "TIGER: read $numsent POS streams with a total of $numtags tags\n". foreach (@sentences) { my $posstream = ''. # prepare_sentence_for_TIGER $sentence # # Replaces PTB tags that have no TIGER tags mapped to them sub prepare_sentence_for_TIGER { my ($sentence) = @_. } while ($sentence =~ while ($sentence =~ while ($sentence =~ while ($sentence =~ while ($sentence =~ while ($sentence =~ while ($sentence =~ while ($sentence =~ while ($sentence =~ while ($sentence =~ return $sentence.$5} || $STTSmap{$1. s/\s+/ /g. $.'-'. $( : VBZ VBZ VBZ VBZ STTSMAP tr/\n\t/ /.Ind VAFIN-3. my %STTSmap. 'tense')..'-'.Pres.

++$numtags. } print "TueBadS: read $numsent POS streams with a total of $numtags tags\n". 61 .= ' '.= $STTSmap{$2} || "unknown:$2". return \@posstreams. s/\s+/ /g. tr/\n/ /. s/\s\s+/ /g. foreach (@sentences) { my $posstream = ''. $2. $posstream . push @posstreams.xml". my %STTSmap. s/<word ([^>]*)pos="([^">]*)"([^>]*)>/$posstream . $numtags) = (0. $STTSmap{(pop)} = pop while (@_). . } my @posstreams./geo. my @sentences. # prepare_sentence_for_TueBadS $sentence # # Replaces PTB tags that have no TueBadS tags mapped to them sub prepare_sentence_for_TueBadS { my ($sentence) = @_. .Chapter 7 – Appendix sub get_TueBadS_pos_streams { my $filename = "TueBadS/tuebads.*?)<\/sentence>/push @sentences. $_ = <<'STTSMAP'.. my ($numsent. 0). <> }. $posstream ./geo. s/<sentence([^>]*)>(. 'START'. $posstream =~ s/ [. ++$numsent. $( : STTSMAP tr/\n\t/ /. $/) = "<$filename". $_ = do { local(@ARGV. $. and . split ' '. ADJA JJ ADJD JJ ADV RB APPR IN APPRART IN APPO IN APZR IN ART DT BS SYM CARD CD FM FW ITJ UH KOUI IN KOUS IN KON CC KOKOM IN NN NN NE NNP NNE NNP PDS DT PDAT DT PIS NN PIAT DT PIDAT NN PPER PRP PPOSS PRP PPOSAT PRP$ PRELS WP PRELAT WP$ PRF PRP PWS WP PWAT WP$ PWAV WRB PAV RB PROAV RB PROP RB PTKZU TO PTKNEG RB PTKVZ RP PTKANT UH PTKA RB TRUNC : VVFIN VERB VVIMP VB VVINF VB VVIZU VB VVPP VBN VAFIN VERB VAIMP VB VAINF VB VAPP VBN VMFIN MD VMINF MD VMPP MD XY SYM $. # filter .' STOP'.]//go.

# pound . # WDT -> PRELS -> WP RBR / RB /go) {}.62 Chapter 7 – Appendix while while while while while while while while while while while while while while while while while } ($sentence ($sentence ($sentence ($sentence ($sentence ($sentence ($sentence ($sentence ($sentence ($sentence ($sentence ($sentence ($sentence ($sentence ($sentence ($sentence ($sentence =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ =~ return $sentence. s/ s/ s/ s/ s/ s/ s/ s/ s/ s/ s/ s/ s/ s/ s/ s/ s/ VBG / NN /go) {}. # dollar # / NN /go) {}. # EX -> PPER -> PRP LS / SYM /go) {}. PDT / DT /go) {}. VBD / VERB /go) {}. NNS / NN /go) {}. VBZ / VERB /go) {}. NNPS / NNP /go) {}. VBP / VERB /go) {}. RBS / RB /go) {}. JJS / JJ /go) {}. EX / PRP /go) {}. # PDT -> PIAT -> DT POS / PRP\$ /go) {}. \$ / NN /go) {}. # POS -> PPOSAT -> PRP$ WDT / WP /go) {}. JJR / JJ /go) {}.

6. # reset iterator to beginning of hash } return $entity. } my @topcounts = reverse sort {$a <=> $b} keys %counts. require Exporter. my $probability) = each(%$rndstruct) ) { $rndnumber -= $probability. $n) = @_. sub print_array { while (@_) { printf "%3dx %s\n". @EXPORT = qw(get_top_n_results print_array print_places get_random_hash_element create_katz_model sentence_katz_probability print_treebank ). for (my $i = 1. $results). $i++) { for (my $n = 1.pm #!/usr/bin/perl use strict. my (%counts. while (($entity. # @EXPORT_OK = qw(funcTwo $varTwo). while (@_) { printf " Place %3d: %3dx %s\n". my $wordtotal. # reset iterator to beginning of hash my $entity. $wordtotal += @_. shift @_. } } } return @output. $n++) { 63 . # Count occurences foreach (@{$TagStreams}) { split ' '. $entity). $_). 2-. $n <= 4. @output). ++$n. my %gramcount. } } sub print_places { my $n = 0. # create_katz_model \@array of streams of POS tags # $K K constant in katz-smoothing # # Creates a 1-. 3-. } } sub get_random_hash_element { my ($probabilitysum. my $rndnumber = rand($probabilitysum). shift @_. @EXPORT). while (my ($entity. $place++ ) { foreach (@{$counts{$topcounts[$place]}}) { if ((($n == 0) || (++$results <= $n)) && ($topcounts[$place] > 0)) { push @output. $i < @_. our (@ISA. $rndstruct) = @_. shift @_. package ThsFunctions. ++$wordcount{$_} foreach (@_). Code: ThsFunctions. @ISA = qw(Exporter). keys %$rndstruct. $count) = each(%$hash) ) { push(@{$counts{$count}}. } } keys %$rndstruct. } for (my ($place. 4-. shift @_. sub get_top_n_results { my ($hash. 5-gram model with Katz-smoothing # Returns a reference to a hash containing the counts of all # n-grams with 1 < n <= 5 as well as individual word counts sub create_katz_model { my ($TagStreams. ($place <= $#topcounts) && (($n == 0) || ($results < $n)).Chapter 7 – Appendix 7. if ($rndnumber <= 0) { last. my %wordcount. ($topcounts[$place]. $K) = @_.

$d /= 1 .$katz->{'ic'}->{$n}->{$r + 1}.gc="."\n" if (KDBG).' if (KDBG). my ($katz. } my %model. @knownderivations = keys %{$katz->{'gc'}->{$input}}. # result cached? return $katz->{'p'}->{$input}->{$output} if defined($katz->{'p'}->{$input}->{$output}). ($p * $d).$output. $backoff = substr($input."\n" if (KDBG). my ($alphat. $model{'gc'} = \%gramcount. print "observed $n-grams occuring $r+1 times: "." if (KDBG).Chapter 7 – Appendix } } } if ($n <= $i) { ++$gramcount{join ' '.(($K + 1) * $katz->{'ic'}->{$n}->{$K + 1} / $katz->{'ic'}->{$n}->{1}).. print "n=$n. $model{'K'} = $K. $output) = @_. if ($r > 0) { my $p = $r / $katz->{'gc'}->{$input}->{''}. } my %inversecount."\n" if (KDBG). index($input. return $katz->{'p'}->{$input}->{$output} = $p * $d.'. } } foreach (keys %wordcount) { $inversecount{1}->{$wordcount{$_}}++. # simple probability for unigrams return $katz->{'p'}->{$input}->{$output} = $katz->{'wc'}->{$output} / $katz->{'N'} if ($input eq ''). @_[$i-$n. $model{'wc'} = \%wordcount. @_[$i-$n. ' ')+1) if ($n > 2). print "Input: $input Backf: $backoff Output: $output\n" if (KDBG).$i-1]}->{@_[$i]}.$katz->{'ic'}->{$n}->{1}."\n" if (KDBG). ++$gramcount{join ' '. $input.$katz->{'ic'}->{$n}->{$K+1}. my $d = ($rstar / $r). print "ug.$katz->{'gc'}->{$input}->{''}. my $rstar = ($r + 1) * $katz->{'ic'}->{$n}->{$r + 1} / $katz->{'ic'}->{$n}->{$r}. print "\nD-P: $p R*: $rstar D-D: $d D: ". } my $backoff. @knownderivations).$katz->{'wc'}->{$output}. $backoff = '' if ($n <= 2). 64 . foreach my $gc (keys %gramcount) { my $len = 2 + $gc=~tr/ / /. print "observed $n-grams occuring $r times: ". # $model{'p'} = probability cache # $model{'lp'} = log probability cache } return \%model. print "observed $n-grams occuring $K+1 times: ".wc=". # if r > k then return traditional estimate return $katz->{'p'}->{$input}->{$output} = $r / $katz->{'gc'}->{$input}->{''} if ($r > $K). print "r=$r. my $r = $katz->{'gc'}->{$input}->{$output}.' if (KDBG). print "observed $n-grams occuring 1 times: ". $alphab. foreach (keys %{$gramcount{$gc}}) { $inversecount{$len}->{$gramcount{$gc}->{$_}}++ if ($_ ne ''). $d -= ($K + 1) * $katz->{'ic'}->{$n}->{$K + 1} / $katz->{'ic'}->{$n}->{1}. my $n = 2 + $input =~ tr/ / /. $model{'ic'} = \%inversecount.."\n" if (KDBG). my $K = $katz->{'K'}. # katz_probability \%katz reference to the katz model # $input input so far # $output next element in stream # # Returns the probability of '$output' occuring if '$input' was # the last observation sub katz_probability { use constant KDBG => 0.$katz->{'ic'}->{$n}->{$r}.'.$i-1]}->{''}. $model{'N'} = $wordtotal.

$_) if ($_ ne ''). foreach (keys %{$treebank[$n]}) { print "$_ :: ".$i+$depth-1]. $i < @{$treebank[$n]->{$_}}.($j == 0 ? '-> ' : ''). print "PS ($backoff -> $output) = $ps\n" if (KDBG). $_) if ($_ ne '').$alphab) if ($alphab != 1). print " + PR [ $output | $input ] = $katz->{'lp'}->{$input}->{$output}\n". for (my $n = 0. $katz->{'lp'}->{$input}->{$output} = 0. $i++) { my $input = join ' '.Chapter 7 – Appendix foreach (@knownderivations) { print " knownderiv: $input -> $_\n" if (KDBG). $depth. --$depth. $backoff. print "$_ " foreach (@{$treebank[$n]->{$_}}). print "ERR: $input -> $output\n" if ($katz->{'lp'}->{$input}->{$output} == 0). print " $backoff => $_ ($katz->{'gc'}->{$backoff}->{$_}. my $output = @_[$i+$depth]. $ps). $katz->{'lp'}->{$input}->{$output} = log $katz->{'lp'}->{$input}->{$output}.'] '. @_[0.$i-1]. @_[$i. $ps = katz_probability($katz.000000875 if ($katz->{'lp'}->{$input}->{$output} <= 0). $n < @treebank. for (my $i = 1. $alphat += katz_probability($katz. $sentence."\n". $katz->{'lp'}->{$input}->{$output} = log $katz->{'lp'}->{$input}->{$output}. $alpha = (1 . print "A = $alpha\n" if (KDBG). my $probability.$depth. # # } # } print "ERR: $input -> $output\n" if ($katz->{'lp'}->{$input}->{$output} == 0). $i <= $depth. $i++) { print " [$i] = ".. $j < @{$treebank[$n]->{$_}->[$i]}.$alphat) / (1 . $output) if ($alpha != 0). # sentence_katz_probability \%hash containing the katz model # $sentence a stream of POS tags # $depth max n in n-gram # # Returns the probability of the given sentence in the given model sub sentence_katz_probability { my ($katz. $alphab += katz_probability($katz. if (($_ eq 'tree') || ($_ eq 'word')) { print "[0] = $treebank[$n]->{$_}->[0]\n". } # # # } } $probability += $katz->{'lp'}->{$input}->{$output}. $alphab)\n" if ($_ ne '') && KDBG. $katz->{'lp'}->{$input}->{$output} = 0. for (my $i = 1. $i++) { my $input = join ' '. print " + PR [ $output | $input ] = $katz->{'lp'}->{$input}->{$output}\n". $j++) { print '['. $alphat)\n" if ($_ ne '') && KDBG. $input. } else { print '[ '. $input. my $output = @_[$i]. if (!defined $katz->{'lp'}->{$input}->{$output}) { $katz->{'lp'}->{$input}->{$output} = katz_probability($katz. } } elsif (($_ eq 'pos') || ($_ eq 'post') || ($_ eq 'words') || ($_ eq 'orig')) { print $treebank[$n]->{$_}. $depth = @_-1 if ($depth >= @_). $output). print "ERR!:$input -> $output\n" if ($katz->{'lp'}->{$input}->{$output} < 0). split ' '. 65 . return $probability. $input.$treebank[$n]->{$_}->[$i]->[$j]. for (my $i = 1. } return $katz->{'p'}->{$input}->{$output} = $alpha * $ps. } print "\n".000000875 if ($katz->{'lp'}->{$input}->{$output} <= 0). print "ERR!:$input -> $output\n" if ($katz->{'lp'}->{$input}->{$output} < 0). $output). $sentence.. print " $input -> $_ ($katz->{'gc'}->{$input}->{$_}. sub print_treebank { my @treebank = @{@_[0]}. $probability += $katz->{'lp'}->{$input}->{$output}. } my ($alpha. $k) = @_. $n++) { print "Sentence $n:\n". $backoff. if (!defined $katz->{'lp'}->{$input}->{$output}) { $katz->{'lp'}->{$input}->{$output} = katz_probability($katz. $i < @_ . for (my $j = 0.

} 66 .Chapter 7 – Appendix } } print "]\n". } print "\n".

","doc_promotions_enabled":false,"static_promo_banner_cta_url":"https://www.scribd.com/"},"eligible_for_exclusive_trial_roadblock":false,"eligible_for_seo_roadblock":false,"exclusive_free_trial_roadblock_props_path":"/doc-page/exclusive-free-trial-props/312724025","flashes":[],"footer_props":{"urls":{"about":"/about","press":"/press","blog":"http://literally.scribd.com/","careers":"/careers","contact":"/contact","plans_landing":"/subscribe","referrals":"/referrals?source=footer","giftcards":"/giftcards","faq":"/faq","accessibility":"/accessibility-policy","faq_paths":{"accounts":"https://support.scribd.com/hc/sections/202246346","announcements":"https://support.scribd.com/hc/sections/202246066","copyright":"https://support.scribd.com/hc/sections/202246086","downloading":"https://support.scribd.com/hc/articles/210135046","publishing":"https://support.scribd.com/hc/sections/202246366","reading":"https://support.scribd.com/hc/sections/202246406","selling":"https://support.scribd.com/hc/sections/202246326","store":"https://support.scribd.com/hc/sections/202246306","status":"https://support.scribd.com/hc/en-us/articles/360001202872","terms":"https://support.scribd.com/hc/sections/202246126","writing":"https://support.scribd.com/hc/sections/202246366","adchoices":"https://support.scribd.com/hc/articles/210129366","paid_features":"https://support.scribd.com/hc/sections/202246306","failed_uploads":"https://support.scribd.com/hc/en-us/articles/210134586-Troubleshooting-uploads-and-conversions","copyright_infringement":"https://support.scribd.com/hc/en-us/articles/210128946-DMCA-copyright-infringement-takedown-notification-policy","end_user_license":"https://support.scribd.com/hc/en-us/articles/210129486","terms_of_use":"https://support.scribd.com/hc/en-us/articles/210129326-General-Terms-of-Use"},"publishers":"/publishers","static_terms":"/terms","static_privacy":"/privacy","copyright":"/copyright","ios_app":"https://itunes.apple.com/us/app/scribd-worlds-largest-online/id542557212?mt=8&uo=4&at=11lGEE","android_app":"https://play.google.com/store/apps/details?id=com.scribd.app.reader0&hl=en","books":"/books","sitemap":"/directory"}},"global_nav_props":{"header_props":{"logo_src":"/images/landing/home2_landing/scribd_logo_horiz_small.svg","root_url":"https://www.scribd.com/","search_term":"","small_logo_src":"/images/logos/scribd_s_logo.png","uploads_url":"/upload-document","search_props":{"redirect_to_app":true,"search_url":"/search","query":"","search_page":false}},"user_menu_props":null,"sidebar_props":{"urls":{"bestsellers":"https://www.scribd.com/bestsellers","home":"https://www.scribd.com/","saved":"/saved","subscribe":"/archive/pmp_checkout?doc=312724025&metadata=%7B%22context%22%3A%22pmp%22%2C%22action%22%3A%22start_trial%22%2C%22logged_in%22%3Afalse%2C%22platform%22%3A%22web%22%7D","top_charts":"/bestsellers","upload":"https://www.scribd.com/upload-document"},"categories":{"book":{"icon":"icon-ic_book","icon_filled":"icon-ic_book_fill","url":"https://www.scribd.com/books","name":"Books","type":"book"},"news":{"icon":"icon-ic_articles","icon_filled":"icon-ic_articles_fill","url":"https://www.scribd.com/news","name":"News","type":"news"},"audiobook":{"icon":"icon-ic_audiobook","icon_filled":"icon-ic_audiobook_fill","url":"https://www.scribd.com/audiobooks","name":"Audiobooks","type":"audiobook"},"magazine":{"icon":"icon-ic_magazine","icon_filled":"icon-ic_magazine_fill","url":"https://www.scribd.com/magazines","name":"Magazines","type":"magazine"},"document":{"icon":"icon-ic_document","icon_filled":"icon-ic_document_fill","url":"https://www.scribd.com/docs","name":"Documents","type":"document"},"sheet_music":{"icon":"icon-ic_songbook","icon_filled":"icon-ic_songbook_fill","url":"https://www.scribd.com/sheetmusic","name":"Sheet Music","type":"sheet_music"},"summary":{"icon":"icon-ic_globalnav_snapshot","icon_filled":"icon-ic_globalnav_snapshot_fill","url":"https://www.scribd.com/snapshots","name":"Snapshots","type":"summary"}},"nav_categories":["mixed","book","audiobook","magazine","document","sheet_music"],"selected_content_type":"mixed","username":"","search_overlay_props":{"search_input_props":{"focused":false,"keep_suggestions_on_blur":false}}}},"recommenders":{"related_titles_recommender":{"ids":[377994927,339320984,135698537,152196591,76597455,61340580,48399268,154956254,148087233,61637401,190892790,352430631,187746256,230356910,372973803,199369858,266015392,290574426,349321157,67068553,229749235,173745136,353421486,177593029,338018021,77539463,205606023,43880007,232982475,312855696,312855578,312855550,312855733,312855483,312550606,312549443,312550702,312550503,312724048],"title_link":null,"title":null,"track_opts":{"compilation_id":"QHdQnWZ7/7DK17lFG+lCaeDN1J8=","module_id":"YeKSnQjzM8uk5Pb/kdA2EF5kVb4=","widget_name":"right sidebar","track_id":"flattened_recommender"}},"footer_recommenders":{"recommenders":[{"ids":[377994927,339320984,135698537,152196591,76597455,61340580,48399268,154956254,148087233,61637401,190892790,352430631,187746256,230356910,372973803,199369858,266015392,290574426,349321157,67068553,229749235,173745136,353421486,177593029,338018021,77539463,205606023,43880007,232982475],"title_link":null,"title":"Documents Similar To 2008-MSc-Thesis.pdf","track_opts":{"compilation_id":"QHdQnWZ7/7DK17lFG+lCaeDN1J8=","module_id":"CgCvsPoaxiehqQzI9MyqdV0vaUQ=","widget_name":"document_carousel"}},{"ids":[312855696,312855578,312855550,312855733,312855483,312550606,312549443,312550702,312550503,312724048],"title_link":null,"title":"More From krishan","track_opts":{"compilation_id":"QHdQnWZ7/7DK17lFG+lCaeDN1J8=","module_id":"hOCTso6ZNrZ7nC8hf5gPNPoFQVg=","widget_name":"document_carousel"}}]},"seo_new_docs_recommenders":{"recommenders":[]},"documents":{"43880007":{"type":"document","id":43880007,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/43880007/149x198/f5e59bae0d/1393528080?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/43880007/298x396/ede50fe06d/1393528080?v=1","title":"Phrasal Verbs List","short_title":"Phrasal Verbs List","author":"guenpradas","tracking":{"object_type":"document","object_id":43880007,"track":"flattened_recommender","doc_uuid":"hzIBrDPiVP1ST9hrUQumO/o5518="},"url":"https://www.scribd.com/document/43880007/Phrasal-Verbs-List","top_badge":null},"48399268":{"type":"document","id":48399268,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/48399268/149x198/6df58cde2d/1335767605?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/48399268/298x396/4e7d8a341c/1335767605?v=1","title":"botttom up parsing","short_title":"botttom up parsing","author":"vishnugehlot","tracking":{"object_type":"document","object_id":48399268,"track":"flattened_recommender","doc_uuid":"3xPAv+olmU88uYUdsJGqoZAFla4="},"url":"https://www.scribd.com/presentation/48399268/botttom-up-parsing","top_badge":null},"61340580":{"type":"document","id":61340580,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/61340580/149x198/3e24986eee/1388825947?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/61340580/298x396/34fa530f91/1388825947?v=1","title":"Articulo12Co","short_title":"Articulo12Co","author":"Lizita Pinto Herbas","tracking":{"object_type":"document","object_id":61340580,"track":"flattened_recommender","doc_uuid":"yagotgEnmhA97Y+GRv7qQcV44+0="},"url":"https://www.scribd.com/document/61340580/Articulo12Co","top_badge":null},"61637401":{"type":"document","id":61637401,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/61637401/149x198/ec5994ce0a/1312491182?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/61637401/298x396/199fd734b1/1312491182?v=1","title":"Lecture I","short_title":"Lecture I","author":"Faruk Iqbal","tracking":{"object_type":"document","object_id":61637401,"track":"flattened_recommender","doc_uuid":"eHqtYf5xF8isjLfROOX+ukwf0qk="},"url":"https://www.scribd.com/document/61637401/Lecture-I","top_badge":null},"67068553":{"type":"document","id":67068553,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/67068553/149x198/8beb796bad/1354586205?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/67068553/298x396/f19900802d/1354586205?v=1","title":"angl gramatika","short_title":"angl gramatika","author":"Antonia Dobreva","tracking":{"object_type":"document","object_id":67068553,"track":"flattened_recommender","doc_uuid":"2XMJVB+/aepsBEg1L8ALIIoauNM="},"url":"https://www.scribd.com/document/67068553/angl-gramatika","top_badge":null},"76597455":{"type":"document","id":76597455,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/76597455/149x198/22279d259f/1373390089?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/76597455/298x396/bfc2068a5a/1373390089?v=1","title":"OS_FINAL","short_title":"OS_FINAL","author":"tejpal goda","tracking":{"object_type":"document","object_id":76597455,"track":"flattened_recommender","doc_uuid":"HRX2BzBFPwFNg5aXXlbKE76kIKQ="},"url":"https://www.scribd.com/doc/76597455/OS-FINAL","top_badge":null},"77539463":{"type":"document","id":77539463,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/77539463/149x198/c396963bb4/1326028345?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/77539463/298x396/90064bda2b/1326028345?v=1","title":"Comma Rules","short_title":"Comma Rules","author":"Clifford Guineden Bosaing","tracking":{"object_type":"document","object_id":77539463,"track":"flattened_recommender","doc_uuid":"W+exKteT0aavE++Rhxp0IIE0bDI="},"url":"https://www.scribd.com/document/77539463/Comma-Rules","top_badge":null},"135698537":{"type":"document","id":135698537,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/135698537/149x198/4fee59b13f/1437289282?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/135698537/298x396/cc61768e61/1437289282?v=1","title":"Chapter 4 \r\nPDA","short_title":"Chapter 4 \r\nPDA","author":"Kalaivani Manoharan","tracking":{"object_type":"document","object_id":135698537,"track":"flattened_recommender","doc_uuid":"0ob9O+IGcIy3H9HsB0zpWRcBdts="},"url":"https://www.scribd.com/document/135698537/Chapter-4-PDA","top_badge":null},"148087233":{"type":"document","id":148087233,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/148087233/149x198/4b01644eba/1398442938?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/148087233/298x396/048dbd2b19/1398442938?v=1","title":"Upper Intermediate Indice","short_title":"Upper Intermediate Indice","author":"Daniela Andrea Ruggiero","tracking":{"object_type":"document","object_id":148087233,"track":"flattened_recommender","doc_uuid":"132zRTLf/xPganKC8eTzjdseI44="},"url":"https://www.scribd.com/document/148087233/Upper-Intermediate-Indice","top_badge":null},"152196591":{"type":"document","id":152196591,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/152196591/149x198/51fff67159/1391777739?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/152196591/298x396/71d22d4555/1391777739?v=1","title":"Theory of Computation","short_title":"Theory of Computation","author":"Shanmugasundaram Muthuswamy","tracking":{"object_type":"document","object_id":152196591,"track":"flattened_recommender","doc_uuid":"3zbolLUWh59izWvcu9LbQjCa3r8="},"url":"https://www.scribd.com/document/152196591/Theory-of-Computation","top_badge":null},"154956254":{"type":"document","id":154956254,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/154956254/149x198/d9ebca8156/1445921611?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/154956254/298x396/0bd9cc6ec8/1445921611?v=1","title":"Formal Languages and Automata, Grammars and Parsing Using Haskell (1)","short_title":"Formal Languages and Automata, Grammars and Parsing Using Haskell (1)","author":"cranckcracker123","tracking":{"object_type":"document","object_id":154956254,"track":"flattened_recommender","doc_uuid":"vVxfblsl3S0GSc/BUQuXCyZy3hg="},"url":"https://www.scribd.com/document/154956254/Formal-Languages-and-Automata-Grammars-and-Parsing-Using-Haskell-1","top_badge":null},"173745136":{"type":"document","id":173745136,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/173745136/149x198/6f801df803/1381015043?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/173745136/298x396/014da91fb6/1381015043?v=1","title":"Chapter 05 (Syntax)","short_title":"Chapter 05 (Syntax)","author":"Barbara Jun","tracking":{"object_type":"document","object_id":173745136,"track":"flattened_recommender","doc_uuid":"xBJxmAXdsy4O3x3C2QpMw5fIvf8="},"url":"https://www.scribd.com/document/173745136/Chapter-05-Syntax","top_badge":null},"177593029":{"type":"document","id":177593029,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/177593029/149x198/e91e899cae/1382300999?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/177593029/298x396/83c1fdfb96/1382300999?v=1","title":"Tips on Avoiding Common Writing Errors","short_title":"Tips on Avoiding Common Writing Errors","author":"franc_estors","tracking":{"object_type":"document","object_id":177593029,"track":"flattened_recommender","doc_uuid":"P9PvLt73Dk+OQO+L3vzq582CJig="},"url":"https://www.scribd.com/document/177593029/Tips-on-Avoiding-Common-Writing-Errors","top_badge":null},"187746256":{"type":"document","id":187746256,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/187746256/149x198/8152f5b19b/1441196021?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/187746256/298x396/8432356c30/1441196021?v=1","title":"Compiler Design","short_title":"Compiler Design","author":"Mohd Shahid","tracking":{"object_type":"document","object_id":187746256,"track":"flattened_recommender","doc_uuid":"RJ+UBloaiaZuwKd5tP4XvOaWg2k="},"url":"https://www.scribd.com/document/187746256/Compiler-Design","top_badge":null},"190892790":{"type":"document","id":190892790,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/190892790/149x198/96e2fd4a9f/1386768510?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/190892790/298x396/281c4f3e0e/1386768510?v=1","title":"Pcd Qb Final","short_title":"Pcd Qb Final","author":"palanisamyam123","tracking":{"object_type":"document","object_id":190892790,"track":"flattened_recommender","doc_uuid":"asMcpIukiRuAqZ5PFaQma1mxhjk="},"url":"https://www.scribd.com/document/190892790/Pcd-Qb-Final","top_badge":null},"199369858":{"type":"document","id":199369858,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/199369858/149x198/4cf9c373f8/1389635323?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/199369858/298x396/d3897b615c/1389635323?v=1","title":"Chapter 2","short_title":"Chapter 2","author":"Jayanth Samavedam","tracking":{"object_type":"document","object_id":199369858,"track":"flattened_recommender","doc_uuid":"D3g/0GnDBCAsDrkEtynVtsXCxKg="},"url":"https://www.scribd.com/document/199369858/Chapter-2","top_badge":null},"205606023":{"type":"document","id":205606023,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/205606023/149x198/eaa387c7a4/1538659688?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/205606023/298x396/ff15d14748/1538659688?v=1","title":"Ready for First Teacher's Book Samples","short_title":"Ready for First Teacher's Book Samples","author":"abgcordoba2013","tracking":{"object_type":"document","object_id":205606023,"track":"flattened_recommender","doc_uuid":"Ic3lQpOwovEq3+ksl35iYColHgo="},"url":"https://www.scribd.com/doc/205606023/Ready-for-First-Teacher-s-Book-Samples","top_badge":null},"229749235":{"type":"document","id":229749235,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/229749235/149x198/8d797b46f8/1402844140?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/229749235/298x396/64327f0863/1402844140?v=1","title":"Vocabulary Activities","short_title":"Vocabulary Activities","author":"green_panther_eyes","tracking":{"object_type":"document","object_id":229749235,"track":"flattened_recommender","doc_uuid":"zecc1y4RDXVs+mozNrpZIq8+g1I="},"url":"https://www.scribd.com/document/229749235/Vocabulary-Activities","top_badge":null},"230356910":{"type":"document","id":230356910,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/230356910/149x198/c44b43504e/1403145520?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/230356910/298x396/f53ed9497f/1403145520?v=1","title":"Cafetiere Report","short_title":"Cafetiere Report","author":"marlonumit","tracking":{"object_type":"document","object_id":230356910,"track":"flattened_recommender","doc_uuid":"xEmTMwmrIO4/Q5K9KeIzuBiq1ek="},"url":"https://www.scribd.com/document/230356910/Cafetiere-Report","top_badge":null},"232982475":{"type":"document","id":232982475,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/232982475/149x198/02985295d9/1404784168?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/232982475/298x396/0ea9789211/1404784168?v=1","title":"Items Perc22013","short_title":"Items Perc22013","author":"Lucille Edward","tracking":{"object_type":"document","object_id":232982475,"track":"flattened_recommender","doc_uuid":"W8gbFWy24PGQKo1mTGPHw1eGSMI="},"url":"https://www.scribd.com/document/232982475/Items-Perc22013","top_badge":null},"266015392":{"type":"document","id":266015392,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/266015392/149x198/73d701a3d2/1432139480?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/266015392/298x396/d91e6892c5/1432139480?v=1","title":"English Essay","short_title":"English Essay","author":"Emily H.","tracking":{"object_type":"document","object_id":266015392,"track":"flattened_recommender","doc_uuid":"glEfOOBRKoDH3mbXlkwrHlnuPU8="},"url":"https://www.scribd.com/document/266015392/English-Essay","top_badge":null},"290574426":{"type":"document","id":290574426,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/290574426/149x198/cbcafa22ca/1500751997?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/290574426/298x396/679d73b63d/1500751997?v=1","title":"English Phrasal Verbs in Arabic","short_title":"English Phrasal Verbs in Arabic","author":"Mohammed Ali","tracking":{"object_type":"document","object_id":290574426,"track":"flattened_recommender","doc_uuid":"1Gbz8Geel7sc375CrTsNCk4Y5sE="},"url":"https://www.scribd.com/document/290574426/English-Phrasal-Verbs-in-Arabic","top_badge":null},"312549443":{"type":"document","id":312549443,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/312549443/149x198/548e93617c/1463202544?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312549443/298x396/a287b898a2/1463202544?v=1","title":"5 Steps to a 5 History and Approaches","short_title":"5 Steps to a 5 History and Approaches","author":"krishan","tracking":{"object_type":"document","object_id":312549443,"track":"flattened_recommender","doc_uuid":"LoBjmXyWn5tZsqz8CiAW9YSTMWQ="},"url":"https://www.scribd.com/doc/312549443/5-Steps-to-a-5-History-and-Approaches","top_badge":null},"312550503":{"type":"document","id":312550503,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312550503/149x198/729153a4b5/1463203929?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312550503/298x396/417148d0d4/1463203929?v=1","title":"01. GONDRA","short_title":"01. GONDRA","author":"krishan","tracking":{"object_type":"document","object_id":312550503,"track":"flattened_recommender","doc_uuid":"uu7ssf9jnHd3PIE13AENVjvhtJ0="},"url":"https://www.scribd.com/doc/312550503/01-GONDRA","top_badge":null},"312550606":{"type":"document","id":312550606,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/312550606/149x198/c83bdeef51/1463204154?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/312550606/298x396/5bfa7ddfed/1463204154?v=1","title":"1.b Notes Approaches","short_title":"1.b Notes Approaches","author":"krishan","tracking":{"object_type":"document","object_id":312550606,"track":"flattened_recommender","doc_uuid":"trE0/JVW286hHp68g/RE6RNpkWk="},"url":"https://www.scribd.com/presentation/312550606/1-b-Notes-Approaches","top_badge":null},"312550702":{"type":"document","id":312550702,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312550702/149x198/6bf2719f08/1463204378?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/312550702/298x396/ddc447cfdb/1463204378?v=1","title":"09 Chapter 2","short_title":"09 Chapter 2","author":"krishan","tracking":{"object_type":"document","object_id":312550702,"track":"flattened_recommender","doc_uuid":"GykEnOBf05XLqK2nVez6Xe4fYjk="},"url":"https://www.scribd.com/document/312550702/09-Chapter-2","top_badge":null},"312724048":{"type":"document","id":312724048,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/312724048/149x198/41a1fb4323/1463375623?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312724048/298x396/30e1377d5b/1463375623?v=1","title":"60 Nielsen","short_title":"60 Nielsen","author":"krishan","tracking":{"object_type":"document","object_id":312724048,"track":"flattened_recommender","doc_uuid":"nWa+e0XAslbZaprOmwWuHO3mX14="},"url":"https://www.scribd.com/doc/312724048/60-Nielsen","top_badge":null},"312855483":{"type":"document","id":312855483,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312855483/149x198/4d4f67fe56/1463463127?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312855483/298x396/c8d27c2dc0/1463463127?v=1","title":"ES-363a","short_title":"ES-363a","author":"krishan","tracking":{"object_type":"document","object_id":312855483,"track":"flattened_recommender","doc_uuid":"x+JhcOITWRgdA/I+Q5H777nzBCg="},"url":"https://www.scribd.com/document/312855483/ES-363a","top_badge":null},"312855550":{"type":"document","id":312855550,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312855550/149x198/a85a710ad4/1463463203?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312855550/298x396/4bd1dc7431/1463463203?v=1","title":"Dialnet-TheRoleOfTheGramamarTeaching-4778849.pdf","short_title":"Dialnet-TheRoleOfTheGramamarTeaching-4778849.pdf","author":"krishan","tracking":{"object_type":"document","object_id":312855550,"track":"flattened_recommender","doc_uuid":"P1RAGlphNicWW4zUvQRdj6AMnKo="},"url":"https://www.scribd.com/doc/312855550/Dialnet-TheRoleOfTheGramamarTeaching-4778849-pdf","top_badge":null},"312855578":{"type":"document","id":312855578,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312855578/149x198/f75ef40703/1463463236?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312855578/298x396/9a668ab509/1463463236?v=1","title":"Grad Infant Development Fall 2005 Lecture 3","short_title":"Grad Infant Development Fall 2005 Lecture 3","author":"krishan","tracking":{"object_type":"document","object_id":312855578,"track":"flattened_recommender","doc_uuid":"p1Zna3/7iKhhgqMj6z6fDn+Z2XM="},"url":"https://www.scribd.com/document/312855578/Grad-Infant-Development-Fall-2005-Lecture-3","top_badge":null},"312855696":{"type":"document","id":312855696,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/312855696/149x198/34bb6942ab/1463463365?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312855696/298x396/b499211db5/1463463365?v=1","title":"2008 a Case Study in Grammar Engineering","short_title":"2008 a Case Study in Grammar Engineering","author":"krishan","tracking":{"object_type":"document","object_id":312855696,"track":"flattened_recommender","doc_uuid":"hcAhOWRZwULvQq4ivyCudWPGL+g="},"url":"https://www.scribd.com/document/312855696/2008-a-Case-Study-in-Grammar-Engineering","top_badge":null},"312855733":{"type":"document","id":312855733,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312855733/149x198/c92b105fee/1463463375?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/312855733/298x396/8edff628d6/1463463375?v=1","title":"major steps.pdf","short_title":"major steps.pdf","author":"krishan","tracking":{"object_type":"document","object_id":312855733,"track":"flattened_recommender","doc_uuid":"xwXhyZZrCJNz1M7S8EmWUq2OwkQ="},"url":"https://www.scribd.com/document/312855733/major-steps-pdf","top_badge":null},"338018021":{"type":"document","id":338018021,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/338018021/149x198/7d59da8ae5/1485876152?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/338018021/298x396/282f4e925a/1485876152?v=1","title":"15 LAKATOS-Andrei Eugen Recovering the Memory 14.09.15","short_title":"15 LAKATOS-Andrei Eugen Recovering the Memory 14.09.15","author":"Tudor Gheorghe","tracking":{"object_type":"document","object_id":338018021,"track":"flattened_recommender","doc_uuid":"Ru3E/r59GRcJyuUQtDG3Xg0AKhs="},"url":"https://www.scribd.com/document/338018021/15-LAKATOS-Andrei-Eugen-Recovering-the-Memory-14-09-15","top_badge":null},"339320984":{"type":"document","id":339320984,"thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/339320984/149x198/d6da03a698/1487095259?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/339320984/298x396/44401f0abc/1487095259?v=1","title":"ICAS2017 SamplePaper en A","short_title":"ICAS2017 SamplePaper en A","author":"ujjwal saha","tracking":{"object_type":"document","object_id":339320984,"track":"flattened_recommender","doc_uuid":"VOvqjLD7KKzIMi6jtcdqS6Bc3sQ="},"url":"https://www.scribd.com/document/339320984/ICAS2017-SamplePaper-en-A","top_badge":null},"349321157":{"type":"document","id":349321157,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/349321157/149x198/68f56de2eb/1495642101?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/349321157/298x396/29e6c5245b/1495642101?v=1","title":"Chapter II Wrong","short_title":"Chapter II Wrong","author":"citra indah syaputri","tracking":{"object_type":"document","object_id":349321157,"track":"flattened_recommender","doc_uuid":"asW6EqvYbjIzHzC4Sh0gC3VnH9g="},"url":"https://www.scribd.com/document/349321157/Chapter-II-Wrong","top_badge":null},"352430631":{"type":"document","id":352430631,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/352430631/149x198/82c4b00370/1498634602?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/352430631/298x396/34eb887c2f/1498634602?v=1","title":"Phrasal Verbs With","short_title":"Phrasal Verbs With","author":"Ruben Dario Barros Gonzalez","tracking":{"object_type":"document","object_id":352430631,"track":"flattened_recommender","doc_uuid":"H/PJTLVFd1wbL8nrFn3/UKleXxw="},"url":"https://www.scribd.com/document/352430631/Phrasal-Verbs-With","top_badge":null},"353421486":{"type":"document","id":353421486,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/353421486/149x198/a6b0b13c51/1499716078?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/353421486/298x396/a25b8fe8e6/1499716078?v=1","title":"nn.4186","short_title":"nn.4186","author":"Elí Lezama","tracking":{"object_type":"document","object_id":353421486,"track":"flattened_recommender","doc_uuid":"vfUjTi/YmZuthbQzRvZ1pjj93zI="},"url":"https://www.scribd.com/document/353421486/nn-4186","top_badge":null},"372973803":{"type":"document","id":372973803,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/372973803/149x198/7937e133a5/1520220822?v=1","retina_thumb_url":"https://imgv2-1-f.scribdassets.com/img/document/372973803/298x396/e9421739c3/1520220822?v=1","title":"SyntaticAnalysis Sample","short_title":"SyntaticAnalysis Sample","author":"FidaHussain","tracking":{"object_type":"document","object_id":372973803,"track":"flattened_recommender","doc_uuid":"a4xy6nozyJUqP/ymRoyWgHt50D0="},"url":"https://www.scribd.com/document/372973803/SyntaticAnalysis-Sample","top_badge":null},"377994927":{"type":"document","id":377994927,"thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/377994927/149x198/193b6ede50/1525270161?v=1","retina_thumb_url":"https://imgv2-2-f.scribdassets.com/img/document/377994927/298x396/17fa385e9f/1525270161?v=1","title":"02-scanning-handout.pdf","short_title":"02-scanning-handout.pdf","author":"mamudu francis","tracking":{"object_type":"document","object_id":377994927,"track":"flattened_recommender","doc_uuid":"VIu13PsDE8oRYnRTHccyGYpEkPo="},"url":"https://www.scribd.com/document/377994927/02-scanning-handout-pdf","top_badge":null}}},"seo_roadblock_props_path":"/doc-page/seo-roadblock-props/312724025","signup_context":null,"toolbar":{"search_path":"/search-4gen?allowed_pages=1%2C2%2C3%2C24%2C25%2C26&auth_token=AIwpT6A%2B71jL1eST3jVUrxPo%2Bls%3D&authenticity_token=M6m8u7klluve8ro3OVeFeZ8xGQDro8I5eD56lkW71KANVkVcrK3E%2FpizW3gSepXoPVyoUgR3DqscY%2FsC6XTnRg%3D%3D&expires=1540445043&wordDocumentId=312724025&wordUploadId=318098619"},"renewal_nag_props":null}-->