You are on page 1of 40

CHART PARSING- EARLEY

ALGORITHM
&
STATISTICAL PARSING

Dr. Sukhnandan Kaur


CSED, TIET
THE EARLEY ALGORITHM-INTRODUCTION
 The Earley algorithm is a dynamic programming parsing
algorithm that do not require the underlying grammar to be in
Chomsky Normal Form.
 In contrast to the bottom-up search implemented by the CYK
algorithm, the Earley algorithm (Earley, 1970) uses dynamic
programming to implement a top-down search.
 The core of the Earley algorithm is a single left-to-right pass
that fills an array called as chart.
 The chart contains a list of states representing the partial/full
parse trees that have been generated so far.
THE EARLEY ALGORITHM- STATES
 The individual states contained within each chart entry contain
three kinds of information:
 a subtree corresponding to a single grammar rule,
 information about the progress made in completing this subtree,
 the position of the subtree with respect to the input.

 A • within the right-hand side of a state’s grammar rule to


indicate the progress made in recognizing it. The resulting
structure is called a dotted rule.
 A state’s position with respect to the input will be represented
by two numbers indicating where the state begins and where its
dot lies.
THE EARLEY ALGORITHM- STATES
 S → • VP, [0, 0] 
 NP → Det • Nominal, [1, 2]

 VP → V NP •, [0, 3]

 The first state, with its dot to the left of its constituent, represents a
top-down prediction for this particular kind of S. The first 0
indicates that the constituent predicted by this state should begin at
the start of the input; the second 0 reflects the fact that the dot lies
at the beginning as well.
 The second state, created at a later stage in the processing of this
sentence, indicates that an NP begins at position 1, that a Det has
been successfully parsed and that a Nominal is expected next.
 The third state, with its dot to the right of all its two constituents,
represents the successful discovery of a tree corresponding to a VP
that spans the entire input
THE EARLEY ALGORITHM - OPERATION
 The fundamental operation of an Earley parser is to march through the sets of
states in the chart in a left-to-right fashion, processing the states within each set
in order.
 At each step, one of the three operators PREDICTOR, SCANNER, and
COMPLETER is applied to each state depending on its status.
 In each case, this results in the addition of new states to the end of either the
current, or next, set of states in the chart.
 The new states are then added to the chart as long as they are not already
present.
 The PREDICTOR and the COMPLETER add states to the chart entry being
processed, while the SCANNER adds a state to the next chart entry.
 The algorithm always moves forward through the chart making additions as it
goes; states are never removed and the algorithm never backtracks to a
previous chart entry once it has moved on.
 The presence of a state S → α•, [0, N] in the list of states in the last chart entry
indicates a successful parse.
THE EARLEY ALGORITHM – PREDICTOR OPERATION

 PREDICTOR is applied to any state that has a non-terminal immediately to the


right of its dot that is not a part-of-speech category.
 This application results in the creation of one new state for each alternative
expansion of that non-terminal provided by the grammar.
 These new states are placed into the same chart entry as the generating state.
 They begin and end at the point in the input where the generating state ends.
 For example, applying PREDICTOR to the state S → • VP, [0, 0] results in the
addition of the following five states
 VP → • Verb, [0, 0]
 VP → • Verb NP, [0, 0]
 VP → • Verb NP PP, [0, 0]
 VP → • Verb PP, [0, 0]
 VP → • VP PP, [0, 0]
THE EARLEY ALGORITHM – SCANNER OPERATION
 When a state has a part-of-speech category to the right of the dot,
SCANNER is called to examine the input and incorporate a state
corresponding to the prediction of a word with a particular part-of-
speech into the chart.
 This is accomplished by creating a new state from the input state
with the dot advanced over the predicted input category.
 The new state is then added to the chart entry that follows the one
currently being processed.
 For Example, , when the state VP → • Verb NP, [0, 0] is processed,
SCANNER consults the current word in the input since the category
following the dot is a part-of-speech.
 It then notes that book can be a verb, matching the expectation in the
current state. This results in the creation of the new state Verb →
book •, [0, 1].
THE EARLEY ALGORITHM – COMPLETER OPERATION
 COMPLETER is applied to a state when its dot has reached the right end of
the rule.
 The presence of such a state represents the fact that the parser has
successfully discovered a particular grammatical category over some span
of the input.
 The purpose of COMPLETER is to find, and advance, all previously
created states that were looking for this grammatical category at this
position in the input.
 New states are then created by copying the older state, advancing the dot
over the expected category, and installing the new state in the current chart
entry.
 When the state NP → Det Nominal•, [1, 3] is processed, COMPLETER
looks for incomplete states ending at position 1 and expecting an NP.
 It finds the states VP → Verb•NP, [0, 1] and VP → Verb•NP PP, [0, 1]. This
results in the addition of the new complete state VP → Verb NP•, [0, 3], and
the new incomplete state VP → Verb NP•PP, [0, 3] to the chart.
  
THE EARLEY ALGORITHM- EXAMPLE
 Using the Earley Algorithm technique, parse the sentence “Book
that flight” . Consider the following phrase structure grammar

S → NP VP Det → that | this | a


S → VP Noun → book | flight | meal | money
S → Aux NP VP Verb → book | include | prefer
NP → Pronoun Pronoun → I | she | me
NP → Proper-Noun Proper-Noun → Houston | TWA
NP → Det Nominal Aux → does
Nominal → Noun  
Nominal → Nominal Noun
Nominal → Nominal PP  
VP → Verb Preposition → from | to | on | near | through
VP → Verb NP
VP → Verb NP PP
VP → Verb PP
VP → VP PP
PP → Preposition NP
THE EARLEY ALGORITHM- EXAMPLE
Chart 0
No. State Begin End Operation
S0 S 0 0 Dummy State
S1 S →  NP VP 0 0 Predictor from S0
S2 S →  VP 0 0 Predictor from S0
S3 S →  Aux NP VP 0 0 Predictor from S0
S4 NP →  Pronoun 0 0 Predictor from S1
S5 NP →  Proper-Noun 0 0 Predictor from S1
S6 NP →  Det Nominal 0 0 Predictor from S1
S7 VP →  Verb 0 0 Predictor from S2
S8 VP →  Verb NP 0 0 Predictor from S2
S9 VP →  Verb NP PP 0 0 Predictor from S2
S10 VP →  Verb PP 0 0 Predictor from S2
S11 VP →  VP PP 0 0 Predictor from S2
THE EARLEY ALGORITHM- EXAMPLE
Chart 1
No. State Begin End Operation
S12 Verb → book  0 1 Scanner from S7
S13 VP → Verb  0 1 Completer from S12
S14 VP → Verb  NP 0 1 Completer from S12
S15 VP → Verb  NP PP 0 1 Completer from S12
S16 VP → Verb  PP 0 1 Completer from S12
S17 S → VP  0 1 Completer from S13
S18 VP → VP  PP 0 1 Completer from S13
S19 NP →  Pronoun 1 1 Predictor from S14
S20 NP →  Proper-Noun 1 1 Predictor from S14
S21 NP →  Det Nominal 1 1 Predictor from S14
S22 PP →  Preposition NP 1 1 Predictor from S16
THE EARLEY ALGORITHM- EXAMPLE
Chart 2
No. State Begin End Operation
S23 Det → that  1 2 Scanner from S21
S24 NP → Det  Nominal 1 2 Completer from S23
S25 Nominal →  Noun 2 2 Predictor from S24
S26 Nominal →  Nominal Noun 2 2 Predictor from S24
S27 Nominal →  Nominal PP 2 2 Predictor from S24
THE EARLEY ALGORITHM- EXAMPLE
Chart 3
No. State Begin End Operation
S28 Noun → flight  2 3 Scanner from S25
S29 Nominal → Noun  2 3 Completer from S28
S30 NP → Det Nominal  1 3 Completer from S23,S29
S31 Nominal → Nominal  Noun 2 3 Completer from S29
S32 Nominal → Nominal  PP 2 3 Completer from S29
S33 VP → Verb NP  0 3 Completer from S12,S30
S34 VP → Verb NP  PP 0 3 Completer from S12,S30
S35 PP →  Preposition NP 3 3 Predictor from S32
S36 S → VP  0 3 Completer from S33
S37 VP → VP  PP 0 3 Completer from S33
THE EARLEY ALGORITHM- EXAMPLE

S (S36)

VP (S33)

VERB (S12) NP (S30)

NOMINAL
Book DET (S23)
(S29)

NOUN (S28)
that

flight
THE EARLEY ALGORITHM –EXAMPLE 2
 Generate two parse trees for the sentence “Pluck the flower with the stick”
using Earley chart parsing algorithm. Consider the following grammar:
S  NP VP
S  VP
NP  Det Noun
NP  NP PP
NP  Noun
VP  VP NP
VP  Verb
VP  VP PP
PP  Preposition NP
Noun  flower | stick | boy
Preposition  with|in
Det  a| an| the
Verb  pluck | see| eat
THE EARLEY ALGORITHM –EXAMPLE 2
Chart[0]

State State Begin End Operator


Number
S0   S 0 0 Dummy State
S1 S  NP VP 0 0 Predictor from S0
S2 S   VP 0 0 Predictor from S0
S3 NP   Det Noun 0 0 Predictor from S1
S4 NP   NP PP 0 0 Predictor from S1
S5 NP   Noun 0 0 Predictor from S1
S6 VP   VP NP 0 0 Predictor from S2
S7 VP   Verb 0 0 Predictor from S2
S8 VP  VP PP 0 0 Predictor from S2
THE EARLEY ALGORITHM –EXAMPLE 2
Chart[1]

State State Begin End Operator


Number
S9 Verb  pluck  0 1 Scanner from S7
S10 VP  Verb  0 1 Completer from S9
S11 S  VP  0 1 Completer from S10
S12 VP  VP  NP 0 1 Completer from S10
S13 VP  VP  PP 0 1 Completer from S10
S14 NP   Det Noun 1 1 Predictor from S12
S15 NP   NP PP 1 1 Predictor from S12
S16 NP   Noun 1 1 Predictor from S12
S17 PP   Preposition NP 1 1 Predictor from S13
THE EARLEY ALGORITHM –EXAMPLE 2
Chart[2]

State State Begin End Operator


Number
S18 Det  the  1 2 Scanner from S14
S19 NP  Det  Noun 1 2 Completer from S18
THE EARLEY ALGORITHM –EXAMPLE 2
Chart[3]
State State Begin End Operator
Number
S20 Noun  flower  2 3 Scanner from S19
S21 NP  Det Noun  1 3 Completer from S18,S20
S22 VP  VP NP  0 3 Completer from S10,S21
S23 NP  NP  PP 1 3 Completer from S21
S24 S  VP  0 3 Completer from S22
S25 VP  VP  NP 0 3 Completer from S22
S26 VP  VP  PP 0 3 Completer from S22
S27 PP   Preposition NP 3 3 Predictor from S23
S28 NP   Det Noun 3 3 Predictor from S25
S29 NP   NP PP 3 3 Predictor from S25
S30 NP   Noun 3 3 Predictor from S25
THE EARLEY ALGORITHM –EXAMPLE 2
Chart[4]
State State Begin End Operator
Number
S31 Preposition  with  3 4 Scanner from S27
S32 PP  Preposition  NP 3 4 Completer from S31
S33 NP   Det Noun 4 4 Predictor from S32
S34 NP   NP PP 4 4 Predictor from S32
S35 NP   Noun 4 4 Predictor from S32
THE EARLEY ALGORITHM –EXAMPLE 2
Chart[5]
State State Begin End Operator
Number
S36 Det  the  4 5 Scanner from S33
S37 NP  Det  Noun 4 5 Completer from S36
THE EARLEY ALGORITHM –EXAMPLE 2
Chart[6]
State State Begin End Operator
Number
S38 Noun  stick  5 6 Scanner from S37
S39 NP  Det Noun  4 6 Completer from S36, S38
S40 PP  Preposition NP  3 6 Completer from S31,S39
S41 NP  NP  PP 4 6 Completer from S39
S42 NP  NP PP  1 6 Completer from S21,S40
S43 VP  VP PP  0 6 Completer from S22,S40
S44 PP   Preposition NP 6 6 Predictor from S41
S45 VP  VP NP  0 6 Completer from S10,S42
S46 NP  NP  PP 1 6 Completer from S42
S47 S  VP  0 6 Completer from S43
S48 VP  VP  NP 0 6 Completer from S43
S49 VP  VP  PP 0 6 Completer from S43
THE EARLEY ALGORITHM –EXAMPLE 2
Chart[6 Contd]
State State Begin End Operator
Number
S50 S  VP  0 6 Completer from S45
S51 VP  VP  NP 0 6 Completer from S45
S52 VP  VP  PP 0 6 Completer from S45
S50 NP   Det Noun 6 6 Predictor from S46
S51 NP   NP PP 6 6 Predictor from S46
S52 NP   Noun 6 6 Predictor from S46
THE EARLEY ALGORITHM –EXAMPLE 2
Parse 1
S (S47)

VP (S43)

VP (S22) PP (S40)

VP (S10) NP (S21) Prep (S31) NP (S39)

DET (S36) Noun (S38)


Verb (S9) Det (S18) Noun (S20) with

Pluck the flower the stick


THE EARLEY ALGORITHM –EXAMPLE 2
Parse 2
S (S50)

VP (S45)

VP (S10) NP (S42)

Verb (S9)
NP (S21) PP (S40)

Prep (S31) NP (S39)


Pluck Det (S18) Noun (S20)

DET (S36) Noun (S38)


with
the flower
the stick
STATISTICAL PARSING
 The sentences on average tend to be very syntactically
ambiguous, due to problems like coordination ambiguity and
attachment ambiguity.
 The dynamic parsing algorithms could represent these
ambiguities in an efficient way, but are not equipped to resolve
them.
 A probabilistic parser offers a solution to the problem: compute
the probability of each interpretation, and choose the most-
probable interpretation.
 The most commonly used probabilistic grammar is the
probabilistic context-free grammar (PCFG), a probabilistic
augmentation of context-free grammars in which each rule is
associated with a probability.
 
PROBABILISTIC CONTEXT FREE GRAMMAR (PCFG)
 The simplest augmentation of the context-free grammar is the Probabilistic Context-Free
Grammar (PCFG), also known as the Stochastic Context-Free Grammar (SCFG), first
proposed by Booth (1969).
 A probabilistic context-free grammar augments each rule in P with a conditional probability. A
PCFG is thus defined by the following components: 
a set of non-terminal symbols (or variables)
a set of terminal symbols (disjoint from N)
 a set of rules or productions, each of the form A →  [ p], where A is a non-terminal,  is a
string of symbols from the infinite set of strings, and p is a number between 0 and 1 expressing
P( |A)
a designated start symbol
 p is the conditional probability of a given expansion  given the left-hand-side (LHS) non-
terminal A. We can represent this probability as: 
P(A →  ) or as P(RH S|LH S)
 Thus if we consider all the possible expansions of a non-terminal, the sum of their probabilities
must be 1:
∑ P(A →  ) =1

 A PCFG is said to be consistent if the sum of the probabilities of all sentences in the language
equals 1.
PCFGS FOR DISAMBIGUATION
 A PCFG assigns a probability to each parse tree T (i.e., each derivation)
of a sentence S. This attribute is useful in disambiguation.
 The probability of a particular parse T is defined as the product of the
probabilities of all the n rules used to expand each of the n non-terminal
nodes in the parse tree T , (where each rule i can be expressed as LH Si
→ RH Si):
n
P(T) = ∏ P (RHSi  LHSi)
i=1
 For example, example, consider the two parses of the sentence “Book
the dinner flights” shown in (next slide).
 The sensible parse on the left means “Book flights that serve dinner”.

 The nonsensical parse on the right, however, would have to mean


something like “Book flights on be half of `the dinner'?”,
PCFGS FOR DISAMBIGUATION CONTD…
PCFGS FOR DISAMBIGUATION CONTD…
 The probability of each of the trees can be computed by multiplying
together the probabilities of each of the rules used in the derivation.
 For example, the probability of the left tree (call it Tle f t and the right
tree (Tright ) can be computed as follows:
 P(Tle f t)=.05.20 .20 .20 .75 .30 .60 .10 .40=2.2 × 10−6
 P(Tright )= .05.10.20.15.75.75.30.60.10.40=6.1 × 10−7
 We can see that the left (transitive) tree in has a much higher
probability than the tree on the right.
 Thus this parse would correctly be chosen by a disambiguation
algorithm which selects the parse with the highest PCFG probability.
 The string of words S is called the yield of any parse tree over S.

 Thus out of all parse trees with a yield of S, the disambiguation


algorithm picks the parse tree which is most probable given S
T ( S )  arg max P (T )
yield (T )
LEARNING PCFG RULE PROBABILITIES
 There are two ways to learn probabilities for the rules of a
grammar.
 The simplest way is to use a treebank, a corpus of already-
parsed sentences.
 The commonly-used Penn Treebank (Marcus et al., 1993), is a
collection of parse trees in English, Chinese, and other
languages distributed by the Linguistic Data Consortium.
 Given a treebank, the probability of each expansion of a non-
terminal can be computed by counting the number of times that
expansion occurs and then normalizing.
 

count( A  B )
P ( A  B | A) 
 count( A  C )
C
LEARNING PCFG RULE PROBABILITIES CONTD…

 If we don't have a treebank, but we do have a (non-


probabilistic) parser, we can generate the counts we need for
computing PCFG rule probabilities by first parsing a corpus of
sentences with the parser.
 It is done by parsing the corpus, increment a counter for every
rule in the parse, and then normalize to get probabilities.
PROBABILISTIC CKY ALGORITHM
 The algorithms for computing the most-likely parse are simple
extensions of the standard algorithms for parsing;
 There are probabilistic versions of both the CKY and Earley
algorithms.
 Most modern probabilistic parsers are based on the
probabilistic CKY (Cocke-Kasami-Younger) algorithm, first
described by Ney (1991).
 In probabilistic CKY algorithm, for a sentence of length n
and a grammar that contains non-terminals, each cell
table[i, j] contains a constituent A that spans positions i
through j of the input along with the probability of the
constituent A.
PROBABILISTIC CKY ALGORITHM

 for all { A | A → BC ∈ grammar, and table[i, k, B] > 0 and


table[k, j,C] > 0 }
table[i,j,A] ←P(A → BC) × table[i,k,B] × table[k,j,C]
PROBABILISTIC CKY ALGORITHM

 Like CKY algorithm, probabilistic CKY algorithm also


requires the probabilistic CFG to be in CNF.
 The rules for converting PCFG to CNF are same but the
probabilities need to be adjusted.
 The rules that mix terminals and can be converted into CNF by
simply introducing a new dummy non-terminal that covers only
the original terminal. The probability of original rule is same
and the probability for dummy non-terminal rule is 1.
 For example, a rule for an infinitive verb phrase such as INF-
VP → to VP [0.2] would be replaced by the two rules
INF-VP → TO VP [0.2]
TO → to [1.0]
PROBABILISTIC CKY ALGORITHM

 Unit  productions are eliminated by rewriting the right-hand side of the


original rules with the right-hand side of all the non-unit production rules
that they ultimately lead to a terminal. The probability of the new rule is
the product of the probability of unit production and the probability of
the rule that leads to terminal.
 If A  B [0.1] is a unit production and B  u [0.2] is a non unit
production, then we add a rule A  u [0.02] in the grammar.
 Rules with right-hand sides longer than 2 are remedied through the
introduction of new non-terminals that spread the longer sequences over
several new productions. The probability of original rule remains same
and the probability for dummy variable rules is set to 1.
 Formally, if we have a rule like A → B C γ [0.18] we introduce following
rules:
X1 → B C [1.0]
A → X1 γ [0.18]
PROBABILISTIC CKY ALGORITHM

 Using probabilistic CKY algorithm, find the most probable path for the
sentence “astronomers saw stars with ears”. Use the following PCFG:
 S NP VP [1]

 NP  NP PP [0.4]

 NP  Noun [0.6]

 VP  V NP [0.7]

 VP  VP PP [0.3]

 PP  P NP [1]

 Noun  astronomers [0.17]

 Noun telescope [0.17]

 Noun  ears [0.3]

 Noun  stars [0.3]

 Noun  saw [0.06]

 V  saw [1]

 P  with [1]
PROBABILISTIC CKY ALGORITHM
 Converting PCFG to CNF
 S NP VP [1]

 NP  NP PP [0.4]

 NP  astronomers [0.102]

 NP telescope [0.102]

 NP  ears [0.18]

 NP  stars [0.18]

 NP  saw [0.036]

 VP  V NP [0.7]

 VP  VP PP [0.3]

 PP  P NP [1]

 V  saw [1]

 P  with [1]

 Noun  astronomers [0.17]

 Noun telescope [0.17]

 Noun  ears [0.3]

 Noun  stars [0.3]

 Noun  saw [0.06]


CKY PARSING EXAMPLE CONTD…

0astronomers 1 saw 2 stars 3 with 4 ears 5


A1: NP [0.102] --------- H1: S [0.012852] ----- I1: S1 [0.00925344]
0 A2: Noun [0.17] [0,2] (A1, F1) [0,4] (A1,J1)
[0,1] [0,3] I2: S2 [0.000694008]
(A1, J2)
B1: NP [0.036] F1: VP [0.126] ------ J1: VP1 [0.09072]
B2: V (B2,C1) [1,4] (B2,I1)
1
[1,2] [1,3] J2: VP2 [0.006804]
(F1,G1)
[1,5]
C1: NP [0.18] ----- I1: NP [0.01296]
2
C2: Noun [0.036] [2,4] (C1,G1)
[2,3] [2,5]
D1: P [1] G1: PP [0.18]
3 [3,4] (D1,E1)
[3,5]
E1: NP [0.18]
4 [4,5]
ASTRONOMERS SAW STARS WITH
EARS

You might also like