Chart Parsing-Earley Algorithm & Statistical Parsing: Dr. Sukhnandan Kaur Csed, Tiet

CHART PARSING- EARLEY
ALGORITHM
&
STATISTICAL PARSING
Dr. Sukhnandan Kaur

CSED, TIET
THE EARLEY ALGORITHM-INTRODUCTION
 The Earley algorithm is a dynamic programming parsing
algorithm that do not require the underlying grammar to be in
Chomsky Normal Form.
 In contrast to the bottom-up search implemented by the CYK
algorithm, the Earley algorithm (Earley, 1970) uses dynamic
programming to implement a top-down search.
 The core of the Earley algorithm is a single left-to-right pass
that fills an array called as chart.
 The chart contains a list of states representing the partial/full
parse trees that have been generated so far.
THE EARLEY ALGORITHM- STATES
 The individual states contained within each chart entry contain
three kinds of information:
 a subtree corresponding to a single grammar rule,
 information about the progress made in completing this subtree,
 the position of the subtree with respect to the input.
 A • within the right-hand side of a state’s grammar rule to

indicate the progress made in recognizing it. The resulting
structure is called a dotted rule.
 A state’s position with respect to the input will be represented
by two numbers indicating where the state begins and where its
dot lies.
THE EARLEY ALGORITHM- STATES
 S → • VP, [0, 0]
 NP → Det • Nominal, [1, 2]
 VP → V NP •, [0, 3]
 The first state, with its dot to the left of its constituent, represents a
top-down prediction for this particular kind of S. The first 0
indicates that the constituent predicted by this state should begin at
the start of the input; the second 0 reflects the fact that the dot lies
at the beginning as well.
 The second state, created at a later stage in the processing of this
sentence, indicates that an NP begins at position 1, that a Det has
been successfully parsed and that a Nominal is expected next.
 The third state, with its dot to the right of all its two constituents,
represents the successful discovery of a tree corresponding to a VP
that spans the entire input
THE EARLEY ALGORITHM - OPERATION
 The fundamental operation of an Earley parser is to march through the sets of
states in the chart in a left-to-right fashion, processing the states within each set
in order.
 At each step, one of the three operators PREDICTOR, SCANNER, and
COMPLETER is applied to each state depending on its status.
 In each case, this results in the addition of new states to the end of either the
current, or next, set of states in the chart.
 The new states are then added to the chart as long as they are not already
present.
 The PREDICTOR and the COMPLETER add states to the chart entry being
processed, while the SCANNER adds a state to the next chart entry.
 The algorithm always moves forward through the chart making additions as it
goes; states are never removed and the algorithm never backtracks to a
previous chart entry once it has moved on.
 The presence of a state S → α•, [0, N] in the list of states in the last chart entry
indicates a successful parse.
THE EARLEY ALGORITHM – PREDICTOR OPERATION
 PREDICTOR is applied to any state that has a non-terminal immediately to the

right of its dot that is not a part-of-speech category.
 This application results in the creation of one new state for each alternative
expansion of that non-terminal provided by the grammar.
 These new states are placed into the same chart entry as the generating state.
 They begin and end at the point in the input where the generating state ends.
 For example, applying PREDICTOR to the state S → • VP, [0, 0] results in the
addition of the following five states
 VP → • Verb, [0, 0]
 VP → • Verb NP, [0, 0]
 VP → • Verb NP PP, [0, 0]
 VP → • Verb PP, [0, 0]
 VP → • VP PP, [0, 0]
THE EARLEY ALGORITHM – SCANNER OPERATION
 When a state has a part-of-speech category to the right of the dot,
SCANNER is called to examine the input and incorporate a state
corresponding to the prediction of a word with a particular part-of-
speech into the chart.
 This is accomplished by creating a new state from the input state
with the dot advanced over the predicted input category.
 The new state is then added to the chart entry that follows the one
currently being processed.
 For Example, , when the state VP → • Verb NP, [0, 0] is processed,
SCANNER consults the current word in the input since the category
following the dot is a part-of-speech.
 It then notes that book can be a verb, matching the expectation in the
current state. This results in the creation of the new state Verb →
book •, [0, 1].
THE EARLEY ALGORITHM – COMPLETER OPERATION
 COMPLETER is applied to a state when its dot has reached the right end of
the rule.
 The presence of such a state represents the fact that the parser has
successfully discovered a particular grammatical category over some span
of the input.
 The purpose of COMPLETER is to find, and advance, all previously
created states that were looking for this grammatical category at this
position in the input.
 New states are then created by copying the older state, advancing the dot
over the expected category, and installing the new state in the current chart
entry.
 When the state NP → Det Nominal•, [1, 3] is processed, COMPLETER
looks for incomplete states ending at position 1 and expecting an NP.
 It finds the states VP → Verb•NP, [0, 1] and VP → Verb•NP PP, [0, 1]. This
results in the addition of the new complete state VP → Verb NP•, [0, 3], and
the new incomplete state VP → Verb NP•PP, [0, 3] to the chart.

THE EARLEY ALGORITHM- EXAMPLE
 Using the Earley Algorithm technique, parse the sentence “Book
that flight” . Consider the following phrase structure grammar
S → NP VP Det → that | this | a

S → VP Noun → book | flight | meal | money
S → Aux NP VP Verb → book | include | prefer
NP → Pronoun Pronoun → I | she | me
NP → Proper-Noun Proper-Noun → Houston | TWA
NP → Det Nominal Aux → does
Nominal → Noun
Nominal → Nominal Noun
Nominal → Nominal PP
VP → Verb Preposition → from | to | on | near | through
VP → Verb NP
VP → Verb NP PP
VP → Verb PP
VP → VP PP
PP → Preposition NP
Chart 0
No. State Begin End Operation
S0 S 0 0 Dummy State
S1 S →  NP VP 0 0 Predictor from S0
S2 S →  VP 0 0 Predictor from S0
S3 S →  Aux NP VP 0 0 Predictor from S0
S4 NP →  Pronoun 0 0 Predictor from S1
S5 NP →  Proper-Noun 0 0 Predictor from S1
S6 NP →  Det Nominal 0 0 Predictor from S1
S7 VP →  Verb 0 0 Predictor from S2
S8 VP →  Verb NP 0 0 Predictor from S2
S9 VP →  Verb NP PP 0 0 Predictor from S2
S10 VP →  Verb PP 0 0 Predictor from S2
S11 VP →  VP PP 0 0 Predictor from S2
Chart 1
S12 Verb → book  0 1 Scanner from S7
S13 VP → Verb  0 1 Completer from S12
S14 VP → Verb  NP 0 1 Completer from S12
S15 VP → Verb  NP PP 0 1 Completer from S12
S16 VP → Verb  PP 0 1 Completer from S12
S17 S → VP  0 1 Completer from S13
S18 VP → VP  PP 0 1 Completer from S13
S19 NP →  Pronoun 1 1 Predictor from S14
S20 NP →  Proper-Noun 1 1 Predictor from S14
S21 NP →  Det Nominal 1 1 Predictor from S14
S22 PP →  Preposition NP 1 1 Predictor from S16
Chart 2
S23 Det → that  1 2 Scanner from S21
S24 NP → Det  Nominal 1 2 Completer from S23
S25 Nominal →  Noun 2 2 Predictor from S24
S26 Nominal →  Nominal Noun 2 2 Predictor from S24
S27 Nominal →  Nominal PP 2 2 Predictor from S24
Chart 3
S28 Noun → flight  2 3 Scanner from S25
S29 Nominal → Noun  2 3 Completer from S28
S30 NP → Det Nominal  1 3 Completer from S23,S29
S31 Nominal → Nominal  Noun 2 3 Completer from S29
S32 Nominal → Nominal  PP 2 3 Completer from S29
S33 VP → Verb NP  0 3 Completer from S12,S30
S34 VP → Verb NP  PP 0 3 Completer from S12,S30
S35 PP →  Preposition NP 3 3 Predictor from S32
S36 S → VP  0 3 Completer from S33
S37 VP → VP  PP 0 3 Completer from S33
S (S36)
VP (S33)
VERB (S12) NP (S30)
NOMINAL
Book DET (S23)
(S29)
NOUN (S28)
that
flight
THE EARLEY ALGORITHM –EXAMPLE 2
 Generate two parse trees for the sentence “Pluck the flower with the stick”
using Earley chart parsing algorithm. Consider the following grammar:
S  NP VP
S  VP
NP  Det Noun
NP  NP PP
NP  Noun
VP  VP NP
VP  Verb
VP  VP PP
PP  Preposition NP
Noun  flower | stick | boy
Preposition  with|in
Det  a| an| the
Verb  pluck | see| eat
Chart[0]
State State Begin End Operator

Number
S0   S 0 0 Dummy State
S1 S  NP VP 0 0 Predictor from S0
S2 S   VP 0 0 Predictor from S0
S3 NP   Det Noun 0 0 Predictor from S1
S4 NP   NP PP 0 0 Predictor from S1
S5 NP   Noun 0 0 Predictor from S1
S6 VP   VP NP 0 0 Predictor from S2
S7 VP   Verb 0 0 Predictor from S2
S8 VP  VP PP 0 0 Predictor from S2
Chart[1]

Number
S9 Verb  pluck  0 1 Scanner from S7
S10 VP  Verb  0 1 Completer from S9
S11 S  VP  0 1 Completer from S10
S12 VP  VP  NP 0 1 Completer from S10
S13 VP  VP  PP 0 1 Completer from S10
S17 PP   Preposition NP 1 1 Predictor from S13
Chart[2]

Number
S18 Det  the  1 2 Scanner from S14
S19 NP  Det  Noun 1 2 Completer from S18
Chart[3]
Number
S20 Noun  flower  2 3 Scanner from S19
S21 NP  Det Noun  1 3 Completer from S18,S20
S22 VP  VP NP  0 3 Completer from S10,S21
S23 NP  NP  PP 1 3 Completer from S21
Chart[4]
Number
S31 Preposition  with  3 4 Scanner from S27
S32 PP  Preposition  NP 3 4 Completer from S31
Chart[5]
Number
S36 Det  the  4 5 Scanner from S33
S37 NP  Det  Noun 4 5 Completer from S36
Chart[6]
Number
S38 Noun  stick  5 6 Scanner from S37
S39 NP  Det Noun  4 6 Completer from S36, S38
S40 PP  Preposition NP  3 6 Completer from S31,S39
S42 NP  NP PP  1 6 Completer from S21,S40
S43 VP  VP PP  0 6 Completer from S22,S40
S45 VP  VP NP  0 6 Completer from S10,S42
Chart[6 Contd]
Number
Parse 1
S (S47)
VP (S43)
VP (S22) PP (S40)
VP (S10) NP (S21) Prep (S31) NP (S39)
DET (S36) Noun (S38)

Verb (S9) Det (S18) Noun (S20) with
Pluck the flower the stick

Parse 2
S (S50)
VP (S45)
VP (S10) NP (S42)
Verb (S9)
NP (S21) PP (S40)
Prep (S31) NP (S39)

Pluck Det (S18) Noun (S20)
DET (S36) Noun (S38)

with
the flower
the stick
STATISTICAL PARSING
 The sentences on average tend to be very syntactically
ambiguous, due to problems like coordination ambiguity and
attachment ambiguity.
 The dynamic parsing algorithms could represent these
ambiguities in an efficient way, but are not equipped to resolve
them.
 A probabilistic parser offers a solution to the problem: compute
the probability of each interpretation, and choose the most-
probable interpretation.
 The most commonly used probabilistic grammar is the
probabilistic context-free grammar (PCFG), a probabilistic
augmentation of context-free grammars in which each rule is
associated with a probability.

PROBABILISTIC CONTEXT FREE GRAMMAR (PCFG)
 The simplest augmentation of the context-free grammar is the Probabilistic Context-Free
Grammar (PCFG), also known as the Stochastic Context-Free Grammar (SCFG), first
proposed by Booth (1969).
 A probabilistic context-free grammar augments each rule in P with a conditional probability. A
PCFG is thus defined by the following components:
a set of non-terminal symbols (or variables)
a set of terminal symbols (disjoint from N)
a set of rules or productions, each of the form A →  [ p], where A is a non-terminal,  is a
string of symbols from the infinite set of strings, and p is a number between 0 and 1 expressing
P( |A)
a designated start symbol
 p is the conditional probability of a given expansion  given the left-hand-side (LHS) non-
terminal A. We can represent this probability as:
P(A →  ) or as P(RH S|LH S)
 Thus if we consider all the possible expansions of a non-terminal, the sum of their probabilities
must be 1:
∑ P(A →  ) =1

 A PCFG is said to be consistent if the sum of the probabilities of all sentences in the language
equals 1.
PCFGS FOR DISAMBIGUATION
 A PCFG assigns a probability to each parse tree T (i.e., each derivation)
of a sentence S. This attribute is useful in disambiguation.
 The probability of a particular parse T is defined as the product of the
probabilities of all the n rules used to expand each of the n non-terminal
nodes in the parse tree T , (where each rule i can be expressed as LH Si
→ RH Si):
n
P(T) = ∏ P (RHSi  LHSi)
i=1
 For example, example, consider the two parses of the sentence “Book
the dinner flights” shown in (next slide).
 The sensible parse on the left means “Book flights that serve dinner”.
 The nonsensical parse on the right, however, would have to mean

something like “Book flights on be half of `the dinner'?”,
PCFGS FOR DISAMBIGUATION CONTD…
PCFGS FOR DISAMBIGUATION CONTD…
 The probability of each of the trees can be computed by multiplying
together the probabilities of each of the rules used in the derivation.
 For example, the probability of the left tree (call it Tle f t and the right
tree (Tright ) can be computed as follows:
 P(Tle f t)=.05.20 .20 .20 .75 .30 .60 .10 .40=2.2 × 10−6
 P(Tright )= .05.10.20.15.75.75.30.60.10.40=6.1 × 10−7
 We can see that the left (transitive) tree in has a much higher
probability than the tree on the right.
 Thus this parse would correctly be chosen by a disambiguation
algorithm which selects the parse with the highest PCFG probability.
 The string of words S is called the yield of any parse tree over S.
 Thus out of all parse trees with a yield of S, the disambiguation

algorithm picks the parse tree which is most probable given S
T ( S )  arg max P (T )
yield (T )
LEARNING PCFG RULE PROBABILITIES
 There are two ways to learn probabilities for the rules of a
grammar.
 The simplest way is to use a treebank, a corpus of already-
parsed sentences.
 The commonly-used Penn Treebank (Marcus et al., 1993), is a
collection of parse trees in English, Chinese, and other
languages distributed by the Linguistic Data Consortium.
 Given a treebank, the probability of each expansion of a non-
terminal can be computed by counting the number of times that
expansion occurs and then normalizing.

count( A  B )
P ( A  B | A) 
 count( A  C )
C
LEARNING PCFG RULE PROBABILITIES CONTD…
 If we don't have a treebank, but we do have a (non-

probabilistic) parser, we can generate the counts we need for
computing PCFG rule probabilities by first parsing a corpus of
sentences with the parser.
 It is done by parsing the corpus, increment a counter for every
rule in the parse, and then normalize to get probabilities.
PROBABILISTIC CKY ALGORITHM
 The algorithms for computing the most-likely parse are simple
extensions of the standard algorithms for parsing;
 There are probabilistic versions of both the CKY and Earley
algorithms.
 Most modern probabilistic parsers are based on the
probabilistic CKY (Cocke-Kasami-Younger) algorithm, first
described by Ney (1991).
 In probabilistic CKY algorithm, for a sentence of length n
and a grammar that contains non-terminals, each cell
table[i, j] contains a constituent A that spans positions i
through j of the input along with the probability of the
constituent A.
 for all { A | A → BC ∈ grammar, and table[i, k, B] > 0 and

table[k, j,C] > 0 }
table[i,j,A] ←P(A → BC) × table[i,k,B] × table[k,j,C]
 Like CKY algorithm, probabilistic CKY algorithm also

requires the probabilistic CFG to be in CNF.
 The rules for converting PCFG to CNF are same but the
probabilities need to be adjusted.
 The rules that mix terminals and can be converted into CNF by
simply introducing a new dummy non-terminal that covers only
the original terminal. The probability of original rule is same
and the probability for dummy non-terminal rule is 1.
 For example, a rule for an infinitive verb phrase such as INF-
VP → to VP [0.2] would be replaced by the two rules
INF-VP → TO VP [0.2]
TO → to [1.0]
 Unit productions are eliminated by rewriting the right-hand side of the

original rules with the right-hand side of all the non-unit production rules
that they ultimately lead to a terminal. The probability of the new rule is
the product of the probability of unit production and the probability of
the rule that leads to terminal.
 If A  B [0.1] is a unit production and B  u [0.2] is a non unit
production, then we add a rule A  u [0.02] in the grammar.
 Rules with right-hand sides longer than 2 are remedied through the
introduction of new non-terminals that spread the longer sequences over
several new productions. The probability of original rule remains same
and the probability for dummy variable rules is set to 1.
 Formally, if we have a rule like A → B C γ [0.18] we introduce following
rules:
X1 → B C [1.0]
A → X1 γ [0.18]
 Using probabilistic CKY algorithm, find the most probable path for the
sentence “astronomers saw stars with ears”. Use the following PCFG:
 S NP VP [1]
 NP  NP PP [0.4]
 NP  Noun [0.6]
 VP  V NP [0.7]
 VP  VP PP [0.3]
 PP  P NP [1]
 Noun  astronomers [0.17]
 Noun telescope [0.17]
 Noun  ears [0.3]
 Noun  stars [0.3]
 Noun  saw [0.06]
 V  saw [1]
 P  with [1]
 Converting PCFG to CNF
 S NP VP [1]
 NP  NP PP [0.4]
 NP  astronomers [0.102]
 NP telescope [0.102]
 NP  ears [0.18]
 NP  stars [0.18]
 NP  saw [0.036]
 VP  V NP [0.7]
 VP  VP PP [0.3]
 PP  P NP [1]
 V  saw [1]
 P  with [1]
 Noun  astronomers [0.17]
 Noun telescope [0.17]
 Noun  ears [0.3]
 Noun  stars [0.3]
 Noun  saw [0.06]

CKY PARSING EXAMPLE CONTD…
0astronomers 1 saw 2 stars 3 with 4 ears 5

A1: NP [0.102] --------- H1: S [0.012852] ----- I1: S1 [0.00925344]
0 A2: Noun [0.17] [0,2] (A1, F1) [0,4] (A1,J1)
[0,1] [0,3] I2: S2 [0.000694008]
(A1, J2)
B1: NP [0.036] F1: VP [0.126] ------ J1: VP1 [0.09072]
B2: V (B2,C1) [1,4] (B2,I1)
1
[1,2] [1,3] J2: VP2 [0.006804]
(F1,G1)
[1,5]
C1: NP [0.18] ----- I1: NP [0.01296]
2
C2: Noun [0.036] [2,4] (C1,G1)
[2,3] [2,5]
D1: P [1] G1: PP [0.18]
3 [3,4] (D1,E1)
[3,5]
E1: NP [0.18]
4 [4,5]
ASTRONOMERS SAW STARS WITH
EARS

Chart Parsing-Earley Algorithm & Statistical Parsing: Dr. Sukhnandan Kaur Csed, Tiet

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chart Parsing-Earley Algorithm & Statistical Parsing: Dr. Sukhnandan Kaur Csed, Tiet

Uploaded by

Copyright:

Available Formats

CHART PARSING- EARLEY

Dr. Sukhnandan Kaur

 A • within the right-hand side of a state’s grammar rule to

 PREDICTOR is applied to any state that has a non-terminal immediately to the

S → NP VP Det → that | this | a

VERB (S12) NP (S30)

State State Begin End Operator

State State Begin End Operator

State State Begin End Operator

VP (S10) NP (S21) Prep (S31) NP (S39)

DET (S36) Noun (S38)

Pluck the flower the stick

Prep (S31) NP (S39)

DET (S36) Noun (S38)

 The nonsensical parse on the right, however, would have to mean

 Thus out of all parse trees with a yield of S, the disambiguation

 If we don't have a treebank, but we do have a (non-

 for all { A | A → BC ∈ grammar, and table[i, k, B] > 0 and

 Like CKY algorithm, probabilistic CKY algorithm also

 Unit productions are eliminated by rewriting the right-hand side of the

 Noun  astronomers [0.17]

 Noun telescope [0.17]

 Noun  ears [0.3]

 Noun  stars [0.3]

 Noun  saw [0.06]

 Noun  astronomers [0.17]

 Noun telescope [0.17]

 Noun  ears [0.3]

 Noun  stars [0.3]

 Noun  saw [0.06]

0astronomers 1 saw 2 stars 3 with 4 ears 5

You might also like