Professional Documents
Culture Documents
ProbabilisticContextFree
Grammars
RaphaelHoffmann
590AI,Winter2009
Outline
PCFGs:InferenceandLearning
ParsingEnglish
DiscriminativeCFGs
GrammarInduction
ImageSearchforpcfg
Live Search
Outline
PCFGs:InferenceandLearning
ParsingEnglish
DiscriminativeCFGs
GrammarInduction
Thevelocityoftheseismicwavesrisesto
Nonterminals N 1, N 2, . . . , N n
Startsymbol N1
Rules N i j
where j is a sequence of
terminals and nonterminals
Nonterminals N 1, N 2, . . . , N n
Startsymbol N1
Rules N i j
where j is a sequence of
terminals and nonterminals
N i = : repeatedderivationfromgives
Ni
X
P (w1n) = P (w1n, t)
t
Terminals with,saw,astronomers,ears,stars,
telescopes
Nonterminals S,PP,P,NP,VP,V
Startsymbol S
2. Contextfree
j j
P (Nkl | words outside wk . . . wl ) = P (Nkl )
2. Contextfree
j j
P (Nkl | words outside wk . . . wl ) = P (Nkl )
3. Ancestorfree
j j j
P (Nkl | ancestor nodes of Nkl ) = P (Nkl )
Whatistheprobabilityofasentence?
P (w1m|G)
Whatistheprobabilityofasentence?
P (w1m|G)
Whatisthemostlikelyparseofsentence?
arg maxt P (t|w1m, G)
Whatistheprobabilityofasentence?
P (w1m|G)
Whatisthemostlikelyparseofsentence?
arg maxt P (t|w1m, G)
Whatruleprobs.maximizeprobs.ofsentences?
G
Findthatmaximizes P (w1m|G)
N i wj
highprobabilityinHMM,lowprobabilityinPCFG
highprobabilityinHMM,lowprobabilityinPCFG
ProbabilisticRegularGrammar
N i wj N k
N i wj
SlidebasedonFoundationsofStatisticalNaturalLanguageProcessing byChristopherManningandHinrich Schtze
HMMs andPCFGs
ForPCFGs wehave
j
Outside j (p, q) = P (w1(p1), Npq , w(q+1)m |G)
j
Inside j (p, q) = P (wpq |Npq , G)
X
P (w1m|G) = j (k, k)P (N j wk )
j
InsideProbabilities j
j (p, q) = P (wpq |Npq , G)
Basecase j (k, k) = j
P (wkk |Nkk , G)
= P (N j wk |G)
Basecase j (k, k) = j
P (wkk |Nkk , G)
= P (N j wk |G)
Induction
Wanttofind j (p, q) for p < q
SinceweassumeChomskyNormalForm,
thefirstrulemustbeoftheform N j N r N s
Sowecandividethesentenceintwoin
variousplacesandsumtheresult
q1
XX
j (p, q) = P (N j N r N s )r (p, d)s (d + 1, q)
r,s d=p
V P = 0.126 P P = 0.18
? ? ? ?
NP = 0.1 V = 1.0 NP = 0.18 P = 1.0 NP = 0.18
NP = 0.04
astronomers saw stars with ears
S = 0.015876
V P = 0.015876
S = 0.0126 NP = 0.01296
?
V P = 0.126 P P = 0.18
S = 0.0126 NP = 0.01296
V P = 0.126 P P = 0.18
j (1, m) = 0, for j 6= 1
Induction
m
X X
j (p, q) = f (p, e)P (N f N j N g )g (q + 1, e)
f,g e=q+1
X p1
X
+ f (e, q)P (N f N g N j )g (e, p 1)
f,g e=1
Therefore, X
P (w1m , Npq |G) = j (p, q)j (p, q)
j
Therefore, X
P (w1m , Npq |G) = j (p, q)j (p, q)
j
Justinthecasesoftherootnodeandthe
preterminals,weknowtherewillbesome
suchconstituent.
SlidebasedonFoundationsofStatisticalNaturalLanguageProcessing byChristopherManningandHinrich Schtze
Training
Ifhavedata count
j C(N j )
P (N ) = P j
C(N )
elseuseEM(InsideOutsideAlgorithm)
repeat
computesands
j j
computes
P
P (N j N r N s ) = . . .
P (N j wk ) = . . .
end
j j
tworeallylongformulaswithsands
Oftensufficientisadiscriminative model
P (y|w)
Easier,becausedoesnotcontain P (w)
CannotmodeldependentfeaturesinHMM,
sooneonlypicksonefeature:wordsidentity
GenerativeandDiscriminativeModels
Sequence GeneralGraphs
HMMs Generative
NaveBayes
directedmodels
Sequence GeneralGraphs
SlidebasedonAnintroductiontoConditionalRandomFieldsforRelationalLearning byCharlesSuttonandAndrewMcCallum
GenerativeandDiscriminativeModels
General
Sequence Tree Graphs
General
Sequence Graphs
General
Sequence Tree Graphs
LogisticRegression LinearchainCRFs
?
GeneralCRFs
Discriminative
ContextFreeGrammars
Terminals w1, w2, . . . , wV
Nonterminals N 1, N 2, . . . , N n
Startsymbol N1
Rules N i j
where j is a sequence of
terminals and nonterminals
Rulescores
F
X
S(N i j , p, q) = k (N i j )fk (w1 w2 . . . wm , p, q, N i j )
k=1
Featurescandependonalltokens+span
ConsiderfeatureAllOnTheSameLine
Mavis Wood Mavis Wood Products
Products
[comparetolinearCRF
fk (st , tt1, w1 w2 . . . wm , t) ]
Noindependencebetweenfeaturesnecessary
Cancreatefeaturesbasedonwords,
dictionaries,digits,capitalization,
CanstilldoefficientViterbi inferencein
O(m3r)
SlidebasedonWikipedia
EmpiricalProblems
Evenfinitesearchspacescanbetoobig
Noise
Insufficientdata
Manylocaloptima
Delete
NewRule
Split
Substitute
DefinebinaryrepresentationforG,code(D|G)
Accuratesegmentation
Inaccuratestructurallearning
SlidebasedonUnsupervisedgrammarinductionwithMinimumDescriptionLength byRoni Katzir
PrototypeDrivenGrammarInduction
Semisupervisedapproach
Giveonlyafewdozenprototypicalexamples
(forNPe.g.determinernoun,pronouns,)
OnEnglishPennTreebank:F1=65.1
(52%reductionovernavePCFGinduction)
Aria Haghighi and Dan Klein.
Prototype-Driven Grammar Induction.
ACL 2006
Dan Klein and Chris Manning.
A Generative Constituent-Context Model for Improved Grammar Induction.
ACL 2001
Thatsit!