Professional Documents
Culture Documents
• Verbs can be
– transitive: they take a complement, as in:
eat an apple; read a book; sing a song
– intransitive: verbs that do not take complements, as in:
she laughed; he slept; I lied
• Some tagging distinct are quite hard for both humans and machines.
• Preposition(IN),particle(RP) and adverb(RB) can have a large overlap as
the word ‘around’ in the following cases:
– Mrs./NNP Shaefer/NNP never/RB got/VBD around/RP to/TO
joining/VBG
– All/DT we/PRP gotta/VBN do/VB is/VBZ go/VB around/IN the/DT
corner/NN
– Chateau/NNP Petrus/NNP costs/VBZ around/RB 250/CD
IS 7118: NLP Unit-5: POS, 30
Prof. R.K.Rao Bandaru
Issues in tagging
• Particles often can either precede or follow a noun phrase as in
the following examples:
– She told off/RP her friends
– She told her friends off/RP
• Prepositions on the other hand cannot follow their noun phrases
– She stepped off/IN the train
– *She stepped the train off/IN
• Another difficulty is labeling words that can modify nouns.
– Cotton/NN sweater/NN**
– Income-tax/JJ return/NN
• The Problem:
Word POS listing in Brown
heat noun verb
oil noun
in prep noun adv
a det noun noun-proper
large adj noun adv
pot noun
NN
RB
VBN JJ VB
PRP VBD TO VB DT NN
She promised to back the bill
Etc… for the ~100,000 words of English with more than 1 tag
• Let W = w1,w2,…,wn
• Goal: Out of all sequences of tags t1…tn, get the most probable
sequence of POS tags T underlying the observed sequence of
words w1,w2,…,wn
• Hat ^ means “our estimate of the best = the most probable tag
sequence”
We can drop the denominator: it does not change for each tag
sequence; we are looking for the best tag sequence for the same
observation, for the same fixed set of words
---------------------------------------------------------------------------------------------------------------
biagram assumption
# of times DT is followed
by NN
These counts are taken from Treebank Brown corpus with 45 tag.
The probability of a common noun
IS 7118: afterPOS,
NLP Unit-5: a determiner is 0.49 56
Prof. R.K.Rao Bandaru
Two kinds of probabilities
If we were expecting a third person singular verb, how likely is it that
• P(NN|TO) = .00047
• P(VB|TO) = .83
• P(race|NN) = .00057
• P(race|VB) = .00012
• P(NR|VB) = .0027
• P(NR|NN) = .0012
• Multiply the lexical likelihoods with the tag sequence probabilities: the
verb wins
• P(VB|TO)P(NR|VB)P(race|VB) = .00000027
• P(NN|TO)P(NR|NN)P(race|NN)=.00000000032
• We could just enumerate all paths given the input and use the
model to assign probabilities to each.
– Not a good idea.
• If there are 30 or so tags in the Penn set, and the average
sentence is around 20 words...
• How many tag sequences do we have to enumerate to argmax
over?
3020
– Luckily dynamic programming could helps us here
Figure 5.13: The Markov chain corresponding to the hidden states of the HMM.
The A transition probabilities are used to compute the prior probabilities
???