You are on page 1of 2

Computational Linguistics HT 2013: Practical 1

Candidate Number: 588 172

Introduction

In this practical, a part of speech tagger is implemented with the help of Hidden Markov
Model(HMM). This approach is a supervised learning algorithm, hence a training data set
is needed to tag a test set. At last, the performance of the speech tagger is checked by 10-
fold cross-validation.

Handling unknown words

Since not all words are contained in the training set, the probability P(wi | t j ) will be zero for
a unknown word wi if no smoothing is employed, which does not give us any information
on how to tag unknown words, therefore smoothing techniques is necessary. I started with
add one smoothing, resulting in an average accuracy of 0.88, which is worse than the zero
order HMM result due to the poor performance in tagging unknown words. The result is
slightly improved by adding 0.1 instead of adding 1, which leads to an accuracy close to
0.9. Nonetheless, the result is still more or less the same as the zero order HMM
Benchmark.

To push the performance of the tagger further, one-count smoothing is eventually used.
The basic idea is that we make use of singletons information from the training data to
estimate the tagging for unknown words. It is similar to add-k (Laplace) smoothing, except
that the value for k is determined dynamically by the number of relevant singleton counts.
The details of one-count smoothing can be found in the appendix of following document:
Intro to NLP, Prof. J. Eisner, http://www.cs.jhu.edu/~jason/465/hw-hmm/hw-hmm.pdf

With the help of one-count smoothing, an accuracy of over 0.93038 is achieved, which is
higher than the baseline performance 0.91948.

Strengths

With the help of one-count smoothing, we take into account of the singleton information in
the training data. For example, since the tag NOUN appears on a large number of different
words in the training set and DETERMINER appears on a small number of different words,
it is more likely that an unseen word will be a NOUN. This information dramatically improve
the performance in tagging unknown words.

For known words, the first order HMM is usually good enough to provide an accuracy
above 97%.
Weaknesses

Since the model is still based on bigrams, the structure between multiple words are not
considered when tagging. The problem is particular significant when tagging unknown
words.

Further and Beyond

The accuracy of the part of speech tagger can be improved further by adapting the
following approaches:

1. Using trigrams or n-grams instead of bigrams in calculating the transition and


emission probability. This results in a higher-order HMM and more information in
the training data is used to tag the sentence.

2. Using fancier smoothing algorithms, such as the Chinese Restaurant Process


model taken from non-parametric Bayesian statistics to put a prior distribution on
unseen word/tag combinations.1

1 Kevin Knight. Bayesian Inference with Tears, a tutorial workbook for natural language researchers
. Sep 2009. URL: http://www.isi.edu/natural-language/people/bayes-with-tears.pdf (retrieved Feb 2013)

You might also like