Welcome to Scribd!

Computational Linguistics HT 2013: Practical 1: Candidate Number: 588 172

Uploaded by

0% found this document useful (0 votes)

7 views2 pages

This document summarizes a student's implementation of a part-of-speech tagger using a Hidden Markov Model. The student tested different smoothing techniques to handle unknown words, achieving the best results using one-count smoothing with an accuracy over 93%. Strengths included one-count smoothing using singleton information to tag unknown words. Weaknesses were that the bigram model does not consider relationships between multiple words. The student proposes using higher-order n-grams or more advanced smoothing to further improve accuracy.

Original Description:

lopotsui

Original Title

lopotsui

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

7 views2 pages

Computational Linguistics HT 2013: Practical 1: Candidate Number: 588 172

Uploaded by

Naveen

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 2

Search inside document

Computational Linguistics HT 2013: Practical 1

Candidate Number: 588 172

Introduction

In this practical, a part of speech tagger is implemented with the help of Hidden Markov
Model(HMM). This approach is a supervised learning algorithm, hence a training data set
is needed to tag a test set. At last, the performance of the speech tagger is checked by 10-
fold cross-validation.

Handling unknown words

Since not all words are contained in the training set, the probability P(wi | t j ) will be zero for
a unknown word wi if no smoothing is employed, which does not give us any information
on how to tag unknown words, therefore smoothing techniques is necessary. I started with
add one smoothing, resulting in an average accuracy of 0.88, which is worse than the zero
order HMM result due to the poor performance in tagging unknown words. The result is
slightly improved by adding 0.1 instead of adding 1, which leads to an accuracy close to
0.9. Nonetheless, the result is still more or less the same as the zero order HMM
Benchmark.

To push the performance of the tagger further, one-count smoothing is eventually used.
The basic idea is that we make use of singletons information from the training data to
estimate the tagging for unknown words. It is similar to add-k (Laplace) smoothing, except
that the value for k is determined dynamically by the number of relevant singleton counts.
The details of one-count smoothing can be found in the appendix of following document:
Intro to NLP, Prof. J. Eisner, http://www.cs.jhu.edu/~jason/465/hw-hmm/hw-hmm.pdf

With the help of one-count smoothing, an accuracy of over 0.93038 is achieved, which is
higher than the baseline performance 0.91948.

Strengths

With the help of one-count smoothing, we take into account of the singleton information in
the training data. For example, since the tag NOUN appears on a large number of different
words in the training set and DETERMINER appears on a small number of different words,
it is more likely that an unseen word will be a NOUN. This information dramatically improve
the performance in tagging unknown words.

For known words, the first order HMM is usually good enough to provide an accuracy
above 97%.
Weaknesses

Since the model is still based on bigrams, the structure between multiple words are not
considered when tagging. The problem is particular significant when tagging unknown
words.

Further and Beyond

The accuracy of the part of speech tagger can be improved further by adapting the
following approaches:

1. Using trigrams or n-grams instead of bigrams in calculating the transition and

emission probability. This results in a higher-order HMM and more information in
the training data is used to tag the sentence.

2. Using fancier smoothing algorithms, such as the Chinese Restaurant Process

model taken from non-parametric Bayesian statistics to put a prior distribution on
unseen word/tag combinations.1

1 Kevin Knight. Bayesian Inference with Tears, a tutorial workbook for natural language researchers
. Sep 2009. URL: http://www.isi.edu/natural-language/people/bayes-with-tears.pdf (retrieved Feb 2013)

Naturak Language Processing Assignment - 1: Name: Asmit Gupta REG NO: 18BCE0904 Slot: G2
Document6 pages
Naturak Language Processing Assignment - 1: Name: Asmit Gupta REG NO: 18BCE0904 Slot: G2
asmit gupta
No ratings yet
A Discriminative Training Approach For Text-Independent Speaker Recognition
Document15 pages
A Discriminative Training Approach For Text-Independent Speaker Recognition
MarkAGregory
No ratings yet
1601 03650v2 PDF
Document21 pages
1601 03650v2 PDF
a4104165
No ratings yet
Speeding Up Target-Language Driven Part-of-Speech Tagger Training For Machine Translation
Document11 pages
Speeding Up Target-Language Driven Part-of-Speech Tagger Training For Machine Translation
Sriram Chaudhury
No ratings yet
A Novel Voice Recognition Model Based On HMM and Fuzzy PPM
Document4 pages
A Novel Voice Recognition Model Based On HMM and Fuzzy PPM
Juan Andres Ruiz
No ratings yet
Lecture On Data Augmentation in NLP
Document8 pages
Lecture On Data Augmentation in NLP
AHSAN JANDAL
No ratings yet
Parameters
Document3 pages
Parameters
Kamil Çağatay Azat
No ratings yet
Machine Learning Selects Best Heuristics for Theorem Proving
Document2 pages
Machine Learning Selects Best Heuristics for Theorem Proving
Oanele Vlad
No ratings yet
COMP3314 A2 G17 Report
Document4 pages
COMP3314 A2 G17 Report
nrdsnke
No ratings yet
CICLing2011 Manning Tagging
Document19 pages
CICLing2011 Manning Tagging
Muhammad Abrar
No ratings yet
Text Augmentation For Neural Networks
Document6 pages
Text Augmentation For Neural Networks
chloe trump
No ratings yet
Represented Using Tensors, and As A Result, Neural Network Programming Utilizes
Document32 pages
Represented Using Tensors, and As A Result, Neural Network Programming Utilizes
Yogesh Krishna
No ratings yet
Data Augmentation in NLP: Best Practices From A Kaggle Master
Document14 pages
Data Augmentation in NLP: Best Practices From A Kaggle Master
ao189
No ratings yet
G A P S T: Enetic Approach For Rabic Art of Peech Agging
Document12 pages
G A P S T: Enetic Approach For Rabic Art of Peech Agging
Darren
No ratings yet
Automatic Short - Answer Grading System (ASAGS) : Abstract
Document5 pages
Automatic Short - Answer Grading System (ASAGS) : Abstract
imot96
No ratings yet
An Improved Training Algorithm in Hmm-Based Speech Recognition
Document4 pages
An Improved Training Algorithm in Hmm-Based Speech Recognition
Prutha Tikale
No ratings yet
Applications of Support Vector Machines To Speech Recognition
Document8 pages
Applications of Support Vector Machines To Speech Recognition
archana10bhosale
No ratings yet
Data Science Interview Questions (#Day11) PDF
Document11 pages
Data Science Interview Questions (#Day11) PDF
Sahil Goutham
100% (1)
Neural Networks: Jiachen Yang, Zhiyong Ding, Fei Guo, Huogen Wang, Nick Hughes
Document10 pages
Neural Networks: Jiachen Yang, Zhiyong Ding, Fei Guo, Huogen Wang, Nick Hughes
Apoorv Gupta
No ratings yet
Mei Numerical Integration Coursework
Document4 pages
Mei Numerical Integration Coursework
afiwiaufk
100% (2)
ML Questions 2021
Document26 pages
ML Questions 2021
Aamir Ali
100% (1)
Short Essays & Notes
Document2 pages
Short Essays & Notes
Yasmine A. Sabry
No ratings yet
Sentiment Analysis From H El Reviews: Data Mining For Business Intelligence
Document13 pages
Sentiment Analysis From H El Reviews: Data Mining For Business Intelligence
Aniket Sujay
No ratings yet
Two Pass Hidden Markov Model For Speech Recognition Systems: 1 Abstract
Document5 pages
Two Pass Hidden Markov Model For Speech Recognition Systems: 1 Abstract
Joyce George
No ratings yet
ML Property Price Model Using Gradient Boosting
Document10 pages
ML Property Price Model Using Gradient Boosting
sujeet_va
No ratings yet
Numerical Methods Integration Coursework
Document6 pages
Numerical Methods Integration Coursework
iuhvgsvcf
100% (2)
Large Language Model Lifecycle
Document2 pages
Large Language Model Lifecycle
Priya Nynaru
No ratings yet
Tiered Tagging and Combined Language Models for Highly Inflectional Languages
Document6 pages
Tiered Tagging and Combined Language Models for Highly Inflectional Languages
Maria Mitrofan
No ratings yet
ML 19.03 Sidenotes
Document30 pages
ML 19.03 Sidenotes
asma
No ratings yet
ML MU Unit 2
Document42 pages
ML MU Unit 2
Paulos K
100% (2)
BERT Explained - State of The Art Language Model For NLP - by Rani Horev - Towards Data Science
Document8 pages
BERT Explained - State of The Art Language Model For NLP - by Rani Horev - Towards Data Science
OnlyBy Myself
100% (1)
Indonesian Part of Speech Tagging Using Maximum Entropy Markov Model On Indonesian Manually Tagged Corpus
Document9 pages
Indonesian Part of Speech Tagging Using Maximum Entropy Markov Model On Indonesian Manually Tagged Corpus
IAES IJAI
No ratings yet
3596 Curriculum Learning A Regulari
Document18 pages
3596 Curriculum Learning A Regulari
Efraileg
No ratings yet
1 - Pos Chunker - IISTE Research Paper
Document6 pages
1 - Pos Chunker - IISTE Research Paper
iiste
No ratings yet
Out-Of-Vocabulary Detection and Confidence Measures For Speech Recognition Using Phone Models
Document4 pages
Out-Of-Vocabulary Detection and Confidence Measures For Speech Recognition Using Phone Models
arlindo_veiga
No ratings yet
Prompt Engineering Guide
Document122 pages
Prompt Engineering Guide
Guz Kout
No ratings yet
Google Neural Machine Translation System
Document23 pages
Google Neural Machine Translation System
Carlangaslangas
No ratings yet
Posterior Based
Document11 pages
Posterior Based
sfunds
No ratings yet
ML MU Unit 2
Document84 pages
ML MU Unit 2
Paulos K
100% (3)
Ce Acmtlsp06 PDF
Document29 pages
Ce Acmtlsp06 PDF
saad
No ratings yet
Numerical Methods Coursework Mei
Document8 pages
Numerical Methods Coursework Mei
botav1nakak3
100% (2)
Machine Learning Multiple Choice Questions - Free Practice Test
Document12 pages
Machine Learning Multiple Choice Questions - Free Practice Test
arafaliwijaya
100% (1)
Brainheaters Notes: Machine Learning Priorities
Document69 pages
Brainheaters Notes: Machine Learning Priorities
jen
No ratings yet
CS771A Machine Learning Final Exam Review
Document8 pages
CS771A Machine Learning Final Exam Review
AnujNagpal
100% (1)
Mei Numerical Methods Coursework Mark Scheme
Document4 pages
Mei Numerical Methods Coursework Mark Scheme
zug0badej0n2
100% (1)
DL Class3
Document28 pages
DL Class3
Rishi Chaary
No ratings yet
Expectation Maximization Homework Solution
Document8 pages
Expectation Maximization Homework Solution
cffge1tw
100% (1)
2020 Icon-Main 10
Document9 pages
2020 Icon-Main 10
Alex gugol
No ratings yet
Hyperparameter Optimization For Machine Learning Models Based On Bayesian Optimization
Document15 pages
Hyperparameter Optimization For Machine Learning Models Based On Bayesian Optimization
Aminul Haque
No ratings yet
Static Dictionary For Pronunciation Modeling
Document5 pages
Static Dictionary For Pronunciation Modeling
International Journal of Research in Engineering and Technology
No ratings yet
A Competitive Programming Approach To A University Introductory Algorithms Course
Document6 pages
A Competitive Programming Approach To A University Introductory Algorithms Course
Krutarth Patel
No ratings yet
POS Tagging Approaches Classification
Document5 pages
POS Tagging Approaches Classification
Rabia Qasim
No ratings yet
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
Document26 pages
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
Md Fazle Rabby
100% (2)
Ch3 - Structering ML Project
Document36 pages
Ch3 - Structering ML Project
amal
No ratings yet
Numerical Methods Coursework
Document5 pages
Numerical Methods Coursework
f675ztsf
100% (2)
Static Dictionary For Pronunciation Modeling
Document5 pages
Static Dictionary For Pronunciation Modeling
esatjournals
No ratings yet
Bharat Hic ML 2010
Document8 pages
Bharat Hic ML 2010
Parameshwari Ramdass
No ratings yet
Automatic Ticket Assignment AIML Online Capstone Group 6
Document21 pages
Automatic Ticket Assignment AIML Online Capstone Group 6
Richa Anand
No ratings yet
Understand Machine Learning Types with Labeled vs Unlabeled Data
Document4 pages
Understand Machine Learning Types with Labeled vs Unlabeled Data
chandana kiran
No ratings yet
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
From Everand
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
Alexandra George
No ratings yet
Finxter Python Cheat Sheet Complex Data Types
Document1 page
Finxter Python Cheat Sheet Complex Data Types
el rahmouni oussama
No ratings yet
Suppositories
Document66 pages
Suppositories
Solomon
No ratings yet
Basic Electrical Engineering - A. Mittle and V. N. Mittle PDF
Document212 pages
Basic Electrical Engineering - A. Mittle and V. N. Mittle PDF
Sajal Singh Patel
0% (5)
Calculator Technique For Solving Volume Flow Rate Problems in Calculus
Document3 pages
Calculator Technique For Solving Volume Flow Rate Problems in Calculus
asapamore
No ratings yet
Understanding PDFF
Document2 pages
Understanding PDFF
JuniorFlores
No ratings yet
Exploring Novel Isocractic HPLC Method For Quantitative Determination of Cinnarizine and Piracetam in Their Capsule Preparations 1920 4159 1000225
Document5 pages
Exploring Novel Isocractic HPLC Method For Quantitative Determination of Cinnarizine and Piracetam in Their Capsule Preparations 1920 4159 1000225
Adolfo Olmos
No ratings yet
5957-Pan Flute Tone Ranges
Document1 page
5957-Pan Flute Tone Ranges
daniel_gheoldus5867
No ratings yet
Probability and Statistics For Engineers
Document42 pages
Probability and Statistics For Engineers
Jennifer Thomas
No ratings yet
CH-7 System of Particles and Rotaional Motion
Document9 pages
CH-7 System of Particles and Rotaional Motion
Mithul VP
No ratings yet
Computer Basics Study Guide
Document11 pages
Computer Basics Study Guide
Prerak Dedhia
No ratings yet
Unit3 (Ac Drives)
Document20 pages
Unit3 (Ac Drives)
vedala chaitanya
No ratings yet
Kubota Front Loader La 211
Document29 pages
Kubota Front Loader La 211
Mark Dubravec
40% (5)
EEE312 - Midterm Exam - Part 1 - Google Forms
Document4 pages
EEE312 - Midterm Exam - Part 1 - Google Forms
David Louie Bedia
No ratings yet
Helios PDF Handshake
Document2 pages
Helios PDF Handshake
milivoj
No ratings yet
Charlez Rock Mechanics V1
Document339 pages
Charlez Rock Mechanics V1
Yehimy Cambas
100% (4)
Iron Carbon Diagram 6
Document15 pages
Iron Carbon Diagram 6
Harris Dar
No ratings yet
Powerful Presentation Tool Beamer
Document85 pages
Powerful Presentation Tool Beamer
Otmane El ouardi
No ratings yet
Basics of Robotics KTU
Document6 pages
Basics of Robotics KTU
Boni Samuel
No ratings yet
Kinetics P.1 and P.2 SL IB Questions Practice
Document22 pages
Kinetics P.1 and P.2 SL IB Questions Practice
2018dgscmt
No ratings yet
800 Range Sensor 120ohms Strain Gauge
Document21 pages
800 Range Sensor 120ohms Strain Gauge
Bigheti
100% (1)
Bus 172
Document5 pages
Bus 172
api-538674995
No ratings yet
Ch. 10: Introduction To Convolution Neural Networks CNN and Systems
Document69 pages
Ch. 10: Introduction To Convolution Neural Networks CNN and Systems
faisal
No ratings yet
03-Power Apps Model-Driven App Lab Manual
Document32 pages
03-Power Apps Model-Driven App Lab Manual
sequeesp medalviru
No ratings yet
Assessment of Water Resources Pollution Associated With Mining Activity
Document13 pages
Assessment of Water Resources Pollution Associated With Mining Activity
Victor Gallo Ramos
No ratings yet
4th Quarter Long Quiz in Science
Document2 pages
4th Quarter Long Quiz in Science
Ederlina Bentilanon Fagtanan
No ratings yet
Vacon OPTE9 Dual Port Ethernet Board Manual DPD01583E UK
Document220 pages
Vacon OPTE9 Dual Port Ethernet Board Manual DPD01583E UK
Ha Nguyen
No ratings yet
FP35 Hearing Aid Analyzer Training
Document85 pages
FP35 Hearing Aid Analyzer Training
Pamela Tamara Fernández Escobar
No ratings yet
Sweater Consumption Costing
Document2 pages
Sweater Consumption Costing
api-214283679
No ratings yet
Landslides and Engineered Slopes - Chen 2008 PDF
Document2,170 pages
Landslides and Engineered Slopes - Chen 2008 PDF
Lupu Daniel
100% (2)