You are on page 1of 5

Preface

Reader feedback
Feedback from our readers is always welcome. Let us know what you think about
this book—what you liked or disliked. Reader feedback is important for us as it helps
us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail feedback@packtpub.com, and mention


the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing
or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support
Now that you are the proud owner of a Packt book, we have a number of things to
help you to get the most from your purchase.

Downloading the example code


You can download the example code files from your account at http://www.
packtpub.com for all the Packt Publishing books you have purchased. If you
purchased this book elsewhere, you can visit http://www.packtpub.com/support
and register to have the files e-mailed directly to you.

Errata
Although we have taken every care to ensure the accuracy of our content, mistakes
do happen. If you find a mistake in one of our books—maybe a mistake in the text or
the code—we would be grateful if you could report this to us. By doing so, you can
save other readers from frustration and help us improve subsequent versions of this
book. If you find any errata, please report them by visiting http://www.packtpub.
com/submit-errata, selecting your book, clicking on the Errata Submission Form
link, and entering the details of your errata. Once your errata are verified, your
submission will be accepted and the errata will be uploaded to our website or added
to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/


content/support and enter the name of the book in the search field. The required
information will appear under the Errata section.

[ ix ]

www.it-ebooks.info
Piracy
Piracy of copyrighted material on the Internet is an ongoing problem across all
media. At Packt, we take the protection of our copyright and licenses very seriously.
If you come across any illegal copies of our works in any form on the Internet, please
provide us with the location address or website name immediately so that we can
pursue a remedy.

Please contact us at copyright@packtpub.com with a link to the suspected pirated


material.

We appreciate your help in protecting our authors and our ability to bring you
valuable content.

Questions
If you have a problem with any aspect of this book, you can contact us at
questions@packtpub.com, and we will do our best to address the problem.

www.it-ebooks.info
Introduction to Natural
Language Processing
I will start with the introduction to Natural Language Processing (NLP). Language
is a central part of our day to day life, and it's so interesting to work on any problem
related to languages. I hope this book will give you a flavor of NLP, will motivate
you to learn some amazing concepts of NLP, and will inspire you to work on some of
the challenging NLP applications.

In my own language, the study of language processing is called NLP. People


who are deeply involved in the study of language are linguists, while the term
'computational linguist' applies to the study of processing languages with
the application of computation. Essentially, a computational linguist will be a
computer scientist who has enough understanding of languages, and can apply his
computational skills to model different aspects of the language. While computational
linguists address the theoretical aspect of language, NLP is nothing but the
application of computational linguistics.

NLP is more about the application of computers on different language nuances,


and building real-world applications using NLP techniques. In a practical context,
NLP is analogous to teaching a language to a child. Some of the most common
tasks like understanding words, sentences, and forming grammatically and
structurally correct sentences, are very natural to humans. In NLP, some of these
tasks translate to tokenization, chunking, part of speech tagging, parsing, machine
translation, speech recognition, and most of them are still the toughest challenges
for computers. I will be talking more on the practical side of NLP, assuming that we
all have some background in NLP. The expectation for the reader is to have minimal
understanding of any programming language and an interest in NLP and Language.

By end of the chapter we want readers

• A brief introduction to NLP and related concepts.


[1]

www.it-ebooks.info
Introduction to Natural Language Processing

• Install Python, NLTK and other libraries.


• Write some very basic Python and NLTK code snippets.

If you have never heard the term NLP, then please take some time to read any of the
books mentioned here—just for an initial few chapters. A quick reading of at least
the Wikipedia page relating to NLP is a must:

• Speech and Language Processing by Daniel Jurafsky and James H. Martin


• Statistical Natural Language Processing by Christopher D. Manning and
Hinrich Schütze

Why learn NLP?


I start my discussion with the Gartner's new hype cycle and you can clearly see NLP
on top of the cycle. Currently, NLP is one of the rarest skill sets that is required in
the industry. After the advent of big data, the major challenge is that we need more
people who are good with not just structured, but also with semi or unstructured
data. We are generating petabytes of Weblogs, tweets, Facebook feeds, chats,
e-mails, and reviews. Companies are collecting all these different kind of data for
better customer targeting and meaningful insights. To process all these unstructured
data source we need people who understand NLP.

We are in the age of information; we can't even imagine our life without Google.
We use Siri for the most of basic stuff. We use spam filters for filtering spam emails.
We need spell checker on our Word document. There are many examples of real
world NLP applications around us.

Image is taken from http://www.gartner.com/newsroom/id/2819918

[2]

www.it-ebooks.info
Chapter 1

Let me also give you some examples of the amazing NLP applications that you can
use, but are not aware that they are built on NLP:

• Spell correction (MS Word/ any other editor)


• Search engines (Google, Bing, Yahoo, wolframalpha)
• Speech engines (Siri, Google Voice)
• Spam classifiers (All e-mail services)
• News feeds (Google, Yahoo!, and so on)
• Machine translation (Google Translate, and so on)
• IBM Watson

Building these applications requires a very specific skill set with a great
understanding of language and tools to process the language efficiently. So it's not
just hype that makes NLP one of the most niche areas, but it's the kind of application
that can be created using NLP that makes it one of the most unique skills to have.
To achieve some of the above applications and other basic NLP preprocessing, there
are many open source tools available. Some of them are developed by organizations
to build their own NLP applications, while some of them are open-sourced. Here is a
small list of available NLP tools:

• GATE
• Mallet
• Open NLP
• UIMA
• Stanford toolkit
• Genism
• Natural Language Tool Kit (NLTK)

Most of the tools are written in Java and have similar functionalities. Some of them
are robust and have a different variety of NLP tools available. However, when it
comes to the ease of use and explanation of the concepts, NLTK scores really high.
NLTK is also a very good learning kit because the learning curve of Python (on
which NLTK is written) is very fast. NLTK has incorporated most of the NLP tasks,
it's very elegant and easy to work with. For all these reasons, NLTK has become one
of the most popular libraries in the NLP community:

[3]

www.it-ebooks.info

You might also like