Professional Documents
Culture Documents
Agenda:
What Is NLP ?
What is NLTK?
NAÏVE BAYES ALGORITHM
spaCY
Natural Language Processing or
NLP is broadly defined as
automatic manipulation of natural
language , like speech and text , by
software .
NLP PIPELINE
A Pipeline is a way to design a program where the output of one module feeds to the input of the next .
Above image illustrates high level steps involved in building any NLP Model.
Data Cleaning
We convert the raw text into
a list of words that are clean
text .
Few example of data
cleaning techniques are
Tokenization, Stopwords
Removal, Stemming
The Process of converting words into
Vectorization numbers are called Vectorization.
Perform Classification
Naïve Bayes Classifiers are family of simple probabilistic classifiers based on applying
Bayes’ theorem with strong independence assumptions between the features
spaCy is a free, open-source library for
These models are the power engines of spaCy. These models enable spaCy to
perform several NLP related tasks, such as part-of-speech tagging, named entity
recognition, and dependency parsing.
spaCy’s PipelineProcessing
Why spaCy