You are on page 1of 16

NLP & Spacy

Agenda:
What Is NLP ?
What is NLTK?
NAÏVE BAYES ALGORITHM
spaCY
Natural Language Processing or
NLP is broadly defined as
automatic manipulation of natural
language , like speech and text , by
software .
NLP PIPELINE

A Pipeline is a way to design a program where the output of one module feeds to the input of the next .
Above image illustrates high level steps involved in building any NLP Model.
Data Cleaning
 We convert the raw text into
a list of words that are clean
text .
 Few example of data
cleaning techniques are
Tokenization, Stopwords
Removal, Stemming
The Process of converting words into
Vectorization numbers are called Vectorization.
Perform Classification

Text classification is the


process of assigning tags
or categories to the text
according to its contents
The Natural Language Toolkit
(NLTK) is a platform used for
What is NLTK ? building Python programs that
work with human language data
for applying in statistical natural
language processing (NLP)
NAÏVE BAYES ALGORITHM

Naïve Bayes Classifiers are family of simple probabilistic classifiers based on applying
Bayes’ theorem with strong independence assumptions between the features
spaCy is a free, open-source library for

What’s spaCy? advanced Natural Language


Processing (NLP) in Python.
Getting Started
with spaCy
spaCy’s Statistical Models

These models are the power engines of spaCy. These models enable spaCy to
perform several NLP related tasks, such as part-of-speech tagging, named entity
recognition, and dependency parsing.
spaCy’s PipelineProcessing
Why spaCy

You might also like