You are on page 1of 20

TEXT CLASSIFICATION

CREATED BY:-
Kshitij Gupta, BIG DATA ANALYTICS
School of Data Science and Forecasting )

1
What is Text Classification
Text classification (a.k.a. text categorization or text
tagging) is the task of assigning a set of predefined
categories to free-text.

Text classifiers can be used to organize, structure,


and categorize pretty much anything.

For example, new articles can be organized by topics,


support tickets can be organized by urgency, chat
conversations can be organized by language, brand
mentions can be organized by sentiment, and so on.

2
APPLICATIONS

• Classifying large textual data helps in standardizing


the platform, make search easier and relevant, and
improves user experience by simplifying navigation.

• Tagging content or products using categories as a way


to improve browsing or to identify related content on
your website.
• Platforms such as E-commerce, news agencies,
content curators, blogs, directories, and likes can use
automated technologies to classify and tag content
and products.
3
APPLICATIONS
• Text classification can also be used to automate CRM tasks.
The text classifier is highly customizable and can be trained
accordingly. The CRM tasks can directly be assigned and
analyzed based on importance and relevance. It reduces
manual work and thus is high time efficient.

• Text Classification of content on the website using tags helps


Google crawl your website easily which ultimately helps in
SEO. Additionally, automating the content tags on website
and app can make user experience better and helps to
standardize them. Another use case for the marketers would
be to research and analyze tags and keywords used by
competitors.

4
APPLICATIONS
• A faster emergency response system can be made by
classifying panic conversation on social media. Authorities
can monitor and classify emergency situation to make a quick
response if any such situation arises. This is a case of very
selective classification

• As marketing is becoming more targeted everyday,


automated classification of users into cohorts can make
marketer’s life simple. Marketers can monitor and classify
users based on how they talk about a product or brand
online. The classifier can be trained to identify promoters or
detractors. Thus, making brands to serve the cohorts better.

5
How Does Text Classification Work?

There are many approaches to automatic text


classification, which can be grouped into three different
types of systems:

- Rule-based systems
- Machine Learning based systems
- Hybrid systems

6
Rule Based Classification
• Rule-based approaches classify text into organized
groups by using a set of handcrafted linguistic rules.
These rules instruct the system to use semantically
relevant elements of a text to identify relevant
categories based on its content. Each rule consists of
an antecedent or pattern and a predicted category.

• For example, this rule-based system will classify the


headline “When is LeBron James' first game with the
Lakers?” as Sports because it counted 1 sport-related
term (Lebron James) and it didn’t count any politics-
related terms.
7
Rule Based Classification
• Rule-based systems are human comprehensible and
can be improved over time. But this approach has
some disadvantages. For starters, these systems
require deep knowledge of the domain. They are also
time-consuming, since generating rules for a complex
system can be quite challenging and usually requires
a lot of analysis and testing. Rule-based systems are
also difficult to maintain and don’t scale well given that
adding new rules can affect the results of the pre-
existing rules.

8
Machine Learning Based Systems
• Instead of relying on manually crafted rules, text
classification with machine learning learns to make
classifications based on past observations. By using
pre-labeled examples as training data, a machine
learning algorithm can learn the different associations
between pieces of text and that a particular output (i.e.
tags) is expected for a particular input (i.e. text).

9
Machine Learning Based Systems
• The first step towards training a classifier with
machine learning is feature extraction: a method is
used to transform each text into a numerical
representation in the form of a vector. One of the most
frequently used approaches is bag of words, where a
vector represents the frequency of a word in a
predefined dictionary of words.

• For example, if we have defined our dictionary to have


the following words {This, is, the, not, awesome, bad,
basketball}, and we wanted to vectorize the text “This
is awesome”, we would have the following vector
representation of that text: (1, 1, 0, 0, 1, 0, 0).
10
Text Classification Algorithms:-
Some of the most popular machine learning algorithms for
creating text classification models include the naive bayes
family of algorithms, support vector machines, and deep
learning.
1- NAÏVE BAYES-
Naive Bayes is based on Bayes’s Theorem, which helps us
compute the conditional probabilities of occurrence of two
events based on the probabilities of occurrence of each
individual event. This means that any vector that represents a
text will have to contain information about the probabilities of
appearance of the words of the text within the texts of a given
category so that the algorithm can compute the likelihood of
that text’s belonging to the category.
11
Text Classification Algorithms:-
2- SVM- SUPPORT VECTOR MACHINE

Like naive bayes, SVM doesn’t need much training data to start
providing accurate results. Although it needs more
computational resources than Naive Bayes, SVM can achieve
more accurate results.
SVM takes care of drawing a “line” or hyperplane that divides a
space into two subspaces: one subspace that contains vectors
that belong to a group and another subspace that contains
vectors that do not belong to that group. Those vectors are
representations of your training texts and a group is a tag you
have tagged your texts with.
12
Text Classification Algorithms:-
3- DEEP LEARNING

The two main deep learning architectures used in text


classification are Convolutional Neural Networks (CNN) and
Recurrent Neural Networks (RNN).
Deep learning algorithms require much more training data than
traditional machine learning algorithms, i.e. at least millions of
tagged examples. On the other hand, traditional machine
learning algorithms such as SVM and NB reach a certain
threshold where adding more training data doesn’t improve
their accuracy. In contrast, deep learning classifiers continue to
get better the more data you feed them with
13
Why is Text Classification Important
It is estimated that around 80% of all information is
unstructured, with text being one of the most common types of
unstructured data. Because of the messy nature of text,
analyzing, understanding, organizing, and sorting through text
data is hard and time-consuming so most companies fail to
extract value from that.
This is where text classification with machine learning steps in.
By using text classifiers, companies can structure business
information such as email, legal documents, web pages, chat
conversations, and social media messages in a fast and cost-
effective way. This allows companies to save time when
analyzing text data, help inform business decisions, and
automate business processes.
14
Why is Text Classification Important
Some of the reasons why companies are leveraging text
classification with machine learning are the following:

1- Scalability
Manually analyzing and organizing text takes time. It’s a slow
process where a human needs to read each text and decide
how to structure it. Machine learning changes this and enables
to easily analyze millions of texts at a fraction of a cost.

15
Why is Text Classification Important
2- Real-time analysis
There are critical situations that companies need to identify as
soon as possible and take immediate action (e.g. PR crises on
social media). Text classifiers with machine learning can make
accurate precisions in real-time that enable companies to
identify critical information instantly and take action right away.
3- Consistent criteria
Human annotators make mistakes when classifying text data
due to distractions, fatigue, and boredom. Other errors are
generated due to inconsistent criteria. In contrast, machine
learning applies the same lens and criteria to all of the data,
thus allowing humans to reduce errors with centralized text
classification models.
16
Text Classification Applications
Text classification can be used in a broad range of contexts
such as classifying short texts (e.g. as tweets, headlines or
tweets) or organizing much larger documents (e.g. customer
reviews, media articles or legal contracts). Some of the most
well-known examples of text classification include sentiment
analysis, topic labeling, language detection, and intent
detection.
1- Sentiment Analysis
Probably the most common example of text classification is
sentiment analysis: the automated process of determining
whether a text is positive, negative, or neutral. Companies are
using sentiment classifiers on a wide range of applications,
such as product analytics, brand monitoring, customer support,
market research, workforce analytics, and much more. 17
Text Classification Applications
2- Topic Labelling:-
Another common example of text classification is topic labeling,
that is, understanding what a given text is talking about. It’s
often used for structuring and organizing data such as
organizing customer feedback by its topic or organizing news
articles according to their subject.

3- Language Detection-
Language detection is another great example of text
classification, that is, the process of classifying incoming text
according to its language. These text classifiers are often used
for routing purposes (e.g. route support tickets according to
their language to the appropriate team). 18
Text Classification Applications
4- Intent Detection:-
Companies are also using text classifiers for automatically
detecting the intent from customer conversations, often used
for generating product analytics or automating business
purposes.

19
20

You might also like