Professional Documents
Culture Documents
534
534
HISTORY OF NLP
We have divided the history of NLP into four phases. The phases have distinctive concerns and
styles.
First Phase (Machine Translation Phase) – Late 1940s to late 1960s
The work done in this phase focused mainly on machine translation (MT). This phase was a
period of enthusiasm and optimism.
Let us now see all that the first phase had in it:
1. The research on NLP started in early 1950s after Booth & Richens’ investigation and
Weaver’s memorandum on machine translation in 1949.
2. 1954 was the year when a limited experiment on automatic translation from Russian to
English demonstrated in the Georgetown-IBM experiment.
3. In the same year, the publication of the journal MT (Machine Translation) started.
4. The first international conference on Machine Translation (MT) was held in 1952 and
second was held in 1956.
In this phase, the work done was majorly related to world knowledge and on its role in the
construction and manipulation of meaning representations. That is why, this phase is also called
AI-flavored phase.
The phase had in it, the following:
1. In early 1961, the work began on the problems of addressing and constructing data or
knowledge base. This work was influenced by AI.
2. In the same year, a BASEBALL question-answering system was also developed. The
input to this system was restricted and the language processing involved was a simple
one.
1
3. A much advanced system was described in Minsky (1968). This system, when
compared to the BASEBALL question-answering system, was recognized and provided
for the need of inference on the knowledge base in interpreting and responding to
language input.
This phase can be described as the grammatical-logical phase. Due to the failure of practical
system building in last phase, the researchers moved towards the use of logic for knowledge
representation and reasoning in AI.
1. The grammatical-logical approach, towards the end of decade, helped us with powerful
general-purpose sentence processors like SRI’s Core Language Engine and Discourse
Representation Theory, which offered a means of tackling more extended discourse.
2. In this phase we got some practical resources & tools like parsers, e.g. Alvey Natural
Language Tools along with more operational and commercial systems, e.g. for database
query.
We can describe this as a lexical & corpus phase. The phase had a lexicalized approach to
grammar that appeared in late 1980s and became an increasing influence. There was a
revolution in natural language processing in this decade with the introduction of machine
learning algorithms for language processing.
Ambiguity, generally used in natural language processing, can be referred as the ability of being
understood in more than one way. In simple terms, we can say that ambiguity is the capability
of being understood in more than one way. Natural language is very ambiguous. NLP has the
following types of ambiguities:
Lexical Ambiguity
The ambiguity of a single word is called lexical ambiguity. For example, treating the word silver as a
noun, an adjective, or a verb.
Syntactic Ambiguity
This kind of ambiguity occurs when a sentence is parsed in different ways. For example, the
sentence . “ the man saw the girl with the telescope”. It is ambiguous whether the man saw the
girl carrying a telescope or he saw her through his telescope
Semantic Ambiguity
This kind of ambiguity occurs when the meaning of the words themselves can be
misinterpreted. In other words, semantic ambiguity happens when a sentence contains an
ambiguous word or phrase. For example, the sentence “The car hit the pole while it was
moving” is having semantic ambiguity because the interpretations can be “The car, while
moving, hit the pole” and “The car hit the pole while the pole was moving”.
Anaphoric Ambiguity
This kind of ambiguity arises due to the use of anaphora entities in discourse. For example, the
horse ran up the hill. It was very steep. It soon got tired. Here, the anaphoric reference of “it” in
two situations cause ambiguity.
3
Pragmatic ambiguity
Such kind of ambiguity refers to the situation where the context of a phrase gives it multiple
interpretations. In simple words, we can say that pragmatic ambiguity arises when the statement
is not specific. For example, the sentence “I like you too” can have multiple interpretations like
I like you (just like you like me), I like you (just like someone else dose).
4
Phases of NLP
There are the following five phases of NLP:
The first phase of NLP is the Lexical Analysis. This phase scans the source code as a stream of
characters and converts it into meaningful lexemes. It divides the whole text into paragraphs,
sentences, and words.
Syntactic Analysis is used to check grammar, word arrangements, and shows the relationship
among the words.
5
In the real world, Agra goes to the Poonam, does not make any sense, so this sentence is
rejected by the Syntactic analyzer.
3. Semantic Analysis
Semantic analysis is concerned with the meaning representation. It mainly focuses on the literal
meaning of words, phrases, and sentences.
4. Discourse Integration
Discourse Integration depends upon the sentences that proceeds it and also invokes the meaning
of the sentences that follow it.
5. Pragmatic Analysis
Pragmatic is the fifth and last phase of NLP. It helps you to discover the intended effect by
applying a set of rules that characterize cooperative dialogues.
NLP APIs
Pricing: Firstly, it offers a free 30 days trial IBM cloud account. You can also opt for its
paid plans.
6
ChatbotAPI
Chatbot API allows you to create intelligent chatbots for any service. It supports Unicode
characters, classifies text, multiple languages, etc. It is very easy to use. It helps you to create a
chatbot for your web applications.
Pricing: Chatbot API is free for 150 requests per month. You can also opt for its paid version,
which starts from $100 to $5,000 per month.
Sentiment Analysis API is also called as 'opinion mining' which is used to identify the tone of a
user (positive, negative, or neutral)
Pricing: Sentiment Analysis API is free for less than 500 requests per month. Its paid
version starts form $19 to $99 per month.
The Translation API by SYSTRAN is used to translate the text from the source language to the
target language. You can use its NLP APIs for language detection, text segmentation, named
entity recognition, tokenization, and many other task
Pricing: This API is available for free. But for commercial users, you need to use its paid
version.
7
NLP Libraries
Scikit-learn: It provides a wide range of algorithms for building machine learning models in
Python.
Natural language Toolkit (NLTK): NLTK is a complete toolkit for all NLP techniques.
Pattern: It is a web mining module for NLP and machine learning.
TextBlob: It provides an easy interface to learn basic NLP tasks like sentiment analysis, noun
phrase extraction, or pos-tagging.
Quepy: Quepy is used to transform natural language questions into queries in a database query
language.
SpaCy: SpaCy is an open-source NLP library which is used for Data Extraction, Data Analysis,
Sentiment Analysis, and Text Summarization.
Gensim: Gensim works with large datasets and processes data streams.
Machine Learning (ML) and Artificial Intelligence (AI) are terms that are crucial to known
when creating machine learning driven solutions. But also the term NLP (Natural language
processing) is a term that is crucial for understanding current machine learning application that
are created for speech or text.
E.g. for bots with which you can converse instead of humans.
So lets start with a high level separation of common used terms and their meaning:
AI (Artificial intelligence) is concerned with solving tasks that are easy for humans but hard for
computers.
ML (Machine learning) is the science of getting computers to act without being explicitly
programmed. Machine learning (ML) is basically a learning through doing. Often ML is
regarded as a subset of AI.
NLP (Natural language processing) is the part of machine learning that has to do with
language (usually written).
8
9
METHODLOGOY
NLP technology is important for scientific, economic, social, and cultural reasons. Computers
that can communicate with humans as humans do are a holy grail. Including understanding
context and emotions. Due to infinite number of cultures solving this problem has proven to be
hard. Communication is not only verbal. Nonverbal is the significant part of communication.
Nonverbal communication is the nonlinguistic transmission of information through visual,
auditory, tactile, and kinesthetic (physical) channels.
NLP is experiencing rapid growth as its theories and methods are deployed in a variety of new
machine learning technologies.
More and more NLP techniques are used products that have serious impact on our daily lives. A
misinterpreted pronounced word like ‘stop’ can have several meanings when used within an
autonomous driving car. Maybe you just meant to warn your children instead of stopping the
vehicle on a dangerous intersection creating good NLP based application using machine
learning is hard. A simple test that gives an indication of the quality is to use a the sentence
“Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo”. This sentence is
correct. But many NLP algorithms and applications cannot handle this very well. Similar
sentences exist in other languages. We humans are by nature good with complicated linguistic
constructs, but many NLP algorithms still fail.
Basic NLP functions
NLP is able to do all kinds of basic text functions. A short overview:
Text Generation.
Machine learning capabilities have proven to be very useful for improving typical NLP based
use cases. This is due to the fact that text in digital format is widely available and machine
learning algorithms typically perform well on large amounts of data.
NLP papers and NLP software comes with a typical terminology:
Stemming and lemmatization. Normalizing words so that different forms map to the
canonical word with the
Part of speech detection: Identifying text as a verb, noun, participle, verb phrase, and so
on
To create NLP enabled applications you need to set up a ‘pipeline’ for the various software
building blocks. For each step in a NLP development pipeline another FOSS building block can
be needed. The figure below shows a typical NLP pipeline.
11
In the figure below a typical NLP architecture for providing input on common user questions. A
lot of Bot systems (Build–operate–transfer) or FAQ answering systems are created with no
machine learning algorithms at all. Often simple key word extraction is done and a simple
match in a database is performed. More state-of-the-art NLP systems make intensive use of
machine learning algorithms. The general rule is: If a application should be user friendly and
value added than learning algorithms are needed, since it is no longer possible to program
output based on given input.
12
An open-source NLP research library, built on PyTorch. AllenNLP is a NLP research library,
built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of
linguistic tasks. AllenNLP makes it easy to design and evaluate new deep learning models for
nearly any NLP problem, along with the infrastructure to easily run them in the cloud or on
your laptop.
13