You are on page 1of 13

INTRODUCTION

HISTORY OF NLP

We have divided the history of NLP into four phases. The phases have distinctive concerns and
styles.
First Phase (Machine Translation Phase) – Late 1940s to late 1960s
The work done in this phase focused mainly on machine translation (MT). This phase was a
period of enthusiasm and optimism.
Let us now see all that the first phase had in it:

1. The research on NLP started in early 1950s after Booth & Richens’ investigation and
Weaver’s memorandum on machine translation in 1949.

2. 1954 was the year when a limited experiment on automatic translation from Russian to
English demonstrated in the Georgetown-IBM experiment.

3. In the same year, the publication of the journal MT (Machine Translation) started.

4. The first international conference on Machine Translation (MT) was held in 1952 and
second was held in 1956.

5. In 1961, the work presented in Teddington International Conference on Machine


Translation of Languages and Applied Language analysis was the high point of this
phase.
Second Phase (AI Influenced Phase) – Late 1960s to late 1970s

In this phase, the work done was majorly related to world knowledge and on its role in the
construction and manipulation of meaning representations. That is why, this phase is also called
AI-flavored phase.
The phase had in it, the following:

1. In early 1961, the work began on the problems of addressing and constructing data or
knowledge base. This work was influenced by AI.

2. In the same year, a BASEBALL question-answering system was also developed. The
input to this system was restricted and the language processing involved was a simple
one.

1
3. A much advanced system was described in Minsky (1968). This system, when
compared to the BASEBALL question-answering system, was recognized and provided
for the need of inference on the knowledge base in interpreting and responding to
language input.

Third Phase (Grammatical-logical Phase) – Late 1970s to late 1980s

This phase can be described as the grammatical-logical phase. Due to the failure of practical
system building in last phase, the researchers moved towards the use of logic for knowledge
representation and reasoning in AI.

1. The grammatical-logical approach, towards the end of decade, helped us with powerful
general-purpose sentence processors like SRI’s Core Language Engine and Discourse
Representation Theory, which offered a means of tackling more extended discourse.

2. In this phase we got some practical resources & tools like parsers, e.g. Alvey Natural
Language Tools along with more operational and commercial systems, e.g. for database
query.

3. The work on lexicon in 1980s also pointed in the direction of grammatical-logical


approach.

Fourth Phase (Lexical & Corpus Phase) – The 1990s

We can describe this as a lexical & corpus phase. The phase had a lexicalized approach to
grammar that appeared in late 1980s and became an increasing influence. There was a
revolution in natural language processing in this decade with the introduction of machine
learning algorithms for language processing.

Natural Language Processing (NLP) is a tract of Artificial Intelligence and Linguistics,


devoted to make computers understand the statements or words written in human languages.
Natural language processing came into existence to ease the user’s work and to satisfy the wish
to communicate with the computer in natural language. Since all the users may not be well-
versed in machine specific language, NLP caters those users who do not have enough time to
learn new languages or get perfection in it.
Study of Human Languages
Language is a crucial component for human lives and also the most fundamental aspect of our
behavior. We can experience it in mainly two forms – written and spoken. In the written form,
2
it is a way to pass our knowledge from one generation to the next. In the spoken form, it is the
primary medium for human beings to coordinate with each other in their day-to-day behavior.
Language is studied in various academic disciplines. Each discipline comes with its own set of
problems and a set of solution to address those.

Ambiguity and Uncertainty in Language

Ambiguity, generally used in natural language processing, can be referred as the ability of being
understood in more than one way. In simple terms, we can say that ambiguity is the capability
of being understood in more than one way. Natural language is very ambiguous. NLP has the
following types of ambiguities:

Lexical Ambiguity
The ambiguity of a single word is called lexical ambiguity. For example, treating the word silver as a
noun, an adjective, or a verb.

Syntactic Ambiguity
This kind of ambiguity occurs when a sentence is parsed in different ways. For example, the
sentence . “ the man saw the girl with the telescope”. It is ambiguous whether the man saw the
girl carrying a telescope or he saw her through his telescope

Semantic Ambiguity
This kind of ambiguity occurs when the meaning of the words themselves can be
misinterpreted. In other words, semantic ambiguity happens when a sentence contains an
ambiguous word or phrase. For example, the sentence “The car hit the pole while it was
moving” is having semantic ambiguity because the interpretations can be “The car, while
moving, hit the pole” and “The car hit the pole while the pole was moving”.

Anaphoric Ambiguity
This kind of ambiguity arises due to the use of anaphora entities in discourse. For example, the
horse ran up the hill. It was very steep. It soon got tired. Here, the anaphoric reference of “it” in
two situations cause ambiguity.

3
Pragmatic ambiguity
Such kind of ambiguity refers to the situation where the context of a phrase gives it multiple
interpretations. In simple words, we can say that pragmatic ambiguity arises when the statement
is not specific. For example, the sentence “I like you too” can have multiple interpretations like
I like you (just like you like me), I like you (just like someone else dose).

Natural Language Generation


Natural Language Generation (NLG) is the process of producing phrases, sentences and
paragraphs that are meaningful from an internal representation. It is a part of Natural Language
Processing and happens in four phases: identifying the goals, planning on how goals maybe
achieved by evaluating the situation and available communicative sources and realizing the
Plans as a text [Figure 3]. It is opposite to Understanding.

Figure3. Components of NLG

4
Phases of NLP
There are the following five phases of NLP:

1. Lexical Analysis and Morphological

The first phase of NLP is the Lexical Analysis. This phase scans the source code as a stream of
characters and converts it into meaningful lexemes. It divides the whole text into paragraphs,
sentences, and words.

2. Syntactic Analysis (Parsing)

Syntactic Analysis is used to check grammar, word arrangements, and shows the relationship
among the words.

Example: Agra goes to the Poonam

5
In the real world, Agra goes to the Poonam, does not make any sense, so this sentence is
rejected by the Syntactic analyzer.

3. Semantic Analysis

Semantic analysis is concerned with the meaning representation. It mainly focuses on the literal
meaning of words, phrases, and sentences.

4. Discourse Integration

Discourse Integration depends upon the sentences that proceeds it and also invokes the meaning
of the sentences that follow it.

5. Pragmatic Analysis

Pragmatic is the fifth and last phase of NLP. It helps you to discover the intended effect by
applying a set of rules that characterize cooperative dialogues.

For Example: "Open the door" is interpreted as a request instead of an order.

NLP APIs

Natural Language Processing APIs allow developers to integrate human-to-machine


communications and complete several useful tasks such as speech recognition, chatbots,
spelling correction, sentiment analysis, etc.

A list of NLP APIs is given below:

 IBM Watson API

IBM Watson API combines different sophisticated machine learning techniques to


enable developers to classify text into various custom categories. It supports multiple
languages, such as English, French, Spanish, German, Chinese, etc. With the help of
IBM Watson API, you can extract insights from texts, add automation in workflows,
enhance search, and understand the sentiment. The main advantage of this API is that it
easy to use.

Pricing: Firstly, it offers a free 30 days trial IBM cloud account. You can also opt for its
paid plans.

6
 ChatbotAPI

Chatbot API allows you to create intelligent chatbots for any service. It supports Unicode
characters, classifies text, multiple languages, etc. It is very easy to use. It helps you to create a
chatbot for your web applications.
Pricing: Chatbot API is free for 150 requests per month. You can also opt for its paid version,
which starts from $100 to $5,000 per month.

 Speech to text API

Speech to text API is used to convert speech to text


Pricing: Speech to text API is free for converting 60 minutes per month. Its paid version starts
form $500 to $1,500 per month.

 Sentiment Analysis API

Sentiment Analysis API is also called as 'opinion mining' which is used to identify the tone of a
user (positive, negative, or neutral)
Pricing: Sentiment Analysis API is free for less than 500 requests per month. Its paid
version starts form $19 to $99 per month.

 Translation API by SYSTRAN

The Translation API by SYSTRAN is used to translate the text from the source language to the
target language. You can use its NLP APIs for language detection, text segmentation, named
entity recognition, tokenization, and many other task
Pricing: This API is available for free. But for commercial users, you need to use its paid
version.

 Text Analysis API by AYLIEN


Text Analysis API by AYLIEN is used to derive meaning and insights from the textual content.
It is available for both free as well as paid from$119 per month. It is easy to use.
Pricing: This API is available free for 1,000 hits per day. You can also use its paid
version, which starts from $199 to S1, 399 per month.

7
NLP Libraries

Scikit-learn: It provides a wide range of algorithms for building machine learning models in
Python.
Natural language Toolkit (NLTK): NLTK is a complete toolkit for all NLP techniques.
Pattern: It is a web mining module for NLP and machine learning.
TextBlob: It provides an easy interface to learn basic NLP tasks like sentiment analysis, noun
phrase extraction, or pos-tagging.
Quepy: Quepy is used to transform natural language questions into queries in a database query
language.
SpaCy: SpaCy is an open-source NLP library which is used for Data Extraction, Data Analysis,
Sentiment Analysis, and Text Summarization.
Gensim: Gensim works with large datasets and processes data streams.

 Machine Learning (ML) and Artificial Intelligence (AI) are terms that are crucial to known
when creating machine learning driven solutions. But also the term NLP (Natural language
processing) is a term that is crucial for understanding current machine learning application that
are created for speech or text.

 E.g. for bots with which you can converse instead of humans.

 So lets start with a high level separation of common used terms and their meaning:

 AI (Artificial intelligence) is concerned with solving tasks that are easy for humans but hard for
computers.

 ML (Machine learning) is the science of getting computers to act without being explicitly
programmed. Machine learning (ML) is basically a learning through doing. Often ML is
regarded as a subset of AI.

 NLP (Natural language processing) is the part of machine learning that has to do with
language (usually written).

8
9
METHODLOGOY

NLP technology is important for scientific, economic, social, and cultural reasons. Computers
that can communicate with humans as humans do are a holy grail. Including understanding
context and emotions. Due to infinite number of cultures solving this problem has proven to be
hard. Communication is not only verbal. Nonverbal is the significant part of communication.
Nonverbal communication is the nonlinguistic transmission of information through visual,
auditory, tactile, and kinesthetic (physical) channels.

NLP is experiencing rapid growth as its theories and methods are deployed in a variety of new
machine learning technologies.
More and more NLP techniques are used products that have serious impact on our daily lives. A
misinterpreted pronounced word like ‘stop’ can have several meanings when used within an
autonomous driving car. Maybe you just meant to warn your children instead of stopping the
vehicle on a dangerous intersection creating good NLP based application using machine
learning is hard. A simple test that gives an indication of the quality is to use a the sentence
“Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo”. This sentence is
correct. But many NLP algorithms and applications cannot handle this very well. Similar
sentences exist in other languages. We humans are by nature good with complicated linguistic
constructs, but many NLP algorithms still fail.
Basic NLP functions
NLP is able to do all kinds of basic text functions. A short overview:

 Text Classification (e.g. document categorization).

 Finding words with the same meaning for search.

 Understanding how much time does it take to read a text.

 Understanding how difficult it is to read is a text.

 Identifying the language of a text.

 Generating a summary of a text.

 Finding similar documents.

 Identifying entities (e.g., cities, people, and locations) in a text.


10
 Translating a text.

 Text Generation.

 Machine learning capabilities.

Machine learning capabilities have proven to be very useful for improving typical NLP based
use cases. This is due to the fact that text in digital format is widely available and machine
learning algorithms typically perform well on large amounts of data.
NLP papers and NLP software comes with a typical terminology:

 Tokenizer: Splitting the text into words or phrases.

 Stemming and lemmatization. Normalizing words so that different forms map to the
canonical word with the

 same meaning. For example, “running” and “ran” map to “run.”

 Entity extraction: Identifying subjects in the text.

 Part of speech detection: Identifying text as a verb, noun, participle, verb phrase, and so
on

 Sentence boundary detection: Detecting complete sentences within paragraphs of text.

To create NLP enabled applications you need to set up a ‘pipeline’ for the various software
building blocks. For each step in a NLP development pipeline another FOSS building block can
be needed. The figure below shows a typical NLP pipeline.

11
In the figure below a typical NLP architecture for providing input on common user questions. A
lot of Bot systems (Build–operate–transfer) or FAQ answering systems are created with no
machine learning algorithms at all. Often simple key word extraction is done and a simple
match in a database is performed. More state-of-the-art NLP systems make intensive use of
machine learning algorithms. The general rule is: If a application should be user friendly and
value added than learning algorithms are needed, since it is no longer possible to program
output based on given input.

NLP Business challenges


When using NLP technology to extract information and insight from text, the starting point is
typically the raw documents stored on websites, unstructured documents and structured
documents. Also the fact that documents are stored in a variety of formats, like PDF, MS word,
TIFFs make that the time needed before text can be send towards an algorithm long and often
manual intensive. Even the most advanced web scraping techniques (software to store raw text
Of websites) is manual intensive. Unstructured text must be structured first before using this
text for a NLP driven applications is possible.

12
An open-source NLP research library, built on PyTorch. AllenNLP is a NLP research library,
built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of
linguistic tasks. AllenNLP makes it easy to design and evaluate new deep learning models for
nearly any NLP problem, along with the infrastructure to easily run them in the cloud or on
your laptop.

AllenNLP was designed with the following principles:


• Hyper-modular and lightweight. Use the parts which you like seamlessly with PyTorch.
• Extensively tested and easy to extend. Test coverage is above 90% and the example models
provide a template for contributions.
• Take padding and masking seriously, making it easy to implement correct models without the
pain.
• Experiment friendly. Run reproducible experiments from a json specification with
comprehensive logging.

13

You might also like