You are on page 1of 16

CHAPTER: 1- OVERVIEW OF NATURAL LANGUAGE

PROCESSING

1.1 INTRODUCTION
Human beings across the word speaks variety of languages, but Computers or Smart Machines can
understands only one language i.e. binary or in terms of 1s and 0s. So to interact with Computers
we need a system which converts our human language to the Computer understandable language.
Such systems or processed is known as Natural Language Processing In more technical terms we
can say that “ Natural Language Processing is the subdivision of Artificial Intelligence that deals
with analyzing, understanding and generating the language which human beings speaks commonly
in order to communicate with Smart Machine. Fig 1.1 shows the relationship of Natural Language
processing with Artificial intelligence and Machine Learning.

Fig 1.1 Relationship of NLP with AI and Machine Learning

In 1950s Computers were first utilized to operate natural languages with efforts to mechanize
communication between Russian and English. Such systems were not appropriate because it
required a human translators to pre-edit, post-edit English and Russian. Based on the Principle
code breaking techniques utilized in World War 2, in this technique individual words are chosen
and their meaning is checked in a glossary. Common accounts about these systems is that many
miss-translations occurred for example "hydraulic ram" interpreted as "water goat".

1
Sometimes there is a misperception between Natural Language Processing and Text Mining or
Text analysis, the table shows in Fig 2 Demystify the difference.

Fig 1.2: Difference between NLP and Text Mining

1.2 NEED OF NATURAL LANGUAGE PROCESSING

 Every day data generates in bulk in the format of text, so we need some intelligent systems
to analyze and translate the data in to appropriate form. With NLP, we can execute certain
tasks like Automated Speech and Automated text in very less time.
 We need Natural language Processing for Automation, today we want everything to be
automated, we want machine to perform tasks according to our commands, which is done
by speech recognition, and it can be done through Natural Language Processing.

1.3 COMPONENTS OF NATURAL LANGUAGE PROCESSING

There are two major components of Natural language Processing as shown in Fig 1.3.

Fig 1.3 Components of Natural language processing

2
1.4 NATURAL LANGUAGE UNDERSTANDING

Deriving relevant information or understanding the meaning of text is the most problematic part
of NLP which the smart machine encounter. The first step is the conversion of natural language in
to machine language. That’s how speech recognition system works and speech to text also works
in the same manner. This is the initial step in NLU. As soon as the information received is in text
form, the process of NLU occurs with an objective of picking out the meaning from the text. Many
of the speech recognition systems these days are working on Hidden Markov Models (HMMs), it
is a mathematical models which utilizes techniques of mathematics or statistics to change your
voice signal in to text by performing mathematical operation to grasp what is been said.

HMMs works by listening to what is spoken fragmenting it into small segments (generally 10-30
milliseconds), then by matching it with already recorded speech to detect the sounds you said in
every segment of your speech. Then, it targets at the series of phonemes and through
mathematically find out the most alike phrases and sentences spoken by the user. The output of
this data is obtained is text type. The second and the problematic step of NLU, is the part of real
understanding portion.

Various techniques are used by different NLP systems. However, the procedure is actually similar.
First it is to be understand by the computer that what each word is. It attempts to know either it is
a noun or a verb, or if it might be a past tense or present tense, etc. it is termed as Part of Speech
tagging (POS). Lexicon (a vocabulary) and a cluster of grammar rules are also there in NLP
arrangement, they are programed into the system. Advanced NLP algorithms use mathematical
machine learning tools to impose such instructions to the natural language and find the most
appropriate meaning for what was said. In the last step of process, the computer should understand
the meaning of what was spoken. There are various challenges in achieving this when assuming
difficulties such as words having numerous meanings (polysemy) or different words having similar
meanings (synonymy), but the NLU systems are trained by the developers to put on the instructions
in correct way.

3
1.5 NATURAL LANGUAGE GENERATION

NLG translates the artificial language of Computer into text, and can likewise moving towards
next step by transforming the text into speech signals through text to speech. Initially, the NLP
unit detects the data which is to be converted into text. When we asked our computer a question
with a reference to weather, then system generally search for the answer online, and from the
obtained search results it selects that the temperature, wind, and humidity are the things you want
to know and read them for you. Then, it systematizes the manner of how it is going to say it.
Utilizing rules of grammar and lexicon you need to know NLG system can formulate a complete
meaningful sentences. At last, if the text in natural language is going to be read audibly, text-to-
speech comes in action. The analysis of text in text-to-speech system is done by using a prosody
model, which detects breaks, duration, and pitch. Then, utilizing a speech data (prerecorded voice),
the system gathers all the saved sounds to form single coherent speech string. Several components
under Natural language generation are shown in Fig 1.4.

Fig 1.4 Natural Language Generation Components

4
Speaker and Generator- For the purpose of generating text it is necessary to have speaker, or we
need a generator or application or we can say a program to do such task for us

Components and Levels of Representation -The method of generation of language includes the
listed tasks.

 Content selection: data must be gathered and compiled in the set. Contingent on how this
information is analyzed into representational units, few portions of the unit might be
eliminated on the other hand some are included by default.
 Textual Organization: The data should be created textually in accordance with grammar.
It should be appropriate both in terms of sequentially and linguistics.
 Linguistic Resources: linguistic resources should be picked up in order to support the
information’s realization
 Realization: The nominated and codified resources must be understood as an authentic
text or voice output.

Application or Speaker – This is just for upholding the model of the situation. In this speaker
only pledgees the procedure and it w take any initiation in the generation of language. It saves the
history, design the material that is significant and organizes a display of what it really knows.

5
CHAPTER: 2- NATURAL LANGUAGE PROCESSING
WORKING AND EVALUATION

2.1 STAGES IN NATURAL LANGUAG

The five major stages of Natural language processing are shown in Fig 2.1

Fig 2.1 NLP Block Diagram

Lexical Analysis − It includes organizing and study the word structure. Lexicon in terms of any
language implies the bunch of phrases and words in that language. Lexical analysis is segmenting
the complete chunk of text into words, paragraph or sentences.

6
Syntactic Analysis – Also known as parsing (a mathematical technique), this analysis deals with
the analysis of words present in the sentence to check grammar grammar and codifying words in
the fashion that expresses the words connection. The sentence for example “The school goes to
boy” is considered as irrelevant by this analysis.

Semantic Analysis – Extracts the appropriate sense from the given text. The analysis of text is
done for relevance. In the task domain. It is performed by mapping syntactic structures and entities.
This analyzing system automatically rejects the sentence such as “hot cold-drink”.

Discourse Integration sometimes the meaning of any sentence in completely dependent on the
preceding sentence meaning. Also it figure out the next or immediate sentence meaning.

Pragmatic Analysis – here at this stage re-interpretation of what was said is done to know what
it actually means. Real world knowledge is necessary for the aspects of language derived by this
analysis.

Different phases of natural language processing are shown in Fig 2.2.

Fig 2.2 Different Phases of Natural Language Processing

7
2.2 CHALLENHES IN NATURAL LANGUAGE PROCESSING

Ambiguity is inseparable companion of human language. In case of NLP is a major challenge to


be overcome. Ambiguity in language can sometimes confuses or misunderstood by human so it is
obvious that machine will also face difficulty. When we talk about ambiguity in a language then
it is associated with sentences which are similar but their meaning is different.

The solution of ambiguity is not well defined in the field of science and technology. The ambiguity
generally depend on speaker who is speaking the language, from a technical point of view, any
sentence in a language with a much grammar can be interpreted in some different ways. Though,
most native speakers only recognize the primary clarification when hearing a phrase while
alternative representations may be clearer to non-native speakers whom, cognitively speaking,
need to rewire their brains in order to lean a new language.

Some ambiguities in NLP are:

• Lexical ambiguity − It is at very primitive level such as word-level.


• Syntax Level ambiguity − A sentence can be parsed in different ways.
• Referential ambiguity − Referring to something using pronouns.

2.3 MAJOR EVALUATIONS AND TASKS

2.3.1 Syntax

(i). Morphological Segmentation

It performs word distinction into distinct morphemes and determine the category of the
morphemes. The complication of such job mainly depend on the morphological trouble, i.e.
language’s word structure is taken in consideration. English language justify tranquil morphology,
specifically inflectional morphology, and hence it’s generally possible to neglect the job
completely and easily model all likely possibilities of a word such as “open, opens, opened,
opening" as discrete words. In Turkish or Meitei language, a highly adhered Indian language, but,
such a method isn’t conceivable, every glossary has millions forms of words.

8
(ii). Part of speech Tagging

Part of speech for each word of the given sentence is to be determined. Several words, mainly
similar, can behave as several parts of speech. Consider an example, "book" can be a noun ("the
book on the table") or verb ("to book a flight"); "set" can be an adjective, noun and verb; and "out"
is of at least any five distinct portions of speech. Certain languages constitutes such vagueness
many times than others. Those Languages with a bit morphology, as English, are specifically
susceptible to to such ambiguity, one of the language is Chinese which is prone to such ambiguity.

(iii). Parsing

It is the geometrical study of a sentence written. The grammar for natural languages is vague and
distinctive sentences have possibly multiple analyses. It is surprising that for a distinctive sentence
there might exist huge number of parses many of them are actually absurd to a human. Generally
we have two major types owing, Dependency and Constituency Parsing. Dependency Parsing is
associated with the relations of words in a sentence, while Constituency Parsing is associated with
constructing out the Parse Tree utilizing a Probabilistic Context-Free Grammar (PCFG).

(iv). Sentence Breaking or Sentence boundary disambiguation

Let us have piece of text, determine the limits of sentences. These limits are frequently marked
with some punctuation marks, also these can facilitate other purposes like to mark’ abbreviations.

(v). Stemming or word segmentation

Isolated a portion of provided text into distinct words. For a language such as English, this is
unimportant, since words are generally parted by spaces. Word boundaries are not defined in such
a way in the languages such as Chinese, Japanese, etc. And in such languages text partition is an
crucial job to be done.

(vi). Terminology Extraction

Its objective is to automatically extract pertinent terms from a provided quantity.

9
2.3.2 Semantics

(i). Lexical semantics

It is the computational sense of separate words in context

(ii). Machine translation

Automatic transformation of text from one language to a different human language. This is much
complicated issues, and is a part of a class of troubles known as "AI-complete", which means
requiring all distinct types of knowledge that humans possess (grammar, real word facts, etc.) to
resolve the problem.

(iii). Named entity recognition (NER)

Provided a text array, find out the object in the text is associated with correct names, like people,
place, object, and every noun (e.g. person, place, thing). It should be kept in mind that, though
capitalization can support in identifying named items in languages like English, such data cannot
be helpful in identifying the kind of named item, and in any case is repeatedly not accurate.

(vi). Optical character recognition (OCR)

An image is provided on which printed text is represented, identify the text.

(vii). Question answering

Assumed a question in natural human -language, find the answer. A genuine questions have a
specific correct answer (such as "What is the capital of India?"), sometimes a vague question is
also possible (such as "What is the meaning of humanity"), or even more complex questions can
be asked.

(viii). Textual entailment Recognition

Two text segments are provided, identify if one is true, involves the other's negation, or allows the
other to be either true or false.

(ix). Extraction of Relation

Provided a text cluster, determine the interconnection amid named entities for example who is te
brother of whom.
10
(x). Sentiment analysis

Pick out subjective data generally through a collection of files, mainly making use of reviews
which there online to check the "polarity" of definite entities. It is helpful particularly for analyzing
fashions of people’s thoughts in the online social sites, for the objective of promotion

(xi) Recognition and segmentation of topic

A cluster of text is provided, break it in parts, every segment is associated to a particular subject,
and determine the topic of the part.

(xii). Word sense disambiguation

Words sometimes can be used in different sense; our motive is to pick the meaning which is most
appropriate. For this issue, we are normally provided a set of words and related wordsenses, an
online source can be used.

2.3.3 Discourse

(i).Auto summarization

Generate a proper summary =of provided text. Commonly utilized to generate text’s summary
which a magazine’s business section article is of known type, for example.

(ii). Co reference resolution

A sentence or a group of sentence is asse, figure out which words ("mentions") refer to the same
objects ("entities"). Anaphora resolution is a good example of this task, and is particularly
concerned with matching up pronouns with the nouns or names to which they refer. The more
common task of co reference resolution is also to identifying so-called "bridging relationships"
including referring expressions. For example, in a sentence such as "He entered Jordan’s house
through the back door", "the back door" is a referring expression and the bridging relationship to
be determined is the fact that the door being referred to is the back door of Jordan’s house.

(iii). Discourse analysis

11
This many interconnected jobs. One job is to identify the discourse organization of associated text,
interconnection amid sentences for example elaboration, explanation, contrast. Another significant
job is to identify and categories the speech behaves in a bunch of text.

2.3.3 Speech

(i). Speech recognition

An audio clip of someone speaking is provided, now the content in the speech is to be determined
This process is the reverse of TTS system and is one off the most difficult issue known as "A-
complete". In human speech no pauses are found amid consecutive words, and hence
fragmentation is an essential subtask to be performed by speech recognition. It is to be noted that
in many spoken languages, the voice signals representing successive letters blend into each other
in a process termed articulation, hence the most difficult process is to convert the analog signal to
discrete characters.

2.4 SPEECH SYNTHESIS

The human speech which is generated by an artificial methods or techniques is known as speech
synthesis. For this purpose specific type of systems used, known as speech computer or speech
synthesizer, and software or hardware implementation is possible. A text-to-speech (TTS) system
is utilized to transform human language text into audio signal; processed voice is generated by
rearranging and assembling the pieces of speech which is already saved in the database. The TTS
systems has a wide range of application in today’s technically advanced Gadgets. Today we are
looking for automation all around us, and TTS is important unit for this purpose. The block
diagram is shown in Figure 2.3

Fig 2.3 Block diagram of Speech Synthesis

12
CHAPTER: 3- NATURAL LANGUAGE PROCESSING
APPLICATIONS

The major applications of Natural Language Processing are:

 Categorization of text
 Modeling of Language
 Caption Generation
 Machine Translation
 Question Answering
 Document Summarization
 Sentiment Analysis

3.1 Text Classification


With text organization one cam assign already defined classes to a content and organize it to
support you find the data that you want or makes some tasks easy. For example, filtering an email
as span is a good example.

3.2 Language Modeling

The probability of word occurrence in a given text is known through language modeling. Simplest
models may focus at a context of a short sequence of words, whereas big models may work at the
sentences or paragraphs level. Generally the operation of Language Model is at the level of words.

3.3 Caption Generation

In artificial intelligence caption generation is one of the challenging, it is problematic to generate


a textual description given a photograph which is human readable. It needs both image
understanding from the area of computer vision and a language model from the domain of natural
language processing. It is to assume and evaluate various ways to frame a given predictive

13
modeling probed there are indeed several ways to frame the problem of producing captions for
photographs.

3.4 Machine Translation

Since the quantity of data provided on web is expanding, the requirement to access it becomes
significant and the relevance of uses of natural language processing become known. Machine
translation facilitate us to overcome hampers due language barriers that we frequently encounter
by translating technical manuals, support content or catalogs at lesser cost The major challenge
with technology of machine translation is not only to translate words, but also to understand the
sentence meaning to offer a correct conversion of the given text.

3.5 Question Answering

As technologies to understand speech or and voice synthesizing application are enhancing day by
day the requirement for NLP is also attaining a swift growth. Question-Answering (QA) is become
more and more popular it is because of apps likes, OK Google, chat boxes and virtual assistants.
A QA application is a unit which is able to correctly answering a human questions. It may be used
as a text-only interface or as a spoken dialog system. While they offer great promise, they still
have a long way to go (take a look of these video to see what happens when two spoken dialog
systems talk to each other: https://youtu.be/WnzlbyTZsQY). This remains an important challenge
especially for search engines, and is an important application of natural language processing
research.

3.6 Automatic summarization

The major problem which is been encountered is information overload when we want to access a
specific, important section of data from a huge knowledge base. This technique is relevant not only
for generating summary but also the meaning of content and information, and also to understand
the emotional senses contained in data, like in gathering from social networking sites. Automatic
summarization is especially relevant when used to facilitate a news item in nut shell or blog posts,
while avoiding redundancy from multiple sources and maximizing the diversity of content
obtained.

14
3.7 Sentiment analysis

It is the technique to analyze the opinions and views of people on any particular subjects and derive
a conclusion. Companies uses this technique to analyze the opinion of people on their products
and analyze the results , and on the basis of conclusion obtained, they made desired changes in the
product. For this purpose NLP is used widely.

3.8 FUTURE SCPOE OF NATURAL LANGUAGE PROCESING

By the year 2025 the market of Natural Language Processing Will reach $22.3 Billion, Adoption
of Natural language processing is attaining swift growth, it is not due to the creation of new
algorithms of NLP , as the data science in that regard is developing,” as it is said by Data Scientist
Mark Become. “NLP acceptance has enhanced during past few years because of accessible,
economic computational power, the rise in digitization of all data, and the merging of NLP with
machine learning (ML) and deep learning (DL).” The growth can be observed from the graph
shown in Figure 3.1

Fig 3.1 Future Scope of Natural Language Processing

15
16

You might also like