You are on page 1of 2

unit 1)Introduction to NLP

unit 2)Study of Grammar and Semantics books:Natural language


processing by Akshar Bhartati
unit3)Machine Translation
unit 4)Lexical: Functional Grammar (LFG) and Indian Languages

tool used -NLTK.


NLP--Natural Language Processing or NLP is an automated way to understand or
analyze the natural languages
and extract required information from such data by applying machine learning
Algorithms.

Components of nlp
Entity extraction--extracting entities like person,organisation,geographies,events
etc.
Syntactic analysis--proper ordering of words.(where as symantic means weather
sentense forming proper meaning )
Pragmatic analysis-- extracting information from text.

tf–idf ---term frequency -inverse document frequency=provide numerical


statistic,how important a word in a document or in set.

POS taggers--piece of software that reads and assign parts of speech to each word.

lemmatization/stemming--the main attempt is to identify and return root word of the


sentence eg boy's=boy,cars=car.

Tokenization is the process of tokenizing or splitting a string, text into a list


of tokens, eg word is token to sentence,
similarly sentence is token to paragraph.(nlp use sent_tokenize )

A corpus is a large and structured set of machine-readable texts that have been
produced in a natural
Brown corpus,
features of a text corpus in NLP
a. Count of the word in a document
b. Vector notation of the word
c. Part of Speech Tag
d. Basic Dependency Grammar

Natural Language Processing


interprets the request in form of language.

conversational interface
mixes voice, chats,with images,videos etc.

semantic analasys-process of relating syntatctic structure.

Masked language modelling is the process in which the output is taken from the
corrupted input.(help master down stream task)

Pragmatic Ambiguity can be defined as the words which have multiple


interpretations.do you want a cup of coffee
so it can be an informative question,or formal offer to make coffee.

Perplexity in NLP is a way to determine the extent of uncertainty in predicting


some text.

Pragmatics analysis that focuses on what was described is reinterpreted by what it


actually meant, deriving
the various aspects of language that require real-world knowledge.

N-gram in NLP is simply a sequence of n words, and we also conclude the sentences
which appeared more frequently.helps
predicting next word.

A corpus is a large and structured set of machine-readable texts that have been
produced in a natural

You might also like