Natural Language Processing

Natural Language Processing
DR.VMS
Sentiment analysis
 A technique used to interpret and classify emotions in subjective data. Sentiment analysis
is often performed on textual data to detect sentiment in emails, survey responses, social
media data, and beyond.
Text classification
 Text classification is the process of categorizing text into organized groups. By

using Natural Language Processing (NLP), text classifiers can automatically analyze text
and then assign a set of pre-defined tags or categories based on its content.
NLP
 Identify, Analyze, Understand and Generate human languages

 Applying computational techniques to natural language
• Explain computational linguistic theories
• Apply artificial Intelligence into possible contexts and makes
•Apply all statistical and mathematical models in human language use and usage
NLP
 NLP is used to teach a machine how to read and understand human languages. Trained
machines can extract the relationships between words, identify the entities in a sentence
(i.e., entity-recognition), etc.
Tokenizing
Breaking up a stream of characters into words, punctuation marks, numbers and other
discrete items.
Parts of speech
 Noun -fish, book, house, pen, procrastination, language

 Proper noun -John, France, Barack, Goldsmiths, Python
 Verb- loves, hates, studies, sleeps, thinks, is, has

 Adjective -grumpy, sleepy, happy, bashful
 Adverb- slowly, quickly, now, here, there
 Pronoun- I, you, he, she, we, us, it, they
 Preposition- in, on, at, by, around, with, without
 Conjunction -and, but, or, unless
 Determiner -the, a, an, some, many, few, 100
Constituent structure
 (((the | a)(cat | dog))(John | Jack | Susan))(barked | slept)

 Sentence → Noun Phrase, Verb Phrase
 Noun Phrase → Determiner, Noun (Example: the, dog)
 Noun Phrase → Proper Noun (Example: Jack)
 Noun Phrase → Noun Phrase, Conj,
 Noun Phrase (Examples: Jack and Jill, the owl and the pussycat)
 Verb Phrase → Verb, Noun Phrase (Example: saw the rabbit)
 Verb Phrase → Verb, Preposition, Noun Phrase (Examples: went up the hill, sat on the
mat)
corpus
 corpus is a collection of data selected with a descriptive or applicative aim as its purpose
 a corpus must possess a common set of fundamental properties, including
representativeness, a finite size and existing in electronic format.
The linguistic data consortium
 Founded in 1992 and based at the University of Pennsylvania in the United States, this
research and development center is financed primarily by the National Science
Foundation (NSF). Its main activities consist of collecting, distributing and annotating
linguistic resources which correspond to the needs of research centers and American
companies which work in the field of language technology. The linguistic data consortium
(LDC) owns an extensive catalog of written and spoken corpora which covers a fairly
large number of different languages.
LFG-GPSG
 In LFG one parses sentences and builds up functional structures, in GPSG sentences are
parsed and translated into formulas of intentional logic, hardly anyone knows how to
generate from f-structures or from logical formulas
LFG
 Lexical Functional Grammar arose in the late 1970’s through the collaboration of Joan
Bresnan (a linguist) and Ronald Kaplan
 Lexical Functional Grammar emphasizes analysis of certain phenomena in lexical and
functional terms,
LFG-Lexical Functional Grammar
 Two levels of structure c and f

 C-structure (tree)
 LFG c-structures adopt the X-bar model of capturing head-dependent relations, and treat
‘functional’ elements such as Determiners, Complementizers and Inflections as co-heads
of lexical elements such as Nouns and Verbs. LFG c-structures however are subject to the
lexical integrity principle which states that minimal c-structure elements are whole words,
not part of words or empty categories.
C Structure
F structure
 F-structure (representation of grammatical functions)

 F-structures capture functional information and are sets of paired attributes and values in an
attribute-value matrix. Attributes are morpho-syntactic features (derived from lexical entries)
such as TENSE or NUMBER, or grammatical functions such as SUBJECT and OBJECT
F Structure
Generalized Phrase Structure Grammar
 Developed by Gazdar, Klein, Pullum and Sag (1985).

 GPSG is confined to be context-free (CF)
 CF phrase structure rules help to efficient parsing in GPSG
 GPSG divides phrase structure rules into immediate dominance (ID) rules and linear precedence
(LP) rules
 GPSG provides for a high level, compact representation of language
 GPSG consists of
 ID-rules, metarules, LP-rules, feature cooccurrence restrictions (FCRs), and feature specification
defaults (FSDs).
GPSG categorisation
Feature Domain Meaning

aux {plus, minus} modals, auxiliaries
case {nom, acc, dat} nominative, accusative, dative
compl {nil, 'zu'} complementizer
dass {plus, minus} sub. clause starting with 'dass'
decl {strong, mixed, adjective declension
weak}
gen {masc, fem, ntr} gender
inf SET OF VERBS infinitive of a verb
num {sg, pl} number
pers {1, 2, 3} person
prefix SET OF PREFIXES prefix agreement: verb to sepref
U {nil} no prefix allowed

GPSG categorisation
Feature Domain Meaning

prfst {sep, prefix status: separated
att, attached
insep, inseparable
nopref} no prefix
pron {plus,minus} personal pronoun
slash SET OF CATEGORIES slash feature for gap handling
subcat {1, ..., n} subcategorization of verb
top {plus, minus} topicalized (fronted)
vform {bse, bare infinitival verb form
fin, finite verb
psp, past participle verb
pas } passive verb
vpos {first, end} verb-initial or verb-final
Online parsing
 https://www.link.cs.cmu.edu/cgi-bin/link/construct-page-4.cgi#submit
 https://demo.allennlp.org/dependency-parsing/MjYwODE5Ng==
Pronunciation
 phonology and phonetics which is concerned with pronunciation.

 Pronunciation of characters in isolation and combinations
 Regular and irregular pronunciation need considerations
 some words have the same pronunciation with different meanings such as "weak" and
"week". Computers cannot differentiate between the two words
Morphology
 structure of words in their written (graphemic) form and spoken (phonemic) form. It has two
forms namely inflection and derivation.
 Inflection:
 It is related to the grammatical function of words of the same part of speech;
 e. g. the paradigm of the verb play as:
 Play, plays, played, playing
 Derivation:
 It is related to the production of new words of different parts of speech;
 e. g. nation - (a noun )
 national- (an adjective )
 nationalize- ( a verb )
Morphological Analyser
 A morphological analyzer can extract the base forms from inserted documents in
computers.
 The applications which are achieved in this respect are:
 a: hyphenation (segmenting words into their morphs),
 b: spelling correction,
 c: stemming which reduces the related words as possible. The problem of such
computational programs is the input which should be very broad. Other forms of
application are parsing and generating natural language utterances in written or spoken
form and machine translation. (Trost, 2006)
Syntax
 concerned with the structure of sentences

 Syntax analysis checks the text for meaningfulness comparing to the rules of formal
grammar.
 Sometimes word order of some kinds of structure causes misleading-
 Eg. I saw her with a telescope.
Semantics
 deals with the meanings of words, phrases and sentences.

 Single word may have several meanings
 Eg. Chip, well, covers,
 “hot ice-cream” would be rejected by semantic analyzer based on probability
Pragmatics
 deals with the meanings of utterance depending on the context.

 Interpretation plays crucial role in understanding the meaning
 Eg. I am waiting
 Can be identified as:
 a.an ordinary fact,
 b. a promise and
 c.a threat.

Natural Language Processing

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Natural Language Processing

Uploaded by

Copyright:

Available Formats

Natural Language Processing

 Text classification is the process of categorizing text into organized groups. By

 Identify, Analyze, Understand and Generate human languages

 Noun -fish, book, house, pen, procrastination, language

 Verb- loves, hates, studies, sleeps, thinks, is, has

 (((the | a)(cat | dog))(John | Jack | Susan))(barked | slept)

 Two levels of structure c and f

 F-structure (representation of grammatical functions)

 Developed by Gazdar, Klein, Pullum and Sag (1985).

Feature Domain Meaning

Feature Domain Meaning

 phonology and phonetics which is concerned with pronunciation.

 concerned with the structure of sentences

 deals with the meanings of words, phrases and sentences.

 deals with the meanings of utterance depending on the context.

You might also like