P. 1


|Views: 4|Likes:
Published by Rahul Gupta

More info:

Published by: Rahul Gupta on Dec 16, 2011
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less






Chandra Prakash LPU

Speech is the representation of the mind and writing is the representation of speech

 

Language is meant for Communicating about the world. By studying language, we can come to understand more about the world. If we can succeed at building computational mode of language, we will have a powerful tool for communicating about the world. We look at how we can exploit knowledge about the world, in combination with linguistic facts, to build computational natural language systems.


The Problem : English sentences are incomplete descriptions of the information that they are intended to convey. Some dogs are outside is incomplete – it can mean  Some dogs are on the lawn.  Three dogs are on the lawn.  Moti,Hira & Tomy are on the lawn. I called Riya to ask her to the movies, she said she’d love to go.  She was home when I called.  She answered the phone.  I actually asked her. The good side : Language allows speakers to be as vague or as precise as they like. It also allows speakers to leave out things that the hearers already know.


The Problem : The same expression means different things in different context.  Where’s the water? ( When you are thirsty, it must be potable)  Where’s the water? ( Dealing with a leaky roof, it can be filthy) The good side : Language lets us communicate about an infinite world using a finite number of symbols. The problem : There are lots of ways to say the same thing :  Mary was born on October 11.  Mary’s birthday is October 11. The good side : Whey you know a lot, facts imply each other. Language is intended to be used by agents who know a lot.

using lexical. Processing spoken language. . NLP problem can be divided into two tasks:   Processing written text. syntactic and semantic knowledge of the language as well as the required real world information. using all the information needed above plus additional knowledge about phonology as well as enough added information to handle the further ambiguities that arise in speech.

  .NLP  Natural language processing NLP includes : Understanding : processing of mapping from an input form into a more immediately useful form.  Generation  Multilingual translation.

 Semantic Analysis:  The structures created by the syntactic analyzer are assigned meanings.  Mapping is made between the syntactic structures and object in the task domain. . Syntactic Analysis:  Linear sequences of words are transformed into structures that show how the words relate to each other.STEPS IN NLP   Morphological Analysis:  Individual words are analyzed into their components and non-word tokens such as punctuation are separated from the words.  Structure for which no such mapping is possible may be rejected.  Based on Language Rules by analyzer  Cat this is.

 Do you know what time it is? .  Pragmatic Analysis:  The structure representing what was said is reinterpreted to determine what was actually meant.STEPS IN NLP (CONT…)  Discourse integration:  The meaning of an individual sentence may depend on the sentences that precede it and may influence the meanings of the sentences that follow it.  He is a nice guy.  John wanted it.

glass jar. Boundaries between these five phases is not so clear The phases are sometime performed in sequence. peanut butter. jar peanut butter. glass jar peanut. butter. and they are sometime performed all at once. Is the glass jar peanut butter How to form 2 NP out of 4 nouns  Is the x y? glass.    .

Pull apart the word “Bill’s” into proper noun “Bill” and the possessive suffix “’s” Recognize the sequence “. This word is either a plural noun or a third person singular verb ( he prints ).init file.MORPHOLOGICAL ANALYSIS  Suppose we have an english interface to an operating system and the following sentence is typed:  I want to print Bill’s .  Consider the word “prints”.init” as a file extension that is functioning as an adjective in the sentence.  .  Morphological analysis must do the following things:   This process will usually assign syntactic categories to all the words in the sentence.

The important thing here is that a flat sentence has been converted into a hierarchical structure and that the structure correspond to meaning units when semantic analysis is performed. The goal of this process. Parsing Tree     Reference markers are shown in the parenthesis in the parse tree Each one corresponds to some entity that has been mentioned in the sentence. called parsing. Reference markers are useful later since they provide a place in which to accumulate information about the entities as we get it.SYNTACTIC ANALYSIS    Syntactic analysis must exploit the results of morphological analysis to build a structural description of the sentence. . is to convert the flat list of words that forms the sentence into a structure that defines the units that are represented by that flat list.

NP V PRO VP S (RM3) NP RM :Reference marker NP: Noun Phrase VP : Verb Phrase Want I (RM2) PRO print V VP NP (RM4) NP ADJS Bill’s (RM5) ADJS .SYNTACTIC ANALYSIS S (RM1) I want to print Bill’s .init N file I (RM2) .init file.

.SEMANTIC ANALYSIS  Semantic analysis must do two important things:   It must map individual words into appropriate objects in the knowledge base or database It must create the correct structures to correspond to the way the meanings of the individual words combine with each other.

Once the correct referent for Bill is known. To pin down these references requires an appeal to a model of the current discourse context.DISCOURSE INTEGRATION  Specifically we do not know whom the pronoun “I” or the proper noun “Bill” refers to. we can also determine exactly which file is being referred to.   . from which we can learn that the current user is USER068 and that the only person named “Bill” about whom we could be talking is USER073.

One possible thing to do is to record what was said as a fact and be done with it. The results of the understanding process is Lpr /wsmith/stuff. But for other sentences. whose intended effect is clearly declarative. For some sentences. from the knowledge based representation to a command to be executed by the system. including this one. We can discover this intended effect by applying a set of rules that characterize cooperative dialogues.init “lpr” is th operating system’s file print command. that is precisely correct thing to do. the intended effect is different. . The final step in pragmatic processing is to translate.PRAGMATIC ANALYSIS          The final step toward effective understanding is to decide what to do as a results.

 Sometimes two or more of them are collapsed.  All of the processes are important in a complete natural language understanding system.  Doing that usually results in a system that is easier to build for restricted subsets of English but one that is harder to extend to wider coverage.SOME FACTS: Results of each of the main processes combine to form a natural language system.  Not all programs are written with exactly these components.  .

This process is called parsing. it constrains the number of constituents that semantics can consider. Thus it can play a significant role in reducing overall system complexity.  If there is no syntactic parsing step.SYNTACTIC PROCESSING    Syntactic Processing is the step in which a flat input sentence is converted into a hierarchical structure that corresponds to the units of meaning in the sentence.  If parsing is done. It plays an important role in natural language understanding systems for two reasons:  Semantic processing must operate on sentence constituents. then the semantics system must decide on its own constituents. . on the other hand.  Syntactic parsing is computationally less expensive than is semantic processing.

syntactic facts demand an interpretation in which a planet revolves around a satellite. . Consider the examples:  The satellite orbited Mars  Mars orbited the satellite  In the second sentence. it is not always possible to do so. despite the apparent improbability of such a scenerio.SYNTACTIC PROCESSING  Although it is often possible to extract the meaning of a sentence without using grammatical facts.

that compares the grammar against input sentences to produce parsed structures.  . called a grammar. of the syntactic facts about the language.SYNTACTIC PROCESSING  Almost all the systems that are actually used have two main components: A declarative representation.  A procedure. called parser.

 . S) where  V: a set of non-terminal symbols  Σ: a set of terminals (V ∩ Σ = Ǿ)  R: a set of rules (R: V → (V U Σ)*)  S: a start symbol. R.CONTEXT-FREE GRAMMAR A context-free grammar (CFG) G is a quadruple (V. Σ.

}  Σ = {0. f. f → 11f | ε })  . f → 11f. f → ε } S=q  (R= {q → 11q | 00f. 1}  R = {q → 11q.EXAMPLE V = {q. q → 00f.

In a context free grammar the left hand side of a production rule is always a single nonterminal symbol.  The grammars are called context free because – since all rules only have a nonterminal on the left hand side – one can always replace that nonterminal symbol with what is on the right hand side of the rule. it could be a string of terminal and/or nonterminal symbols.  .  In a general grammar.  Thecontext in which the symbol occurs is therefore not important.


Symbols that are further expanded by rules are called nonterminal symbols. Vertical bar is OR . .GRAMMARS AND PARSERS   The most common way to represent grammars is as a set of production rules. ε represnts empty string. Symbols that correspond directly to strings that must be found in an input sentence are called terminal symbols. A simple Context-free phrase structure grammar for English:               S → NP VP NP → the NP1 NP → PRO NP → PN NP → NP1 NP1 → ADJS N ADJS → ε | ADJ ADJS VP → V VP → V NP N → file | printer PN → Bill PRO → I ADJ → short | long | fast V → printed | created | want    First rule can be read as “ A sentence is composed of a noun phrase followed by Verb Phrase”.

Pure context free grammars are not effective for describing natural languages. which in turn provide the basis for many natural language understanding systems. .GRAMMARS AND PARSERS    Grammar formalism many linguistic theories. NLPs have less in common with computer language processing systems such as compilers.

CONT… Parsing process takes the rules of the grammar and compares them against the input sentence. which simply records the rules and how they are matched.  The simplest structure to build is a Parse Tree.  Every node of the parse tree corresponds either to an input word or to a nonterminal in our grammar.  .  Each level in the parse tree corresponds to the application of one grammar rule.

A PARSE TREE FOR A SENTENCE S Bill Printed the file NP V VP NP PN printed the Bill ADJS N NP1 E file .

S -> NP VP VP -> V NP NP -> NAME NP -> ART N NAME -> John V -> ate ART-> the N -> apple S NP VP V ART NAME NP John ate N the apple . 6. 5. John ate the apple. 8. 3. 7.A PARSE TREE  1. 2. 4.

--> article.noun_phrase. --> intransitive_verb. --> proper_noun. --> article.noun. .CONTEXT-FREE GRAMMAR sentence noun_phrase noun_phrase noun_phrase verb_phrase verb_phrase article adjective adjective proper_noun noun intransitive_verb transitive_verb --> noun_phrase. --> [beats]. --> [lazy].verb_phrase.adjective. --> [turtle]. --> [achilles]. --> [the]. --> transitive_verb.noun. --> [rapid]. --> [sleeps].

PARSE TREE noun_phrase sentence verb_phrase article adjective noun transitive_verb noun_phrase proper_noun the rapid turtle beats achilles .

noun_phrase [the].[beats].[rapid].[rapid].noun.EXAMPLE sentence noun_phrase.verb_phrase [the].verb_phrase [the]. adjective.adjective.[rapid].noun_phrase [the].[turtle].[beats]. verb_phrase noun_phrase --> article.verb_phrase article.[turtle].verb_phrase sentence --> noun_phrase.[achilles] noun_phrase --> proper_noun proper_noun --> [achilles] .proper_noun [the].[rapid].[turtle].[rapid].noun.noun.verb_phrase [the].[turtle].[beats].[turtle].noun_phrase transitive_verb --> [beats] [the]. noun article --> [the] adjective --> [rapid] noun --> [turtle] verb_phrase --> transitive_verb.[rapid].transitive_verb.adjective.

vp pn. vp .vp [the]. n. vp art.adj.[lazy].vp art.vp [achilles].n.[rapid].[rapid]. n. [turtle].vp [the].vp [the].vp [the]. [turtle]. n.vp [the].vp [the].n.[lazy].sentence np.n.vp [the].[turtle]. adj.

[lazy].[beats]. [turtle].np [the].[lazy].np [the].n [the]. [turtle].[beats]. [achilles] [the].[beats].[turtle] [the].n [the].[beats].[lazy].[turtle] [the]. [turtle].[lazy]. [turtle].[lazy].n [the].[lazy].[lazy]. [the]. [turtle].[beats]. [turtle]. [turtle]. [the].[lazy].[lazy].iv [the]. [turtle].[lazy].[beats]. [the].[beats].[lazy].[beats]. [turtle]. [the].[beats].tv. [the]. [turtle]. [turtle].[lazy].[turtle] . [turtle].n [the].vp [the].[lazy].[beats]. art. [turtle].n [the].[rapid].adj.[lazy]. [the].[lazy].[lazy].[rapid]. [the].n [the].[sleeps] [the]. art. [turtle].adj.[the]. pn [the].[lazy].[lazy]. [turtle]. [turtle].[beats].[beats].

verb_phrase(VP1-VP2) .DIFFERENCE LISTS IN GRAMMAR RULES noun phrase verb phrase VP2 VP1 NP1 sentence(NP1-VP2):noun_phrase(NP1-VP1).

 All books and magazines that deal with controversial topics have been removed from the shelves.  .EXERCISE: FOR EACH OF THE FOLLOWING SENTENCES. DRAW A PARSE TREE John wanted to go to the movie with Sally  I heard the story listening to the radio.

. This set is made up of precisely those sentences that can be completely matched by a series of rules in the grammar.  Its strong generative capacity. by which we mean the structure to be assigned to each grammatical sentence of the language. by which we mean the set of sentences that are contained within the language.GRAMMAR SPECIFIES TWO THING ABOUT LANGUAGE?  Its weak generative capacity.

TOP-DOWN VERSUS BOTTOM-UP PARSING  To parse a sentence. Bottom-up parsing: Begin with the sentence to be parsed and apply the grammar rules backward until a single tree whose terminals are the words of the sentence and whose top node is the start symbol has been produced. There are two ways this can be done:  Top-down Parsing: Begin with start symbol and apply the grammar rules forward until the symbols at the terminals of the tree correspond to the components of the sentence being parsed. it is necessary to find a way in which that sentence could have been generated from the start symbol.  .

 The most important consideration is the branching factor.  .The choice between these two approaches is similar to the choice between forward and backward reasoning in other problem-solving tasks. Is it greater going backward or forward?  Sometimes these two approaches are combined to a single method called “bottom-up parsing with top-down filtering”.

 Record. every choice point   Best path with patch up Follow only one path at a time. Best Path with backtracking Follow only one path at a time.  Explicitly shuffle around the components that already have been formed   Wait and see  Wait untill information is available .FINDING ONE INTERPRETATION OR FINDING MANY 4 way of handling sentences  All Paths:  Follow all possible path and build all the possible intermediate components.

and having wide application in artificial intelligence. used especially in parsing relatively complex natural languages.AUGMENTED TRANSITION NETWORKS Top down parsing procedure  ATN is similar to a finite state machine in which the class of labels that can be attached to the arcs that define transitions between states has been augmented.  .  An augmented transition network (ATN) is a type of graph theoretic structure used in the operational definition of formal languages.


 We must still produce a representation of the meaning of the sentence.  Because understanding is a mapping process.SEMANTIC ANALYSIS Producing a syntactic parse of a sentence is only the first step toward understanding it. we must first define the language into which we are trying to map.  There is no single definitive language in which all sentence meaning can be described.  .  The choice of a target language for any particular NL understanding program must depend on what is to be done with the meanings once they are constructed.

as for example when one builds a program whose goal is to read text and then answer questions about it. a target language can be designed specifically to support language processing. Thus the design of the target language is driven by the backend program. then the target language must be legal input to that other program. When natural language is being used as an interface language to another program( such as a db query system or an expert system). .CHOICE OF TARGET LANGUAGE IN SEMANTIC ANALYSIS  There are two broad families of target languages that are used in NL systems. depending on the role that the natural language system is playing in a larger system:   When natural language is being considered as a phenomenon on its own.

The process of determining the correct meaning of an individual word is called word sense disambiguation or lexical disambiguation. Some useful semantic markers are :    PHYSICAL-OBJECT ANIMATE-OBJECT ABSTRACT-OBJECT . Many words have several meanings.LEXICAL PROCESSING       The first step in any semantic processing system is to look up the individual words in a dictionary ( or lexicon) and extract their meanings. Sometimes only very straightforward info about each word sense is necessary. baseball field interpretation of diamond could be marked as a LOCATION. For example. It is done by associating. and it may not be possible to choose the correct one just by looking at the word itself. with each word in lexicon. information about the contexts in which each of the word’s senses may appear.

Case grammars. Compositional semantic interpretation. which combine syntactic. including the following:     Semantic grammars. in which the structure that is built by the parser contains some semantic information. Conceptual parsing in which syntactic and semantic knowledge are combined into a single interpretation system that is driven by the semantic knowledge.SENTENCE-LEVEL PROCESSING  Several approaches to the problem of creating a semantic representation of a sentence have been developed. although further interpretation may also be necessary. semantic and pragmatic knowledge into a single set of rules in the form of grammar. in which semantic processing is applied to the result of performing a syntactic parse .

 The result of parsing and applying all the associated semantic actions is the meaning of the sentence.  There is usually a semantic action associated with each grammar rule.SEMANTIC GRAMMAR A semantic grammar is a context-free grammar in which the choice of nonterminals and production rules is governed by semantic as well as syntactic function. .

ps | .A SEMANTIC GRAMMAR                         S-> what is FILE-PROPERTY of FILE? { query FILE.lsp | .owner: USER } FILE1 -> FILE2 { FILE2} FILE2 -> EXT file { instance: file-struct extension: EXT} EXT -> .FILE-PROPERTY} S-> I want to ACTION { command ACTION} FILE-PROPERTY -> the FILE-PROP {FILE-PROP} FILE-PROP -> extension | protection | creation date | owner {value} FILE -> FILE-NAME | FILE1 {value} FILE1 -> USER’s FILE2 { FILE2.txt | .init | .mss value ACTION -> print FILE { instance: printing object : FILE } ACTION -> print FILE on PRINTER { instance : printing object : FILE printer : PRINTER } USER -> Bill | susan { value } .for | .

Many ambiguities that would arise during a strictly syntactic parse can be avoided since some of the interpretations do not make sense semantically and thus cannot be generated by a semantic grammar. the parsing process may be expensive.  . The drawbacks of use of semantic grammars are: The number of rules required can become very large since many syntactic generalizations are missed.ADVANTAGES OF SEMANTIC GRAMMARS     When the parse is complete. the result can be used immediately without the additional stage of processing that would be required if a semantic interpretation had not already been performed during the parse.  Because the number of grammar rules may be very large. Syntactic issues that do not affect the semantics can be ignored.

 Grammar rules are written to describe syntactic rather than semantic regularities. The case grammar interpretation of the two sentences would both be :  ( printed ( agent Susan)  ( object File ))  .CASE GRAMMARS Case grammars provide a different approach to the problem of how syntactic and sematic interpretation can be combined.  But the structures the rules produce correspond to semantic relations rather than to strictly syntactic ones  Consider two sentences:    Susan printed the file. The file was printed by susan.

 CD usually provides a greater degree of predictive power.  The parsing is similar to case grammar.  .  Conceptual parsing is driven by dictionary that describes the meaning of words in conceptual dependency (CD) structures.CONCEPTUAL PARSING Conceptual parsing is a strategy for finding both the structure and meaning of a sentence in one step.

This type of references are called anaphora. The word “it” should be identified as referring to red balloon. Consider the text:     Parts of entities. Bill had a red balloon.DISCOURSE AND PRAGMATIC PROCESSING  There are a number of important relationships that may hold between phrases and parts of their discourse contexts.  The title page was torn. .  The phrase “title page” should be recognized as part of the book that was just bought. including:  Identical entities. John wanted it. Consider the text:  Sue opened the book she just bought.

Taking a flight should be recognized as part of going on a trip. He left on an early morning flight.  Entities involved in actions. item and a flag. Moons means moon decals  Elements of sets. The pronoun “they” should be recognized as referring to the burglars who broke into the house. Consider the text:    John went on a business trip to New Yrok. They took the TV and the stereo. Consider the text:    . Consider the text:    My house was broken into last week. I’ll take two moons. The decals we have in stock are stars. the moon.DISCOURSE AND PRAGMATIC PROCESSING  Parts of actions.

DISCOURSE AND PRAGMATIC PROCESSING  Names      of individuals: chains  Causal Dave went to the movies.  Implicit It sure is cold in here.  Planning sequences: force:  Illocutionary   Sally wanted a new car She decided to get a job. There was a big snow storm yesterday. The schools were closed today. Did Joe fail CS101? presuppositions: .

.DISCOURSE AND PRAGMATIC PROCESSING  We focus on using following kinds of knowledge:     The current focus of the dialogue A model of each participant’s current beliefs The goal-driven character of dialogue The rules of conversation shared by all participants.

STATISTICAL NLP Corpora  Counting the elements in a corpus  N-Grams  Smoothing  .

 It is used in a wide variety of computing environments including word processing.  IT involves:  Identifying words and non words  Suggesting the complex possible alternatives for its correction  . character or text recognition system.SPELL CHECKING A spell checking is one of the basic tools required for language processing. speech recognition and generation.

sat “Divya at on box” . “Divya sar on box”  Sar --.

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->