Professional Documents
Culture Documents
Rahul Chitturi
Language Technology Research Center, IIIT-Hyderabad, INDIA
rahul_ch@students.iiit.net
Abstract Exact query: When does that train 1024 arrive in Bangalore?
Human conversations often tend to be incomplete. Many a time, we tend Query 5: And Delhi?
to shorten our conversations. The notion of omission from a text of one or Exact query: When does that train 1024 arrive in Delhi?
more words that are obviously understood, but that must be supplied, to
make a construction grammatically correct is called ellipsis [1]. In a con-
versation, the computer should be in a position to handle the ellipsis de- The problem which we deal in this paper is, given a conversation
pending on the context, previous dialogues and knowledge. Given a as in example 1; the exact(Complete) queries should be obtained.
specific domain question answering system, we deal with how to handle Complete queries are the queries for which the SQL queries can
ellipsis in that particular domain. In this paper we classify the ellipsis into be generated. This problem is first handled with the syntactic
three types and try to provide solution for each of the three cases taking cues from the preceding queries. If there is no much clue then
an example of the Railway Domain. The evaluation of this algorithm is semantic cues are used to handle the situation which is not gener-
done comparing the results with that of well known Question Answering ally employed in the QA systems. The present QA systems like
Systems, which proves that this approach is portable for domain specific
AnswerBus [8], Quartz [9], Pai [10] don t take care of this ellip-
systems.
sis, which is quite essential in a natural conversation. Even the
popular systems like START use only the syntactic information
to handle the ellipsis [7]. In our paper, we present the semantic
1 Introduction approach which handles many of the complex ellipsis to make the
The development of widespread computer technology has conversation more natural. This comparison is made in the
changed many of our daily practices. Unfortunately, even today evaluation section (7).
the computers lack the very basic sense of naturalness in commu-
nicating with man. The creators of computer technology can
lessen the disruptive force of the technology by practicing good 2 Issues in handling ellipsis, in a question
design. Well designed computer systems should be useful, us- answering system
able, easily learned, easily communicative and perform functions
that let people do the things they want to do. It is this fundamen- 2.1 Identifying the complete queries
tal necessity, which is ultimately leading the computer scientists
to overcome this barrier, concentrating on the natural means of Identifying the completeness of a given sentence is the very basic
communication. Tremendous research is being carried on the issue in ellipsis handling. In the example 1, the first query is a
Natural Language Processing, Vision, etc now a day. The prob- complete sentence and the rest are incomplete sentences. It is
lem which we deal in this paper is the Ellipsis Handling in a quite complex to identify the complete sentence. Even if the sen-
Natural Language Dialogue System. tence structure is considered, for a given complete structure there
can be sentences that are not complete [2].
Ellipsis structures pose a crucial problem for Natural Lan-
guage Processing systems, designed to provide text understand- 2.2 Scope of the context
ing or to handle dialogues. They contain information which is not
overtly expressed, but which must be recovered through the iden- Generally there is a perplexity regarding the number of queries
tification of an antecedent or previous occurrences. that should be kept in the memory, so that if they are referred to,
the required knowledge can be appropriately retained. It is diffi-
In a domain specific dialogue system, a machine answers queries cult to retrieve the desired query from its elliptical notation in the
specific to that domain. For the dialogue to be as natural as pos- given knowledge base. This is clearly understood looking at the
sible, the system should be able to handle incomplete questions. example 1. In this example, in order to handle the ellipsis in
In order that the machine understands the query, the complete query 5, all the information from the first query is indispensable.
query corresponding to an incomplete query has to be generated. So, the problem here is how many previous queries should be
Let s see the example of ellipses in the railway reservation do- kept in the memory and also the way in which they should be
main. The queries numbered are in the actual conversation and stored.
their exact meanings are given correspondingly.
2.3 Entities in the domain
Example 1
Query 1: What is the next train to Calcutta? Generally, there is a mapping difficulty between the entities in
Answer: Train number 1024. the Entity Relationship Diagram of a Database Management Sys-
tem and the entities in the domain that is being queried. It is
Query 2: When does it start? worthwhile to note that the entities in the DBMS are different
Exact query: When does the train 1024 to Calcutta start? from the entities that are to be modeled semantically as in Dialog
Systems. This can be well understood from the discussions in the
Query 3: Which platform? later part of the paper.
Exact query: To which platform will the train 1024 arrive?
The queries in a question answering system can be divided into
Query 4: When does it arrive in Bangalore? three types. This classification also depends on the type of do-
main and the type of queries that are going to be handled. Based Query 2: To S (station)? Or From T (station)?
on the experience that is gained from the structure of the queries
in the Railway Reservation Domain, the generalization is done on 4.2 Type 2 (Grouping Based)
the following classification for all the question answering sys-
tems. Let s see the following example:
3.1 Ellipsis in Discourse In this case there will not be any prepositions. So these can be
handled only by identifying the group of Noun Phrase (NP) to
The author of the reference [6] mentions that there are various which it belongs. One might get a doubt that how is this different
ways to describe the different types of ellipsis occurring in Eng- from the previous type (refer 3.1). Let us now see the following
lish and other languages]. Sanders (1977) uses alphabetic charac- example
ters to identify the six different positions in which ellipsis can
occur, ranging from the first position in the first clause (position Example 7
A) to the last position in the second clause (position F): Query 1: When is the train from Bombay to Delhi?
ABC&DEF Query 2: To Calcutta?
Although there is disagreement about precisely which positions If we use grouping based method then we give no importance to
permit ellipsis in English, most would agree that English allows the preposition. This results in ambiguity that which should the
ellipsis in positions C, D, and E. Example (2) illustrates C- entity refer (Bombay? or Delhi?).
Ellipsis: ellipsis of a constituent at the end of the first clause
(marked by brackets) that is identical to a constituent (placed in 4.3 Type 3 (Semantic Based)
italics) at the end of the second clause.
All those ellipsis which cannot be classified as the above two
Example 2(C Type): The author wrote [ ] and the copy-editor types come under this type. For this type, a semantic diagram can
revised the introduction to the book. be built from the Entity Relationship diagram of the DBMS of
the given domain. This can be easily understood by looking at the
Examples (3) and (4) illustrate D- and E-Ellipsis: ellipsis of, re- following diagrams. (Please refer to Fig.1 and Fig. 2).
spectively, the first and second parts of the second clause.
Example 3(D Type): The students completed their course work
Example 8
and [ ] left for summer vacation.
Example 4(E Type): Sally likes fish, and her mother [ ] hamburg-
Query 1: When will the train X arrive?
ers. Query 2: To which platform?
These types predominantly look at the intra-sentential ellipses. Every query can be handled by this type. But as this type is
related to semantics, this gives only basic semantic relations. The
3.2 Ellipsis in Question Answering System first two types which are syntactically solvable are more accurate
in giving the exact relationship.
As seen in Example 1, the ellipsis in the QA systems is very dif-
ferent from the general ellipsis. These are basically inter- 5 Algorithm for Ellipsis handling
sentential ellipses. The case in Example 2 doesn t come into pic-
ture in QA systems. Also in the Examples 3 and 4, there is a lot In this paper the ellipsis handling problem is divided into four
of structural difference from the Example1. The author of the parts. First the completeness of the queries is identified. Then the
reference [6] mentions that 86% of the elliptical coordinations are entities in the query need to be mapped to that of the domain. The
of type D. C accounts for 2% and E for 5.5%. So, the ellipsis in queries along with their mapped entities are then analyzed. The
the QA systems cannot be applied to the general ellipsis. analyzed queries are kept in memory so that the ellipsis in subse-
quent queries can be handled.
Destination Source Generally, while speaking more stress is put on the head noun of
Train the sentence. So, the head noun of a complete dialog is identified.
Then whenever an incomplete dialog appears, the relationship
Time Day between the head noun of the previous complete query and the
s head noun of the incomplete dialog is identified. If in database,
there are many queries with only those two heads as entities, then
Plat-
they are returned. If no relationships exist between the two enti-
Travels Arrives
Name Dis ties then null is returned.
Book
s Example 12
Pas- Na Loca-
senger
When will the train X (Group: Train_specific) arrive?
PNR Station Train X is the head NP of the query
Ad-
dress
To which platform (Group: Platform) ?
Book- Platform is the head NP of the query
Counter ing
Offers
Id Then the relationship between the Train_specific group and the
Platform group is identified. Then all the queries with only these
Avails Con-
cession
two semantic entities are returned.
Example 10
Is there any train from X (place/station) to Y (place/station)?
Typ Percentage To Delhi? ;{ To Station_name} is together treated as the entity
e destination . Then to Y should be replaced with to Delhi
Figure 1 Entity Relationship Diagram for Railway Reservation
5.3.2 Group Based Ellipsis
System
The entities which are left after the processing the prepositions,
Example 9
will fall into some group. For example Delhi Express , Train
1) Will the {Train_specific} go from {Source} to {Destination}?
number 4567 , etc refer to a specific train. If an incomplete query
comes, then the value for that group in previous complete query
Train_specific is a specific train { Train number 2039 , Delhi
is replaced with the new value.
Express , etc} Source is a station or place { Delhi , Mumbai
station , etc} Destination is a station or place { Delhi , Mumbai
Example 11
station , etc}
At what time will X (Group: Train_specific) arrive?
What about Y (Group: Train_specific)? This Y is substituted
5.2 Matcher in the previous complete query in the place of X.
For each entity in the domain, all the possible values for that 5.3.3 Semantic Based Ellipsis
entity are stored in the semantic graph. So, whenever a noun
phrase appears, it is matched with all the possible values of each In the semantic graph, some entities have relations between one
entity. Thus the noun phrases which are the entities in our domain another. The basic relationship between the possible semantic
are identified. The entities need not be noun phrases but in this entities should be kept in the database in the beginning.
paper we used only some defined set of noun phrases as the enti-
ties. The output of the matcher will be given to the ellipsis han- These three types (3.1, 3.2, and 3.3) are not mutually exclusive.
dler. But the procedure and the order in which they are applied is very
important. As shown in example 3, if the solution for the second
5.3 Ellipsis Handler type is applied first, then there will be some problems. So, one
has to apply the solutions for these types one after the other. As
The following methods have to be employed one after the other first two types are more accurate, first apply 4.1, then 4.2. If the
in the order. queries cannot be handled by these two types, then apply 4.3.
This approach would handle most of the ellipsis in that domain
5.3.1 Preposition Based Ellipsis
5.4 Scope of the Context
The prepositions which are important in handling ellipsis in the
given domain are noted. Whenever these prepositions occur be- It is very complex to know how many queries should be kept in
fore a semantic entity, they can be treated as a separate preposi- the memory. It depends on the type of domain. For example In-
tion entity, which is different from the original entity and the teractive NLI agent [4] supports natural language queries and
preposition. And the most recent value of this will be kept in the
commands along with a search history so that users can use their maintained. That is only entities are stored. At first, the entities
queries based on the previous search results. should be given the default values. If some other value is occurs
then the most recent value for that entity is stored.
If the dialogues in the domain are kept in the memory, it becomes
very difficult to handle the queries. So, a hash of all the entities is
R1
Pnr Address
Train type Pnr_number
Passenger name R9
Train R5
Specific Train Booking Counter
>Train name Counter id
>Train number R7
R8
R2 R4
R6
Platform Concession
Platform number Concession Type
Station
Source {To station}
Destination {From sta-
R3
tion}
Figure 2: Semantic Graph, Edges indicate Basic Relations between the semantic entities which are in ovals and their attributes which are in
rectangles. An example Basic Relation between Train and Platform: To which Platfrom will the train arrive?
`
In the example 1, the word train in query 1 is identified as entity
The Mechanism of Ellipsis Handling Train . Similarly Calcutta is identified as Destination
(Destination is intermediate station in which the train arrives). In
query 3, the word platform is identified as Platform . In query4,
Bangalore is identified as Destination , so Calcutta is replaced
Natural language query with Bangalore, as Calcutta is the latest value for entity
Destination . In query 5, the Delhi is identified as Destination ,
Defaults & and Bangalore is replaced with the Delhi, as Delhi is the latest
Possible values of value for entity Destination .
entities Entity Identifier
6. Snapshot of the output
Templates
The following is the exact dialog with the system
Complete query Identifier user: Tell me about the availability status of all train running
from secundrabad to Delhi on 03-03-05?
Incomplete query sys: ***Tell me about the availability status of all {train; Train}
Memory running {from secundrabad; Source} {to Delhi; Destination} on
Storage {03-03-05; Date}?
Ellipsis Classifier
Semantic sys: THIS IS ASSUMED AS A COMPLETE QUERY
Graph ------------------------------------------------------------------
user: to calcutta ?
Ellipsis Handler sys: ***{to calcutta;To_Station} ?
Ellipsis free query sys: Tell me about the availability status of all train running from
secundrabad to Calcutta on 03-03-05?
sys: ***Tell me about the availability status of all {train; Train}
Figure 3: Schematic Diagram of the mechanism of ellipsis running {from Secundrabad; Source} {to Calcutta; Destination}
handling on {03-03-05; Date}?
------------------------------------------------------------------
user: What about on 04-03-05?
sys: ***what about on {04-03-05; Date}? sys: ***to which {platform; Platform} will the {train; Train}
arrive
sys: DID YOU MEAN? Grouping based ellipsis -----------------------------------------------------------------
7. Evaluation
sys: Tell me about the availability status of all train running from
secundrabad to Calcutta on 04-03-05?
Many of the present QA systems do not handle the ellipsis. So,
sys: ***Tell me about the availability status of all {train; Train}
the results of our algorithm are compared with that of the START
running {from secundrabad; Source} {to Calcutta; Destination}
system which handles ellipsis. Also the START system handles
on {04-03-05; Date}?
only the ellipsis considering the previous query in a dialogue as
-----------------------------------------------------------------
the context [7]. Generally in human conversation this is not the
user: To which platform?
case. In our case the till the occurrence of complete query all the
incomplete queries are handled.
sys: recognized the query as: ***To which {platform; Platform}?
This approach is tested in three Domains - Railway Reservation
sys: DID YOU MEAN? Semantic based ellipsis System, Course Registration System, and Library Maintenance
System. For each system we took 100 test cases such that all the
/* Complete queries with {Train; Platform} Train is the head NP three type of ellipses are handled. The test cases are the dialogs in
of the latest complete query and the Platform is the head NP of that domain which have some inter-sentential ellipses as in ex-
the previous query */ ample (1). These are tested with our algorithm and START sys-
tem. Table 1-3 shows the result in all the three systems and types.
sys: to which platform will the train X (the name of the train
running from secundrabad to Calcutta on 04-03-05) arrive ?