You are on page 1of 6

Novel approach of Domain Specific Ellipsis Handling in Question Answering Systems

Rahul Chitturi
Language Technology Research Center, IIIT-Hyderabad, INDIA
rahul_ch@students.iiit.net

Abstract Exact query: When does that train 1024 arrive in Bangalore?

Human conversations often tend to be incomplete. Many a time, we tend Query 5: And Delhi?
to shorten our conversations. The notion of omission from a text of one or Exact query: When does that train 1024 arrive in Delhi?
more words that are obviously understood, but that must be supplied, to
make a construction grammatically correct is called ellipsis [1]. In a con-
versation, the computer should be in a position to handle the ellipsis de- The problem which we deal in this paper is, given a conversation
pending on the context, previous dialogues and knowledge. Given a as in example 1; the exact(Complete) queries should be obtained.
specific domain question answering system, we deal with how to handle Complete queries are the queries for which the SQL queries can
ellipsis in that particular domain. In this paper we classify the ellipsis into be generated. This problem is first handled with the syntactic
three types and try to provide solution for each of the three cases taking cues from the preceding queries. If there is no much clue then
an example of the Railway Domain. The evaluation of this algorithm is semantic cues are used to handle the situation which is not gener-
done comparing the results with that of well known Question Answering ally employed in the QA systems. The present QA systems like
Systems, which proves that this approach is portable for domain specific
AnswerBus [8], Quartz [9], Pai [10] don t take care of this ellip-
systems.
sis, which is quite essential in a natural conversation. Even the
popular systems like START use only the syntactic information
to handle the ellipsis [7]. In our paper, we present the semantic
1 Introduction approach which handles many of the complex ellipsis to make the
The development of widespread computer technology has conversation more natural. This comparison is made in the
changed many of our daily practices. Unfortunately, even today evaluation section (7).
the computers lack the very basic sense of naturalness in commu-
nicating with man. The creators of computer technology can
lessen the disruptive force of the technology by practicing good 2 Issues in handling ellipsis, in a question
design. Well designed computer systems should be useful, us- answering system
able, easily learned, easily communicative and perform functions
that let people do the things they want to do. It is this fundamen- 2.1 Identifying the complete queries
tal necessity, which is ultimately leading the computer scientists
to overcome this barrier, concentrating on the natural means of Identifying the completeness of a given sentence is the very basic
communication. Tremendous research is being carried on the issue in ellipsis handling. In the example 1, the first query is a
Natural Language Processing, Vision, etc now a day. The prob- complete sentence and the rest are incomplete sentences. It is
lem which we deal in this paper is the Ellipsis Handling in a quite complex to identify the complete sentence. Even if the sen-
Natural Language Dialogue System. tence structure is considered, for a given complete structure there
can be sentences that are not complete [2].
Ellipsis structures pose a crucial problem for Natural Lan-
guage Processing systems, designed to provide text understand- 2.2 Scope of the context
ing or to handle dialogues. They contain information which is not
overtly expressed, but which must be recovered through the iden- Generally there is a perplexity regarding the number of queries
tification of an antecedent or previous occurrences. that should be kept in the memory, so that if they are referred to,
the required knowledge can be appropriately retained. It is diffi-
In a domain specific dialogue system, a machine answers queries cult to retrieve the desired query from its elliptical notation in the
specific to that domain. For the dialogue to be as natural as pos- given knowledge base. This is clearly understood looking at the
sible, the system should be able to handle incomplete questions. example 1. In this example, in order to handle the ellipsis in
In order that the machine understands the query, the complete query 5, all the information from the first query is indispensable.
query corresponding to an incomplete query has to be generated. So, the problem here is how many previous queries should be
Let s see the example of ellipses in the railway reservation do- kept in the memory and also the way in which they should be
main. The queries numbered are in the actual conversation and stored.
their exact meanings are given correspondingly.
2.3 Entities in the domain
Example 1
Query 1: What is the next train to Calcutta? Generally, there is a mapping difficulty between the entities in
Answer: Train number 1024. the Entity Relationship Diagram of a Database Management Sys-
tem and the entities in the domain that is being queried. It is
Query 2: When does it start? worthwhile to note that the entities in the DBMS are different
Exact query: When does the train 1024 to Calcutta start? from the entities that are to be modeled semantically as in Dialog
Systems. This can be well understood from the discussions in the
Query 3: Which platform? later part of the paper.
Exact query: To which platform will the train 1024 arrive?
The queries in a question answering system can be divided into
Query 4: When does it arrive in Bangalore? three types. This classification also depends on the type of do-
main and the type of queries that are going to be handled. Based Query 2: To S (station)? Or From T (station)?
on the experience that is gained from the structure of the queries
in the Railway Reservation Domain, the generalization is done on 4.2 Type 2 (Grouping Based)
the following classification for all the question answering sys-
tems. Let s see the following example:

3 Difference between the ellipsis in discourse and Example 6


Query 1: At what time will X (Train) arrive?
the question answering systems Query 2: What about Y(Train) ?

3.1 Ellipsis in Discourse In this case there will not be any prepositions. So these can be
handled only by identifying the group of Noun Phrase (NP) to
The author of the reference [6] mentions that there are various which it belongs. One might get a doubt that how is this different
ways to describe the different types of ellipsis occurring in Eng- from the previous type (refer 3.1). Let us now see the following
lish and other languages]. Sanders (1977) uses alphabetic charac- example
ters to identify the six different positions in which ellipsis can
occur, ranging from the first position in the first clause (position Example 7
A) to the last position in the second clause (position F): Query 1: When is the train from Bombay to Delhi?
ABC&DEF Query 2: To Calcutta?

Although there is disagreement about precisely which positions If we use grouping based method then we give no importance to
permit ellipsis in English, most would agree that English allows the preposition. This results in ambiguity that which should the
ellipsis in positions C, D, and E. Example (2) illustrates C- entity refer (Bombay? or Delhi?).
Ellipsis: ellipsis of a constituent at the end of the first clause
(marked by brackets) that is identical to a constituent (placed in 4.3 Type 3 (Semantic Based)
italics) at the end of the second clause.
All those ellipsis which cannot be classified as the above two
Example 2(C Type): The author wrote [ ] and the copy-editor types come under this type. For this type, a semantic diagram can
revised the introduction to the book. be built from the Entity Relationship diagram of the DBMS of
the given domain. This can be easily understood by looking at the
Examples (3) and (4) illustrate D- and E-Ellipsis: ellipsis of, re- following diagrams. (Please refer to Fig.1 and Fig. 2).
spectively, the first and second parts of the second clause.
Example 3(D Type): The students completed their course work
Example 8
and [ ] left for summer vacation.
Example 4(E Type): Sally likes fish, and her mother [ ] hamburg-
Query 1: When will the train X arrive?
ers. Query 2: To which platform?

These types predominantly look at the intra-sentential ellipses. Every query can be handled by this type. But as this type is
related to semantics, this gives only basic semantic relations. The
3.2 Ellipsis in Question Answering System first two types which are syntactically solvable are more accurate
in giving the exact relationship.
As seen in Example 1, the ellipsis in the QA systems is very dif-
ferent from the general ellipsis. These are basically inter- 5 Algorithm for Ellipsis handling
sentential ellipses. The case in Example 2 doesn t come into pic-
ture in QA systems. Also in the Examples 3 and 4, there is a lot In this paper the ellipsis handling problem is divided into four
of structural difference from the Example1. The author of the parts. First the completeness of the queries is identified. Then the
reference [6] mentions that 86% of the elliptical coordinations are entities in the query need to be mapped to that of the domain. The
of type D. C accounts for 2% and E for 5.5%. So, the ellipsis in queries along with their mapped entities are then analyzed. The
the QA systems cannot be applied to the general ellipsis. analyzed queries are kept in memory so that the ellipsis in subse-
quent queries can be handled.

4 Classification of elliptical queries in a ques- 5.1 Identifying the complete queries


tion answering system
The syntactic structure could be used with its corresponding se-
In this paper, we classify the ellipsis in a question answering
mantics, to obtain the semantics for the complete sentence. In this
system into three types. The first two types have syntactic cues.
case, the anaphoric expression is constrained to have the same
The third type is based on the semantic cues.
semantics as the complete expression [3]. But in our case, since
this is a domain specific system the queries in the domain are
4.1 Type 1 (Preposition Based) limited. Finally, these have to be mapped to the DBMS queries.
Though this seems to be very trivial for ellipsis han- So, a set of complete queries can be identified which are related
dling, most of the ellipsis in a domain can be handled by this. to that domain and for which the DBMS queries can be mapped.
This type of ellipsis is identified by the prepositions in the query These can be treated as complete queries. All these complete
or sentence. This is easily understood by the following example: queries are stored in the beginning. As simulating a human con-
versation is a very complex problem, some laborious work has to
Example 5 be done in the initial stages of the system. This can be even
Query 1: Is there any train from X (station) to Y (station)? automated using speech recognition systems at the field of our
interest. To enact the human conversation a lot of data is required intervention these queries can be checked if they are complete.
for training the system. Using speech recognition systems the These can be used as templates for these complete queries.
queries in the domain can be obtained. And with little human
memory. So, in the next incomplete sentence if the same type of
preposition entity occurs then the previously entered value is
Num Name Seats
replaced by the present value.

Destination Source Generally, while speaking more stress is put on the head noun of
Train the sentence. So, the head noun of a complete dialog is identified.
Then whenever an incomplete dialog appears, the relationship
Time Day between the head noun of the previous complete query and the
s head noun of the incomplete dialog is identified. If in database,
there are many queries with only those two heads as entities, then
Plat-
they are returned. If no relationships exist between the two enti-
Travels Arrives
Name Dis ties then null is returned.
Book
s Example 12
Pas- Na Loca-
senger
When will the train X (Group: Train_specific) arrive?
PNR Station Train X is the head NP of the query
Ad-
dress
To which platform (Group: Platform) ?
Book- Platform is the head NP of the query
Counter ing
Offers
Id Then the relationship between the Train_specific group and the
Platform group is identified. Then all the queries with only these
Avails Con-
cession
two semantic entities are returned.

Example 10
Is there any train from X (place/station) to Y (place/station)?
Typ Percentage To Delhi? ;{ To Station_name} is together treated as the entity
e destination . Then to Y should be replaced with to Delhi
Figure 1 Entity Relationship Diagram for Railway Reservation
5.3.2 Group Based Ellipsis
System
The entities which are left after the processing the prepositions,
Example 9
will fall into some group. For example Delhi Express , Train
1) Will the {Train_specific} go from {Source} to {Destination}?
number 4567 , etc refer to a specific train. If an incomplete query
comes, then the value for that group in previous complete query
Train_specific is a specific train { Train number 2039 , Delhi
is replaced with the new value.
Express , etc} Source is a station or place { Delhi , Mumbai
station , etc} Destination is a station or place { Delhi , Mumbai
Example 11
station , etc}
At what time will X (Group: Train_specific) arrive?
What about Y (Group: Train_specific)? This Y is substituted
5.2 Matcher in the previous complete query in the place of X.

For each entity in the domain, all the possible values for that 5.3.3 Semantic Based Ellipsis
entity are stored in the semantic graph. So, whenever a noun
phrase appears, it is matched with all the possible values of each In the semantic graph, some entities have relations between one
entity. Thus the noun phrases which are the entities in our domain another. The basic relationship between the possible semantic
are identified. The entities need not be noun phrases but in this entities should be kept in the database in the beginning.
paper we used only some defined set of noun phrases as the enti-
ties. The output of the matcher will be given to the ellipsis han- These three types (3.1, 3.2, and 3.3) are not mutually exclusive.
dler. But the procedure and the order in which they are applied is very
important. As shown in example 3, if the solution for the second
5.3 Ellipsis Handler type is applied first, then there will be some problems. So, one
has to apply the solutions for these types one after the other. As
The following methods have to be employed one after the other first two types are more accurate, first apply 4.1, then 4.2. If the
in the order. queries cannot be handled by these two types, then apply 4.3.
This approach would handle most of the ellipsis in that domain
5.3.1 Preposition Based Ellipsis
5.4 Scope of the Context
The prepositions which are important in handling ellipsis in the
given domain are noted. Whenever these prepositions occur be- It is very complex to know how many queries should be kept in
fore a semantic entity, they can be treated as a separate preposi- the memory. It depends on the type of domain. For example In-
tion entity, which is different from the original entity and the teractive NLI agent [4] supports natural language queries and
preposition. And the most recent value of this will be kept in the
commands along with a search history so that users can use their maintained. That is only entities are stored. At first, the entities
queries based on the previous search results. should be given the default values. If some other value is occurs
then the most recent value for that entity is stored.
If the dialogues in the domain are kept in the memory, it becomes
very difficult to handle the queries. So, a hash of all the entities is

R1
Pnr Address
Train type Pnr_number
Passenger name R9

Train R5
Specific Train Booking Counter
>Train name Counter id
>Train number R7
R8
R2 R4
R6
Platform Concession
Platform number Concession Type
Station
Source {To station}
Destination {From sta-
R3
tion}

Figure 2: Semantic Graph, Edges indicate Basic Relations between the semantic entities which are in ovals and their attributes which are in
rectangles. An example Basic Relation between Train and Platform: To which Platfrom will the train arrive?
`
In the example 1, the word train in query 1 is identified as entity
The Mechanism of Ellipsis Handling Train . Similarly Calcutta is identified as Destination
(Destination is intermediate station in which the train arrives). In
query 3, the word platform is identified as Platform . In query4,
Bangalore is identified as Destination , so Calcutta is replaced
Natural language query with Bangalore, as Calcutta is the latest value for entity
Destination . In query 5, the Delhi is identified as Destination ,
Defaults & and Bangalore is replaced with the Delhi, as Delhi is the latest
Possible values of value for entity Destination .
entities Entity Identifier
6. Snapshot of the output
Templates
The following is the exact dialog with the system
Complete query Identifier user: Tell me about the availability status of all train running
from secundrabad to Delhi on 03-03-05?

Incomplete query sys: ***Tell me about the availability status of all {train; Train}
Memory running {from secundrabad; Source} {to Delhi; Destination} on
Storage {03-03-05; Date}?
Ellipsis Classifier
Semantic sys: THIS IS ASSUMED AS A COMPLETE QUERY
Graph ------------------------------------------------------------------
user: to calcutta ?
Ellipsis Handler sys: ***{to calcutta;To_Station} ?

sys: DID YOU MEAN? Preposition based ellipsis

Ellipsis free query sys: Tell me about the availability status of all train running from
secundrabad to Calcutta on 03-03-05?
sys: ***Tell me about the availability status of all {train; Train}
Figure 3: Schematic Diagram of the mechanism of ellipsis running {from Secundrabad; Source} {to Calcutta; Destination}
handling on {03-03-05; Date}?
------------------------------------------------------------------
user: What about on 04-03-05?
sys: ***what about on {04-03-05; Date}? sys: ***to which {platform; Platform} will the {train; Train}
arrive
sys: DID YOU MEAN? Grouping based ellipsis -----------------------------------------------------------------
7. Evaluation
sys: Tell me about the availability status of all train running from
secundrabad to Calcutta on 04-03-05?
Many of the present QA systems do not handle the ellipsis. So,
sys: ***Tell me about the availability status of all {train; Train}
the results of our algorithm are compared with that of the START
running {from secundrabad; Source} {to Calcutta; Destination}
system which handles ellipsis. Also the START system handles
on {04-03-05; Date}?
only the ellipsis considering the previous query in a dialogue as
-----------------------------------------------------------------
the context [7]. Generally in human conversation this is not the
user: To which platform?
case. In our case the till the occurrence of complete query all the
incomplete queries are handled.
sys: recognized the query as: ***To which {platform; Platform}?
This approach is tested in three Domains - Railway Reservation
sys: DID YOU MEAN? Semantic based ellipsis System, Course Registration System, and Library Maintenance
System. For each system we took 100 test cases such that all the
/* Complete queries with {Train; Platform} Train is the head NP three type of ellipses are handled. The test cases are the dialogs in
of the latest complete query and the Platform is the head NP of that domain which have some inter-sentential ellipses as in ex-
the previous query */ ample (1). These are tested with our algorithm and START sys-
tem. Table 1-3 shows the result in all the three systems and types.
sys: to which platform will the train X (the name of the train
running from secundrabad to Calcutta on 04-03-05) arrive ?

Table 1. Comparison with START system in Railway Reservation Domain


System Accuracy for Accuracy for Accuracy for
Railway Reservation Type 1 Ellipsis Type 2 Ellipsis Type 3 Ellipsis
Total Test Cases: 40 Total Test Cases: 35 Total Test Cases: 25
Algorithm discussed 100% 97.14% 80%
START system 57.5% 42.85% 0%

Table 2. Comparison with START system in Course Registration Domain


System Accuracy for Accuracy for Accuracy for
Course Registration Type 1 Ellipsis Type 2 Ellipsis Type 3 Ellipsis
Total Test Cases: 35 Total Test Cases: 30 Total Test Cases: 35
Algorithm discussed 100% 93.3% 65.71%
START system 54.2% 40% 0%

Table 3. Comparison with START system in Library Maintenance Domain


System Accuracy for Accuracy for Accuracy for
Library Maintenance Type 1 Ellipsis Type 2 Ellipsis Type 3 Ellipsis
Total Test Cases: 40 Total Test Cases: 30 Total Test Cases: 30
Algorithm discussed 100% 96.6% 83.3%
START system 52.5% 50% 0%

8. Future work 10. References

The endeavor from here onwards would be to identify ellipses 1. http://odur.let.rug.nl/~usa/LIT/chap10.htm


involving NPs, Verb groups and also for semantic nets. The idea
would be to make the system respond instantaneously to user 2. Mary Dalrymple, Stuart M. Shieber, and Fernando :
queries and use this query as a supervised input to generate a Pereira. 1991. Ellipsis and higher-order unification.
database which relates to similar patters easily the next time they Linguistics and Philosophy, 14:399-452.
are keyed in. Also a speech component identifying prosody ef- 3. Andrew Kehler, Common Topics and Coherent Situa-
fects is planned. tions: Interpreting Ellipsis in the Context of Discourse
9. Acknowledgements Inference In Proceedings of the 32nd Annual Confer-
ence of the Association for Computational Linguistics
I would like to Dr. Dipti Misra Sharma and Prof. Rajeev Sangal, (ACL-94), pp. 50- 57, Las Cruces, June, 1994.
LTRC, IIIT-Hyderabad who helped us a lot in this project, with 4. Lee Jong-Hyeok, Rho Hyunchul, Park Young-Tack,
their feed back. Choi Joongmin, Seo Jungyu Interactive NLI Agent
For Multi-Agent Web Search Model - Geunbae, Jong- call Oriented Approach to Open Domain Question
Hyeok..(1998) nlp.postech.ac.kr/lab_papers/9808_iai Answering
w_gblee.ps
10. http://www.di.unipi.it/~scordino/pai/pai.html
5. Koeneman, Olaf, Sergio Baauw & Frank Wijnen
(1998). Reconstruction in VP-ellipsis: Reflexive vs. 11. Jay Budzik and Kristian J. Hammond. Learning for
non-reflexive predicates. Poster presented at the 11th Question Answering and Text Classification: Integrat-
Annual CUNY Conference on Human Sentence ing Knowledge-Based and Statistical Techniques.
Processing. New Brunswick, NJ, March 19-21, 1998. AAAI Workshop on Text Classification. Menlo Park,
CA, 1998
6. Charles F. Meyer, University of Massachusetts, Boston
: English Corpus Linguistics An Introduction, Series: 12. Sanda Harabagiu, Marius Pasca, and Steven Maiorano.
Studies in English Language Experiments with open-domain textual question
answering. COLING-2000. Association for
7. Boris Katz, MIT CSAIL: Discourse and Dialog in the Computational Linguistics/Morgan Kaufmann, Aug
START Question Answering System, SIGDial 04 2000.
13. Sanda Harabagiu, Mihai Surdeanu, Rada Mihalcea,
8. IR-244 2002 Pinto, D., Branstein, M., Coleman, R., Roxana Girju, Vasile Rus, Finley Lacatusu, Paul
King, M., Li, W., Wei, X. and Croft, W.B. QuASM: A Morarescu and Razvan Bunescu. Answering Complex,
System for Question Answering Using Semi-Structured List and Context Questions with LCC s Question- An-
Data , the JCDL 2002 Joint Conference on Digital swering Server. Tenth Text REtrieval Conference
Libraries, pp. 46-55 (TREC-10). Gaithersburg, MD. November 13-16,
9. David Ahn, Valentin Jijkoun, Gilad Mishne, Karin 2001.
Müller, Maarten de Rijke, and Stefan Schlobach (In-
formatics Institute, University of Amsterdam): A Re-

You might also like