You are on page 1of 10

A Graph-Based Relation Extraction Method

for Question Answering System


A Graph-Based Relation Extraction
Method for Question Answering System

A thesis submitted in partial fulfillment of the requirements for the degree


of
Master of Computer Applications

by

Athulya S. – AM.EN.P2MCA15011
Salma Shaji – AM.EN.P2MCA15022
Department of Computer Science and Application

Amrita Vishwa Vidyapeetham,


Amrita University,
Amrita School of Engineering,
(Estd. U/S 3 of the UGC Act 1956),
Amritapuri Campus, 690525.

June 2017
AMRITA VISHWA VIDYAPEETHAM,
AMRITA UNIVERSITY
AMRITA SCHOOL OF ENGINEERING, AMRITAPURI CAMPUS

BONAFIDE CERTIFICATE

This is to certify that the thesis entitled “A Graph-Based Relation Extraction Method
for Question Answering System” submitted by Athulya S, AM.EN.P2MCA15011
and Salma Shaji, AM.EN.P2MCA15022 in partial fulfillment of the requirements for
the award of the degree of Master of Computer Applications is a bonafide record of
the work carried out under our guidance and supervision at Amrita School of Engi-
neering, Amritapuri.

SIGNATURE SIGNATURE

Project Coordinator Project Guide


Manjusha Nair .M Veena G
Assistant Professor, Assistant Professor,
Dept. of CSA Dept. of CSA

This thesis is approved and evaluated by us on …………………..

INTERNAL EXAMINER EXTERNAL EXAMINER


(Name and Signature) (Name and Signature)
DECLARATION

We , Athulya S and Salma Shaji, hereby declare that this project entitled “A Graph-
Based Relation Extraction Method for Question Answering System” done at Amrita
Vishwa Vidyapeetham is a record of original work done by us under the guidance of
Mrs.Veena G, Department of Computer Science And Applications, Amrita School of
Engineering, Amritapuri and this work has not formed on the basis of the award of any
degree/diploma/fellowship or a similar award to any candidate in any University, to the
best of our knowledge.

Place: Amritapuri
Date:

Signature of Student: Signature of Project Guide:


ACKNOWLEDGEMENTS

First of all we would like to thank the Almighty for giving us the courage to complete
this project work successfully. We express our gratitude to our respected Chancellor
Sri Mata Amritanandamayi Devi for being a backbone to achieve this project suc-
cessfully.

We express our deep gratitude to Dr S.N. Jyothi, Principal, Amrita School of Engi-
neering, Amritapuri.

We also take this opportunity to thank Mr Ramachandra Kaimal, Chairman, De-


partment of Computer Science and Applications, Amrita School of Engineering,
Amritapuri for his permission to do the project.

Also we would like to express our deep-felt gratitude to Mrs. Manjusha Nair and
Mrs. Kavitha K.R, Project coordinators, Department of Computer Science and Appli-
cation, Amrita School of Engineering, Amritapuri for their primary support throughout
the project.

Our heartfelt thanks to Internal Guide, Mrs. Veena G, Assistant Professor, Amrita
School of Engineering, Amritapuri, who has supported and guided us throughout the
project period by continual encouragement through a relaxed approach.

We would also like to thank our friends for giving their valuable information and to
our family for their moral support.

We are extending thankfulness to Mrs.Ani R, Vice-Chairperson, Department of Com-


puter Science and Applications, Amrita School of Engineering, Amritapuri for her
primary support throughout the project.

Last but not the least we would like to thank all the people who are directly or indi-
rectly involved in this project for granting their support.
ABSTRACT

Question Answering (QA) is the method of automatically answering a question asked


by human in natural language using either a pre-structured database or a collection of
documents. It is a rising new information service following the popularization of
search engines. This thesis will focus on a QA system for reading comprehension tests
that pick out the sentence in the passage that best answers a given question by extract-
ing the relations. We used a graph-based approach, were the input document get
converted to a multigraph using the relations extracted. The goal of this work is to cre-
ate a QA system whose answers should be fast, short and as accurate as possible;
ideally, only one sentence or its part. In order to improve the accuracy, we included a
gender analysis, morphological analysis and a synonyms check along with coreference
resolution. As a result, the system achieved best results in terms of accuracy compared
to the existing systems.
Table of Contents

List of Tables...............................................................................................................
List of Figures .............................................................................................................
1 Introduction……………………………………………………………………....1
1.1 Overview of Question Answering System…………………………………..2
1.1.1 Questions………….………………………………………………...2
1.1.2 Answers…………………………………………………………......3
1.1.3 Data Sources………………………………………………………...3
1.2 Background and Context……………………………………………………4
1.3 Scope and Objectives………………………………………………………..6
1.4 Related Work………………………………………………………………..6
1.5 Overview of the Thesis……………………………………………………...9
2 Problem Description……………………………………………………………10
3 Methodology……………………………………………………………………12
3.1 Document Processing……………………………………………………...14
3.1.1 Pre-processing…………………………………………………………..15
3.1.1.1 Tokenization………………………………………………………..15
3.1.1.2 Parts-of-Speech Tagging…………………………………………...15
3.1.1.3 Named Entity Recognition…………………………………………16
3.1.1.4 Syntactic Dependency Parsing……………………………………..17
3.1.2 Coreference Resolution…………………………………………………18
3.1.3 Gender Analysis………………………………………………………...19
3.1.3.1 Naïve Bayes Classifier……………………………………………..19
3.1.4 Relation Extraction……………………………………………………..20
3.1.5 Graph Generation……………………………………………………….21
3.2 Query Processing……………………………………………………..........22
3.2.1 Pre-processing…………………………………………………………..22
3.2.2 Coreference Resolution…………………………………………………23
3.2.3 Relation Extraction……………………………………………………..24
3.3 Answer Extraction…………………………………………………………25
4 Results and Analysis……………………………………………………………28
4.1 Result……………………………………………………………………….29
4.1.1 Document Processing…………………………………………………...29
4.1.2 Query Processing……………………………………………………….31
4.1.3 Answer Extraction….…………………………………………………...32
4.2 Evaluation Metrics…………………………………………………………..33
4.3 Performance Evaluation…………………………………………………….34
4.4 Analysis……………………………………………………………………..34
5. Discussion and Conclusion……………………………………………………...35
5.1 Summary…………………………………………………………………...36
5.2 Future Work………………………………………………………………..37
References…………………………………………………………………………38
Appendix A : Research Paper……………………………………………………...40
LIST OF TABLES

Table 1.1: Difference between search engines and QAS……………………………...4


Table 3.1: Types of entities and their description…………………………….............14
Table 3.2: Types of dependency relations………………………………….………...18
Table 3.3: Example for relations extracted in triplet format……………….………...21
Table 3.4: Sample query triplets……………………………………………………...24
LIST OF FIGURES

Figure 2.1: Reading comprehension test document and query……………………….11


Figure 3.1: Overview of the proposed system………………………………………..13
Figure 3.2: Document processing steps………………………………………………14
Figure 3.3: Dependency graph of a sentence………………………………………...17
Figure 3.4.a: Sentence before coreference resolution……………...…………………18
Figure 3.4.b: Sentence after coreference resolution...............................................…...19
Figure 3.5: Relation extraction example……………………………………………..20
Figure 3.6: Multigraph generated using the extracted relations……………………...21
Figure 3.7: Query processing steps…………………………………………………..22
Figure 3.8: Answer extraction phase…………………………………………………25
Figure 3.9: Workflow of the proposed system……………………………………….27
Figure 4.1: Result of pre-processing steps for the input comprehension passage……30
Figure 4.2: Relations extracted from the input passage……………………………...30
Figure 4.3: Multigraph generated for the input passage……………………………...31
Figure 4.4: Pre-processing results of the input query………………………………...32
Figure 4.5: Sample output of the proposed system…………………………………..33
Figure 4.6: Precision, recall and accuracy measure graph……………...……………34
Figure 4.7: Comparison graph in terms of accuracy…………………………………34

You might also like