You are on page 1of 28

Integration of Heterogeneous

Databases into XML Format with

Translator

A Project report
submitted in partial fulfilment of
the requirements for the award of the degree of
Bachelor of Technology
in
Computer Science and Engineering
by
Abhishek Singh Y2.245
Anurag Singh Y2.246
Khichade Sagar Shivdas Y2.319

Department of Computer Engineering


National Institute of Technology, Calicut
Kerala - 673601
april, 2006
Certificate

This is to certify that the project entitled “Integration of Heterogeneous Databases

into XML Format with Translator” is a bonafide record of the project under-

taken by Abhishek Singh Y2.245, Anurag Singh Y2.246, Khichade Sagar

Shivdas Y2.319, under our supervision and guidance. The project report has been

submitted to the Department of Computer Engineering of National Insti-

tute of Technology, Calicut in partial fulfilment of the award of the Degree of

Bachelor of Technology in Computer Science and Engineering.

Dr. M.P. Sebastian Mr. K. A. Abdul Nazeer


Professor and Project Guide
Head Senior Lecturer
Dept.of Computer Engineering Dept.of Computer Engineering
NIT Calicut NIT Calicut
iii

Abstract

The objective of this project is to explore a method to integrate multiple

heterogeneous databases like MySql, Oracle, Sybase, Postgre, by taking the advan-

tage of some of the latest Web technologies.It provides a global interface for these

databases, accepts a query from the user in a natural language format(english),

translates this query into SQL using a translator, then retrieves data from the

database, displays the result to the user and convert it into a unified format. The

unified format here we are using is XML.


iv

Acknowledgements

We thank Mr. K. A. Abdul Nazeer, Senior Lecturer, Department of Com-

puter Engineering, NIT Calicut for his guidance and co-operation. We also ac-

knowledge the advice and help given to us by our friends.

We would like to extend our gratitude to the entire faculty and staff of the

CSED NITC, who stood by us in all pits and falls we had to face during the

development phase of this project.

Abhishek Singh

Anurag Singh

Khichade Sagar Shivdas


v

Contents

Chapter

1 INTRODUCTION 1

1.1 Problem Specification . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.3 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Organization of Report . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 DESIGN 4

2.1 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Syntax Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3 Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.4 Execution of SQL Query . . . . . . . . . . . . . . . . . . . . . . . . 6

2.5 Conversion into XML . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 IMPLEMENTATION 8

4 RESULTS 9

4.1 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4.2 Snapshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4.2.1 Configure NAT-SQL . . . . . . . . . . . . . . . . . . . . . . 9

4.2.2 Login to database . . . . . . . . . . . . . . . . . . . . . . . . 10

4.2.3 NAT-SQL main window . . . . . . . . . . . . . . . . . . . . 10


vi
4.2.4 Database view . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.2.5 Create your own SQL . . . . . . . . . . . . . . . . . . . . . . 13

4.2.6 Enter your own Query . . . . . . . . . . . . . . . . . . . . . 13

5 CONCLUSION 19

5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Bibliography 20
Tables

Table
Figures

Figure

2.1 Overall Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Design of Translaror . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4.1 Configure NAT-SQL . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.2 Login to database . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.3 NAT-SQL main window . . . . . . . . . . . . . . . . . . . . . . . . 14

4.4 Database view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.5 Create your own SQL-1 . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.6 Creating your own SQL-2 . . . . . . . . . . . . . . . . . . . . . . . 17

4.7 Enter your own Query . . . . . . . . . . . . . . . . . . . . . . . . . 18


Chapter 1

INTRODUCTION

1.1 Problem Specification

There are many different heterogeneous databases like Mysql, Oracle etc.

Here we will integrate those databases into a unified XML format. To achieve this,

we will implement a translator, which will take user query in natural language

(English) as its input and gives corresponding SQL query as output. These queries

will be applied to underlying database to fetch information from the database and

then these will be displayed to the user. At the same time fetched data will be

converted into XML format and stored in a file as a XML document.

This XML document can be used further for other XML processing like converting

it into HTML or another XML document using XSL

1.2 Motivation

Hospitals have to exchange data between them to transfer the patient to a

suitable hospital in a timely fashion, and to arrange appropriate treatment for the

patient. Taking advantage of the Internet to exchange data can accelerate the

process of transfer of patient related data, it also avoid the unnecessary repeat

examinations and decrease the wastage of medical resources. To transmit data,

we must address the problems of data exchange. The most important thing for

information interchange is that the data format is common which will reduce the
2
complexity of data exchange. The World Wide Web Consortium (W3C) in 1996

suggested that the XML standard would be the solution to this problem. In a

similar way, with the rapid growth of the Internet, lots of other organizations make

use of the Internet to provide information services. In the past few years, XML has

become the standard for exchanging data on the Internet and on intranets in the

enterprise. XML is playing a fundamental role in the integration of heterogeneous

data. Starting out as a standard data exchange format for the Web, it has become

instrumental in the development of electronic commerce applications and online

information services. As a consequence, the amount of XML data being stored and

processed is large and will be increasing at an astonishing rate. This has caused

XML data management to become a focus of research efforts in the database

community. There are several benefits. Among them:

• The data is self-describing.

• The data can be viewed with standard tools.

• Different views of the same data are easy to create with style sheets.

• The data can be manipulated with standard tools.

• XML files compress very well

To make this task more user friendly, a natural language interface can be pro-

vided.

1.3 Literature Survey

Extensive study was done on topics relevant to the project. These topics

include development of translator [2] and XML [5][6][7]. We had to look into

the syntax of input user query in natural language and SQL queries (the output

of translator) of all types for making the translator. Our implementation is in


3
PERL[1][3][4], so we learn necessary issues in Perl like pattern matching, file han-

dling, database connectivity, GUI etc. We can also apply XQUERY on XML data

[8][9][10].

1.4 Organization of Report

The remaining part of the report deals with design of the project, snapshots,

work done, results obtained, conclusion and references. Details will be explained

in the respective sections as follows:

Section 2: This section will describe the design of the whole project. It will

start from the first step in the design from where input is taken till the last step

where we obtain the output.

Section 3: This section will show the snapshots obtained by running the

program.

Section 4: This section will describe the work done in the project. It will

put light on the various stages in which the implementation was done.

Section 5: This section will mainly deal with the results which were obtained

by running the program.

Section 6: This section will conclude the whole document. It will try to put

some light on what was achieved by doing this project and what more has to be

done.

Section 7: This section has all the references which were used while studying,

designing and implementing the project.


Chapter 2

DESIGN

The details of the design stages are as follows:

2.1 Configuration

This stage takes some preliminary information from user which will be used

for database connection. This will be done by a GUI where user can enter the

required informations. These informations are user name, password, database

type, database source. number of maximum rows to be displayed and enable

learning.

2.2 Syntax Analysis

This stage checks for any syntax errors in the input user query/statement

as described in the grammar. In case of any error conforming to the grammar an

error message is reported. The tool used to implement this stage is a Perl Module

known as Parse::RecDescent.

2.3 Translation

This stage translates the input user query into SQL query after syntax anal-

ysis is over. So here input will be a query in english language conforming to the

underlying grammer and output will be a SQL query.


5

Figure 2.1: Overall Design


6
2.4 Execution of SQL Query

This stage executes the translated SQL query. It involves database connec-

tion and then passing the query to the database then get the resulting data back.

This is the stage where the information provided by user in configuration stage

will be used for database connection. All these things will be done by GUI. Result

will be displayed to user also.

2.5 Conversion into XML

This stage converts the result obtained by executing the SQL query into

XML format. It involves Perl file handling.


7

Figure 2.2: Design of Translaror


Chapter 3

IMPLEMENTATION

The project was divided into modules covering all stages and implemented

as stage by stage. The coding was done in Perl.

Coding for configuration part was done as one stage which involves GUI and file

handling.

The second stage was syntax analysis which is completed by writing the grammar

according to Parse::RecDescent specifications and SQL syntax. Coding of transla-

tor is done in this stage.

The third stage was the coding done for executing the SQL query (the output of

translator) and getting the data as a result.

The last stage was the conversion of this data into XML format. It involves file

handling in Perl only.

After doing all above we integrate it and get the code as a whole.
Chapter 4

RESULTS

4.1 Testing

The output of the program was tested with many possible input query in

natural language and compared with expected output and was observed to be

correct for all runs. Thus the code runs successfully for commands SELECT queries

with other functions like AVG, COUNT, SUM, DISTINCT. The test data was

chosen such that the errors covered could be checked. Any query which does not

match with the grammar will be reported as an error.

Thus we have successfully done our project and we use it to convert any database

into XML format.

4.2 Snapshot

4.2.1 Configure NAT-SQL

When you run NAT-SQL for the first time, following configuration window

will come up. You need to enter all the following information before continuing to

the NAT-SQL.

Source - The location or the name of the data source


10
Type - List of all the available database drivers installed in you computer.

Select the type of the database if you already know or otherwise chose the option

”Do not Know”. NAT-SQL will attempt to find the database type if you do not

know the database type.

If type of database is not in he list (i.e. ”Other”), then the NAT-SQL cannot

proceed any further until you install the appropriate database driver.

Available sources - These are the available data sources for the selected

database type. You can either select the data source from this list or type it in the

source entry field.

User - user name to access the database.

Password - password to access the database.

No of rows - Maximum number of rows to display when the SQL statement

is executed. It is strongly recommended to keep this value below 100 as this may

consume lots of computer resources. (default - 20)

Enable learning - Enables or disables the learning functionality of NAT-

SQL. (Default - Enabled)

4.2.2 Login to database

Since NAT-SQL does not store your password to access the database, you

need to enter the database password every time you run the program. Therefore

NAT-SQL will always bring up the following login window at the start-up.

User - user name will be asked here for the database connection

Password - password will be asked to validate the user.

4.2.3 NAT-SQL main window

Translate the query to SQL - As the name suggests, this button calls the

NAT-SQL parser to translate your natural language query to SQL. The results of
11

Figure 4.1: Configure NAT-SQL


12

Figure 4.2: Login to database


13
the translation is displayed the text area immediately below this button.

Clear - clear the content in the natural language query area. Create your

own SQL - This brings up the window where you can easily create SQL statements

by selecting tables and columns.

Execute SQL - Execute the SQL statement and display the results in the

results table. If here is an error in the SQL statement, an error message is displayed.

Note that you can only execute SELECT queries. Trying to execute any

other type of SQL statement will generate and error message.

Database - brings up the window that displays the database structure.

Configure - brings up the configuration window where you can change

database and other information.

4.2.4 Database view

It will give the view of the data source i.e. the names of tables and all

columns in each table

4.2.5 Create your own SQL

Here a GUI will be shown to user with table names, column names in each

tables and a space to give conditions. Here user need not remember the grammer

of the input language. He can create SQL query by just selecting the necessary

table, then columns and at last he can give conditions.

4.2.6 Enter your own Query

Here user has to enter the english language query. So it is necessary that

user should know the grammer of the input language.


14

Figure 4.3: NAT-SQL main window


15

Figure 4.4: Database view


16

Figure 4.5: Create your own SQL-1


17

Figure 4.6: Creating your own SQL-2


18

Figure 4.7: Enter your own Query


Chapter 5

CONCLUSION

5.1 Summary

We implemented our own translator as an interface to the databases which

translates natural language query into SQL commands. This translator works only

for SELECT commands because here we are not giving privilege to normal user

to make any change to the underlying databases. After retrieving the data from

database, we are converting it into XML format which can be use for further pro-

cessing. Overall this project helped us to learn the development of the translator

and issues related to XML.

5.2 Future Work

The translator works only for SELECT commands but it can be easily ex-

tended to other commands like CREATE, UPDATE, DELETE, TRUNCATE etc.

by just modifying the grammar and writing actions for the same.
Bibliography

(1) Mike Glover, Aidan H., Ed Weiss,Perl 5 How To Techmedia


1st Edition
(2) Christopher M. Frenz,Pro Perl Parsing , APress, 25 Jul 2005
(3) Alligator Descartes, Tim Bunce, Programming the Perl DBI:
Database Programming with Perl , 2nd Edition
(4) www.perl.com
(5) Elliotte Rusty Harold,XML Bible , WILEY- dreamtech India
Pvt.Ltd, 2nd Edition
(6) Giovanni Guardalben,Integrating XML and Relational
Database Technology , ACM Journal, vol. no. 10, issue March
04
(7) Anglea C. Duta, ConvRel:Relationship Conversion to XML
nested structures , ACM Journal, vol. no. sac 04 March 14-17,
2004
(8) J. Robie, J. Lapp, D. Schach,XML Query Language
(9) World Wide Web Consortium, Extensible Markup Language
1.0, W3C Recommendation
(10) World Wide Web Consortium, XPath , W3C Recommendation

You might also like