You are on page 1of 7

International Journal of Computer Information Systems, Vol. 3, No.

6, 2011

Structure of XML Database Using Xindice


Mr. Pravin R. Nerkar
Department Of Information Technology Sipnas C.O.E.T., Amravati. Amravati, India. nerkarpravin@yahoo.com Abstract This Apache Xindice pronounced as zeendee-chay is an example of a Native XML DB. Apache Xindice is a database designed from the ground up to store XML data or what is more commonly referred to as a native XML database. XML Database is just like any other database system with the only difference being the data is stored in XML format in this case. XML has become a norm for data transportation these days so you may have a requirement of directly dealing with XML Input and similarly you may need to return an XML Output. In the project we are developing a GUI based front end for the XML Database maintained using Xindice. XQuery language will be used to send queries to the database. Based on the results returned from the database, some parameters would be calculated like execution time for the query and the size of the query. These parameters would be then tried to optimize by modifying the query. And the results obtained from such query would be tested against those of the original query. An XML database would be developed on Xindice from an existing relational database system database. This database would be then used for query processing and optimization. At last the overall aim of study involves to understanding of XML database.
Keywords-Xindice;Collection;Document;XML File

Prof. Ms. S. S. Dhande


Department Of Computer Science & Engineering Sipnas C.O.E.T., Amravati. Amravati, India. dhande_123@rediffmail.com

I.

INTRODUCTION

A. Introduction XML Database is just like any other database system with the only difference being the data is stored in XML format in this case. XML has become a norm for data transportation these days so you may have a requirement of directly dealing with XML Input and similarly you may need to return an XML Output. In such cases (which are increasingly getting more and more popular) XML DB can be of great help which avoids you to pay much attention to how the mappings actually take place. Apache Xindice pronounced as zeen-dee-chay is an example of a Native XML DB. Apache Xindice is a database designed from the ground up to store XML data or what is more commonly referred to as a native XML database. The benefit of a native solution is that you don't have to worry about mapping your XML to some other data structure. You just insert the data as XML and retrieve it as XML. You also gain a lot of flexibility through the semi-structured nature of XML and the schema independent model used by Xindice[1]. This is especially valuable when you have very complex XML

structures that would be difficult or impossible to map to a more structured database. Similar to the JDBC and ODBC in case of relational databases, there is XAPI interface for XML Databases which provide an implementation-independent access to the XML database. Here are some key features of "Apache Xindice": Document Collections: Documents are stored in collections that can be queried as a whole. You can create collections that contain just documents of the same type or you can create a collection to store all your documents together. The database doesn't care. XPath Query Engine: To query the Document Collections you use XPath as defined by the W3C. This provides a reasonably flexible mechanism for querying documents by navigating and restricting the result tree that is returned. XML Indexing: In order to improve the performance of queries over large numbers of documents you can define indexes on element and attribute values. This can dramatically speed up query response time. XML:DB XUpdate Implementation: When you store XML in the database you may want to be able to change that data without retrieving the entire document. XUpdate is the mechanism to use when you want to do server side updates of the data. It is an XML based language for specifying XML modifications and allows those modifications to be applied to entire document collections as well as single documents. Java XML:DB API Implementation: For Java programmers Xindice provides an implementation of the XML:DB API. This API is intended to bring portability to XML database applications just as JDBC has done for relational databases. Most applications developed for Xindice will use the XML:DB API. II. LITERATURE REVIEW / RELATED WORKS A. Overall Xindice Architecture Xindice is a native XML database engine that is written entirely in Java. As such it must always be hosted by a Java Virtual Machine (JVM)[2]. When running, a Xindice instance stores a number of data items in Java objects inside the JVM, the most important of which are: Object representation of Collection hierarchy

December Issue

Page 51 of 72

ISSN 2229 5208

Client connection sate information Various cached data items In addition, Xindice needs access to disk files containing the XML data, and related meta-data. The files are stored inside a diskfile directory hierarchy, that starts somewhere called the database root. 1. Access modes: Xindice can be set up to run in a JVM in two different ways, depending on how clients will want to use Xindice. In embedded mode, a complete Java application will set up a Xindice instance in its own JVM. Only that one Java application is able to access and manipulate the data in Xindice. Clients using the XML: DB API will use something called the embedded driver to access the Xindice instance that is running inside the same JVM as the host application. In server mode, Xindice is run as a standard J2EE web application, in some web application container, such as Apache Tomcat. In this mode, the JVM hosting Xindice, is in fact the JVM running the web application container. Clients connect to Xindice from different JVM's possibly located on different machines, using XML-RPC, a Remote Procedure Call standard designed to work on top of HTTP (which is why Xindice is packaged as a web application in this mode). B. Organization Of Collections Logically, all XML data stored in Xindice is organized into a hierarchy of collections. A collection is exactly what its name suggests: it contains any number of XML documents, and can in addition contain its own child collections, thus providing a hierarchy. The "root" collection is also called the Database[3]. It is special in that: It has no parent. It can contain no XML documents of its own. It only has child collections. Each collection in the database is represented in Java by an object of class org.apache.xindice.core.Collection. As with many Java objects inside Xindice, it is initialized using an XML configuration description, a piece of XML describing the properties of the collection.This XML configuration is modeled in Java as an object of class org.apache.xindice.util.Configuration.To set up the configuration of a collection, Xindice calls the collection's setConfig() method, passing it an appropriately obtained org.apache.xindice.util.Configuration object. 1) The Database Organization: Database or "root" collection is the Java object that provides the link to everything else used by the Xindice instance. When Xindice first starts, its first act is to create and initialize an object of class org.apache.xindice.core.Database, which extends org.apache.xindice.core.Collection. The database object is initialized using an XML configuration file that is obtained from outside the database (i.e. it is stored simply as a file somewhere). In the case of an embedded

International Journal of Computer Information Systems, Vol. 3, No. 6, 2011 Xindice instance, the file is referenced using the Java property xindice.configuration, whereas in server mode, the file is referenced by the parameter xindice-configuration in the Xindice web application's web.xml file. The format of the XML configuration file used to initialize the database object is as follows: <xindice> <root-collection dbroot="./db/" name="db"> <queryengine> <resolver autoindex="false" class="org.apache.xindice.core.query.XPathQueryResolv er"/> <resolver class="org.apache.xindice.core.xupdate.XUpdateQueryRe solver" /> </queryengine> </root-collection> </xindice> In fact, if during initialization, the XML configuration file cannot be found for some reason, rather that throw an error, Xindice will assume a default configuration file, which is exactly the one shown above. 2) The database root directory: This directory contains all data and meta-data for the XML content of the database.The directory structure inside this database root directory reflects the child collection structure of the database. So if there is a child collection named mycol in the database, the database root directory will contain a subdirectory named mycol and so on. Each collection's directory contains at the minimum a file with extension .tbl that contains all the XML documents stored in that collection. The file is not human-readable. 3) The System Collection: One special collection, called system, always exists within a Xindice database. When the Xindice database is initialized, it automatically also loads the system collection, as this known to always exist. The structure of the system collection is simple: it contains no documents of its own, but contains two child collections: SysConfig and SysSymbols. The SysConfig [3]collection contains exactly one document called database.xml. SysSymbols contains various documents that are in fact the Symbol tables used for storage of the element and attribute names of all XML content in the database. The database.xml is the XML configuration file that is used to initialize all other collections in the database. It is located in the database itself, because it obviously needs to be updated each time collections are added or removed from the database. Its structure is as shown below: (you can check your own configuration by issueing the command-line tool invocation: xindice rd -c /db/system/SysConfig -n database.xml) <database name="db"> <collections> <collection compressed="true" name="james">

December Issue

Page 52 of 72

ISSN 2229 5208

<filer class="org.apache.xindice.core.filer.BTreeFiler" /> <indexes> <index class="org.apache.xindice.core.indexer.ValueIndexer" name="myidx" pattern="sub" /> </indexes> <collections> <collection compressed="true" name="sub"> <filer class="org.apache.xindice.core.filer.BTreeFiler" /> <indexes /> </collection> </collections> </collection> <collection compressed="true" name="james_sub"> <filer class="org.apache.xindice.core.filer.BTreeFiler" /> <indexes /> </collection> </collections> </database> As you can plainly see, it is here that the XML configuration for all remaining collections is stored. Please note that the system collection is not mentioned here. The XML configuration for the system is hard-coded into Xindice as follows: <collection name="system"> <!-- No filer for system collection: it contains no doucments itself --> <collections> <collection name="SysSymbols" compressed="true"> <filer class="org.apache.xindice.core.filer.BTreeFiler" /> <symbols> <symbol name="symbols" id="0" /> <symbol name="symbol" id="1" /> <symbol name="name" id="2" /> <symbol name="id" id="3" /> <symbol name="nsuri" id="4" /> </symbols> </collection> <collection name="SysConfig" compressed="false"> <filer class="org.apache.xindice.core.filer.BTreeFiler" /> </collection>" </collections>" </collection> The only way to modify this configuration is to change the Xindice source code (org.apache.xindice.core.SystemCollection class) and recompile. 4) Other Collections: The XML Configuration data used to initialize collection objects in Java (of class org.apache.xindice.core.Collection)[4] is, as shown in the examples above, located in an XML document in the system collection, or, in the case of the system collection itself, hard-coded into Xindice. The important aspects of this configuration data are:

International Journal of Computer Information Systems, Vol. 3, No. 6, 2011 A collection is represented by a collection configuration element. The name attribute indicates the collection's name. The compressed attribute, usually true indicates how XML data should be encoded by the collection's filer. More on this in a subsequent chapter. Each collection element contains a filer element. This element basically tells the collection the name of a Java class that it can use to read its own data file. (The file with a .tbl extension mentioned earlier). org.apache.xindice.core.filer.BTreeFiler is the most common, and indeed the standard filer class used in Xindice. If a collection has no filer (because the filer element is missing, or because the Java class it points to couldn't be loaded), then it won't be able to store any XML documents. That's the case for example of the database (root collection), and the system collection. A collection element may contain a collections element, which contains one collection element per child collection of the collection under discussion.

C. XPath Queries The org.apache.core.query.XPathQueryResolver class implements a query resolver for the XPath language[7]. Recall that a query resolver provides two query methods: one for immediate execution and another for storing (after compiling) invokations for later use. Internally, the xpath query resolver always compiles queries into org.apache.core.query.XPathQueryResolver.XPathQuery. It then may or may not execute the query immediately depending on which method was called on the resolver. Analyzing and compiling the query is actually handled by Xalan, not Xindice. Xalan contains XPath manipulation and evaluation classes, and Xindice uses these. When an XPath query needs to be evaluated, a set of candidate documents is selected from the collection. Then the XPath is evaluated using the Xalan classes against each of these documents in turn: the document is loaded into a DOM tree using the compressed DOM and B-Tree filer classes, and the XPath is evaulated against this DOM tree. The results of all evaluations are aggregated and returned. It follows from the above that there is absolutely no performance gain in using Xindice to evaluate an XPath with respect to a document. Using a parsed XML file, exactly the same performance would result. Xindice's main contribution is in searching through a large collection of documents, as in this case, it can use indexes to intelligently select a set of candidate documents. 1) Selecting candidate documents from a collection: When performing a query, recall it is posible to specify which documents should be considered. If this is done, then Xindice will use the provided set as the candidate set, and execute the XPath query against each of them, reading them into memory first. If however no explicit set of documents is speficied, Xindice will try to locate an appropriate index based on the XPath

December Issue

Page 53 of 72

ISSN 2229 5208

query. This index can then provide an intelligent set of candidate documents, and the XPath is evaluated against only these documents. If no appropriate index is found, Xindice resorts to "brute force": it evaluates the XPath expression against every document in the collection, thus effectively reading in, parsing and searching each document. III. ANALYSIS OF PROBLEM

International Journal of Computer Information Systems, Vol. 3, No. 6, 2011 IV. IMPLEMENTATION In the proposed work I have implemented a user friendly and graphical interface to Xindice Database system to perform all operations like accessing and adding XML Documents into database. The application would also manage collections stored in the Xindice with respect to create new collection or delete collection and add collection. The application developed in Java having explorer like windows with a panel to show the collection and documents inside particular collection. After listening document we can view their XML and tree format of file. There is a feature to write and execute XPath queries to the database and display results in an output windows. A. Structure Of Database

A. Analysis Xindice provides many features to access and update data into the XML database. There are query languages especially designed to write and execute queries on the database like XPath, XUpdate and XQuery. The data retrieved from the database can be presented in various forms like records and btrees. Although I have reviewed the XML Database and the Xindice, I did not come across any graphical user interface or a GUI front end to Xindice which I intend to develop in my project work. The figure shows the process of accessing data from a browser based application developed in php and deployed on Apache server. The request is sent through HTTP to the server which in turn fires an XPath query to the Xindice[9]. The Xindice returns the XML Results back to the browser application by converting it to DHTML document.

Figure 2.Structure Of Database

For creating structure of database we create first user collection and inside that we add XML document, these are nothing but XML files. After creating we can also delete that particular document and collection[10]. 1) Adding a Collection: We can add a collection in the root collection by xindiceadmin.bat add_collection -c /db n where -c The collection context under which to create the new collection. -n The name of the collection to create 2) Deleting a Collection:
Figure 1. Architecture of Xindice

B. System Requirement After analyzing whole system, I found technical supports in terms of hardware and software is. Minimum hardware requirements: Processor:2GHz Pentium IV Main Memory: 2GB RAM Software Requirements: Operating System: Windows XP, Windows 7 Programming Language: Java Setup: Xindice version 1.0

We can delete a collection from the root collection by xindiceadmin.bat delete_collection -c /db -y n where -c The collection context under which to delete the collection. -n The name of the collection to delete. 3) List Collections:

We can view collection list from root collection by xindiceadmin.bat list_collections -c /db where -c The collection context under which all sub collections are listed

December Issue

Page 54 of 72

ISSN 2229 5208

4)

Adding a Document:

International Journal of Computer Information Systems, Vol. 3, No. 6, 2011 So implemented GUI is look like.

Adds a document to a collection requires two parameters - the collection it will be stored under, and the file path to the document. xindice.bat add_document -c /db/ -f where -c The collection context under which to add the document -f The complete file path to the document being added 5) Deleting a Document:

Deletes an existing document from a collection or nested collection within the database[3]. xindice.bat delete_document -c /db/ -y n where -c The collection context under which to delete the document -n The key of the document to be deleted 6) Retrieving a Document

Screenshot 1:- Basic GUI

V.

EXPERIMENTAL RESULTS

Retrieves an existing document from a collection or nested collection within the database. The complete path where the document will be stored is required. xindice retrieve_document -c /db/ -n f where -c The collection context under which to retrieve the document -n The key of the document to be retrieved -f The file path to store the document under B. Query Processing

This section presents the screenshots of the detailed results of proposed system in order to demonstrate the complete process. First we have to start the server and open GUI.

Screenshot 2:- Basic GUI

Figure.3. Structure Of Xpath

In Xpath query we write root node name value from particular XML file. In processing part it will search that particular name value in particular database file and retrieve whole root node structure.

Now there are some buttons to create a collection, delete a collection, add a document, delete a document e.t.c. as shown in screenshot 2. Now for creating a collection click on Create Collection button, we will see one input box where we have write name of collection (Ex. student) which is shown following screenshot 3.

December Issue

Page 55 of 72

ISSN 2229 5208

International Journal of Computer Information Systems, Vol. 3, No. 6, 2011

Screenshot 5:- Tree view

Screenshot 3:- Adding collection

After we can see our created collection in the list as shown in screenshot 4.

Screenshot 6:- XML View

Screenshot 4:- Added collection

Then we can add particular XML file nothing but a document (Ex. add12) to that collection (Ex. student) using Add Document button. After clicking button we can see again input box where we have to write name of a collection and choose a XML file path. After adding documents to collection we can view how many documents are in that particular collection by selecting collection and click on select collection. After listing the documents we can view XML and tree view as shown in following screenshot 5 and 6.

This is all about creation of collection, adding document, viewing XML and tree view. Same we can delete collection and document using buttons. For deletion of collection just click on Delete Collection button, it will open again input box where we have to write name of collection which we have to delete

Screenshot 7:- Deleting collection

December Issue

Page 56 of 72

ISSN 2229 5208

And lastly for deleting document just click on Delete Document button, it will open again input box where we have to write name of a collection where that document is stored and then write name of a document, it will delete document. As we have seen in Xpath query structure we have to write name of root tag value then it will display result. So in implementation result we can view Xpath result as shown below screenshot 8.

International Journal of Computer Information Systems, Vol. 3, No. 6, 2011 XML documents are in the plain text format, XML databases are very easy to manage and maintain. In a nutshell, the XML database can ideally replace conventional database systems. The proposed project work would be developed as a user friendly, front end for Xindice database which could be used by organizations and institutions or any establishment where the data is in very large volumes and the storage space is major criteria. The GUI based tool will replace the command line operations of Xindice system and reduce the tedious and lengthy queries fired to the database and the same time it would generate interest and awareness of using Xindice system and native XML databases amongst the database users and administrators. In future, big projects which require large database can be shifted to xindice . REFERENCES
[1] S. Helmer, c. Kanne, G. Moerkotte. Isolation in XML Bases. Technical Report of The University of Mannheim,2001 [2] E. M. Dashofy, A.Hoek, R. N. Taylor, A Highly-Extensible, XMLbased Architecture Description Language, Proc. Of Working IEE/IFIP Conference on software Architecture,2001 [3] Apache Xindice native XML database. http://xml.apache.org/xindice/index.html. [4] Vana, J.: Integrity of XML data(in Czech). Master Thesis, Dept. of Software Engineering, Charles University, Prague.2001. [5] Fuhr, N., Gvert, N., Kazai, G., Lalmas, M.: Initiative for the Evaluation of XML retrieval(INEX). Proceedings of the First INEX Workshop. ERCIM Workshop proceedings. ERCIM ,Sophia Antipolis, France,2003. [6] Scott Boag, Don Chamberlin, Mary F. Fernandez, Daniela Florescu, Jonathan Robie, Jerome Simeon. XQuery 1.0: An XML Query Language W3C Recommendation 23 January 2007. [7] Andres Berglund, Scott Boag, Don Chamberlin, Mary F. Fernandez, Michael Kay, Jonathan Robie, Jerome Simeon.XML Path Language(XPath) 2.0. W3C Recommendation 23 January 2007. [8] Andres Laux, Lars Martin. XUpdate. Working Draft-2000-09-17 . http://xmldb-org.sourceforge.net/xupdate/xupdate-wd.html [9] E. Rahm, H. Do, S. Mamann: Matching Large XML Schemas, SIGMOD Record 33(4), December2004. [10] Elliotte Rusty Harold. Managing XML Data: Native XML databases Theory and reality. 6 June 2005 AUTHORS PROFILE

Screenshot 8:- Xpath Result

So from this result it is cleared that this GUI is much handier for users. A. Advantages Of XML Database Of Xindice Verses Other Database:

XML database is more secure than other database as we cant view where collection are creted. It is more faster in execution. It required very less space to install on any system as compare to other database like SQL. As all data are in xml format, so it required less space to stored data. It is freely available. VI. CONCLUSION AND FUTURE SCOPE

The traditional database systems occupy a large amount of disk spaces after the installation. The databases creation and population of data adds more space. Many database systems slow down the computer operations and speed. The access to the database and managing with respect to insert, delete and update operation may also be time consuming features.Native XML Database systems are implemented to store large amount of data using XML documents. There are many advantages of XML databases against the traditional Relational Databases like the XML databases take very less disk space since all

1) Mr. Pravin R. Nerkar has completed his B.E.(I.T.) and pursing M.E.(I.T) form Sipnas C.O.E.T. Amravati (Amravati University)

Photo Goes here

2) Ms. S. S. Dhande has completed her M.E.(Computer Science & Engg.) and working as a Associate professor in Sipnas C.O.E.T. Amravati.

December Issue

Page 57 of 72

ISSN 2229 5208