Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
1Activity
0 of .
Results for:
No results containing your search query
P. 1
Fragmentation Investigation And Evaluation In Distributed DBMS Using JDBC and OGSA-DAI

Fragmentation Investigation And Evaluation In Distributed DBMS Using JDBC and OGSA-DAI

Ratings: (0)|Views: 34|Likes:
Published by ijcsis
This research investigates and evaluate the impact of the fragmentation on different database retrieval modes based on derived horizontal fragmentation by generating and distributing the query to the servers (distributed search) or send the query to the direct server (direct search). Moreover, it provides recommendation on suitable query execution strategies based on a proposed fitness fragmentation formula. Furthermore, examine the suitable technology such as OGSA-DAI and JDBC in grid database to examine the time overhead in distributed systems and grid environments in different cases like size or number of servers. The results show that the fragmentation's time performance impact is clearly effective and positively applied while increasing the database size or the number of servers. On the other hand, the OGSA-DAI kept on showing slower execution time on all conducted scenarios, and the differences between the execution time exceeds up to 70% while increasing the size of data or number of servers. In addition, this thesis has tested the impact of fragmentation search against the distributed search where the first one submit the query to direct server(s) (direct search), and the second one distribute the query to the servers (distributed search). The result shows that the speed effectiveness of direct search technique in JDBC case is around 70% faster than the distributed search and around 50% faster in OGSA-DAI case.
This research investigates and evaluate the impact of the fragmentation on different database retrieval modes based on derived horizontal fragmentation by generating and distributing the query to the servers (distributed search) or send the query to the direct server (direct search). Moreover, it provides recommendation on suitable query execution strategies based on a proposed fitness fragmentation formula. Furthermore, examine the suitable technology such as OGSA-DAI and JDBC in grid database to examine the time overhead in distributed systems and grid environments in different cases like size or number of servers. The results show that the fragmentation's time performance impact is clearly effective and positively applied while increasing the database size or the number of servers. On the other hand, the OGSA-DAI kept on showing slower execution time on all conducted scenarios, and the differences between the execution time exceeds up to 70% while increasing the size of data or number of servers. In addition, this thesis has tested the impact of fragmentation search against the distributed search where the first one submit the query to direct server(s) (direct search), and the second one distribute the query to the servers (distributed search). The result shows that the speed effectiveness of direct search technique in JDBC case is around 70% faster than the distributed search and around 50% faster in OGSA-DAI case.

More info:

Published by: ijcsis on Jan 20, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

01/20/2011

pdf

text

original

 
(IJCSIS) International Journal of Computer Science and Information Security,Vol.
8
 , No.
9
 , 2010
FRAGMENTATION INVESTIGATION ANDEVALUATION IN DISTRIBUTED DBMS USINGJDBC AND OGSA-DAI
 
Ahmed Almadi, Ahmed Manasrah, Omer Abouabdalla, Homam El-Taj
National Advanced IPv6 Center of Excellence (NAv6)UNIVERSITI SAINS MALAYSIAPenang, Malaysia{almadi, ahmad, omar, homam}@nav6.org
 Abstract
— This research investigates and evaluate the impact of the fragmentation on different database retrieval modes based onderived horizontal fragmentation by generating and distributingthe query to the servers (distributed search) or send the query tothe direct server (direct search). Moreover, it providesrecommendation on suitable query execution strategies based ona proposed fitness fragmentation formula. Furthermore, examinethe suitable technology such as OGSA-DAI and JDBC in griddatabase to examine the time overhead in distributed systems andgrid environments in different cases like size or number of servers. The results show that the fragmentation's timeperformance impact is clearly effective and positively appliedwhile increasing the database size or the number of servers. Onthe other hand, the OGSA-DAI kept on showing slower executiontime on all conducted scenarios, and the differences between theexecution time exceeds up to 70% while increasing the size of data or number of servers. In addition, this thesis has tested theimpact of fragmentation search against the distributed searchwhere the first one submit the query to direct server(s) (directsearch), and the second one distribute the query to the servers(distributed search). The result shows that the speed effectivenessof direct search technique in JDBC case is around 70% fasterthan the distributed search and around 50% faster in OGSA-DAIcase.
 Keywords-component; JDBC; OGSA-DAI; Fragmentation; Distributed DBMS
I.
 
I
NTRODUCTION
 
In distributed systems, data are fragmented, located andbeing retrieved in a transparent manner among the distributedsites[3]. Therefore, accessing some distributed data fromdifferent locations are applied using a “View” of the data.However, technically, in the distributed systems, the cataloguedatabase is an essential demand where it makes an affix for thephysical
 
location into the catalogue [11].Moreover, web-services are playing a big role on retrievingthe fragmented database and applying certain services.Fragmentation is considered to be one of the most importantphases that is been conducted to achieve the distributeddatabase design. Yet, the impact of the fragmentationperformance on the case of increasing or decreasing theoverhead is unclear.Moreover, some noticeable overheads are come into viewclearly from several new technologies in the distributedsystems such as OGSA-DAI middleware. A reason is that thehigh-level technologies and processing gives a noticeableoverhead. In particular, the perceptible overheads areappearing clearly on retrieving databases and accessing thedistributed systems. From this point of view, main researchquestion is "How to reduce the overhead in the grid systemsand distributed systems on distributed database retrievalservice?"Sub-questions arise from the research are as the following:
 
What is the best database size to apply the fragmentationif we consider the performance in order to generate sub-queries to the servers or just to the local server 
?  
 
 
What is the tradeoff between transparency and the performance in case of using JDBC and OGSA-DAI?
 This paper focuses on the impact of the fragmentation ondifferent cases of database systems, and on the JDBCperformance under several layers of executions against theOGSA-DAI. The evaluation part will be based a quantitativeevaluation and the execution time overhead is the mainattribute of the evaluation.II.
 
1B
L
ITERATURE
R
EVIEW
 
1)
 
16B
 Distributed Query Performance
In processing such an index partitioning scheme twoapproaches are presented in response to a range query. Acomparison between such approaches and other similarschemes is done in order to compare their performances.Accordingly, this performance is assessed from theperspective of the response time, system throughput network utilization and disk utilization. Taking in account varying thenumber of nodes and query mix [12].Sidell in (1997) presented the distributed query processingproblem in Mariposa. A comparison of its performance with atraditional cost-based distributed query optimizer is obtained[9].
17http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
(IJCSIS) International Journal of Computer Science and Information Security,Vol.
8
 , No.
9
 , 2010
The ability to adapt a dynamic workload is displayedthrough a Mariposa system. In addition, the adaptivedistributed query processing in Mariposa and its interactionwith multi-user workloads, network latencies and query sizebenefits are investigated. Performance results observed toshow that the Mariposa system outperforms a static optimizeras it distributes works regularly among the available sites.Besides, it is to be noticed that the overhead which isintroduced by Mariposa's budding protocol gives insignificantresults if it is used with large, expensive queries. In addition,though for small queries it is outweighed by the benefits of load balancing. Truthfully , the comparisons based on theTPC-D benchmark show that the authors' point of view inwhich their approach behave as a static optimizer is influencedby network latency and query size [9].In 2003, a paper in title of “Distributed Query Processingon the Grid” [10] argues on the significant of the distributedquery processing in the Grid and on the facilities in the gridthat support the distributed query processing producers. Thepaper describes a Polar prototype implementation of distributed query processing running over Globus. They used abioinformatics case study to illustrate the benefits of theapproach [10].Oliveira et al., have presents in 2007 a paper that shows thedevelopment on the grid computing and a comparison wasconducted on two algorithms for planning the distribution andparallelization of database equerry on grid computing [8].Showing the partial order planning algorithm with resourceand monitoring constraints is the best choice for distributionand parallel DBMS queries was their main contribution.
2)
 
 Investigating the OGSA-DAI 
In grid computing, many investigations and studies onOGSA technology where aim to decipher the importance of OGSA-DAI and the benefits of its services.An overview of the design and implementation of the corecomponents of the OGSA-DAI project was presented in High-level manner. The paper describes the design decisions madethe project’s interaction with the Data Access and IntegrationWorking Group of the Global Grid Forum and provides anoverview of implementation characteristics. Implementationdetails could be seen from the project web site [2].In describing experiences of the OGSA-DAI team, a teamhas an experience in designing and building a database accesslayer using the OGSI and the emerging DAIS GGFrecommendations [7].They designed this middleware to enable other UK e-Science projects which need database access. It also providesbasic primitives for higher-level services such as DistributedQuery Processing. In addition, OGSA-DAI intends to produceone of the required reference implementations of the DAISspecification once this becomes a proposed recommendationand, until then, scope out their ideas, provide feedback as wellas directly contributing to the GGF working group [7].In this paper, issues that have arisen in tracking the DAISand OGSI specifications are presented. These issues appearedduring a development of a software distribution using the Gridservices model; trying to serve the needs of the various targetcommunities; and using the Globus Toolkit OGSI coredistribution [7].In 2008, Hoarau & Tixeuil, presented an experimentalstudy of studying the OGSA-DAI [5]. Results were quitestable and performed quite well in scalability tests, and wereexecuted on Grid5000. It is also discussed that the OGSA-DAIWSI uses a SOAP container (Apache Axis1.2.1) which suffersfrom severe memory leaks. It is shown that the defaultconfiguration of OGSA-DAI is not affected by that problem;however, a small change in the configuration of a Web-servicecould lead to very unreliable execution of OGSA-DAI [5].An OGSA-DQP is an open source service-baseddistributed query processor. The evaluation of queries issupported by this processor. The OGSA-DQP effects overseveral layers of service-oriented infrastructure. Experiencesin investigating the impact of infrastructure layers werediscussed in a study in [1]. In addition, this study presents anunderstanding of the performance issues, identify bottlenecks,and improve response times of queries. It also describes theexperiments carried out and presents the results gained [1].However, as illustrated in Figure 1 the processes in theOGSA-DAI which are in high-level schematicallyrepresentations are passing through several layers of interfacesbetween each layer. Therefore, it gives the fact of the timeoverhead performance through using the OGSA-DAI high-level schematically representation to communicate andretrieve the database [6].
Figure 1. OGSA-DAI architecture and flow processes
III.
 
2B
F
RAGMENTATION FRAMEWORK
 The derived fragmentation is a fragmentation wheredatabases are fragmented according to a specific attribute.Since that, a catalogue table is a compulsion in the main serverto keep on mind knowledge for all fragmented databases.Catalogue database is a database that contains theinformation of the distributed database. It contains the site, thename of the database and the attributes in where the databasewas fragmented.We will apply a basic student database which consists of student’s information such as name, id and year of eachstudent. Table I. shows the catalogue table which will beconducted in the research implantation.
18http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
(IJCSIS) International Journal of Computer Science and Information Security,Vol.
8
 , No.
9
 , 2010
TABLE I. C
ATALOGUE
D
ATABASE
 
TableNameFAbase Dbname Serverid FAconstraintsStudent Year DB1 S1 1student year DB2 S2 2Table I. above consists of 5 main attributes that displays themain information about the fragmented relational or tables inthe distributed system:1.
 
Table Name: contains the name of the fragmented tables.2.
 
FAbase: contains the attribute where the tables wherefragmented according on.3.
 
Dbname: contains the name of the DB in the distributedservers where it handles the fragmented table.4.
 
Serverid: contains the server ID to refer to the server’sIP for the distributed DB.5.
 
FAconstraints: contains the value of the fragmentationbase attribute for each table.IV.
 
3B
M
AIN
F
RAMEWORK FOR
E
NGINE
S
EARCHING
 In distributed systems, database retrieval process isachieved on two main ways. The first way is to directly accessthe server that contains the required data. The secondtechnique is to distribute the searching query into thedistributed servers. In this research, we will be calling thesetwo techniques direct search and distributed searchrespectively.The system will take the decision of choosing thesearching method after analyzing and understanding the SQLstatement. The decision will be based on the existing of thefragmentation attribute via the query, in such case, system willchoose the direct search and. The existing of the fragmentationattribute in the query means that the system can get directlythe data from the distributed server(s) by getting the site of that server from the catalogue database, by referring to the
 FAbase
in that table. In this case, the performance will behigher since it reduces lots of time if it was using thedistributed search.From Figure 2. we can see the main architecture of theresearch framework. The processes can be categorized intofive main layers:
User interface, SQL Processing System Management Connections (JDBC) and Servers pool
.
Figure 2. Main framework and architectureFigure 3. Flowchart for database retreiving based on catalogue database
 A.
 
7B
SQL Processing Search
In SQL processing step, several procedures will be done insidethe engine to process the SQL, which are:1.
 
Getting the SQL statement from the web-server 
:
inthe interface, the user will write the SQL statement forretrieving certain data as a string.2.
 
Select checking:
(validation) does it start with“SELECT”, by using searching method for this word(select), if yes, continue to next step, else, give an errormessage and back to the main page .3.
 
Table Name Searching:
After getting the SQL, a tablesearching method will be called. Its main job is tosearch for the table's name inside the SQL statement(string).4.
 
Checking Name:
checks if the name was mentioned(the SQL statement is correct so far), if yes, save thetable name and continue to next step, else, give an errormessage and back to the main page.5.
 
 Retrieving Fragmentation Attribute (FA):
Get the FAstyle for that table from the catalogue database which issaved in the main server.6.
 
FA Searching:
Another searching method will be calledto search for the FA style in the SQL statement.7.
 
Found:
If the FA was found in the SQL statement, thesystem will choose the
direct search
; else it will choosethe
distributed search
.
19http://sites.google.com/site/ijcsis/ISSN 1947-5500

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->