Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
2Activity
0 of .
Results for:
No results containing your search query
P. 1
Performance Prediction of Single Static String Algorithms on Cluster Configurations

Performance Prediction of Single Static String Algorithms on Cluster Configurations

Ratings: (0)|Views: 37 |Likes:
Published by ijcsis
This paper study various factors in the performance of static single pattern searching algorithms and predict the searching time for the operation to be performed in a cluster computing environment. Master-worker model of parallel computation and communication is designed for searching algorithms with MPI technologies. Prediction and analysis is based on KMP, BM, BMH, ZT, QS, BR, FS, SSABS, TVBS, ZTBMH and BRFS algorithms. Performances have compared and discussed the results. Theoretical and practical results have presented. This work consists of implementation of the same algorithms in two different cluster architecture environments
Dhakshina-I and Dhakshina-II. This has improved the reliability of the prediction method. The result of algorisms helps us to predict and implement this technology in the areas of text search, DNA search and Web related fields in a cost effective manner.
This paper study various factors in the performance of static single pattern searching algorithms and predict the searching time for the operation to be performed in a cluster computing environment. Master-worker model of parallel computation and communication is designed for searching algorithms with MPI technologies. Prediction and analysis is based on KMP, BM, BMH, ZT, QS, BR, FS, SSABS, TVBS, ZTBMH and BRFS algorithms. Performances have compared and discussed the results. Theoretical and practical results have presented. This work consists of implementation of the same algorithms in two different cluster architecture environments
Dhakshina-I and Dhakshina-II. This has improved the reliability of the prediction method. The result of algorisms helps us to predict and implement this technology in the areas of text search, DNA search and Web related fields in a cost effective manner.

More info:

Published by: ijcsis on Jul 07, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

08/10/2011

pdf

text

original

 
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 6, 2011
Performance Prediction of Single Static StringAlgorithms on Cluster Configurations
Prasad J. C.
Research Scholar, Dept. of CSE,Dr.MG.R University, Chennaicum Asst.Professor, Dept of CSE, FISAT,Angamaly, Indiacheeeran.prasad@gmail.com
K. S. M. Panicker 
Professor, Dept of CSEFederal Institute of Science and Technology[FISAT]Angamaly, India
 Abstract
—This paper study various factors in the performance of static single pattern searching algorithms and predict thesearching time for the operation to be performed in a clustercomputing environment. Master-worker model of parallelcomputation and communication is designed for searchingalgorithms with MPI technologies. Prediction and analysis isbased on KMP, BM, BMH, ZT, QS, BR, FS, SSABS, TVBS,ZTBMH and BRFS algorithms. Performances have comparedand discussed the results. Theoretical and practical results havepresented. This work consists of implementation of the samealgorithms in two different cluster architecture environmentsDhakshina-I and Dhakshina-II. This has improved the reliabilityof the prediction method. The result of algorisms helps us topredict and implement this technology in the areas of text search,DNA search and Web related fields in a cost effective manner.
 
Keywords-
 
Static String Searching, Beowulf Cluster, Parallel  programming
 
I.
 
I
 NTRODUCTION
The importance of pattern searching and its prediction ismore relevant with the latest advancements in DNAsequencing, web search engines, database operations, signal processing, error detection, speech and pattern recognitionareas, which require pattern searching problem to processTerabytes amount of data. Most of the researchers usuallyfocus on achieving high throughput with expensive resourcesof software and hardware. The cluster architecture consideredhere is based on Beowulf architecture [1] with open sourcetechnologies with 1 Gbps speed of networks. During the benchmark test[2] of HP Linpack system, the system workedwith a speed of 80 Gigaflops with 32 nodes. Therefore, thisresearch paper focuses on providing a high-speed but low-coststring matching implementation details with limited resources.The implementation is based on Message PassingImplementation (MPI) standard [3,4,5] based parallel programming technology.II.
 
MASTER 
 
WORKER 
 
MODEL
 
STATIC
 
PATTERN
 
SEARCHING
 
ENVIRONMENTA single string pattern matching problem can be defined asfollows. Let T be a large text of n number of character size andP be a pattern of length m. Text T =T
[1]
T
[2]
………..T
[n]
andPattern P=P
[1]
P
[2]
…….P
[m]
. The characters of both T and P belong to a finite set of elements of the set S andm<<n. Search processes identify all the occurrence of the pattern P in text T. Two types of input data have considered(natural language input string and DNA sequence string) for the evaluation of algorithms. The actual task of searching isdone parallel among the processors from 0 to p-1.Master node decompose the text in to r subtexts anddistributed to available workers(nodes)
[6]
. Each subtextcontains
 
= [(n-m+
1
) / r] + m-
1 characters, where
is thesuccessive characters of the complete text. There is an overlapof m-1 successive characters between successive sub texts. Sothere will be a redundancy of 
r(m-1)
characters for processing.The objective is to compare the result of searching withdifferent algorithms. So redundancy of searching does nothave relevance in the system
[7]
.
III
.
 
ELATED
W
ORKS
 Smith
[8]
compared the theoretical running time and the practical running time of the Knuth-Morris-Pratt [KMP]algorithm, Boyer-Moore [BM] algorithm and Brute-Force[BF]algorithm. Hume and Sunday
[9]
constricted taxonomy andexplored the performance of most existing versions of thesingle keyword Boyer-Moore pattern matching algorithm. Thevarious experiments selected efficient versions for use in practical applications. Pirklbauer 
[10]
compares several versionsof the Knuth-Morris-Pratt algorithm, several versions of BMalgorithm
[11]
and Brute-Force algorithm. Since Pirklbauer didnot classified the algorithms, it is difficult to compare themand the testing of the Boyer-Moore variants is not quite asextensive as the Hume and Sunday taxonomy.A comparative study on the performance of today’s most prominent retrieval models is presented below to identify the parameters involved in searching. Considered two Beowulf cluster configurations
[12]
in which, second configuration is thesoftware upgradation with better speed performance and better resource management in hardware systems. Secondconfiguration utilizes heterogeneous operating systems of customized Debian5 of Linux with BSD as Firewall systems.Sungrid engine acts as a master node in the secondconfiguration. Beowulf architecture is used with both
This work is sponsored by Centre for High Performance Computing, FISAT
300http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 6, 2011
configurations.This work compares the results of searching with sequentialand two Beowulf cluster configuration (
 Dakshina I 
&
 Dakshina II 
) implementations to identify the parametersinvolved in searching. To get a reliable and consistent performance result, the average of ten executions for different pattern of constant length is given in the table as result values.
T
ABLE
1
:
S
EQUENTIAL
E
XECUTION
T
IME IN
S
ECOND FOR A FILE SIZE
12
 
MB
 
(
 N
=12661224)
 
m 5 10 15 20 25 50KMP
1.47 1.414 1.401 1.381 1.372 1.394
BM
0.783 0.754 0.703 0.644 0.536 0.257
BMH
0.786 0.746 0.712 0.664 0.576 0.294
ZT
1.296 1.243 1.156 1.074 1.007 0.989
QS
0.523 0.48 0.452 0.435 0.413 0.244
BR
0.673 0.664 0.631 0.602 0.574 0.307
FS
0.523 0.512 0.481 0.463 0.433 0.283
SSABS
0.796 0.723 0.694 0.662 0.614 0.507
TVSBS
0.667 0.654 0.642 0.605 0.576 0.521
ZTMBH
0.963 0.921 0.809 0.653 0.462 0.186
BRBMH
0.482 0.456 0.423 0.397 0.368 0.214T
ABLE
2:
 
P
ARALLEL
E
XECUTION
T
IME IN
S
ECONDS FOR A FILE SIZE
12
 
MB
 WITH
5
NODES IN
B
EOWULF
C
LUSTER 
D
AKSHINA
-1
m 5 10 15 20 25 50KMP
0.531 0.51 0.475 0.464 0.45 0.444
BM
0.284 0.272 0.254 0.232 0.194 0.093
BMH
0.284 0.269 0.257 0.24 0.208 0.106
ZT
0.469 0.449 0.417 0.388 0.364 0.357
QS
0.189 0.175 0.163 0.158 0.15 0.089
BR
0.244 0.24 0.228 0.217 0.208 0.112
FS
0.189 0.183 0.174 0.167 0.156 0.103
SSABS
0.287 0.261 0.251 0.239 0.222 0.183
TVSBS
0.241 0.236 0.232 0.218 0.208 0.188
ZTMBH
0.348 0.322 0.292 0.236 0.167 0.067
BRBMH
0.174 0.166 0.154 0.144 0.134 0.078T
ABLE
3:
 
P
ARALLEL
E
XECUTION
T
IME IN
S
ECOND FOR A FILE SIZE
12
 
MB
 WITH
10
NODES
B
EOWULF
C
LUSTER 
D
AKSHINA
-I
m 5 10 15 20 25 50KMP
0.411 0.395 0.376 0.368 0.361 0.355
BM
0.219 0.212 0.199 0.18 0.15 0.074
BMH
0.221 0.21 0.199 0.187 0.167 0.082
ZT
0.398 0.369 0.351 0.345 0.334 0.327
QS
0.146 0.135 0.128 0.123 0.117 0.069
BR
0.189 0.187 0.178 0.169 0.16 0.085
FS
0.146 0.143 0.134 0.131 0.122 0.08
SSABS
0.223 0.203 0.194 0.187 0.172 0.144
TVSBS
0.187 0.183 0.182 0.169 0.162 0.146
ZTMBH
0.27 0.258 0.228 0.183 0.129 0.053
BRBMH
0.136 0.129 0.119 0.11 0.104 0.061T
ABLE
4:
 
P
ARALLEL
E
XECUTION
T
IME IN
S
ECONDS FOR A FILE SIZE
12
 
MB
 WITH
5
NODES IN
B
EOWULF
C
LUSTER 
D
AKSHINA
-II
m 5 10 15 20 25 50KMP
0.304 0.289 0.278 0.264 0.261 0.257
BM
0.161 0.156 0.144 0.132 0.107 0.054
BMH
0.162 0.153 0.145 0.138 0.122 0.061
ZT
0.292 0.268 0.254 0.253 0.247 0.236
QS
0.106 0.097 0.093 0.09 0.084 0.052
BR
0.138 0.135 0.131 0.124 0.118 0.061
FS
0.106 0.103 0.097 0.096 0.088 0.057
SSABS
0.161 0.149 0.142 0.136 0.124 0.104
TVSBS
0.137 0.134 0.132 0.122 0.117 0.105
ZTMBH
0.196 0.189 0.165 0.132 0.093 0.037
BRBMH
0.098 0.093 0.086 0.081 0.075 0.043T
ABLE
5:
 
P
ARALLEL
E
XECUTION
T
IME IN
S
ECOND FOR A FILE SIZE
12
 
MB
 WITH
10
NODES
B
EOWULF
C
LUSTER 
D
AKSHINA
-II
m
5 10 15 20 25 50
KMP
0.231 0.22 0.212 0.207 0.204 0.194
BM
0.128 0.117 0.114 0.109 0.086 0.045
BMH
0.126 0.114 0.111 0.106 0.093 0.049
ZT
0.225 0.207 0.195 0.195 0.183 0.182
QS
0.081 0.075 0.071 0.068 0.065 0.038
BR
0.105 0.104 0.099 0.093 0.089 0.047
FS
0.081 0.079 0.074 0.072 0.069 0.044
SSABS
0.124 0.113 0.108 0.103 0.086 0.082
TVSBS
0.104 0.101 0.101 0.094 0.09 0.083
ZTMBH
0.152 0.145 0.129 0.103 0.072 0.029
BRBMH
0.075 0.072 0.068 0.061 0.057 0.033
IV. FACTORS AFFECTING THE SEARCHING PROCESSThere are both hardware and software factors
[13]
whichaffect the searching process results. These factors can beclassified in to Algorithmic factors, Architectural factors,network factors and I/O factors.Only one user login in the cluster acts with the interconnectionnetwork during the searching process. Timing function usedwas
getrusage
system call to find the running time of thealgorithms. Data are accessed from main memory before thetiming function begins. In string matching algorithm, length of the searched pattern (m), the length of the text being searched(n) and the size of the alphabet (o) are the direct parametersaffecting the performance
[14]
of searching operations.In cluster computing environment, with large number of computer nodes, the result doesn’t become attractive becauseof the network traffic in the interconnection network. Thespeedup of a program using multiple processors in parallelcomputing is limited by the time needed for the sequentialfraction of the program as per Amdajl’s law
[15]
. Amdahl's lawstates that if 
P
is the percentage (proportion) of the programthat can be made parallel (i.e. benefit from parallelization),
301http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 6, 2011
and (1
 
P
) is the proportion (percentage) that cannot be parallelized (remains serial), then the maximum speedup thatcan be achieved by using
 N 
processors is S = N/(N-NP+P).Reverse factor (Against Amdajl’s law) says
[16, 17]
, the problemsize scales with the number of processors. When more powerful processors are given, the problem generally expandsto make use of the increased facilities. Users have control over such things as grid resolution, number of time steps, differenceoperator complexity, and other parameters that are usuallyadjusted to allow the program to be run in some desiredamount of time. Hence, it may be most realistic to assume thatrun time, not problem size, is constant.Resource manager used with Dakshina-I is Torque where asthe Dakshina-II configuration used is Sun Grid Engine. SGEneed the "tight integration whereas The TORQUE
[18]
ResourceManager is a distributed resource manager providing controlover batch jobs and distributed compute nodes. The ParallelVirtual Machine (PVM)
[19]
is a software tool for parallelnetworking of heterogeneous computers in the Dakshina-I.Dakshina-II, uses Sun Grid Engine which perform scalable parallel job startup with
qrsh
as a replacement for 
rsh
or 
ssh
 for starting the remote tasks. It is possible to start binaries with
qrsh
. If a special command line option
-inherit 
is specified,
qrsh
will start a subtask in an existing Grid engine parallel job.Size of RAM of Dakshina-I is 256 MB whereas the same inDakshina-II is 1.23 GB. RAM memory does not have the power of making the computer processor work faster. In our work nodes, it takes the CPU approximately 200ns(nanoseconds) to access RAM compared to 12,000,000ns toaccess the hard drive
[18]
. When there is more RAM memory inthe computer, the probability of “running out” of RAMmemory and having the necessity to make a change with thehard disk swap file is smaller and, thus the computer  performance increases.MPI used with Dakshina-I is LAM MPI whereas the same inDakshina-II is Open MPI. LAM MPI is with the standard of MPI-1.2 and much of MPI-2, Open MPI is with the standardof MPI-2
[20]
. Open MPI is therefore able to combine theexpertise, technologies, and resources from all across the HighPerformance Computing community in order to build the bestMPI library available. Open MPI offers advantages for systemand software vendors, application developers and computer science researchers. File system used with Dakshina-I is PVFSwhere as the same in Dakshina-II is NFS. PVFS consists of aserver process and a client library, both of which are writtenentirely of user-level code. A Linux kernel module and
 pvfs-client 
process allow the file system to be mounted and usedwith standard utilities. Network File System(NFS) originallydeveloped by Sun, allowing a user on a client computer toaccess files over a network in a manner similar to how localstorage is accessed. NFS, builds on the Open Network Computing Remote Procedure Call (ONC RPC) system.Authentication system used with configuration1 is NIS whereas the same in configuration-2 is LDAP. NIS do not provide adistributed administration capability nor hierarchical dataadministration. LDAP organizes data into a hierarchy, andallows distributed administration
[21]
.Linux kernel used with configuration 1 is customised Debianof 2.6.18 whereas for configuration 2 is of the version 2.6.26.During HPL benchmarking
[2]
, the speed of Dakshina-II hasrecorded 9600 cores floating operations in a second where asDakshina-I performed 7000 cores floating point operations per second.V. SEARCHING TIME PREDICTIONBased on the above mentioned factors, a theoretical predictionmethod for string searching process can be formed. Thisformula will help to decide the optimum number of processorsto be used for searching process in cluster computingenvironment, because of computing-communication ratio limitof Amdahl’s law.The string matching problem can achieve data parallelism withthe following simple data partitioning technique: The textstring is decomposed into a number of subtexts according tothe number of processors allocated. These subtexts are storedin local disks of the processors. According this partitioningapproach the static master-worker model is followed: First, themaster distributes the pattern string to all the workers. In thesecond step, each worker reads its subtext from the local disk in main memory and search using string matching algorithms.Finally, in the third step, the workers send the number of occurrences back to master.Let T
1
be the time required for the initial setup
[22]
of themaster node. Master node read the text of length n and search pattern of length m. Then master node divide the text intosubparts based on the availability of work nodes for  broadcasting. Thus reading of the text requires n accesses. Lett
avg
 be the average time to perform one I/O step. However open MPI does not read all the values from these files duringstartup always, but when division of text become arequirement it reads the text and then send them to all nodes inthe job for Dhakshina II, but Dhakshina I used LAM MPIwhich reads both text and pattern initially. In the case of  pattern, master node does not waste time for readingoperation. The files are read on each node during each process'startup. This is intended behavior: It allows for per-nodecustomization, which is especially relevant in heterogeneousenvironments. Thus for configuration -I is,T
1
= t
avg
 
* (
n + m) +
Є
1
Equation (5.1).and for configuration II, T
1
= t
avg
 
*
n +
Є
1
Equation (5.2).T
2
be the time required for communication from master nodeto slave nodes. The cluster nodes in the Configuration I and IIare interconnected using the Gigabit Ethernet LAN, Realtek 8169 Gigabit Network Card on each compute nodes and HPProCurve 1400-24G(39078A) Switch X 2. Performance of switch is less than 4.7
μ
s (64-byte packets) and less than 3.0
μ
s (64-byte packets)
[18]
. The switch has a maximum through put of 35.7 million packets per second (pps). The data transfer takes place at the speed of around 990 Megabit per secondamong the servers and around 665 Mb per second among the
302http://sites.google.com/site/ijcsis/ISSN 1947-5500

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->