Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
6Activity
0 of .
Results for:
No results containing your search query
P. 1
An Efficient Approach of Fast External Sorting Algorithm in Data Warehouse

An Efficient Approach of Fast External Sorting Algorithm in Data Warehouse

Ratings: (0)|Views: 272 |Likes:
Journal of Computing, ISSN 2151-9617, http://www.journalofcomputing.org
Journal of Computing, ISSN 2151-9617, http://www.journalofcomputing.org

More info:

Published by: Journal of Computing on Sep 09, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

04/03/2013

pdf

text

original

 
An Efficient Approach of Fast External SortingAlgorithm in Data Warehouse
Abhishek Purohit, Naveen Hemrajani,Savita Shiwani,Ruchi Dave
 
Abstract
 — Sorting of bulk data in warehouse is possible through external sorting and the effective performance of the externalsorting is analyzed in terms of both time and I/O complexities.The proposed method is a hybrid technique that uses quick sortand In place merging in two distinct phases. Both the time and I/O complexities of the proposed algorithm are analyzed here.The proposed algorithm uses special in-place merging technique, which creates no extra backup file for manipulating hugerecords. For this, the algorithm saves huge disk space, which is needed to hold the large file. This also reduces time complexityand makes the algorithm faster.
Index Terms
 — External sorting,In place merging, merge and quick sort,algorithm,time and space complexity.
——————————
 
 
——————————
1 I
NTRODUCTION
he memories of current computers have been increas-ing rapidly, there still exists a need for external sort-ing for large databases sorting has continued to becounted for roughly one-fourth of all computer cycles.The problem of how to sort data efficiently has beenwidely discussed. the main concern with external sortingis to minimize disk access since reading a disk block takesabout a million times longer than accessing an item inRAM. The most common external sorting algorithm stilluses the merge sort as described by Knuth [1]. The num-ber of I/Os is a more appropriate measure in the perfor-mance of the external sorting and the other external prob-lems, because the I/O speed is much slower than the CPUspeed. The most common external sorting used is still themerge sort . In two-way merge sort, a file is divided into 2sub files. The records of the two sub files are written totwo auxiliary files whereby by pair wise comparison thesmaller records are always written first, thus producingsorted runs of two records each. During the second pass,the two runs from the output files are compared; therebyproducing new runs of four records each, which are in thesorted sequence. This process continues until the entirefile is sorted. This routine makes use of temporary diskfiles. Dufrene and Lin[2] proposed an algorithm in whichno other external file is needed; only the original file (fileto be sorted) is used. M.N. Adnan et al. proposed a hybridexternal sorting algorithm with no additional disk space.Another similar algorithm is proposed by M. R. Islam [3] .In all of these three algorithms the authors gave attentionto the time complexities but they did not give attention tothe I/O complexities. In this paper we study the I/Ocomplexities of these algorithms. For this, in the next sec-tion we review the external sorting algorithms with noadditional disk space. text. For two addresses, use twocentered tabs, and so on. For three authors, you may haveto improvise.
2 R
ELATED
W
ORK
 
In this section I will review three external sorting algo-rithms with no additional disk space. The proposed algo-rithm is based on the algorithms proposed by Dufreneand Lin [2] and M. R. Islam [3]. Among these algorithms,the overall performance of M. R. Islam et al. [3] is better.So, we have reviewed M. R. Islam et al. [3] algorithm inthe next subsection.
2.1 An efficient external sorting algorithm
This algorithm proposed by Dufrene and Lin is essential-ly a generalization of the internal Bubble Sort, where theindividual record in the internal sort is replaced by blockof records in the external At the first iteration Block_1 andBlock-N are read into the lower half and upper half ofmemory array respectively. These two blocks are thensorted using Quick Sort. The records of the lower half areretained in the memory array, which contains the lowestsorted records of Block_1 and Block-N and the records ofupper half of memory array are returned to Block-N areaof external file. Now Block-N-1 comes into the upper halfand the process continues.
2.2 A faster external sorting Algorithm to sort bulkdata
This algorithm proposed by M.N.Adnan is also the gene-ralization of internal Bubble Sort. The algorithm works intwo phases. In the first phase, this algorithm works as thealgorithm proposed by Dufrene and Lin which was re
————————————————
 
 
 Abhishek Purohit is M.Tech Software Engineering Scholar at Suresh GyanVihar University Jaipur,India.
 
Naveen Hemrajani is vice principal in Suresh Gyan Vihar University Jaipur,India.
 
 
Savita Shiwani is Assistant Professor at Department of computer science &Engineering in SGVU,Jaipur,India
 
 
Ruchi Dave is Assistant Professor at Department of computer science &Engineering in SGVU,Jaipur,India
 
T
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 8, AUGUST 2011, ISSN 2151-9617HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/WW.JOURNALOFCOMPUTING.ORG84© 2011 Journal of Computing Press, NY, USA, ISSN 2151-9617
 
 viewed in the previous section. After this phase, we getthe external file simultaneously in the position of Block-N-1 in the external file until the block is full. So, half ofthe records in the memory array will be sorted by merg-ing and written in the position of Block-N-1 in the exter-nal file. The remaining records in the lower half (if any)are copied into the upper half of memory array. Now theupper half of memory array contains the highest recordsof Block-N and Block-N-1. Then, again Merge Sort is ap-plied to sort the records in the upper half of memory ar-ray. The additional space required for Merge Sort is thelower half of memory array. After this, Block-N-2 Thenext iteration starts with Block-N-2 and Block-N-1 to beread into the lower half and upper half of memory arrayrespectively. At the end of this iteration, upper half isread into lower half of memory array. The Merging andMerge Sort terminates when Block-B is read into the low-er half of memory array and processed accordingly. Afterthis iteration the upper half of the memory array containsthe highest sorted records and they are written in the po-sition of Block-N in the external file.
2.3 In Place Merging Algorithm
The algorithm works in two phases. In the firstphase, the algorithm works as the algorithm proposed byDufrene and Lin that is, Block_1 and Block_ S are readinto lower half and upper half of the memory array, re-spectively,and they are sorted using Quick sort. Thisphase terminates when Block_ 2 is read into the upperhalf of the memory array and sorted with the remainingrecords in the lower half of the memory array. Thus weget sorted runs. After this phase, the lower half of thememory array contains the lowest sorted records of theentire file.Then, the algorithm switches to its secondphase, whereby the sorting process continues consideringthe following two cases:Case 1: Here the required blocks are read and ifthe last record of the lower half of the memory array issmaller than the first record of the upper half of thememory array, then it is not required to sort the recordsof the memory array and then the next block will be readfor further approach.Case 2: This is the general case. The in-placemerging technique is used here. In the second phase,Block_ S
−1 and Block_ S are read into the lower and u
p-per halves of the memory array respectively. For Case 1the blocks are not required to write back in the externalfile. In Case 2, after applying the in-place merging, theupper half of the memory array contains the highest or-dered records of Block_ S and
Block_ S −1 and the lower half is sent back to its
corresponding position in the external file (Fig. 6). After
this, Block_ S − 2 is read into the lower half of the mem
o-ry array and checked for the conditions specified in Case1 or Case 2. In this way, when Block_ 2 has beenprocessed, the upper half of the memory array containsthe highest sorted records of the entire file and they arewritten in the position of Block_ S in the external file forCase 2. The next iteration starts with Block_ S
− 2 andBlock_ S −1 to be read into the lower and upper halves of
the memory array respectively. At the end of this itera-tion, the upper half of the memory array contains thehighest sorted records among the blocks i.e. Block_ 2 ,Block_ 3 , . . . , Block_ 
S −1 and they are written in the p
o-
sition of Block_ S −1 in the external file for Case 2. After
each pass, the size of the external file is decreased by oneblock. The last two blocks to be processed are Block_ 2and Block_ 3 , which upon completion, the entire file issorted.
2.3.1 Algorithm: An external sorting algorithmusing in-place merging no additional disk space
1.
 
Declare the blocks in external file to behalf of memory array. Let the blocks be
Block_1, Block_2, …,Block_ S −1 , Block_ 
S2.
 
If there is only one block in the externalfile then quicksort the entire memory ar-ray3.
 
Read Block_1 into the lower half ofmemory array. Set T = S //Begins firstphase4.
 
Read Block_T into upper half of memoryarray5.
 
Sort the entire memory array usingquicksort6.
 
Write upper half of memory array toBlock_T area of external file7.
 
Decrement Block_ T by one block8.
 
Repeat from step 4 if Block_T is notequal to Block_19.
 
Write lower half of memory array toBlock_1 area of external file10.
 
Set P = S //Begins second phase11.
 
Read Block_ P into the upper half of
memory array and set Q = P −1
 12.
 
Read Block_Q into the lower half ofmemory array13.
 
If last element of the lower half is greaterthan first element of the upper half thensort (merge) the memory array using in-place merging and write lower half ofmemory array to Block_Q area of exter-nal file14.
 
Decrement Block_Q by one block15.
 
Repeat from step 12 if Block_Q ≠
Block_1
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 8, AUGUST 2011, ISSN 2151-9617HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/WW.JOURNALOFCOMPUTING.ORG85© 2011 Journal of Computing Press, NY, USA, ISSN 2151-9617
 
16.
 
Write upper half of memory array toBlock_ P area of the external file. Decre-ment Block_ P by one block17.
 
Repeat from step 11 if Bl
ock_ P ≠
Block_218.
 
//End of sorting procedure
3 P
ROPOSED
M
ODEL
3.1 Sorting of bulk data in data warehouse usingless time and space
The proposed algorithm works in several phases. In thefirst phase, the external file is divided into all equal sizeblocks, where the size of each block is approximatelyequal to the available main memory (RAM) of the com-puter. If the size of the available internal memory is Mthen the size of each block is M and if the size of externalfile is N then the number of block, S = N/M. Block_A isread into memory. Then the records of the main memoryare sorted using quick sort in figure 1(A) and again writ-ten to Block_A. The process continues until the last block,Block-N, has been processed. After this the proposed al-gorithm switches to its next phase.Each sorted block is divided into two sub-blocks, B_1 andB_2 in figure 1(B). The sub block B_1 of Block-A and subblock B_1 of Block-B are read into the lower and upperhalf of the main memory(RAM) as array respectively.Then the records of the lower half and the upper half ofthe main memory, which are individually sorted aremerged using In-place merging technique After sorting,the records of the upper half of the main memory arewritten to B_1 of Block-B and the records of the sub blockB_1 of Block-C are read into the upper half of the mainmemory. After Then the records of the main memory areagain merged using In-place merging technique and therecords of the upper half of the main memory are writtenin the position of B_1 of Block-C and read the records ofsub block B_1 of Block_D to the upper half of the mainmemory. Repeat this process until the records of subblock B_1 of Block-N are read into the upper half of themain memory and processed. Now the lower half of themain memory contains the lowest records in sorted formamong the records from Block-A to Block-N and is writ-ten in the position of B_1 of Block-A. Now sub block B_2of Block-N and sub block B_2 of Block-N-1 are read intothe upper and lower half of the memory array respective-ly.After Then the records of the lower half and upper half ofthe main memory(RAM) are merged using In-place merg-ing technique. After merging, the records of the lowerhalf of the main memory are written to B_2 of Block-N-1and the records of the sub block B_2 of Block-N-2 are readinto the lower half of the main memory(RAM). Then therecords of the main memory are again merged using In-place merging technique and the records of the lower halfof the main memory are written in the position of B_2 ofBlock-N-2 and read the records of sub block B_2 of Block-N-3 to the lower half of main memoryin figure 2(A). Re-peat this process until the records of the sub block B_2 ofBlock-A read into the lower half of the main memo-ry(RAM) and processed. Now the upper half of the mainmemory contains the records which are maximum sortedamong the records from Block-A to Block-N and is writ-ten in the position of B_2 of Block-N. At this point subblock B_1 of Block-A and sub block B_2 of Block-N con-tains the lowest and highest sorted records respectively.Now, read B_2 of Block-A and B_1 of Block-B in the mainmemory and after merging, write lower and upper half atthe position of B_2 of Block-A and B_1 of Block-B respec- 
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 8, AUGUST 2011, ISSN 2151-9617HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/WW.JOURNALOFCOMPUTING.ORG86© 2011 Journal of Computing Press, NY, USA, ISSN 2151-9617

Activity (6)

You've already reviewed this. Edit your review.
1 thousand reads
1 hundred reads
manuelq9 liked this
Imsai Arasan liked this
Ghaith_Makey liked this
vdenisov125 liked this

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->