Professional Documents
Culture Documents
704
Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:16:13 UTC from IEEE Xplore. Restrictions apply.
original data structure type. Index pointer stored in Microprocessor (CPU & Cache): Direct input-output
index [i]. i is corresponding to the original array (I/O)
element. h[i].key is keyword. i=0, 1, ……, N-1. System bus: all microprocessors connect and
communicate with each other through system
void index BIS (element *h, int *index) bus.
{ int i, j, low, high, mid, m ; Memory (Mem): Memories are composed of multiple
For (i=1;i<N;i++) memory modules. They are shown in figure 1.
{ These modules and nodes distribute on two
m=index[i]; // index[i] temporarily kept in m
low= 1;
sides of system bus symmetrically.
high= i-1;
while (low<=high) // look for insertion position 3.2. Parallel algorithm
between h[index[low]] and h[index[high]
{ Based on index sort algorithm, an improved parallel
mid= (low+high) /2; Merge-Sort algorithm is proposed in this paper.
if (strcmp (h[m].key, h[index[mid]].key)>=0) Suppose that the data record of dataset is A [N]. A [N]
low= mid+1; // insertion point is in high half zone need to be sorted. The number of elements in dataset is
else
high= mid-1; // insertion is in low half zone
N. First, the records can be regarded as N ordered
} subsequences. The length of every subsequence is 1. By
for (j=i-1;j>=low;j--) merging every two sub-sequence, we can get ┏N/2┑
index [j+1]=index[j]; //index pointer moves ordered sub-sequence. Their length respectively is 2 or
backwards 1. We merge repeatedly every two sub-sequence in the
index [high+1]=m; // change index pointer same way until we get an ordered sequence in length of
} N. Every processor in parallel system can independently
}
merge two sub-sequences. Then compared with single
processor, waiting time can be reduced or avoided.
After the program executes, we can get ordered
While every processor merges two sub-sequences, the
dataset according to serial number of index pointer.
exchanging of index pointer takes place of data record
exchange. Based on this idea, the algorithm’s abstract
3. A cooperative Merge-Sort algorithm description is as follows:
1) Given an array A [N], the first step is to merge
3.1. Parallel computer and cooperative ┏N/2┑every two sub-sequence by┏N/2┑
computing microprocessors. If time spent in merging two sub-
sequences is T, we can save┏N/2┑-1(T) in contrast
Cooperative computing means that in a parallel with single microprocessor,.
computer, an application is decomposed into multiple 2) We can get ┏N/2┑sub-sequences from 1). Then
subtasks which are distributed to different processors. merge two sub-sequences repeatedly.
The processors cooperatively carry out subtasks in ┏N/4┑processors are used to do parallel
concurrent. Thereby, it will take less time to solve the computing. If the time spent in merging two sub-
problem. sequences is T1, comparing with single
Parallel computer means that two or more
microprocessor, it will save ┏N/4┑-1 (T1).
processors are connected by interconnection networks.
These processors communicate with each other. 3) In the same way, ┏N/4┑, ┏N/8┑, .....,
SMP parallel computer system structure [8] is ┏N /N/8┑, we adopt the method of 1) and 2) to deal
shown in figure 1. with the ┏N/4┑, ┏N/8┑, ....., ┏N /N/8┑
sub-sequences.
4) Finally, there are two ordered sub-sequences left. We
use a processor to merge the two sub-sequences into
an ordered sequence. Ultimately, we get an ordered
sequence in length of N.
In general, it is impossible to use microprocessors
enough to merge sort. Because we don’t know the
number of elements in sequence, the hardware cost is
too high. If we make use of several microprocessors in
the parallel computing, the executing efficiency of the
parallel algorithm is remarkable. According to it, we
Fig.1. Typical SMP system architecture give the architecture of Merge-Sort algorithm based on
indexing. It is shown as the Figs 2 (P stands for
In which: processor while T stands for sub-sequences):
705
Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:16:13 UTC from IEEE Xplore. Restrictions apply.
Sort, insertion and Merge-Sort algorithms to do these
experiments.
706
Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:16:13 UTC from IEEE Xplore. Restrictions apply.
Fig.5. Insertion and index-insertion sort Fig.7. Time difference between merging and index-merge
Table.2. Executing CPU time difference in every algorithm Analyzing the figure 6, figure 7, and figure 8, we easily
know:
1) When the records of data set get more evidently, we
can see the continuously increasing time difference in
curve.
2) From figure 6 to figure 8, we can know that the
merging algorithm based on index table has better
efficient than original merging algorithm.
Table 2 lists the case of time difference between 4.3. Index-based Merge-Sort algorithm under
original sorting algorithm and sorting algorithm based the parallel computing
on index table. Using string diagram to describe table 2,
we get figure 6, figure 7 and figure 8. These are more In experiments of parallel computing, two or more
intuitive images. processors are needed. Because of limitation of
laboratory conditions, we apply two processors to do
parallel computing experiments. It must be pointed out
that, in theory, processors enough can produce the
maximum executing efficiency [7] of merging algorithm
based on parallel computing. For instance, if we have
50 microprocessors, the length of sorting sequence is
100 will be the best choice in this experiment.
In order to measure executing CPU time, the dataset
will be operated 10000 times in every sort algorithm.
Executing CPU time in every algorithm is shown in the
table 3.
707
Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:16:13 UTC from IEEE Xplore. Restrictions apply.
The length of dataset is N1 (N1=100). The records of
the data set are less than N2. The time cost of each
algorithm is shown in table 3. The results of these
experiments are described in figure 9. This is histogram.
(Note: two microprocessors are used in parallel Merge-
Sort algorithm)
Acknowledgement
This work was supported by Guangdong Provincial
Natural Science Foundation (Grant No. 06021484),
Guangdong Provincial science and technology project
(Grant No.2005B10101077) and Yuexiu Zone
Guangzhou city science & technology project (Grant
No. 2007-GX-075).
708
Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:16:13 UTC from IEEE Xplore. Restrictions apply.
Digital Images”, 2008 International Symposiums on
References Information Processing, Chengdu, China, Jul 2008,
pp.389-393.
[5] W. Wang, Y. Qiou, “A new algorithm for parallel
[1] Y. Li, D. Wang, G. Wang, et al, “Block Sorting Index-
Mergesorting”, Computer Engineering and Applications,
based Techniques for Local Alignment Searches on
2005, 41(5), pp.71-72, 81.
Biological Sequences”, computer science, 2005, 32(12),
[6] C. Zhong, G. Chen, “An Optimal Parallel Sorting
pp.159-163, 205.
Algorithm for Multisets”, Journal of Computer Research
[2] N. Liu, Z. Tong, “Analysis of improving the C language
and Development, 2003, 40(2), 336-341.
quick sort algorithm”, Applications of the Computer
[7] W. Yan, W. Wu, “Data Structure (C language)”, Tsinghua
System, 2008, 1, pp.113-116.
University Press, April 1997.
[3] C. Lan, “Optimizing of Bubble Sort Algorithm”,
[8] L. Zhang, “Parallel computing introduction”, Tsinghua
Ordnance Industry Automaton, 2006, 25(12), pp.50-52.
University Press, Nov. 2006.
[4] J. Zhang, T. Ke, M. Sun, “The Parallel Computing Based
on Cluster Computer in the Processing of Mass Aerial
709
Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:16:13 UTC from IEEE Xplore. Restrictions apply.