Professional Documents
Culture Documents
by K. E. BATCHER
Goodyear Aerospace Corporation
Akron, Ohio
a1
and
A L c
1
a2 BH
A L c2
max(a1 ; an+1 ); max(a2 ; an+2 ); :::; max(an ; a2n ) (2)
b1 A L
BH c3 that each of these sequences is bitonic and no number
c
of (1) is greater than any number of (2).
b2 BH 4 This fact gives us the iterative rule illustrated in
Figure 6. A bitonic sorter for 2n numbers can be con-
a1 c
structed from n comparison elements and two bitonic
a2
A L 1 sorters for n numbers. The comparison elements form
BH A L A L c2 the sequences (1) and (2) and since each is bitonic they
a3 A L BH BH c3 are sorted by the two n-number bitonic sorters. Since
no number of (1) is greater than any number of (2) the
a4 BH
A L c4 output of one bitonic sorter is the lower half of the sort
BH c
and the output of the other is the upper half.
b1 A L 5 A bitonic sorter for 2 numbers is simply a compari-
b2 BH A L A L c6 son element and using the iterative rule bitonic sorters
for 2p numbers can be constructed for any p. Figure
b3 A L BH BH c7
7 shows bitonic sorters for 4 numbers and 8 numbers.
b4 BH c A 2p -number bitonic sorter requires p levels of 2p,1
8 elements each for a total of p:2p,1 elements. It can
act as a merging network for any two input lists whose
Figure 5- Construction of ‘‘2 by 2’’ and ‘‘4 by 4’’ total length equals 2p .
odd-even merging networks Large bitonic sorters can be constructed from a
number of smaller bitonic sorters; for instance, a 16-
number bitonic sorter can be constructed from eight
4-number bitonic sorters, as shown in Fig. 8. This
allows large networks to be built of standard modules
Readers may recognize the similarity between the topologies of the bitonic sort and the fast-fourier-transform.
310 Spring Joint Computer Conference, 1968
of convenient size. a1 A L A L c
1
a1 c1 a2 BH BH c2
A L
a2 BH c2 b1 c3
A L A L
a3 c3 c
A L . n-ITEM b2 BH BH
. 4
. BH .
. BITONIC .
. SORTER .
. A L a1 A L A L A L c
BH cn-2 1
an-2 a2 BH BH BH
c n-1 c2
a n-1
. cn a3 c3
an . A L A L A L
. cn+1
an+1 a4 BH BH BH c4
c n+2
a n+2 a5 A L c
c n+3 A L A L 5
A L
a n+3 n-ITEM a6 BH BH c6
BH BH
. . BITONIC .
. . SORTER . c7
A L . . a7 A L A L A L
.
a2n-2 BH c2n-2 a8 c
BH BH BH 8
a 2n-1 c 2n-1
A L
a 2n BH c 2n
Figure 7- Construction of bitonic sorters for 4 numbers and for
a1 , a , . . . , a 2n 8 numbers
2 IS BITONIC
c1 <_ c <_. . . <
_ c 2n
2
a1 c
Figure 6-Iterative rule for bitonic sorters 1 c
a5 a c 2
9 3 c
Sorting networks a 13 4
a2 c
5 c
a6 a
A sorter for arbitrary sequences can be constructed 10 c 6
7 c
from odd-even merges or bitonic sorters using the well- a
14 8
known sorting-by-merging scheme: The numbers are a3 c
combined two at a time to from ordered lists of length a a
9 c
two; these lists are merged two at a time to form or- 7 11 c 10
11 c
dered lists of length four, etc. until all numbers are a
15 12
merged into one ordered list. a4 c
a a 13 c
To sort 2p numbers using odd-even merges requires 8 12 c 14
2p,1 comparison elements followed by 2p,2 \2-by-2" a 15 c
16 16
merging networks followed by 2p,3 \4-by-4" merging
networks, etc,. etc. The longest path will go through
( 21 )p(p + 1) elements and the shortest path through p
Figure 8- A 16 number bitonic sorter constructed from eight
elements. The network requires (p2 , p + 4)2p,2 , 1 4-number bitonic sorters
comparison elements. A sorter of 1024 numbers will have 55 levels and
24,063 elements with odd-even merges or 28,160 el-
To sort 2p numbers using bitonic sorters requires ements with bitonic sorters. With a 40 nanosecond
( 12 )p(p + 1) levels each with 2p,1 elements for (p2 + propagation delay per level the total delay is 2.2 mi-
p)2p,2 elements. Each path goes through ( 12 )p(p + 1) croseconds. Serial transmission of the bits would re-
levels. quire about this much time between successive bits of
Sorting Networks and Their Applications 311
the numbers unless re-clocking occurs within the net- by their priority number. The ordered set of m-input
work. Parallel-input-parallel-output registers of 1024 items is merged with a set of n items, each containing
bits each can be placed between certain levels to per- a xed output address and a control bit equal to 0.
form this task or the re-clocking may be incorporated At the right side of the m by n merge the m+n items
within each comparison element with a pair of
ip- are in one ordered list; each address-inserter item will
ops on the outputs. The latter scheme does not add be directly below any input items with the same ad-
to the terminal count of the comparison element so dress. The adjacent word transfer network, looking at
the cost of the added
ip-
ops on the comparison el- the control bits, connects each address-inserter item to
ement chip is small. One can use any of the familiar the input item directly above it if one exists (the in-
techniques for driving shift registers such as the \A-B" put item with lowest priority number is picked in each
technique where successive levels are clocked out-of- case). The elements in the sort and the merge are bi-
phase with each other. With present circuit and wiring directional so two-way paths are formed from input to
techniques a bit rate of 10 megahertz may be possible output. The adjacent word transfer sends back sig-
with 50 nanosecond delay per level (2.75 microsecond nals over each path to signal each input and output
delay from input to output of a 1024-word sorter). line whether or not a connection has been established.
With re-clocking in the element and odd-even Data can then be transmitted over each of the con-
merges extra elements are needed to balance the nected input lines.
unequal-length paths. Bitonic sorters do not have N INPUT LINES
MERGING NETWORK
‘‘M BY N’’
Applications
INPUT ITEM
ADDRESS INSERTER
DESIRED OUTPUT 1 PRIORITY
M+N
The fast sorting capability of these networks allows
N OUTPUT LINES
CONTROL BIT
AND
memory" of multiprocessors. The self-sorting capabil-
OUTPUT REQUESTS
ity could be useful for keeping \task lists" up to date
and performing other housekeeping tasks.
Other uses may be as a message \store and for-
OUTPUT
OUTPUT WORDS
MSB LSB
a unique address which it continually interrogates; in-
put devices send their data to these addresses.
__ _ _ _ _ _ EMPTY
1 1 1 1 _ _ _ _ 1 1
1 ADDRESS 0_ _ _ _
_ _ _
0 0
OUTPUT positions which govern the ordering of the words. Ad-
dressing can then take place on any part of the words.
REQUEST
Figure 10 - A multi-access memory As long as the same eld positions are being searched
more than one search can be accommodated simulta-
for \output requests". Logic between adjacent words neously.
causes an output request to aect the word directly Multi-processor
above it. By adding processing logic to perform additions,
During one recirculation cycle new words and out- subtractions, etc., on groups of adjacent words of a
put requests are entered into memory. During the next sorting memory one can implement a multi-processor.
recirculation cycle all words are recirculated with no The sorting capability is used to transmit operands
new entries. At the end of the cycle the LSB of each between processors. Merely by changing address elds
word will proceed the MSB of the same word (no re- the multiprocessor can be recongured quickly. Such a
ordering occurs in the second cycle). Output requests multi-processor can keep up with the \dynamic topol-
are identied by a \0" in the LSB and for each request ogy" of certain real-time problems.
logic performs the following action: if the word above To simplify the processing logic one might use the
the request is a normal word (\1" in the LSB) change same network or another network to perform table
its MSB to a \0" and empty the request (change all look-up arithmetic. It is possible to have all the pro-
its bits to \1" as they
y by), if the word above the cessors search the same table simultaneously.
request is another request change the MSB of the rst
request to \0". During the following recirculation cy- SUMMARY
cle the selected words and unfullled requests
ow to
the low end of memory and are read by output lines. Sorting networks capable of sorting thousands of
Because the request itself is outputted if no word is items in the order of microseconds can be constructed
found, as many outputs as original requests occur. If with present-day hardware. Such fast sorting capabil-
the original requests were in order the outputs directly ity can be used to manipulate large sets of data quickly
correspond to them (a second sorting network can put and solve some of the communications problems asso-
the original output requests in order). ciated with large-scale computing systems.
In use the more-signicant part of each word is used Standard modules of convenient sizes can be picked
as an address and the rest as data. To request a certain and used in any size network to lower the cost. Large-
address an output request is sent in with that address scale integration can be applied if the problem laying
and zeros for data. The word returned will be at that out the rather complex topology of the network can be
address or a higher address if the requested address is solved. Studies of this problem are being conducted at
empty. Goodyear Aerospace.
Sorting Networks and Their Applications 313