Professional Documents
Culture Documents
Int. J. on Recent Trends in Engineering & Technology, Vol. 05, No. 01, Mar 2011
I. INTRODUCTION
The Cell Broadband Engine (Cell/B.E.) processor
is a heterogeneous multi-core chip that is significantly
different from conventional multiprocessor or multi-core
Figure1. Cell Broadband Engine Architecture
architectures. It consists of a traditional microprocessor (the
PPE) that controls eight SIMD co-processing units called
synergistic processor elements (SPEs), a high speed memory II. PROBLEM WITH THE TRADITIONAL SEARCH
controller, and a high bandwidth bus interface (termed the If the shallowest goal node is at some finite depth say
element interconnect bus, or EIB), levels of parallelism in on- d, breadth-first search (BFS) will eventually find it after
chip communication.all integrated on a single chip. Fig. 1 expanding all shallower nodes [1]. However the time taken to
gives an architectural overview of the Cell/B.E. processor. find out a solution is large. Whereas, depth-first search (DFS)
The PPE runs the operating system and coordinates the SPEs. is an uninformed search that progresses by expanding the
It is a 64-bit PowerPC core with a vector multimedia extension first child node of the search tree that appears and thus going
(VMX) unit, 32 KByte L1 PowerPC Processing Element(PPE) deeper and deeper until a goal node is found, or until it hits a
instruction and data caches, and a 512 KByte L2 cache. The node that has no children. Then the search backtracks,
PPE is a dual issue, in-order execution design, with two way returning to the most recent node it hasn’t finished
simultaneous multithreading. Ideally, all the computation exploring.If the key to be found seems to reside at very high
should be partitioned among the SPEs, and the PPE only depths, the DFS algorithm may run into infinite looping
handles the control flow. Each SPE consists of a synergistic whereby the searching process becomes incomplete.
processor unit (SPU) and a memory flow controller (MFC).
The MFC includes a DMA controller, a memory management III. PROPOSED SEARCHING TECHNIQUE
unit (MMU), a bus interface unit, and an atomic unit for
synchronization with other SPUs and the PPE [8, 11]. The combination of these two algorithms makes searching in
a efficient manner on the cell broadband engine. The modern
processors moving more towards improving parallelization
and multithreading, it has become impossible for performance
gains in older compilers as technology advances [3]. Any
multicore architecture relies on improving parallelism than
on improving single core performance. The main advantage
133
© 2011 ACEEE
DOI: 01.IJRTET.05.01.69
Short Paper
Int. J. on Recent Trends in Engineering & Technology, Vol. 05, No. 01, Mar 2011
V. SYSTEM ARCHITECTURE
The overall process and control flow of the
parallelized and synchronized hybrid search is shown in the
figure below, implemented by using cell sdk 3.1 simulator.
The PowerPC processor element (PPE) reads the following.
a) KEY
b) SPE COUNT
c) TREE SIZE
Figure 2. Hybrid search Architecture
Where KEY is the element to be searched against
the tree, SPE COUNT is the number of SPEs to be utilized in As the SPE’s local store memory is very limited,
the searching process, TREE SIZE is the total number of this poses a limit on the number of elements that can be
elements to be searched. processed by this cell BE architecture.
The PPE determines the tree size for the SPEs The actual searching process starts after the
performing DFS, depending on two factors namely, the input binary trees are created by all of the SPEs [7]. The first SPE
tree size and the number of SPEs to be utilized. The following makes use of breadth first search strategy to search the
relation formulates the tree size for each SPE performing Depth elements of its tree. The rest of the SPEs search their own
first search. trees in Depth first search fashion..
Tree size for each SPE =
VI. PARALLELIZED ALGORITHM
Total number of elements to be processed
Number of SPEs to be utilized
PPU side
PPE initiates the first SPE which is supposed to No_of_ele:= read no of elements
search its tree in Breadth first search fashion by sending the Spe_count:=read no of spes to be used
key element to be searched. The entire tree is given as input Key:=key to be searched
to this SPE as this SPE will search the entire set of elements in Start: spe[0]->thread(no of file , key ,spe id) //invoke first
breadth first search fashion [2]. The PPE then initiates all SPE to search in BFS
other SPEs by sending the tree size prescribed for each SPE for(i=1;i<spe count;i++)
and the key element to be searched. The number SPEs invoked {
depends on the SPE COUNT entered by the user. Start:=spe[i]->thread[no of ele , key ,spe id]//invoke other
The processing of the SPEs starts now. Each SPE SPEs to search inDFS
creates a binary tree with its own prescribed set of elements. }
The creation of nodes in the binary tree takes place in breadth Found:=spu_read_out_mbox(); // Read status from outbound
first fashion. For the last SPE, the tree size differs slightly as mailbox of SPEs.
the relation mentioned above doesn’t stand perfectly If(Found)
divisible at all times [4]. So for the last SPE, the quotient Print “Key found “
obtained from the above relation is summed up with the else
remainder left out after the division is carried out, so that no Print “Key not found “
element is left out within the TREE SIZE prescribed. Terminate(spe[i]->thread)
The nodes are created and added to the tree in level exit(0)
order fashion. For Eg, consider the set of nodes 1,2,3,4,5,6. end
These nodes are now added to the tree with 1 as root node,
followed by 2 as its left child, 3 as right child of 1. Then the BFS_SPU
element 4 gets inserted as left child of 2,5 as right child of 2 create Tree(no of ele);
and 6 as left child of 3 and so on. This is how tree gets bfs search tree(node*root, int key)
constructed in each of the SPE’s local store [6]. If key found // if key found , mark found as 1
© 2011 ACEEE
134
DOI: 01.IJRTET.05.01.69
Short Paper
Int. J. on Recent Trends in Engineering & Technology, Vol. 05, No. 01, Mar 2011
VII. CONCLUSION
Thus the performance issues encountered using the
existing searching algorithms are overcome by the hybrid
search algorithm. They hybrid algorithms work fine and good
with all the cores of the cell broadband engine. The accuracy
of the results is also greatly improved.
135
© 2011 ACEEE
DOI: 01.IJRTET.05.01.69
Short Paper
Int. J. on Recent Trends in Engineering & Technology, Vol. 05, No. 01, Mar 2011
136
© 2011 ACEEE
DOI: 01.IJRTET.05.01.69