of chip area and power consumption. Finding a universalsolution suited to wide range of DSP algorithms is permanently actual task. To reach relevant real-time performance, it must be multiprocessor architecture. At thearchitectural level, the main interest is the overall organizationof the system compound using processing elements (PEs),memories, communication channels and control elements. Oneof the possible approaches is so called shared memoryarchitecture. Our architecture can obtain a high area efficiencyand high performance for implementing industrialapplications.II.
PRINCIPLE OF SHARED MEMORY BASEDPROCESSOR In this section, we review shared-memory approach for DSP application . The architecture of shared memory isshown in Figure 1. The idea is very simple. In order tosimultaneously provide the PEs with input data, we need to partition the shared-memory into blocks. Processing elements(PEs) usually perform simple memory less mapping of theinput values to a single output value. Using a rotating accessscheme, each processor gets access to the memories once per N (N - number of PE’s) cycles. During this time processor either writes or reads data from memory. All processors havethe same duration time slot to access to the memories andaccess conflict is completely avoided. The disadvantage of using shared-memory architecture is the memory bandwidth bottleneck. In order to avoid bandwidth bottleneck andsimultaneously provide the processors with several (K) inputdata, the shared-memory is partitioned into K memories(figure 3).
Fig. 1. Shared-memory architecture
In this paper, a special instance of that architecture is presented. The main target is to find balance betweencomplexity of interconnection network, type of computationmodel of PEs (serial vs. parallel), number of PEs and memorysize. Chosen compromise should fulfill following factors:required performance, minimal power consumption and costin terms of chip area. Another important requirement is tocreate flexible, easy reconfigurable architecture suited to widerange of DSP algorithms.
Processing Elements (PEs)
Usually Processing elements perform simple memory lessmapping of the input values to a single output values. The PEscan be in parallel or serial fashion. In parallel form it requiresa parallel data bus and careful design because of delays andcarry propagation. Parallel form leads to arithmetic operationmade in one clock cycle, but when compared to serial form, itconsumes more chip area. Serial PEs receives their inputs bit serially, and their results are also produced bit-serially.Hence, only a single wire is required for each signal. Design process could be more simple and robustness. The cost interms of chip area and power consumption is therefore low.However, to achieve required performance bit-serialcommunication leads to high clock frequencies.
Fig. 2. Shared-memory architecture
Memory elements comparing to PEs are slow. It is desirableto make a trade-off between additional registers and RAMto achieve appropriate (in comparing to PE) read and write performance. By bit parallel PE high speed register will playa trivial (one word) cache memory role. By bit serial PEthere must be a shift register. The data can be shifted in toand out from register with high speed. Then word can bewritten into the RAM. The RAM addressing requires onlycyclic work. Reading data is bit-parallel and stored into shiftregister. Number of RAM words should be enough to storeall variables accordingly to realize algorithm.
Interconnection network (ICN)
Interconnection network provides the communicationchannel needed to supply the PEs with proper data and parameters, and store results in the proper memories. Thedata movement should be kept simple, regular and uniform.Major design issues involve the topology of thecommunication network and its bandwidth.III.
BALANCED MODIFIED SHARED MEMORYARCHITECTURERealizing single, basic arithmetic operation like additionor multiplication, it is obviously bit-parallel version of PEsthat has several times higher performance then bit serialone. However, taking into account whole module with PEs,input and output registers, memory and interconnectionnetwork, advantage of parallel form is not so clear.Including power consumption and chip area, serial formcould be more convenient. Generally smaller chip area andsmaller clock leads to smaller power consumption. Therequirements on the PE are that it completes its operationwithin the specified time limit. Self–explanatory chip areaof single serial-PE is much smaller then parallel-PE, but toget the same performance needs faster clock. Parallel PE’slead to more connections lines, consistently more area and power. Parallel-PE looks to have several times bigger computational throughput (then serial by the same clock),however when considering 2-3 PE’s in shared– memory
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 4, July 2010199http://sites.google.com/site/ijcsis/ISSN 1947-5500