Professional Documents
Culture Documents
U.maheswaran Presentation
U.maheswaran Presentation
Submitted by
CONTENTS:
Introduction Problem
Definition Technical Background Proposed Idea Design Methodology Design Aspects Results Conclusions Queries
INTRODUCTION
Why What Why
reconfigurable computing? are CGRA s? not FPGAs for Complex applications? computing & Mapping applications?
Reconfigurable
PROBLEM DEFINITION
FPGAs are architectures with fine grain packing[bit level granularity]So flexibility reduces , complexity increases. Hence CGRA comes to picture. CGRA has WORD /NIBBLE level granularity. But CGRA applications are domain specific. It Supports only integer arithmetic. Proposing a new architecture supporting both integer and floating point operations.
TECHNICAL BACKGROUND
In
current scenario systems with Reconfigurable logic modules, have a greater impact on many technical applications. FPGAs are used in wide range in many technical domains to implement many interesting complex algorithms. But these FPGAs have less flexibility and give less efficiency, due to their fine grained architecture.
CONTINUED..
If
we use such fine grained architecture for complex algorithms, the flexibility has to be sacrificed and the system becomes more complex. COARSE GRAINED ARCHITECTURES has greater granularity, where the divided resource entities(hardware/problem) are larger grain size
CONTINUED..
Hence
on using this CGRA for complex algorithms the flexibility is preserved. But typical CGRAs comes without floating point unit Also they are domain specific. To overcome these barriers, heuristic mapping functions are used .so that a floating point unit can be dynamically created by the mapping algorithm.
CONTINUED..
Hence
this floating point enabled CGRA can be used for complex applications involving floating point arithmetic. E.g.: DSP filter design, Graphics accelerators, and many multimedia applications. Thus the hardware flexibility of a system is improved by using an High performance Hardware and the programming flexibility is achieved through mapping algorithms .
PROPOSED IDEA
The
real challenge before us now is the grain size of the reconfigurable device. By grouping the basic units of the Reconfigurable device with a data bus of a particular data width, and thus by improving its granularity is the aim of these authors.
CONTINUED..
The target architecture consists of a reconfigurable computing module (RCM). RCM executes loop kernel code segments. A general-purpose processor for controlling the RCM is present. These units are connected with a shared bus. Each PE can be dynamically reconfigured to perform arithmetic/ logic operation.
DESIGN FLOW
consists of control path and data path designs. path design: Generation of set of control
Control
mapped CGRA contains a coprocessor[kcpsm3 Pico-Blaze] on the host FPGA ,used for the reconfiguration of grain size of FPGA. The kcpsm-3 [Constant(K) Coded Programmable State Machine] is a free soft processor cores from Xilinx for use in their FPGA .
.
CONTINUED..
Xilinx
documents the Pico-Blaze as requiring just 96 FPGA slices. It runs kernel in looping fashion , and reconfigures the CLBs in to required PEs. Reconfiguration details are stored in configurable caches. Now the floating point adder unit is synthesized on the mapped CGRA and addition is performed.
DESIGN ASPECTS
pair of PEs used for floating point operations. One PE computes Mantissa & another handles Exponent. tree routing is preferred for faster routing performance. After ILP/QEA , heuristic approaches are followed to increase performance.
Steiner
CONTINUED
Thus,
each operation in a loop body is spatially mapped to a dedicated PE. The main advantage of spatial mapping is that each PE may not need reconfiguration during execution of a loop because of its fixed functionality. However, it has a disadvantage that spreading all operations of the loop body over the reconfigurable array may require a very large array size.
CONTINUED
The operations that a PE (or a pair of PEs) in our CGRA can execute are classified into three groups. 1) Arithmetic/logical operations A PE can execute ALU operations in one clock cycle . 2) Multiply/divide/load/store operations These operations are executed by dedicated functional resources located outside the PE array in several clock cycles. 3) Floating-point operations A pair of PEs can execute floating-point operations taking several clock cycles.
CONCLUSION
Thus Increased performance, Flexibility is achieved in both programming and Hardware by this mapping process over a reconfigurable device. A faster, more flexible reconfigurable hardware is mapped to support floating point operations, in this way, can draw good attention in embedded systems industries.
Queries ?
AUTHORS
U.Maheswaran.,M.E.[A.E] PG Scholar, M.N.M.Jain Engineering college, Chennai,Tamilnadu,India P.Venugopal M.E.[A.E],M.B.A. Asst.Professor,Dept.of ECE, M.N.M.Jain Engineering college, Chennai,Tamilnadu,India
er.maheswaran@live.com +91-9944215357
kpsvenu@gmail.com +91-9444420128