This action might not be possible to undo. Are you sure you want to continue?
Shared Memory Architecture
For Embedded DSP
J.L.Mazher Iqbal
Assistant Professor,
ECE Department,
Rajalakshmi Engineering College,
Chennai602 105, India
mazheriq@gmail.com
Abstract—Reconfigurable computing greatly accelerates a wide
variety of applications hence it has become a subject of a great
deal of research. It has the ability to perform computations in
hardware to increase performance, while keeping much of the
flexibility of a software solution. In addition reconfigurable
computers contain functional resources that may be easily
modified after field deployment in response to changing
operational parameters and datasets. Till date the core
processing element of most reconfigurable computers has been
the field programmable gate array (FPGA) [3]. This paper
presents reconfigurable FPGAbased hardware accelerator for
embedded DSP. Reconfigurable FPGAs have significant logic,
memory and multiplier resources. These can be used in a parallel
manner to implement very high performance DSP processing.
The advantages of DSP design using FPGAs are high number of
Instructions/Clock, high number of Multipliers, high Bandwidth
Flexible I/O and Memory Connectivity. The proposed processor
is a reconfigurable processing element architecture that consists
of processing elements (PEs), memories and interconnection
network and control elements. Processing element based on bit
serial arithmetic (multiplication and addition) was also given. In
this paper, it is established that specific universal balanced
architecture implemented in FPGA is a universal solution, suited
to wide range of DSP algorithms. At first the principle of
modified sharedmemory based processor are shown and then
specific universal balanced architecture is proposed. An example
of processor for TVDFT Transformation on the given accelerator
is also given. By the proposed architecture, we could reduce cost,
area and hence power in the bestknown designs in the Xilinx
FPGA technology.
Key Word; Reconfigurable architectures; FPGA;
Pipeline; Processing Element; Hardware Accelerator
I. INTRODUCTION
Now that design rules have stopped shrinking for ASICs,
ASSPs and the like, they seem likely to be replaced by
FPGAs. With their design rules coming into the 40nm
generation, FPGAs will soon be level with ASICs and ASSPs
in terms of circuit size and performance. The circuit
configuration of FPGAs can be freely revised by equipment
Dr.S.Varadarajan
Associate Professor,
ECE Department,
Sri Venkateswara College of Engineering,
Sri Venkateswara University,
Tirupati517 502, India
varadasouri@gmail.com
manufacturers on the spot, which means that they have no
need to pay development costs, including mask sets. Even
better, FPGAs do not require any circuit fabrication after
design, which means faster equipment development.
Nowadays, consumer appliances have become more
advanced than ever. They are required to be more functional
and portable. Moreover, the span of the product’s life has
become shorter. There are two important issues to develop
LSIs, period and cost. Development of new LSI demands
investing hugely and taking a big risk. Programmable
devices, such as CPU, DSP and FPGA, have become a key
to resolve these issues and hardware reconfigurability has
been paid attention because of its high performance [3].
FPGA has high flexibility and suitable to implement control
circuits, but FPGA suffers from the low area efficiency to
implement data dominated circuits. When implementing
industrial application systems, the area of FPGA
implementation is far larger than that of ASIC
implementation because of the high reconfigurability.
Reconfigurable architecture has the capability to configure
connections between programmable logic elements,
registers and memory in order to construct a highly parallel
implementation of the processing kernel at run time. This
features makes them attractive, since a specific high speed
circuit for given instance of an application can be generated
at compile or even run time. Since the appearance of the
first reconfigurable computing systems, DSP applications
have served as important test cases in reconfigurable
architecture and software development. In the area of
special purpose architecture for digital signal processing,
systolic arrays are recognized as a standard for high
performance. Systolic designs represent an attractive
architectural paradigm for efficient hardware
implementation of computationintensive DSP applications,
being supported by the features like simplicity, regularity
and modularity of structure. In addition, they also possess
significant potential to yield highthroughput rate by
exploiting highlevel of concurrency using pipelining or
parallel processing or both [1]. Today’s objective is to tailor
system performance to given task at minimal cost in terms
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 4, July 2010
198 http://sites.google.com/site/ijcsis/
ISSN 19475500
of chip area and power consumption. Finding a universal
solution suited to wide range of DSP algorithms is
permanently actual task. To reach relevant realtime
performance, it must be multiprocessor architecture. At the
architectural level, the main interest is the overall organization
of the system compound using processing elements (PEs),
memories, communication channels and control elements. One
of the possible approaches is so called shared memory
architecture. Our architecture can obtain a high area efficiency
and high performance for implementing industrial
applications.
II. PRINCIPLE OF SHARED MEMORY BASED
PROCESSOR
In this section, we review sharedmemory approach for
DSP application [13]. The architecture of shared memory is
shown in Figure 1. The idea is very simple. In order to
simultaneously provide the PEs with input data, we need to
partition the sharedmemory into blocks. Processing elements
(PEs) usually perform simple memory less mapping of the
input values to a single output value. Using a rotating access
scheme, each processor gets access to the memories once per
N (N  number of PE’s) cycles. During this time processor
either writes or reads data from memory. All processors have
the same duration time slot to access to the memories and
access conflict is completely avoided. The disadvantage of
using sharedmemory architecture is the memory bandwidth
bottleneck. In order to avoid bandwidth bottleneck and
simultaneously provide the processors with several (K) input
data, the sharedmemory is partitioned into K memories
(figure 3).
Fig. 1. Sharedmemory architecture
In this paper, a special instance of that architecture is
presented. The main target is to find balance between
complexity of interconnection network, type of computation
model of PEs (serial vs. parallel), number of PEs and memory
size. Chosen compromise should fulfill following factors:
required performance, minimal power consumption and cost
in terms of chip area. Another important requirement is to
create flexible, easy reconfigurable architecture suited to wide
range of DSP algorithms.
A. Processing Elements (PEs)
Usually Processing elements perform simple memory less
mapping of the input values to a single output values. The PEs
can be in parallel or serial fashion. In parallel form it requires
a parallel data bus and careful design because of delays and
carry propagation. Parallel form leads to arithmetic operation
made in one clock cycle, but when compared to serial form, it
consumes more chip area. Serial PEs receives their inputs
bit serially, and their results are also produced bitserially.
Hence, only a single wire is required for each signal. Design
process could be more simple and robustness. The cost in
terms of chip area and power consumption is therefore low.
However, to achieve required performance bitserial
communication leads to high clock frequencies.
Fig. 2. Sharedmemory architecture
B. Memory elements
Memory elements comparing to PEs are slow. It is desirable
to make a tradeoff between additional registers and RAM
to achieve appropriate (in comparing to PE) read and write
performance. By bit parallel PE high speed register will play
a trivial (one word) cache memory role. By bit serial PE
there must be a shift register. The data can be shifted in to
and out from register with high speed. Then word can be
written into the RAM. The RAM addressing requires only
cyclic work. Reading data is bitparallel and stored into shift
register. Number of RAM words should be enough to store
all variables accordingly to realize algorithm.
C. Interconnection network (ICN)
Interconnection network provides the communication
channel needed to supply the PEs with proper data and
parameters, and store results in the proper memories. The
data movement should be kept simple, regular and uniform.
Major design issues involve the topology of the
communication network and its bandwidth.
III. BALANCED MODIFIED SHARED MEMORY
ARCHITECTURE
Realizing single, basic arithmetic operation like addition
or multiplication, it is obviously bitparallel version of PEs
that has several times higher performance then bit serial
one. However, taking into account whole module with PEs,
input and output registers, memory and interconnection
network, advantage of parallel form is not so clear.
Including power consumption and chip area, serial form
could be more convenient. Generally smaller chip area and
smaller clock leads to smaller power consumption. The
requirements on the PE are that it completes its operation
within the specified time limit. Self–explanatory chip area
of single serialPE is much smaller then parallelPE, but to
get the same performance needs faster clock. Parallel PE’s
lead to more connections lines, consistently more area and
power. ParallelPE looks to have several times bigger
computational throughput (then serial by the same clock),
however when considering 23 PE’s in shared– memory
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 4, July 2010
199 http://sites.google.com/site/ijcsis/
ISSN 19475500
architecture it could be impossible to use high speed clock
because of noise in signal propagation in long parallel buses.
Otherwise control part of whole system in serialPE version
may be in micro program fashion, where implemented
algorithm will be changed by the way of changing control
memory contest. Using parallelPE, control part must be
significantly changed due to change type of computational
task. This brief considerations show that sharedmemory
architecture with serialPE’s can be easier to implement and is
better suited to wide range of DSP algorithms.
The proposed balanced sharedmemory module based on
the approach [13] is shown on figure 3. The Processing
elements are capable of performing three computing functions:
bit–serial full addition (inc. carry), bit–serial multiplication
and negation, with two inputs and one output. Other arithmetic
operation will be done as sequence of additions. Because of
two serial inputs and one serial output, each PE is equipped
with four shift registers (two inputs and one output for two
independent memories). Those registers are used as single
word cache memory and as serial/parallel parallel/serial
translators on communication path to RAM memory. Hence
interconnection network is very simple. This leads to small
chip area and possibly of using high speed clock. Number of
PEs should be 2 and consequently 1 RAM memory blocks
(one RAM per multiplied output of PE). The multiplier
output of PE1 is shipped into the RAM block using signals
s1=1 and s2=0 to PE2 via shift register. PE2, accumulate
the multiplier output and write back the result to the output
buffer. Our sharedmemory architecture offers good balance
in terms of chip area, power consumption, computational
throughput and flexibility. In fact of lack of required
performance proposed module could be “multiplied” i.e.
connected as shown in figure 5 and figure 6. Using parallel
(figure 5) or cascade (figure 6) connection of modules it is
easy to create processor suited to wide range of algorithms.
Almost every required performance could be achieved as
well. In the next part of this article the example realization
of multiplier based on serial arithmetic and TVDFT
transformation based on proposed architecture is presented.
The heart of proposed architecture is PE (Figure 4). This
element makes all bitserial arithmetic calculation such as
multiplication and addition. Moreover that, the processor
element can made negation on “b” input when control signal
“not_b” is high. Control signals s1 and s2 enable
multiplication and addition respectively.
Fig. 3. Instance of universal specific balance architecture
DEMUX
RAM
S1
S2
OUTPUT
IN2
R1.1
PE1
R2.1
PE2
R3.1
IN1
R1.2 R2.2
R3.2
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 4, July 2010
200 http://sites.google.com/site/ijcsis/
ISSN 19475500
A
B
Clk Reset
Y
Serial Adder
A
B
Clk Reset
Y
Multiplier
4
5
6
S1
Reset
4
5
6
S2
Reset
4
5
6
a
b
a
b
Out put
clk
clk
Fig. 4. PE architecture
Fig. 5. Parallel form
OUTPUT
IN
DEMUX
RAM
S1
S2
R1.1
PE1 PE2
R3.1
R1.2
R3.2
R2.1
R2.2
DEMUX
RAM
S1
S2
R1.1
PE1 PE2
R3.1
R1.2
R3.2
R2.1
R2.2
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 4, July 2010
201 http://sites.google.com/site/ijcsis/
ISSN 19475500
Fig. 6. Cascade form
A. Multiplicator based on the serial arithmetic
Most bitserial multipliers are in practice based on the
shiftandadd algorithm where several bitproducts are
added in each time slot. We present such bitserial
multiplicator. Bitserial addition is based on equations are
given bellow:
} {
} {
length word data W : Where
0 D
) 1 ( 0 , 1 ,... 2 W , 1 W i for C Di
D . B D . A B A
D , B , A Majority C
D B A D , B , A XOR Sum
d
1 w
d d 1 i
i i i i i . i
i i i i
i i i i i i i
d
−
=
− − = =
+ + =
=
⊕ ⊕ = =
−
−
In bit–serial arithmetic the numbers are normally
processed with leastsignificant bit first. In bit–serial
carry–save adder carries of the adder are saved from one
bit position to the next. At the beginning of computation
the D flipflop is reset. The nbit serial multiplicator is
shown on figure 7. At the first step register (shift register
with serial input and parallel output) takes less significant
bit of multiplier and first delay from element D1 (less
significant bit of multiplicand). Unit delay D1 holds bits
for one clock cycle and D2 unit, two cycles respectively.
The first bit (less significant) of output appears after 2n
clock cycles. For example 8–bit serial multiplication takes
16 clock cycles.
IN
DEMUX
RAM
S1
S2
R1.1
PE1 PE2
R3.1
R1.2
R3.2
R2.1
R2.2
DEMUX
RAM
S1
S2
R1.1
PE1 PE2
R3.1
R1.2
R3.2
R2.1
R2.2
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 4, July 2010
202 http://sites.google.com/site/ijcsis/
ISSN 19475500
Fig. 7. Bit serial multiplicator
IV. TVDFT TRANSFOMATION ON THE GIVEN
ARCHITECTURE
Systolic system consists of an array of processing
elements (typically multiplieraccumulator chips) in a
pipeline structure that is used for applications such as
image and signal processing. Systolic approach can speed
up a computebound computation in a relatively simple
and inexpensive manner. A systolic array in particular
achieves higher computation throughput without
increasing memory bandwidth. A wide variety of signal
processing functions can be hosted on the sharedmemory
processor, including complete subsystems that encompass
multiple algorithms. The TVDFT transformation [8] will
be used as the example. TVDFT is given by equation:
(2) 0..k k w(n), x(n)e X(k)
1 N
0 n
k) (n, j
∑
−
=
ϕ −
= =
Where X (k)  spectral component corresponding to kth
harmonic, N – length of analysis frame, x(n) input signal,
w(n) – time window, K – number of orders in input
signal.
) 3 (
Fs 2
)), 1 i ( f ) i ( f ( 2
) k , n (
0 ) k , 0 (
n
1 i
0 0
¦
¹
¦
´
¦
− − π
= ϕ
= ϕ
∑
=
Where f
0
(i) is fundamental frequency at time specified by
i, F
s
is sampling frequency.
In case of linear change of fundamental frequency
formula (3) can be written as follows:
) 4 (
N 2
f n
f
F
nk 2
) k , n ( 0
s
\


¹
 ∆
+
π
= ϕ
Where: f
0
fundamental frequency at the beginning of
analysis frame, ∆f is fundamental frequency change
within analysis frame. Hence, TVDFT formula (2) can be
written as follows:
) 5 ( ,
N 2
f n
f
F
nk 2
sin ) n ( x ) k ( X Im
N 2
f n
f
F
nk 2
cos ) n ( x ) k ( X Re
1 N
0 n
0
s
w
0
1 N
0 n
s
w
¦
¦
¹
¦
¦
´
¦
\


¹
 ∆
+
π
=
\


¹
 ∆
+
π
=
∑
∑
−
=
−
=
where: xw(n)=x(n)w(n).
Formula (5) shows, that for practical realization of
TVDFT, two sine wave generators with linear change of
frequency can be used [10]. The balance for that
algorithm needs two serial PEs, so there is only six shift
registers for communication between RAM and PEs. Size
of memories for given example is 3 of 8–bit words RAM
[0...2]. Such architecture can be used as bit–serial
computational unit (figure 3). According to proposed
TVDFT computation in formula 5, at the first step we
must calculate value of sine and cosine function then
multiply that value by input signal. Given algorithm could
be realized in one bit–serial computational unit in two
main steps. Step one – generation of sine/cosine value
[10].
As the main step of TVDFT algorithm there is
necessary multiply generated value of sine/cosine by
value of input signal. It could be done by detailed steps
given bellow:
1) Load data into input registers.
• R1.1< IN1, R2.1<  IN2;
• R1.2< RAM [0], R2.2< R3.2;
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 4, July 2010
203 http://sites.google.com/site/ijcsis/
ISSN 19475500
2) Calculations.
• PE1< R1.1 * R2.1;
• PE2< R1.2 + R2.2;
3) Writing the results into output registers.
• R3.1< PE1;
• R3.2< PE2;
4) Writing the result from output registers to memories
• RAM [0] < R3.1;
• R2.2< R3.2;
Where: IN1 input signal from sine generation out
module, IN2 input signal X(n).
As it was shown, scheduling of TVDFT algorithm
consists of two parts of work. First of it, computation
value of sine/cosine function and the second one,
multiplication of 8 bit input sample by 8 bit sine/cosine
value, according formula 5. The same main steps should
be repeated N times for each real and imaginary part. For
improving computations performance it is possible to use
parallel or cascade connection of computation units.
V. HARDWARE IMPLEMENTATION
FPGA implementation of specific universal balanced
architecture was made on XILINX VIRTEXII family
(Selected Device: 2vp2fg2566). Simulation is critical in
verifying developed design behavior. Functional and
timing simulation of the specific universal balance
architecture design was done with Mentor Graphics
Modelsim using developed test bench and appropriate
stimuli to validate the design. Table I shows area
utilization of given architecture and it is compared with
utilization of area in previous architecture [1] [3] [13]
[14]. Table II shows timing summary and Table III shows
timing constraint. The simulation result of TVDFT
Transform on specific universal balanced architecture is
shown in figure 8. Layout design is created using Micro
wind 3.1 to verify the impact of physical structure on the
behavior of the developed design which is shown in
figure 9. The final voltage, maximum Idd current of
proposed architecture is 01v 02mA.
TABLE I. Device utilization summary
2vp2fg2566 Area Used Utilization
Number of Slices 40 out of 1408 2%
Number of Slice Flip Flops 48 out of 2816 1%
Number of 4 input LUTs 72 out of 2816 2%
Number of bonded IOBs 28 out of 140 20%
Number of MULT18X18s 2 out of 12 16%
Number of GCLKs 1 out of 16 6%
TABLE II. Timing summary
Maximum Frequency 271.444MHz
Minimum Period 4.160ns
Minimum input arrival time before clock: 3.316ns
Maximum output required time after clock 3.670ns
A. Comparison of Architectures
Details of the performance of the specific universal
balanced architecture of Section III in terms of the basic
design metrics are tabulated alongside with those of other
comparable existing architectures in Table IV and figure
10. It is clear that the proposed implementation
significantly outperforms the existing implementations in
terms of three important key metrics, namely the area
occupied, maximum usable frequency, and gate count.
TABLE III. Timing constraint: Default period analysis for Clock 'clk'
Cell:in>out Fan
out
Gate
Delay
Net
Delay
Logical Name (Net Name)
MULT18X18S:C>P4
LUT2_D:I1>O
MULT18X18S:A4
2
6
0.705
0.313
2.057
0.561
0.524
PE1/multiplier1/Mmult__old_pdt_int_11_inst_mult_0
PE1/output4<4>1 (PE_1<4>)
PE2/multiplier1/Mmult__old_pdt_int_11_inst_mult_0
Total 4.160ns (3.075ns logic, 1.085ns route)(73.9% logic, 26.1% route)
Fig. 8. Simulation of instant universal specific balance architecture
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 4, July 2010
204 http://sites.google.com/site/ijcsis/
ISSN 19475500
Table IV. Comparison of Performance of the Proposed Implementation and the Existing Reconfigurable Implementation
2vp2fg2566 Area Used
Proposed Systolic Fir
Filter[14]
2D Systolic
Structure for FIR
Filters[1]
Yoo et[13]
Number of Slices 40 122 133 146
Number of Slice Flip Flops 48 668 48 48
Number of 4 input LUTs 72   
Frequency[MHz] 271.44 84.5 74 70
Period[ns] 4.160 11.8 14.5 14.0
Fig. 9. Layout design of specific instant universal balance architecture
40
160
48
268
72
261
28
54
1 2
0
50
100
150
200
250
300
1 2
Architecture
O
c
c
u
p
a
t
i
o
n
Number of Slices
Number of Slice Flip Flops
Number of 4 Input LUTs
Number of bonded IOBs
Number of GLKs
Fig. 10. Comparison of Proposed Architecture and Existing Shared Memory Architecture [3]
Processing
Element 1
Processing
Element 2
Output
RAM
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 4, July 2010
205 http://sites.google.com/site/ijcsis/
ISSN 19475500
VI. CONCLUSION
Reconfigurable hardware is getting more and more
complex with increased complexity and heterogeneity. To
develop smaller and powerful reconfigurable processor
we have proposed the universal computation module as
balanced architecture, based on modified sharedmemory
approach. The balance was achieved between processing
elements (PEs), count of memories and interconnection
network. As an example of balanced architecture
appliance for DSP algorithm the TVDFT transformation
was implemented. Processing element based on bit serial
arithmetic (multiplication and addition) was also given.
As presented in this paper sharedmemory balanced
architecture implemented in FPGA is a universal solution,
suited to wide range of DSP algorithms.
REFERENCES
[1] Pramod Kumar Meher, Abbes Amira, July 2008. “FPGA Realization
of FIR Filters by Efficient and Flexible Systolization Using Distributed
Arithmetic”, IEEE Trans, Signal Processing, vol. 56, no. 7, pp. 3009
3017.
[2] Khaled Benkrid, Feb 2008. ” High Performance Reconfigurable
Computing: From Applications to Hardware”, in IAENG International
journal of computer science, 35:1, IJCS_35_1_04.
[3] G.Rubin, M. Omieljanowicz, A. Petrovsky, Jun 2007.
“Reconfigurable FPGABased hardware accelerator for embedded
DSP”, in Proc MIXDES’07 Conf, pp.147151.
[4] Makoto Okada, Tatsuo Hiramatsu, Hiroshi Nakajima, Makoto
Ozone, Katsunori Hirase and Shinji Kimura, 2005. “ A Reconfigurable
Processor based on ALU array architecture with limitation on the
interconnect”, in Proc IEEE IPDPS’05, pp.16.
[5] H. Yoo and D. V. Anderson, “Hardwareefficient distributed
arithmetic architecture for highorder digital filters,” in Proc. IEEE Int.
Conf. Acoustics, Speech, Signal Processing (ICASSP), Mar. 2005, vol.
5, pp. v/125–v/128.
[6] Multiprocessor systemsonchips, 2005 edited by A.A.Jerraya,
W.Wolf, Elsevier Inc.
[7] JaeJin Lee and GiYong Song, “Implementation of a Bitlevel
SuperSystolic FIR Filter”, 2004 IEEE AsiaPacific Conference on
Advanced System Integrated Circuits(APASlC2004)I Aug. 45, 2004,
pp.206209.
[8] A. Petrovsky, P. Zubrycki, A. Sawicki, 2003. “Tonal and noise
separation based on a pitch synchronous DFT analyzer as a speech
coding method”, The proc. of the ECCTD’ 03, vol.III, Cracow, pp. 169
172.
[9] Katherine Compton, June 2002. “Reconfigurable Computing: A
survey and software”, in ACM Computing survey, Vol.34, N0.2,
pp.171210.
[10] M. Omieljanowicz, P. Zubrycki, A. Petrovsky, G.Rubin, 2002.
“FPGAbased algorithms and hardware for generating digital sine
waves”, MIXDES 2002, Wroclaw, pp. 279284.
[11] R.Tessier, W.Burleson, 2002. “Reconfigurable computing and
digital signal processing: past, present, and future”, in the book
“Programmable digital signal processors: architecture, programming,
and applications”, edited by Yu Hen Hu, Marcel Dekker, Inc., pp.147
185
[12] R.Haritenstein, “Reconfigurable Computing: the Road map to a
New Business Modeland its Impact on SOC Design,” SBCCI 2001 15
th
Symposium on Integrated Circuits and System Design, Brasilia, DF,
Brazil, Sep.2001.
[13] L. Wanhammar, DSP integrated circuits, Academic Press, USA,
1999.
[14] R. Reeves, K. Sienski, C. Field, “Reconfigurable Hardware
Accelerator for Embedded DSP”, ICSPAT 97’, SanDiego, 1997, 929
933.
[15] http://www.xilinx.com
AUTHORS PROFILE
J.L.Mazher Iqbal received B.E degree in Electronics and Communication Engineering from
Madurai Kamaraj University in 1998 and M.E degree in Applied Electronics from Anna University in
2005. He is persuing Ph.D in Sri Venkateswara University College of Engineering, Sri Venkateswara
University Tirupati. Currently, he is working as Assistant Professor in the Department of Electronics
and Communication Engineering, Rajalakhmi Engineering College, Chennai, Tamil Nadu, India.
Dr. S. Varadarajan received B.Tech degree in Electronics and Communication Engineering from
Sri Venkateswara University, Tirupati in 1987 and M.Tech degree from NIT, Warangal in
Instrumentation in 1981, respectively. He obtained Ph.D. from Sri Venkateswara University, Tirupati
in 1997. He is a fellow of IETE and member, IEEE. Currently, he is working as Associate Professor in
the Department of Electronics and Communication Engineering, Sri Venkateswara University College
of Engineering, Sri Venkateswara University, Tirupati, Andhra Pradesh, India.
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 8, No. 4, July 2010
206 http://sites.google.com/site/ijcsis/
ISSN 19475500