You are on page 1of 12

Author Name et. al.

/ International Journal of Engineering Research and Technology


Vol. 1 (02), 2012, ISSN 2278 - 0181

DESIGN OF AN OPTIMIZED CORDIC FOR FIXED ANGLE OF ROTATION

Abstract 1. Introduction
The CORDIC algorithm is a technique for The Coordinate Rotation DIgital Computer
efficiently implementing the trigonometric (CORDIC) algorithm has been used for many years for
functions and hyperbolic functions.Vector efficient implementation of vector rotation operations
rotation through fixed and known angles has in hardware. CORDIC or Coordinate Rotation Digital
wide applications in various areas. But there is Computer is a simple and hardware-efficient algorithm
no optimized coordinate rotation digital for the implementation of various elementary,
computer (CORDIC) design for vector-rotation especially trigonometric functions. Instead of using
through specific angles. FPGA based CORDIC Calculus based methods such as polynomial or rational
design for fixed angle of rotation provides functional approximation, it uses simple shift, add,
optimization schemes and CORDIC circuits for subtract and table look-up operations to achieve this
fixed and known rotations with different levels objective. By making slight adjustments to the initial
of accuracy. For reducing the area and time conditions and the LUT values, it can be used to
complexities, a hardwired pre-shifting scheme
efficiently implement Trigonometric, Hyperbolic,
in barrel-shifters of the CORDIC circuits is
Exponential functions, Coordinate Transformations etc.
used. Pipelined schemes are suggested further
using the same hardware. Since it uses only shift-add
for cascading dedicated single-rotation units
arithmetic, VLSI implementation of such an algorithm
and bi-rotation CORDIC units for high-
throughput and reduced latency is easily achievable.
implementations. The fixed-point mean- The CORDIC algorithm was first proposed by
squared-error of the CORDIC circuit is Jack E Volder in 1959 [9], [10]. It is an iterative
analyzed and reduced. This design offers technique and consists of two modes of operation called
higher throughput, less latency and less area- rotation mode and vectoring mode. In either mode, the
delay product. The fixed-angle CORDIC algorithm is rotation of an angle vector by a definite
rotation would have wide applications in signal angle but in variable directions. This fixed rotation in
processing, games, animation, graphics and
variable direction is implemented through an iterative
robotics. The coding is done in Verilog
sequence of addition/subtraction followed by bit-shift
language, synthesized using Xilinx ISE 14.1 and
operation. The final result is obtained by appropriately
simulated using ISim.
scaling the result obtained after successive iterations.
Keywords— Coordinate rotation digital computer
All of the trigonometric functions can be computed or
(CORDIC), digital signal processing (DSP) chip, VLSI. derived from functions using vector rotation. CORDIC
works by rotating the coordinate system through

ESRSA Publication © 2012 http://www.ijert.org


Author Name et. al. / International Journal of Engineering Research and Technology
Vol. 1 (02), 2012, ISSN 2278 - 0181

constant angles until the angle is reduces to zero. 2. Shift-add operations for corresponding scaling
CORDIC algorithm is very attractive for hardware circuits.
implementation because less power dissipation,
3. A hardware pre-shifting scheme is suggested for
compactable & simple in design.
reduction of barrel-shifter complexity.
Less power dissipation and compact because it
4. A cascaded pipelined circuit in which more than one
uses only elementary shift-and-add operations to
CORDIC module in separate stages in a cascade. The
perform the vector rotation so it only needs the use of 2
cascade of CORDIC modules is faster and involves less
shifter and 3 adder modules. Simplicity is due to
area-delay complexity.
elimination of multiplication with only addition,
subtraction and bit-shifting operation. Either if there is 5. An efficient CORDIC circuit using a pair of micro-
an absence of a hardware multiplier (e.g. μC, μP) or rotations, and named as Bi rotation CORDIC.
there is a necessity to optimize the number of logic
gates (e.g. FPGA) CORDIC is the preferred choice. 3. System architecture
However, the major disadvantage of the CORDIC 3.1 Reference Cordic Circuit
algorithm are large number of iterations required for
accurate results and thus the speed is low and time Reference CORDIC circuit [1] for vector
delay is high, power consumption is high in some rotation with initial vector (X, Y) rotate through an
architecture types angle θ to obtain final vector (X’, Y’) through
,whenever a hardware multiplier is available, e.g. in a successive iterations is shown below. The rotation-
DSP microprocessor mode CORDIC algorithm to rotate a vector U= [Ux Uy]
Rotation of vectors through a fixed and known through an angle θ to obtain a rotated vector V= [Vx Vy]
angle has wide applications in robotics, graphics, is given by
games, and animation [4] [6]. Locomotion of robots is
very often performed by successive rotations through (Ux)i+1= (Ux) i – (Uy )i 2-i ……3.1
small fixed angles and translations of the links. The
(Uy)i+1= (Uy) i + (Ux )i 2-i ……3.2
translation operations are realized by simple additions
of coordinate values while the new coordinates of a Φi+1 = Φi - tan-1(2-i) ……3.3
rotational step could be accomplished by suitable
successive rotations through a small fixed angle which such that when n is sufficiently large
could be performed by a CORDIC circuit for fixed
rotation. Similarly, interpolation of orientations
between key-frames in computer graphics and
T ……3.4
animation could be performed by fixed CORDIC
rotations. There are plenty of examples of uniform
rotation starting from electrons inside an atom to the where = -1 if Φi <0 and σi = 1 otherwise, and K is
planets and satellites. A simple example of uniform the scale-factor of the CORDIC algorithm, given by,
rotations is the hands of an animated mechanical clock
which perform one degree rotation each time. There are
K= ……3.5
several cases where high-speed constant rotation is
required in games, graphic, and animation. The objects
with constant rotations are very often used in A reference CORDIC circuit for fixed rotations
simulation, modeling, games, and animation. The according to eqns 3.1 and 3.2 is shown in figure 3.1.
contributions of this paper are as follows,
1. Algorithm for optimization schemes for reducing the
number of micro-rotations and for reducing the
complexity of barrel-shifters for fixed-angle vector-
rotation.
ESRSA Publication © 2012 http://www.ijert.org
Author Name et. al. / International Journal of Engineering Research and Technology
Vol. 1 (02), 2012, ISSN 2278 - 0181

An adder/subtractor is a circuit that is


capable of adding or subtracting binary numbers. Here
addition or subtraction occurs according to the sign bits
stored in the sign bit register (SBR).
Sign bit register (SBR)
In case of fixed rotation, αi could be pre-
computed and the sign-bits corresponding to could
be stored in a sign-bit register (SBR) in CORDIC
circuit.
Fig 3.1 Reference CORDIC circuit for fixed
The CORDIC circuit therefore need not compute the
rotations
remaining angle αi during the CORDIC iterations.

X0 and Y0 are fed as set/reset input to the pair of input 3.2 Optimization of Cordic Circuit
registers and the successive feedback values Xi and Yi
Optimization schemes [1] are for reducing the
at the iteration are fed in parallel to the input registers.
number of micro-rotations and for reducing the
Note that conventionally we feed the pair of input
complexity of barrel-shifters for fixed-angle vector-
registers with the initial values X0 and Y0 as well as the
rotation. Optimization is done for both micro rotations
feedback values Xi and Yi through a pair of
and scaling with the help of algorithm.
multiplexers.
The steps involved in optimizing the
Registers
Elementary angle set of Micro-rotations are as follows:
A register is one of a small set of data
 εΦ, maximum tolerable error between desired
holding places that are part of a computer processor.
angle and approximated angle is given as an
Here two registers are used to store the values of X and
input.
Y. A register may hold a computer instruction, a
 Optimization algorithm searches starts with
storage address, or any kind of data (such as a bit
single micro rotation
sequence or individual characters).
 If the micro-rotation that has smaller angle of
Barrel shifters deviation than εΦ cannot be found, the number
of micro-rotations is increased by one.
A barrel shifter is a digital circuit that can
 Optimization algorithm is run again.
shift a data word by a specified number of bits in one
 Based on the obtained micro-rotations, the
clock cycle. It is simply a bit-rotating shift register. The
parameters for scaling operation can be
bits shifted out the MSB end of the register are shifted
searched with the different objective function.
back into the LSB end of the register. In a barrel shifter,
The steps involved in optimizing the scaling process
the bits are shifted the desired number of bit positions
are as follows:
in a single clock cycle. For example, an eight-bit barrel
 εk, maximum tolerable error between desired
shifter could shift the data by three positions in a single
angle and approximated angle is given as an
clock cycle. If the original data was 11110000, one
input.
clock cycle later the result will be 10000111.
 Optimization algorithm searches starts with
Functionally, since any bit can end up in any bit
single term of scaling.
position, multiplexers are used to place the bits
 The number of scaling terms is increased by
correctly for proper storage. Thus, a barrel shifter is
one until ∆k is smaller than the given
implemented by feeding an N- bit data word into N, N-
maximum deviation εk, where ∆k is deviation
bit-wide multiplexers. For e.g., an eight-bit barrel
of ratio of approximated scale factor to actual
shifter is built out of eight flip-flops and eight 8-to-1
scale factor from 1.
multiplexers; a 32-bit barrel shifter requires 32 registers
 Optimization algorithm is run again.
and thirty-two, 32-to-1 multiplexers.
For rotation of a vector through a known
Adder / Subtractor and fixed angle of rotation using a rotation-mode

ESRSA Publication © 2012 http://www.ijert.org


Author Name et. al. / International Journal of Engineering Research and Technology
Vol. 1 (02), 2012, ISSN 2278 - 0181

CORDIC circuit, a set of a small number of The major contributors to the hardware-complexity in
predetermined elementary angles { , for 0 i m- the implementation of a CORDIC circuit are the barrel-
shifters and the adders.
1}, where = arctan (2-k(i)) is the elementary angle to The optimization of CORDIC circuit is shown in figure
be used for the ith micro-rotation in the CORDIC 3.2.
algorithm and m is the minimum necessary number of
The components are
micro-rotations. Meanwhile, it is well known that the
rotation through any angle, 0 < θ 2π can be mapped i. Registers
ii. Barrel shifter
into a positive rotation through 0 < θ π/4 without any
iii. Adder/ Subtractor
extra arithmetic operations. Hence, as a first step of iv. Sign Bit Register
optimization, we perform the rotation mapping so that v. ROM Module
the rotation angle lies in the range of 0 < θ π/4. In
the
next step, we minimize the number of elementary Read only memory (ROM)
ROM is a memory that is capable of
angles in the set { } according to the accuracy
holding data and having that data read from the chip,
requirements. The rotation mode CORDIC algorithm of
but not written to. It is non-volatile which means it
eqns 3.1, 3.2 and 3.3 therefore, can be modified
keeps its contents regardless if it has power or not. Here
accordingly to have
ROM contains the control-bits for the number of shifts
corresponding to the micro-rotations to be implemented
=
by the barrel-shifter and the directions of micro-
rotations are stored in the sign-bit register (SBR).
……3.6 3.3 Hardwired Pre-Shifting Scheme
The set of micro rotations is optimized according to the The barrel-shifter used in the optimization
algorithm which is described in chapter 2. Since the CORDIC circuit increases the complexity of the circuit.
In order to avoid the complexity of barrel-shifter, a
elementary angles and direction of micro-rotations are hardwired pre-shifting scheme [1] is used. A barrel-
predetermined for the given angle of rotation, the angle shifter for maximum of S shifts for word-length L is
estimation data-path is not required in the CORDIC implemented by [log2(S+1)] stages of demultiplexors,
where each stage requires L number of 1:2 line
circuit for fixed and known rotations. Moreover,
MUXes. The hardware-complexity of barrel-shifter,
because only a few elementary angles are involved in therefore, increases linearly with the word-length and
this case, the corresponding control- bits could be logarithmically with the maximum number of shifts.
We can reduce the effective word-length in the MUXes
stored in a ROM of few words.
of the barrel-shifters, and so also the number of stages
of MUXes by simple hardwired pre-shifting.
The time involved in a barrel-shifter could
also be reduced by hardwired pre-shifting, since the
delay of the barrel-shifter is proportional to the number
of stages of MUXes, and it also possible to reduce the
number stages by hardwired pre-shifting. The
hardwired pre-shifting scheme is shown in figure.3.3.

Figure 3.2 Optimization of CORDIC circuit

ESRSA Publication © 2012 http://www.ijert.org


Author Name et. al. / International Journal of Engineering Research and Technology
Vol. 1 (02), 2012, ISSN 2278 - 0181

The rotation-module in this case does not


require input from SBR since each adder module
always performs either addition or subtraction.

Figure 3.3 Hardwired pre-shifting circuit


If l is the minimum number of shifts in the set of
selected micro-rotations, we can load only the (L-l)
more- significant bits (MSBs) of an input word from
the registers to the barrel-shifters, since the l less
significant bits (LSBs) would get truncated during
shifting. The barrel-shifter, therefore, needs to
implement a maximum of(s- l) shifts only, where s is
the maximum number of shifts in the set of selected
micro-rotations. The output of the barrel-shifters are Figure 3.4 Multi-stage single-rotation cascaded
loaded as the (L-l) LSBs to the add/subtract units, and cordic circuit
the l MSBs of the corresponding operand of
add/subtract unit are hardwired to 0. Therefore, the
3.5 SCALING OPTIMIZATION
hardware-complexity of a barrel-shifter could be
reduced by the hardwired pre-shifting approach The generalized expression for the scale-
factor given by eqn 3.5 can be expressed explicitly for
the selected set of m1 micro rotations as
3.4 Multi-Stage Single-Rotation Cascaded Cordic
Circuit
K= ……3.7
Multi-stage cascaded CORDIC circuit [1]
connects n number of single rotations in cascaded form. where k(i) for 0 m1 is the number of shifts in the
Here the initial vector values are provided to the first ith micro-rotation
rotation module, whose output is provided as the input
to the second rotation module and this goes on till the An approximate scale-factor is the product of shift-add
nth rotation module finally to obtain the final vector X n terms of form
and Yn. The Multi-stage single-rotation cascaded cordic
circuit is shown in figure.3.4. K A= ……3.8
Each stage of the cascaded design consists
where s(i) is the number of shifts performed for the ith
of a dedicated rotation-module that performs a specific
micro-rotation. Each rotation-module consists of a pair iteration of scaling, = 1, and m2 is maximum
of adders or subtractors (depending on the direction of number of scaling iterations required for the
micro-rotation which it is required to implement). Each approximation. The scaling circuit using hardwired pre-
of the adders/subtractors loads one of the pair of inputs shifting scheme is shown in figure 3.5. The scaling
directly and loads the other input in a pre-shifted form circuit based on approximated scaling factor, Ka shown
at (L-s(i)) LSB locations, where s(i) is the number of in figure 3.5 can use hardwired pre-shifting for
right- shifts required to be performed to implement the minimizing barrel-shifter complexity.
micro- rotation. The s(i)MSB locations are hardwired
to be zero.

ESRSA Publication © 2012 http://www.ijert.org


Author Name et. al. / International Journal of Engineering Research and Technology
Vol. 1 (02), 2012, ISSN 2278 - 0181

value is then add/subtract to the input values according


to sign bit values stored in SBR.

Inputs - X = 0001 and Y = 0000


Shifted values- q11 = 0001 and q21 = 0001
Outputs – S1 = 0001 and S2 = 0001

Figure 3.5 Scaling circuit using hardwired pre-


shifting scheme

4. Simulation Results

The design entry is modeled using Verilog in Xilinx


ISE Design Suite 14.1 and the simulation of the design
is performed using ISim from Xilinx ISE to validate the
Fig 4.2 Reference CORDIC circuit for 2nd iteration
functionality of the design. Here CORDIC design for
vector rotation through fixed and known angle is In figure 5.2 the input X and Y is shifted for the
simulated.
iteration value corresponding to 26.6 (one shift). The
4.1 Reference CORDIC circuit for single iteration shifted value is then add/subtract to the input values
according to sign bit values stored in SBR.
Here the initial vector (X0, Y0) which is rotated
through a series of micro-rotations or iteration say, 1 st Inputs - X = 1000 and Y = 0010
iteration, 2nd iteration etc to obtain the final vector (Xn, Shifted values- q11 = 0100 and q21 = 0001
Yn). Outputs – S1 = 1001 and S2 = 0110

4.2 Multi-stage single-rotation cascaded CORDIC


circuit

Multi-stage cascaded CORDIC circuit


connects ‘n’ number of single rotations in cascaded
form Here the initial vector values are provided to the
first rotation module, whose output is provided as the
in

input to the second rotation module and this goes on till


the nth rotation module finally to obtain the final vector
Xn and Yn. (Xn, Yn) is the final vector obtained after
Fig 4.1 Reference CORDIC circuit for 1st iteration

In figure 5.1 input X and Y is shifted for the iteration


value corresponding to 45 (zero shift). The shifted

ESRSA Publication © 2012 http://www.ijert.org


Author Name et. al. / International Journal of Engineering Research and Technology
Vol. 1 (02), 2012, ISSN 2278 - 0181

vector rotation through n iterations from the initial where s is the maximum number of shifts in the set of
vector (X0, Y0). selected micro-rotations. In this (L-l) MSBs are load to
Figure 5.3 Multi-stage single-rotation barrel shifters from the registers and load (l-l) LSBs
cascad
ed
COR
DIC

locationsb
y
keepingl

circuit MSBs as 0 to
adder/subtractor.
In figure 5.3 the input initial values (X, Y) = (1,
Figure 5.4 Hardwired pre-
0) is rotated by a fixed angle taken as 20 . For 20 , shifting scheme
there

are 7 number of iterations. The initial vector is rotated

by the given angle to obtain the final vector (Xn, Yn). hardwired pre-shifting scheme is used. The barrel-shifter
needs to implement a maximum of(s- l) shifts only,
Outputs – Xn = 000000000000000011111000

and Yn = 000000000000000000111110

Angle – Z0 = 0000000000000110110100010

5.1.3 Hardwired pre-shifting scheme

The barrel-shifter used in the optimization


CORDIC circuit increases the complexity of the circuit.
In order to avoid the complexity of barrel-shifter, a

ESRSA Publication © 2012 http://www.ijert.org


Author Name et. al. / International Journal of Engineering Research and Technology
Vol. 1 (02), 2012, ISSN 2278 - 0181

In figure 5.4 the input initial values (X, Y) =

(1, 0) is rotated by a fixed angle taken as 30 . For 30 ,


there are 9 number of iterations. The initial vector is
rotated by the given angle to obtain the final vector
(Xn, Yn).

Outputs – Xn =

000000000000000010011111 and Yn

= 000000000000000010010111

Angle – Z0 = 000000000001010001110011

ESRSA Publication © 2012 http://www.ijert.org


Author Name et. al. / International Journal of Engineering Research and Technology
Vol. 1 (02), 2012, ISSN 2278 - 0181

Figure 5: Simulation result of lockstep scheme

Re-computing with shifted operands based on


DWC-CED technique identifies the core with error in
the system. Figure 6 shows the simulation results of the
RESO method to identify the core with error in the
system and to switch the correct core to the output.

Figure 6: Simulation result of RESO method

Each core produces two outputs. One due to


actual input and other due to shifted inputs. The output
from picoblaze core1 due to actual input is signal q7
and output due to shifted input is shown by signal
q6.The output from picoblaze core2 due to actual input
is signal q14 and output due to shifted input is shown
by signal q13.Now the actual output from both cores is
detecting the presence of error defined based
compared using comparator indicated by signal out1.
on a hardware redundancy technique known as
The out1 is zero, which indicates the mismatch between
Duplication with comparison.
the actual outputs. The output due to shifted inputs is
faulty one. Now the correct core is switched to right shifted by 1 bit, ie, q6 and q13 is right shifted by
output by using MUX module. one bit to obtain the signal q15 and q16 respectively.
Now q15 is compared with q7 and the output of
The design entry is modelled using VHDL in Xilinx
comparator shown by signal out2. Out2 is high which
ISE Design Suite 13.2 and the simulation of the design
indicate the picoblaze core1 has no error. Now the
is performed using Isim from Xilinx ISE to validate the
signal q16 is compared with q14 and the output of
functionality of the design. comparator shown by signal out3. Out3 is low which
Lockstep scheme which is based on indicate the picoblaze core2 has error. Now core
Duplication with Comparison technique defined at the without error, core1 is switched to the output by using
processor level is designed using a pair of picoblaze MUX indicated by out4.
cores, a comparator and a mux which detects the
presence of error in the system. The outputs from 5. Conclusions
picoblaze core 1 and core 2 are provided as the inputs
to the comparator which compares the output of both SRAM based FPGA’s are sensitive to
the cores. The out1 port indicates the output of the radiation induced faults and require protection to work
comparator. If it is low, it indicates the mismatch in harsh environments due to it’s to high logic density
in terms of SRAM memory cells. SRAM based FPGAs
between the outputs of picoblaze cores. Now once the
are affected by radiation induced temporary faults
core with the error is detected, the core without error is called as single event upsets (SEUs) or soft errors. That
switched to the output and the core with error is may alter the logic states of any static memory
switched to fault tolerant configuration engine for error elements. The paper is intended to design a new
recovery. The out4 port indicates the output of the architecture for soft error detection technique which
mux module. can be incorporated on any SRAM based FPGA with
integrated Softcore processors. PicoBlaze is used as the
Softcore processor, which is a compact, capable, and
Figure 5 shows the simulation result of the
cost-effective fully embedded8-bit RISC virtual soft
lockstep scheme based on DWC technique to detect the core optimized for the Xilinx FPGA families. Lockstep
presence of error in the system. scheme based on DWC technique at the processor level
which is used to detect the presence of error in the

ESRSA Publication © 2012 http://www.ijert.org


Author Name et. al. / International Journal of Engineering Research and Technology
Vol. 1 (02), 2012, ISSN 2278 - 0181

system and RE-computing

ESRSA Publication © 2012 http://www.ijert.org


Author Name et. al. / International Journal of Engineering Research and Technology
Vol. 1 (02), 2012, ISSN 2278 - 0181

with Shifted Operand (RESO) method based on DWC-


CED to detect the core with error are designed and
simulated.

6. References

[1] Hung-Manh Pham, SebastienPillement, Stanisław J.


Piestrak “Low-Overhead Fault Tolerance Technique
for a Dynamically Reconfigurable Softcore
Processor”, IEEE transactions on computers, vol. 62,
no. 6, june 2013
[2] Bertrand Le Gal and Christophe Jego “Soft core
Processor Optimization According to Real-
Application Requirements” IEEE embedded systems
letters vol. 5, no.1,pp 4-7. march 2013
[3] P. Yiannacouras, J. G. Steffan, and J. Rose,
“Exploration and customization of FPGA-based soft
processors,” IEEE Trans. Comput.-Aided Design
Integr. Circuits Syst.,vol. 26, no. 2, pp. 266–277,
Feb.2007.
[4] DirettoredellaScuolaand Andrea Manuzzato“SINGLE
EVENT EFFECTS ON FPGAs”, Vol 3, december
2006

[5] Jonathan Johnson, William Howes, and Michael


Wirthlin “Using Duplication with Compare for On-
line ErrorDetection in FPGA-based
Designs”December 2007
[6] VatsyaTiwari , Pratap Singh Patwal “ Design and
analysis of software fault-tolerant techniques for
softcore processors in reliable sram-based
fpga“pratap Singh Patwal , Int. J. Comp. Tech. Appl.,
Vol 2 (6), 1812-1819 June 2006
[7] Xilinx “Error detection and correction Techniques for
Virtex FPGAs” February 2005.
[8] PicoBlaze 8-bit Embedded Microcontroller User
Guide for Extended Spartan®-3 and Virtex®-5
FPGAs Introducing PicoBlaze for Spartan-6, Virtex-6
and 7 Series FPGAs, UG129 June 22, 2011
[9] Ken Chapman PicoBlaze 8-Bit Microcontroller for
Virtex-E and Spartan-II/IIE Devices (v2.1) February
4, 2003.

ESRSA Publication © 2012 http://www.ijert.org


ESRSA Publication © 2012 http://www.ijert.org

You might also like