You are on page 1of 10

Author Name et. al.

/ International Journal of Engineering Research and Technology


Vol. 1 (02), 2012, ISSN 2278 - 0181

DESIGN OF AN OPTIMIZED CORDIC FOR FIXED ANGLE OF ROTATION

Abstract 1. Introduction
The Coordinate Rotation DIgital Computer
The CORDIC algorithm is a technique for
(CORDIC) algorithm has been used for many years for
efficiently implementing the trigonometric
efficient implementation of vector rotation operations in
functions and hyperbolic functions.Vector
rotation through fixed and known angles has hardware. CORDIC or Coordinate Rotation Digital
wide applications in various areas. But there is Computer is a simple and hardware-efficient algorithm
no optimized coordinate rotation digital for the implementation of various elementary, especially
computer (CORDIC) design for vector-rotation trigonometric functions. Instead of using Calculus based
through specific angles. FPGA based CORDIC methods such as polynomial or rational functional
design for fixed angle of rotation provides approximation, it uses simple shift, add, subtract and
optimization schemes and CORDIC circuits for table look-up operations to achieve this objective. By
fixed and known rotations with different levels making slight adjustments to the initial conditions and
of accuracy. For reducing the area and time the LUT values, it can be used to efficiently implement
complexities, a hardwired pre-shifting scheme Trigonometric, Hyperbolic, Exponential functions,
in barrel-shifters of the CORDIC circuits is Coordinate Transformations etc. using the same
used. Pipelined schemes are suggested further hardware. Since it uses only shift-add arithmetic, VLSI
for cascading dedicated single-rotation units implementation of such an algorithm is easily
and bi-rotation CORDIC units for high- achievable.
throughput and reduced latency
implementations. The fixed-point mean- The CORDIC algorithm was first proposed by
squared-error of the CORDIC circuit is Jack E Volder in 1959 [9], [10]. It is an iterative
analyzed and reduced. This design offers higher technique and consists of two modes of operation called
throughput, less latency and less area-delay rotation mode and vectoring mode. In either mode, the
product. The fixed-angle CORDIC rotation algorithm is rotation of an angle vector by a definite
would have wide applications in signal angle but in variable directions. This fixed rotation in
processing, games, animation, graphics and variable direction is implemented through an iterative
robotics. The coding is done in Verilog language, sequence of addition/subtraction followed by bit-shift
synthesized using Xilinx ISE 14.1 and simulated operation. The final result is obtained by appropriately
using ISim. scaling the result obtained after successive iterations. All
of the trigonometric functions can be computed or
Keywords— Coordinate rotation digital computer derived from functions using vector rotation. CORDIC
(CORDIC), digital signal processing (DSP) chip, VLSI. works by rotating the coordinate system through

ESRSA Publication © 2012 http://www.ijert.org


Author Name et. al. / International Journal of Engineering Research and Technology
Vol. 1 (02), 2012, ISSN 2278 - 0181

constant angles until the angle is reduces to zero. 2. Shift-add operations for corresponding scaling
CORDIC algorithm is very attractive for hardware circuits.
implementation because less power dissipation,
3. A hardware pre-shifting scheme is suggested for
compactable & simple in design.
reduction of barrel-shifter complexity.
Less power dissipation and compact because it
4. A cascaded pipelined circuit in which more than one
uses only elementary shift-and-add operations to
CORDIC module in separate stages in a cascade. The
perform the vector rotation so it only needs the use of 2
cascade of CORDIC modules is faster and involves less
shifter and 3 adder modules. Simplicity is due to
area-delay complexity.
elimination of multiplication with only addition,
subtraction and bit-shifting operation. Either if there is 5. An efficient CORDIC circuit using a pair of micro-
an absence of a hardware multiplier (e.g. μC, μP) or rotations, and named as Bi rotation CORDIC.
there is a necessity to optimize the number of logic gates
(e.g. FPGA) CORDIC is the preferred choice. However, 3. System architecture
the major disadvantage of the CORDIC algorithm are 3.1 Reference Cordic Circuit
large number of iterations required for accurate results
and thus the speed is low and time delay is high, power Reference CORDIC circuit [1] for vector
consumption is high in some architecture types rotation with initial vector (X, Y) rotate through an angle
,whenever a hardware multiplier is available, e.g. in a θ to obtain final vector (X’, Y’) through successive
DSP microprocessor iterations is shown below. The rotation-mode CORDIC
Rotation of vectors through a fixed and known algorithm to rotate a vector U= [Ux Uy] through an angle
angle has wide applications in robotics, graphics, games, θ to obtain a rotated vector V= [Vx Vy] is given by
and animation [4] [6]. Locomotion of robots is very
often performed by successive rotations through small (Ux)i+1= (Ux) i – (Uy )i 2-i ……3.1
fixed angles and translations of the links. The translation
(Uy)i+1= (Uy) i + (Ux )i 2-i ……3.2
operations are realized by simple additions of coordinate
values while the new coordinates of a rotational step Φi+1 = Φi - tan-1(2-i) ……3.3
could be accomplished by suitable successive rotations
through a small fixed angle which could be performed such that when n is sufficiently large
by a CORDIC circuit for fixed rotation. Similarly,
interpolation of orientations between key-frames in
computer graphics and animation could be performed by
T ……3.4
fixed CORDIC rotations. There are plenty of examples
of uniform rotation starting from electrons inside an
atom to the planets and satellites. A simple example of where = -1 if Φi <0 and σi = 1 otherwise, and K is
uniform rotations is the hands of an animated the scale-factor of the CORDIC algorithm, given by,
mechanical clock which perform one degree rotation
each time. There are several cases where high-speed
K= ……3.5
constant rotation is required in games, graphic, and
animation. The objects with constant rotations are very
A reference CORDIC circuit for fixed rotations
often used in simulation, modeling, games, and
according to eqns 3.1 and 3.2 is shown in figure 3.1.
animation. The contributions of this paper are as follows,
1. Algorithm for optimization schemes for reducing the
number of micro-rotations and for reducing the
complexity of barrel-shifters for fixed-angle vector-
rotation.

ESRSA Publication © 2012 http://www.ijert.org


Author Name et. al. / International Journal of Engineering Research and Technology
Vol. 1 (02), 2012, ISSN 2278 - 0181

An adder/subtractor is a circuit that is


capable of adding or subtracting binary numbers. Here
addition or subtraction occurs according to the sign bits
stored in the sign bit register (SBR).
Sign bit register (SBR)
In case of fixed rotation, αi could be pre-
computed and the sign-bits corresponding to could
be stored in a sign-bit register (SBR) in CORDIC circuit.
The CORDIC circuit therefore need not compute the
Fig 3.1 Reference CORDIC circuit for fixed remaining angle αi during the CORDIC iterations.
rotations
3.2 Optimization of Cordic Circuit
X0 and Y0 are fed as set/reset input to the pair of input Optimization schemes [1] are for reducing the
registers and the successive feedback values Xi and Yi at number of micro-rotations and for reducing the
the iteration are fed in parallel to the input registers. Note complexity of barrel-shifters for fixed-angle vector-
that conventionally we feed the pair of input registers rotation. Optimization is done for both micro rotations
with the initial values X0 and Y0 as well as the feedback and scaling with the help of algorithm.
values Xi and Yi through a pair of multiplexers. The steps involved in optimizing the
Registers Elementary angle set of Micro-rotations are as follows:
A register is one of a small set of data ▪ εΦ, maximum tolerable error between desired
holding places that are part of a computer processor. angle and approximated angle is given as an
Here two registers are used to store the values of X and input.
Y. A register may hold a computer instruction, a storage ▪ Optimization algorithm searches starts with
address, or any kind of data (such as a bit sequence or single micro rotation
individual characters). ▪ If the micro-rotation that has smaller angle of
Barrel shifters deviation than εΦ cannot be found, the number
of micro-rotations is increased by one.
A barrel shifter is a digital circuit that can ▪ Optimization algorithm is run again.
shift a data word by a specified number of bits in one ▪ Based on the obtained micro-rotations, the
clock cycle. It is simply a bit-rotating shift register. The parameters for scaling operation can be
bits shifted out the MSB end of the register are shifted searched with the different objective function.
back into the LSB end of the register. In a barrel shifter, The steps involved in optimizing the scaling process are
the bits are shifted the desired number of bit positions in as follows:
a single clock cycle. For example, an eight-bit barrel ▪ εk, maximum tolerable error between desired
shifter could shift the data by three positions in a single angle and approximated angle is given as an
clock cycle. If the original data was 11110000, one clock input.
cycle later the result will be 10000111. Functionally, ▪ Optimization algorithm searches starts with
since any bit can end up in any bit position, multiplexers single term of scaling.
are used to place the bits correctly for proper storage. ▪ The number of scaling terms is increased by
Thus, a barrel shifter is implemented by feeding an N- one until ∆k is smaller than the given maximum
bit data word into N, N-bit-wide multiplexers. For e.g., deviation εk, where ∆k is deviation of ratio of
an eight-bit barrel shifter is built out of eight flip-flops approximated scale factor to actual scale factor
and eight 8-to-1 multiplexers; a 32-bit barrel shifter from 1.
requires 32 registers and thirty-two, 32-to-1 ▪ Optimization algorithm is run again.
multiplexers. For rotation of a vector through a known
Adder / Subtractor and fixed angle of rotation using a rotation-mode

ESRSA Publication © 2012 http://www.ijert.org


Author Name et. al. / International Journal of Engineering Research and Technology
Vol. 1 (02), 2012, ISSN 2278 - 0181

CORDIC circuit, a set of a small number of The major contributors to the hardware-complexity in
predetermined elementary angles { , for 0 i m- the implementation of a CORDIC circuit are the barrel-
shifters and the adders.
1}, where = arctan (2-k(i)) is the elementary angle to The optimization of CORDIC circuit is shown in figure
be used for the ith micro-rotation in the CORDIC 3.2.
algorithm and m is the minimum necessary number of
The components are
micro-rotations. Meanwhile, it is well known that the
rotation through any angle, 0 < θ 2π can be mapped i. Registers
ii. Barrel shifter
into a positive rotation through 0 < θ π/4 without any
iii. Adder/ Subtractor
extra arithmetic operations. Hence, as a first step of iv. Sign Bit Register
optimization, we perform the rotation mapping so that v. ROM Module
the rotation angle lies in the range of 0 < θ π/4. In the
next step, we minimize the number of elementary angles
Read only memory (ROM)
in the set { } according to the accuracy requirements.
The rotation mode CORDIC algorithm of eqns 3.1, 3.2 ROM is a memory that is capable of
and 3.3 therefore, can be modified accordingly to have holding data and having that data read from the chip, but
not written to. It is non-volatile which means it keeps its
= contents regardless if it has power or not. Here ROM
contains the control-bits for the number of shifts
corresponding to the micro-rotations to be implemented
……3.6
by the barrel-shifter and the directions of micro-rotations
are stored in the sign-bit register (SBR).
The set of micro rotations is optimized according to the
algorithm which is described in chapter 2. Since the 3.3 Hardwired Pre-Shifting Scheme

elementary angles and direction of micro-rotations are The barrel-shifter used in the optimization
CORDIC circuit increases the complexity of the circuit.
predetermined for the given angle of rotation, the angle
In order to avoid the complexity of barrel-shifter, a
estimation data-path is not required in the CORDIC hardwired pre-shifting scheme [1] is used. A barrel-
circuit for fixed and known rotations. Moreover, because shifter for maximum of S shifts for word-length L is
implemented by [log2(S+1)] stages of demultiplexors,
only a few elementary angles are involved in this case,
where each stage requires L number of 1:2 line MUXes.
the corresponding control- bits could be stored in a ROM The hardware-complexity of barrel-shifter, therefore,
of few words. increases linearly with the word-length and
logarithmically with the maximum number of shifts. We
can reduce the effective word-length in the MUXes of
the barrel-shifters, and so also the number of stages of
MUXes by simple hardwired pre-shifting.
The time involved in a barrel-shifter could
also be reduced by hardwired pre-shifting, since the
delay of the barrel-shifter is proportional to the number
of stages of MUXes, and it also possible to reduce the
Figure 3.2 Optimization of CORDIC circuit number stages by hardwired pre-shifting. The hardwired
pre-shifting scheme is shown in figure.3.3.

ESRSA Publication © 2012 http://www.ijert.org


Author Name et. al. / International Journal of Engineering Research and Technology
Vol. 1 (02), 2012, ISSN 2278 - 0181

The rotation-module in this case does not


require input from SBR since each adder module always
performs either addition or subtraction.

Figure 3.3 Hardwired pre-shifting circuit


If l is the minimum number of shifts in the set of selected
micro-rotations, we can load only the (L-l) more-
significant bits (MSBs) of an input word from the
registers to the barrel-shifters, since the l less significant
bits (LSBs) would get truncated during shifting. The
barrel-shifter, therefore, needs to implement a maximum
of(s- l) shifts only, where s is the maximum number of
shifts in the set of selected micro-rotations. The output
of the barrel-shifters are loaded as the (L-l) LSBs to the Figure 3.4 Multi-stage single-rotation cascaded
add/subtract units, and the l MSBs of the corresponding cordic circuit
operand of add/subtract unit are hardwired to 0.
Therefore, the hardware-complexity of a barrel-shifter
3.5 SCALING OPTIMIZATION
could be reduced by the hardwired pre-shifting approach
The generalized expression for the scale-
factor given by eqn 3.5 can be expressed explicitly for
3.4 Multi-Stage Single-Rotation Cascaded Cordic
the selected set of m1 micro rotations as
Circuit
Multi-stage cascaded CORDIC circuit [1] K= ……3.7
connects n number of single rotations in cascaded form.
Here the initial vector values are provided to the first where k(i) for 0 m1 is the number of shifts in the
rotation module, whose output is provided as the input ith micro-rotation
to the second rotation module and this goes on till the
An approximate scale-factor is the product of shift-add
nth rotation module finally to obtain the final vector X n
terms of form
and Yn. The Multi-stage single-rotation cascaded cordic
circuit is shown in figure.3.4. KA= ……3.8
Each stage of the cascaded design consists
of a dedicated rotation-module that performs a specific where s(i) is the number of shifts performed for the ith
micro-rotation. Each rotation-module consists of a pair iteration of scaling, = 1, and m2 is maximum
of adders or subtractors (depending on the direction of number of scaling iterations required for the
micro-rotation which it is required to implement). Each approximation. The scaling circuit using hardwired pre-
of the adders/subtractors loads one of the pair of inputs shifting scheme is shown in figure 3.5. The scaling
directly and loads the other input in a pre-shifted form at circuit based on approximated scaling factor, Ka shown
(L-s(i)) LSB locations, where s(i) is the number of right- in figure 3.5 can use hardwired pre-shifting for
shifts required to be performed to implement the micro-
minimizing barrel-shifter complexity.
rotation. The s(i)MSB locations are hardwired to be
zero.

ESRSA Publication © 2012 http://www.ijert.org


Author Name et. al. / International Journal of Engineering Research and Technology
Vol. 1 (02), 2012, ISSN 2278 - 0181

value is then add/subtract to the input values according


to sign bit values stored in SBR.

Inputs - X = 0001 and Y = 0000


Shifted values- q11 = 0001 and q21 = 0001
Outputs – S1 = 0001 and S2 = 0001

Figure 3.5 Scaling circuit using hardwired pre-


shifting scheme

4. Simulation Results

The design entry is modeled using Verilog in Xilinx ISE


Design Suite 14.1 and the simulation of the design is
performed using ISim from Xilinx ISE to validate the
Fig 4.2 Reference CORDIC circuit for 2nd iteration
functionality of the design. Here CORDIC design for
vector rotation through fixed and known angle is In figure 5.2 the input X and Y is shifted for the iteration
simulated. value corresponding to 26.6 (one shift). The shifted
value is then add/subtract to the input values according
4.1 Reference CORDIC circuit for single iteration
to sign bit values stored in SBR.
Here the initial vector (X0, Y0) which is rotated
Inputs - X = 1000 and Y = 0010
through a series of micro-rotations or iteration say, 1st
Shifted values- q11 = 0100 and q21 = 0001
iteration, 2nd iteration etc to obtain the final vector (Xn,
Outputs – S1 = 1001 and S2 = 0110
Yn).
4.2 Multi-stage single-rotation cascaded CORDIC
circuit

Multi-stage cascaded CORDIC circuit connects


‘n’ number of single rotations in cascaded form Here the
initial vector values are provided to the first rotation
module, whose output is provided as the in

input to the second rotation module and this goes on till


the nth rotation module finally to obtain the final vector
Xn and Yn. (Xn, Yn) is the final vector obtained after

Fig 4.1 Reference CORDIC circuit for 1st iteration

In figure 5.1 input X and Y is shifted for the iteration


value corresponding to 45 (zero shift). The shifted

ESRSA Publication © 2012 http://www.ijert.org


Author Name et. al. / International Journal of Engineering Research and Technology
Vol. 1 (02), 2012, ISSN 2278 - 0181

vector rotation through n iterations from the initial where s is the maximum number of shifts in the set of
vector (X0, Y0). selected micro-rotations. In this (L-l) MSBs are load to
Figure 5.3 Multi-stage single-rotation barrel shifters from the registers and load (l-l) LSBs
cascad
ed
COR
DIC

locations by
keeping l

circuit MSBs as 0 to
adder/subtractor.
In figure 5.3 the input initial values (X, Y) = (1,
Figure 5.4 Hardwired pre-
0) is rotated by a fixed angle taken as 20 . For 20 , there shifting scheme
are 7 number of iterations. The initial vector is rotated
In figure 5.4 the input initial values (X, Y) = (1,
by the given angle to obtain the final vector (Xn, Yn).
0) is rotated by a fixed angle taken as 30 . For 30 , there
Outputs – Xn = 000000000000000011111000 are 9 number of iterations. The initial vector is rotated
by the given angle to obtain the final vector (Xn, Yn).
and Yn = 000000000000000000111110

Outputs – Xn = 000000000000000010011111
Angle – Z0 = 0000000000000110110100010

and Yn = 000000000000000010010111
5.1.3 Hardwired pre-shifting scheme

Angle – Z0 = 000000000001010001110011
The barrel-shifter used in the optimization
CORDIC circuit increases the complexity of the circuit.
In order to avoid the complexity of barrel-shifter, a
hardwired pre-shifting scheme is used. The barrel-shifter
needs to implement a maximum of(s- l) shifts only,

ESRSA Publication © 2012 http://www.ijert.org


Author Name et. al. / International Journal of Engineering Research and Technology
Vol. 1 (02), 2012, ISSN 2278 - 0181

Figure 5: Simulation result of lockstep scheme

Re-computing with shifted operands based on


DWC-CED technique identifies the core with error in
the system. Figure 6 shows the simulation results of the
RESO method to identify the core with error in the
system and to switch the correct core to the output.

Figure 6: Simulation result of RESO method

Each core produces two outputs. One due to


actual input and other due to shifted inputs. The output
from picoblaze core1 due to actual input is signal q7 and
output due to shifted input is shown by signal q6.The
output from picoblaze core2 due to actual input is signal
q14 and output due to shifted input is shown by signal
q13.Now the actual output from both cores is compared
detecting the presence of error defined based on
using comparator indicated by signal out1. The out1 is
a hardware redundancy technique known as Duplication
zero, which indicates the mismatch between the actual
with comparison.
outputs. The output due to shifted inputs is right shifted
faulty one. Now the correct core is switched to by 1 bit, ie, q6 and q13 is right shifted by one bit to obtain
output by using MUX module. the signal q15 and q16 respectively. Now q15 is
compared with q7 and the output of comparator shown
The design entry is modelled using VHDL in Xilinx by signal out2. Out2 is high which indicate the picoblaze
ISE Design Suite 13.2 and the simulation of the design
core1 has no error. Now the signal q16 is compared with
is performed using Isim from Xilinx ISE to validate the q14 and the output of comparator shown by signal out3.
functionality of the design. Out3 is low which indicate the picoblaze core2 has error.
Lockstep scheme which is based on Now core without error, core1 is switched to the output
Duplication with Comparison technique defined at the by using MUX indicated by out4.
processor level is designed using a pair of picoblaze
cores, a comparator and a mux which detects the 5. Conclusions
presence of error in the system. The outputs from
picoblaze core 1 and core 2 are provided as the inputs to SRAM based FPGA’s are sensitive to radiation
the comparator which compares the output of both the induced faults and require protection to work in harsh
cores. The out1 port indicates the output of the environments due to it’s to high logic density in terms of
SRAM memory cells. SRAM based FPGAs are affected
comparator. If it is low, it indicates the mismatch
by radiation induced temporary faults called as single
between the outputs of picoblaze cores. Now once the event upsets (SEUs) or soft errors. That may alter the
core with the error is detected, the core without error is logic states of any static memory elements. The paper is
switched to the output and the core with error is switched intended to design a new architecture for soft error
to fault tolerant configuration engine for error recovery. detection technique which can be incorporated on any
The out4 port indicates the output of the mux module. SRAM based FPGA with integrated Softcore
processors. PicoBlaze is used as the Softcore processor,
which is a compact, capable, and cost-effective fully
Figure 5 shows the simulation result of the embedded8-bit RISC virtual soft core optimized for the
lockstep scheme based on DWC technique to detect the Xilinx FPGA families. Lockstep scheme based on DWC
presence of error in the system. technique at the processor level which is used to detect
the presence of error in the system and RE-computing

ESRSA Publication © 2012 http://www.ijert.org


Author Name et. al. / International Journal of Engineering Research and Technology
Vol. 1 (02), 2012, ISSN 2278 - 0181

with Shifted Operand (RESO) method based on DWC-


CED to detect the core with error are designed and
simulated.

6. References

[1] Hung-Manh Pham, SebastienPillement, Stanisław J.


Piestrak “Low-Overhead Fault Tolerance Technique
for a Dynamically Reconfigurable Softcore
Processor”, IEEE transactions on computers, vol. 62,
no. 6, june 2013
[2] Bertrand Le Gal and Christophe Jego “Soft core
Processor Optimization According to Real-
Application Requirements” IEEE embedded systems
letters vol. 5, no.1,pp 4-7. march 2013
[3] P. Yiannacouras, J. G. Steffan, and J. Rose,
“Exploration and customization of FPGA-based soft
processors,” IEEE Trans. Comput.-Aided Design
Integr. Circuits Syst.,vol. 26, no. 2, pp. 266–277,
Feb.2007.
[4] DirettoredellaScuolaand Andrea Manuzzato“SINGLE
EVENT EFFECTS ON FPGAs”, Vol 3, december
2006

[5] Jonathan Johnson, William Howes, and Michael


Wirthlin “Using Duplication with Compare for On-line
ErrorDetection in FPGA-based Designs”December
2007
[6] VatsyaTiwari , Pratap Singh Patwal “ Design and
analysis of software fault-tolerant techniques for
softcore processors in reliable sram-based fpga“pratap
Singh Patwal , Int. J. Comp. Tech. Appl., Vol 2 (6),
1812-1819 June 2006
[7] Xilinx “Error detection and correction Techniques for
Virtex FPGAs” February 2005.
[8] PicoBlaze 8-bit Embedded Microcontroller User
Guide for Extended Spartan®-3 and Virtex®-5 FPGAs
Introducing PicoBlaze for Spartan-6, Virtex-6 and 7
Series FPGAs, UG129 June 22, 2011
[9] Ken Chapman PicoBlaze 8-Bit Microcontroller for
Virtex-E and Spartan-II/IIE Devices (v2.1) February 4,
2003.

ESRSA Publication © 2012 http://www.ijert.org


Author Name et. al. / International Journal of Engineering Research and Technology
Vol. 1 (02), 2012, ISSN 2278 - 0181

ESRSA Publication © 2012 http://www.ijert.org

You might also like