Fully-Pipeline CORDIC-based FPGA Realization For A 3-DOF Hexapod-Leg Inverse Kinematics Calculation

CONFIDENTIAL. Limited circulation. For review only.
Fully-pipeline CORDIC-based FPGA Realization for a 3-DOF

Hexapod-Leg Inverse Kinematics Calculation
Guillermo Evangelista, Member, IEEE, Carlos Olaya, Member, IEEE, Erick Rodríguez, Member, IEEE
it is necessary to find an efficient method to accelerate


Abstract—This paper presents a CORDIC-based FPGA

realization for a three degree of freedom (3-DOF) hexapod-leg calculation.
inverse kinematics calculation. This architecture design
proposal is approached first by a 3-DOF hexapod-leg inverse TABLE I. HARDWARE SPECIFICATIONS OF HEXAPOD ROBOTS
kinematics equations analysis and how are these adapted to
Robot LAURON V CRIXUS - -
design an architecture scheme based on CORDIC operations.
After that, a 3-DOF hexapod-leg working area is analyzed to get Year 2014 2014 2015 2015
the CORDIC convergence requirements. Consequently, an
Author A. Roennau G. Evangelista M. Zak A. Cully
iterative, high-accuracy, 32-bit floating point CORDIC entity
was designed. This design achieved the convergence and ARM Cortex-
Raspberry Intel Xeon
Processor Intel Core i7 M3 NXP
accuracy requirements. Finally, a comparison of the results LPC1768
Pi E5-260
obtained by the proposal is given and the result of the kinematic
Speed 3 GHz 96 MHz 1.2 GHz 2 GHz
calculations in software are obtained, including the angles
equations illustrating the precision, hardware requirements and Cores 4 1 4 8
processing speed. Motor
Control 9 1 1 -
Keywords— CORDIC, FPGA, kinematics calculation, hexapod- Units
leg. Power (W) 100-150 60 - -
I. INTRODUCTION
Table I shows an hexapod resources comparison, as main
Legged robots have many characteristics desirable for
terrestrial and space applications, including omnidirectional characteristic, the central processor which is responsible for
motion, variable geometry, discrete contact points, access to solving the routines required by the robot: control, sensors
diverse terrain and unique modes of locomotion [1]. Hexapod readings, trajectories generation and communication with
robots are kind of legged walking robots that are slave units, among others.
programmable and some cases provided with autonomy. The Focusing on the processor, it can be seen that they coincide
six legs move within workspaces in order to achieve in being sequential and depending on the complexity of the
translational and rotational displacements. control they choose to increase the speed of the processor,
number of cores and associated actuator controllers. The
Hexapod walking robots also benefit from a lower impact sequential processing speed requires a lot of effort because the
on the terrain and have greater mobility in natural performance is physically limited by the architecture. In
surroundings. This is especially important in dangerous contrast, the parallelism in computer systems is presented as a
environments like mine fields, or where it is essential to keep solution. For example, adding parallel processing units can
the terrain largely undisturbed for scientific reasons [2]. The increase the speed of complex calculations. In robotics,
development of hexapod robots, is an area of interest due to parallel processing at the kinematic level has two advantages.
the versatility of their displacement in contrast to the vehicles The first, the hardware architecture of the controller reflects
on wheels in irregular terrains or to climb obstacles of complex the robot hardware architecture, this makes the system easier
geometry. to develop and debug. Second, these schemes are statically
Some of their current disadvantages include higher extensible and algorithmically scalable, e.g. another robotic
complexity and cost, low energy efficiency, and relatively low joint can be added to a parallel control unit [4].
speed [3]. In fact, hexapods are complex machines that An efficient and low-cost applications are being developed
consisting of mechanisms, actuators, sensors and support in parallel architectures using CORDIC algorithms, these
hardware. The low speed refers to how quick kinematic and algorithms generate trigonometric, logarithmic and
locomotion are calculated, partly because the kinematic transcendental functions, e.g. forward and inverse calculation
equations of a hexapod leg involve a large number of for robot manipulators [5]. For this reason, this paper is limited
trigonometric functions and this is performed in real time; n a to the study and design of an embedded architecture to
conventional approach, these are implemented in a processor calculate the inverse kinematics of a hexapod-leg 3-DOF
using Taylor series, look up tables or other; this and sequential based on the CORDIC algorithm.
controllers performance added are not an effective solution, so
The authors are with the School of Electronic Engineering, Antenor

Orrego Private University, Trujillo 13008 PERU (phone: 51-913030332; e-
mail: gevangelistaa@upao.edu.pe, colayar@upao.edu.pe,
erodriguezd@upao.edu.pe).
Preprint submitted to WRC Symposium on Advanced Robotics and Automation 2018.

Received June 10, 2018.
II. LEG STRUCTURE III. ARCHITECTURE MODELING

A. Inverse Kinematics A. Fully Pipeline Architecture Model
Based on Denavit-Hartenberg [6], must first be assigned Based on the inverse kinematics model, a CORDIC-based
the reference systems on Figure 1, the hexapod-leg has three fully pipeline architecture is proposed in Figure 3. This
rotation angles 1,2,3 . proposal is composed by 9 CORDIC modules, a finite state
machine (FSM), 2 multipliers and 3 adders/subtractors. This
six-stage proposal is fixed in order to achieve high
performance and throughput.
Figure 1. Reference systems assigned on hexapod-leg.
The equations (1), (2) and (3) provide the joints values
according to the cartesian position of the end effector and the
links length: l1 = 0.0275m, l2 = 0.0963m, l3 = 0.1051m [7]. The
eq. (4), (5), (6), (7), (8), (9), (10) and (11) are variables for a
development by segments of the architecture.
𝜃1 = atan (𝑦𝑒 /𝑥𝑒 ) (1)
𝜃2 = 𝑎𝑡𝑎𝑛 (𝐺 ⁄√1 − 𝐺 2 ) − atan(sin 𝜃3 ⁄(𝐹 + D)) (2)
𝜃3 = 𝑎𝑡𝑎𝑛 (√1 − 𝐷2 ⁄𝐷 ) (3)
𝑟 = √𝑥𝑒 2 + 𝑦𝑒 2 (4)
𝐴 = 2𝑙1 𝑟 (5)
𝐵 = √(𝑟 − 𝑙1 )2 + 𝑧𝑒 2 (6)
𝐶 = 𝑟 2 + 𝑧𝑒 2 + 𝑙1 2 − 𝑙2 2 − 𝑙3 2 − 𝐴 (7) Figure 3. Fully-pipeline CORDIC-based architecture.
𝐷 = 𝐶 𝐶𝑎 = cos 𝜃3 (8)
𝐺 = 𝑧𝑒 ⁄𝐵 (9) Four types of CORDIC operators are considered: circular
𝐶𝑎 = 1⁄2𝑙2 𝑙3 (10) rotational, circular vectorial, hyperbolic vectorial and linear
𝐹 = 𝑙2 ⁄𝑙3 (11) vectorial [9], whose functions and symbology are represented
in Table 2. In addition, both circular rotational and circular
B. End-effector Trajectory
vectorial can be implemented in the same CORDIC module by
The surface of working area for end-effector with a crab changing the operation mode.
angle 𝛼 ∈ [0,90]° for a quadruped walk [8] is showed at
Figure 2, this is necessary because it determinates the TABLE II. CORDIC OPERATION REFERENCE
convergence range of CORDIC.
Circular Rotational
𝑥 𝐾(𝑥 cos 𝑧 − 𝑦 sin 𝑧)

𝑦 CR 𝐾(𝑦 cos 𝑧 + 𝑥 sin 𝑧)
𝑧 CC 100
0
Circular Vectorial
𝑥 𝐾√𝑥 2 + 𝑦 2
𝑦 CV 0
𝑧 CC 𝑧 + tan−1 (𝑦⁄𝑥 )
100
Vectorial Hyperbolic
𝑥 𝐾′√𝑥 2 − 𝑦 2
𝑦 VH 0
𝑧 CC 𝑧 + tanh−1 (𝑦⁄𝑥 )
100Lineal
Vectorial
𝑥 𝑥
Figure 2. Working area for hexapod-leg end-effector. 𝑦 VL 0
𝑧 CC 𝑧 + 𝑦⁄𝑥
100

B. Architecture Finite State Machine The Figure 6 shows that Hyperbolic C8 and Linear C7 are
In order to calculate the value of 𝑟 and 𝐷 which are not far enough to reach convergence limit to be expanded.
CORDIC-dependent in (4) and (8), a FSM was designed at Although Hyperbolic C2 is slightly close of its range of
stage two by using 2 multipliers and one adder as is shown in convergence limit, it’s for this reason that the Hyperbolic C2
the Figure 4. This FSM obtains these values in three clock is expanded.
cycles and set outputs to the pipeline register.
Figure 4. Finite state machine architecture.
IV. CORDIC DESIGN

A. Convergence Range Analysis
The end-effector can occupy any point within a working
area (Figure 2), hence it is mandatory to determine minimum
convergence requirements. Since CORDIC algorithm has a
well-defined convergence range (Table III), each operator in
architecture in Figure 3 is analyzed based on input parameters
in order to satisfy this convergence for each gait step. Figure 6. Linear and Hyperbolic Range of Convergence Analysis.
TABLE III. EQUATIONS FOR OPERATOR’S EXPANSION B. Hyperbolic Convergence Range Expansion
Method
Convergence Range In order to satisfy minimum working area of Figure 2, the
Rotational Vectorial hyperbolic cordic algorithm basic convergence range is
|𝑧𝑖𝑛 |
Circular |atan2(𝑦𝑖𝑛 ⁄𝑥𝑖𝑛 )| ≤ 1.7433
≤ 1.7433 (99.9°)
expanded.
Linear |𝑦𝑖𝑛 ⁄𝑥𝑖𝑛 | ≤ 1 |𝑧𝑖𝑛 | ≤ 1
For 𝑖 ≤ 0:
Hyperbolic |tanh−1(𝑦𝑖𝑛 ⁄𝑥𝑖𝑛 )| ≤ 𝜃𝑚𝑎𝑥 ≈ 1.1182 |𝑧𝑖𝑛 | ≤ 1.1182 𝑥𝑖+1 1 −𝛿𝑖 (1 − 2𝑖−2 ) 𝑥𝑖
[𝑦 ] = [ 𝑖−2
] [𝑦 ] (12)
𝑖+1 𝛿𝑖 (1 − 2 ) 1 𝑖
At Figure 5, Circular C3 gaits calculations exceeds the 𝑧𝑖+1 = 𝑧𝑖 + 𝛿𝑖 atan−1 (1 − 2𝑖−2 )
basic range of convergence, consequently an expansion is
needed. This is achieved by adding two negative index terms For 𝑖 > 0:
in the calculation, it is enough to enlarge 𝜃𝑚𝑎𝑥 and satisfy 𝑥𝑖+1 1 −𝛿𝑖 2−𝑖 𝑥𝑖
convergence requirements. [𝑦 ] = [ −𝑖 ] [𝑦 ] (13)
𝑖+1 𝛿𝑖 2 1 𝑖
𝑧𝑖+1 = 𝑧𝑖 + 𝛿𝑖 atan−1 (1 − 2−𝑖 )
Since only vectoring mode is used, there is not 𝑧𝑖+1

calculation involved. Moreover, avoiding non-positive
indexes by making 0 ≤ 𝑖 < 𝑀 + 𝑁, being M the non-positive
iteration and N the positive iteration amount. Hence, (12) and
(13) can be rewritten as follows:
For 𝑖 ≤ 𝑀:
𝑥𝑖+1 1 −𝛿𝑖 (1 − 2𝑖−𝑀−2 ) 𝑥𝑖
[𝑦 ] = [ ] [𝑦 ] (14)
𝑖+1 −𝛿𝑖 (1 − 2𝑖−𝑀−2 ) 1 𝑖
For 𝑖 > 𝑀:
𝑥𝑖+1 1 −𝛿𝑖 2𝑀−𝑖 𝑥𝑖
[𝑦 ] = [ ] [𝑦 ] (15)
𝑖+1 −𝛿𝑖 2𝑀−𝑖 1 𝑖
Figure 5. Circular C3 CV Range of Convergence Analysis.

The implementation of bit-parallel iterative architecture is convergence 𝑎𝑡𝑎𝑛 parameters starting from third
shown at Figure 7 position in memory. Since pipeline architecture forces all
CORDIC modules start with the pipeline clock, all
instantiations are synchronized. Thus, this allow all
Circular CORDIC modules shares the same look-up
ROM memory to reduce hardware cost.
ADDR_A [4: 0] Dout_A [31: 0]

100
ADDR_B [4: 0]
IP CORE
100 [31: 0]
Dout_B
100 ROM 100
𝑎𝑡𝑎𝑛
ENA
100
Figure 10. RTL level for IP CORE ROM 𝑎𝑡𝑎𝑛. Total port read latency
from rising edge of read clock: 1 clock cycle.
 Linear cordic: This is shown at Figure 11 (basic

convergence) and requires according Figure 12 , 16
iterations/cycles for processing time.
Xin [31: 0] Xout [31: 0]

Figure 7. Bit-parallel iterative architecture. 100
Yin [31: 0] 100
100 LINEAR
C. CORDIC Design
For CORDIC modules implementation, three IEEE 754
floating point CORDIC bit-parallel architecture modules Start Data Ready
were implemented, since only vectoring mode is used for both 100 100
Figure 11. RTL level for Linear CORDIC.
Linear and Hyperbolic CORDIC modules, one ROM memory
is instantiated for Circular CORDIC look-up table:
 Circular cordic: This is shown at Figure 8 (basic

convergence) and requires according Figure 9, 16
iterations/cycles for processing time and 18 cycles for Figure 12. Test bench for Linear CORDIC.
expanded processing time.
 Hyperbolic cordic: This is shown at Figure 13 (expanded
convergence) and requires according Figure 14, 18 iterations
Xin[31: 0] Xout[31: 0]
[−1: 16] and 20 cycles for processing time.
100
Yin[31: 0] 100 0]
Yout[31:
100 0]
Zin[31: 100
Zout[31: 0]
100 0]
ROM Out[31: CIRCULAR 100 Xin [31: 0] Xout [31: 0]
100 100 100
Yin [31: 0] 100
Mode Data ready 100 HYPERBOLIC
100
Start 100 Address[4: 0]
ROM
100 100
Figure 8. RTL level for Circular CORDIC. Start Data Ready
100 100
Figure 13. RTL level for Hyperbolic CORDIC.
Figure 14. Test bench level for Hyperbolic CORDIC.
Figure 9. Test bench for Circular CORDIC.

V. FULLY-PIPELINE DESIGN IMPLEMENTATION
 Look-up table sharing: For storing Circular CORDIC A. Module Design
𝑎𝑡𝑎𝑛 parameters, a dual port Xilinx ROM Memory IP Proposal pipeline architecture module instantiation
Core resource is used (Figure 10). This LUT module presented at Figure 15 is required at least to perform Inverse
store both Circular basic and Circular expanded range of Kinematics calculation. Therefore, some modules are

implemented to generate a VLSI architecture which is C. Timing Analysis

expanded further. This module outs joints values in Floating Another consideration taken is the clock generation.
Point IEEE 754 format, this can be feedback to processor unit Regarding proposal architecture presented at Figure 15, two
or be decode in a next stage in order to get output signals for timing approach are taken. The first one is the clock period
robot actuators. which is based on synthesis tool analysis which considers
minimum time for input setup signals and maximum path from
B. VLSI Architecture input to output. The timing summary from ISE Design is
The present is intended to be part of a SOC architecture. shown at Figure 16:
Thus, some considerations are taken. First, inverse kinematics
input parameters should be provided by an asynchronous Timing Summary:
Trajectory Generator module, this can or not be on the same ---------------
clock domain causing sync issues due to different data rates. Speed Grade: -3
Mostly by using FIFO memories is a good approach to solve
this issue where one limitant is the FIFO size. Thus, three Minimum period: 33.586ns (Maximum Frequency: 29.774MHz)
single clock domain FIFOs are implemented as is shown in Minimum input arrival time before clock: 1.406ns
Figure 15, assuming that Trajectory and Inverse Kinematics Maximum output required time after clock: 1.063ns
modules are under the same clock domain and making FIFOs Maximum combinational path delay: No path found
larger enough to store Inverse Kinematics input parameters for Figure 16. Timing Summary from ISE Design.
all hexapod legs. A FSM is in charge of control signaling for
data write and read in FIFOs which is explained further. This On the other hand, since the period of pipeline clock should
way, the writing of the kinematics parameters in the be large enough to provide sufficient time for slower stage
architecture is archived synchronously. signals to traverse through, Pipeline Architecture frequency is
limited by the slowest stage. As is presented in CORDIC
Design section, Circular C3 and Hyperbolic C2 modules are
the bottleneck in the design, this is due these modules has the
larger processing time of 20 clock cycles.
In consequence, pipeline clock signal must be fit in order

to get the minimum period for pipeline stages and handle FIFO
read signaling. Then pipeline period is calculated in base on
pipeline bottleneck period, pipeline registers period and FIFO
read period. Thus, 21 clock cycles period is used as pipeline
clock.
Figure 17. Test bench for timing.
In the Figure 17 shows the clock signaling described as

follows:
 Read_fifo: Sends the clock signal to FIFO modules to read
the next Inverse Parameters input data.
 Stage_clock: Pipeline register clock signal.
 Read_clock: Clock signal to read FIFO output and register
into the pipeline input first stage.
 Start_clock: Cordic module start signal clock.
VI. RESULTS
The Table 4 shows error in Inverse Kinematics calculation.
The main percentage error presented in 𝜃2 is greater than 5%,
this is because 𝜃2 takes values close to zero at determined gait
iterations, even though this error is high, the deviation is close
to zero. Thus, the accuracy in Inverse Kinematics calculation
Figure 15. VLSI Architecture for Trajectories and IK. is acceptable with negligible error and deviation presented less
than zero.

TABLE IV. IK ANGLES ERROR CALCULATION [2] M. Cigola, A. Pelliccio, O. Salotto, G. Carbone, E. Ottaviano, M.
Ceccarelli, “Application of robots for inspection and restoration of
Type 𝜽𝟏 𝜽𝟐 𝜽𝟑
historical sites”. In Proceedings of the International Symposium on
Percent Error (%) 0.004888 5.272004 0.001002 Automation and Robotics in Construction of the Conference, Ferrara,
Italy, 11–14 September 2005, p. 37.
Deviation (Rads) 2.6451x10 −7
3.9705x10 −7
2.6451𝑥10−7
[3] P. Gregorio, M. Ahmadi, M. Buehler. “Design, Control and Energetics
of an Electrically Actuated Legged Robot”. IEEE Trans. Systems, Man
A good approach to measure effective error in robot and Cybernetics, vol. 27, pp. 626–634, Aug. 1997.
motion is by using direct kinematics. Table 5 shows a Direct [4] D. Henrich, T. Höniger, “Parallel Processing Approaches in Robotics”,
IEEE International Symposium on Industrial Electronics, Guimarães,
Kinematics parameters mean percentage error based on each Portugal, 7-11 July, 1997, pp. 2-6.
hexapod robot gait. The architecture calculation gets great [5] M. Arora, R. Chauhan and L. Bagga, “FPGA Prototyping of Hardware
accuracy on both Tripod and Pentapod gaits, on the other hand, Implementation of CORDIC Algorithm”, International Journal of
gets slightly greater error in Quadruped gaits with a maximum Scientific & Engineering Research, vol. 3, Issue 1, January 2012, p1.
of 2.43%, this represents a deviation about 7 mm. [6] J. Denavit and R.S. Hartenberg, “A kinematic notation for lower-pair
mechanisms based on matrices”, ASME Journal of Applied Mechanics,
TABLE V. EQUATIONS FOR OPERATOR’S EXPANSION Vol. 77, 1955.
[7] G. Evangelista, “Design and Modeling of a Mobile Research Platform
𝑷𝒆𝒓𝒄𝒆𝒏𝒕 𝑬𝒓𝒓𝒐𝒓 (%) based on Hexapod Robot with Embedded System and Interactive
Gait
𝒙 𝒚 𝒛 Control”, 23rd International Conference on Methods and Models in
Tripod 8.06451x10 −4
4.62962x10 −4
14.7633x10−4 Automation and Robotics, Międzyzdroje, Poland, 27-30 August, pp.
294-299.
Quadruped 0.583768 0.583333 0.80507 [8] G. Evangelista, D. Lázaro, “Translational Motion Analysis of a
Quadruped Hexapod Walking Robot”, IEEE International Engineering Summit,
4+2
0.583768 0.583333 0.80507 Coatzacoalcos, Mexico, 29-31 October, pp. 167-172.
Pentapod 2.98408x10 −3
2.20439x10 −3
3.41625x10−3 [9] X.Hu, R. Harber and S. Bass, “Expanding the Range of Convergence
of the CORDIC Algorithm”, IEEE Transactions on Computers, vol. 40,
no. 1, pp. 2-6, January 1991.
Figure 18 shows hexapod robot gaits accuracy, even
although was mentioned previously that Quadruped gaits gets
greater error, this actually represents a maximum deviation
about 7 mm in a real gait. This can be overlapped by actuator
precision and may be improved in further works by
approximation methods.
Figure 18. Gate Accuracy.
APPENDIX
Programs, simulations and algorithms:
https://drive.google.com/open?id=1u2vgMB8w0MLEd5N
nFU24oTn9zw3128cL
ACKNOWLEDGMENT
The authors are grateful to the professional school of
electronic engineering. Over time we have been able to create
this line of research in polyarticulated walking robots, as well
as to encourage research.
REFERENCES
[1] D. Chávez, “Gait Optimization for Multi-Legged Walking Robots, with
Application to a Lunar Hexapod”, Ph.D. dissertation, Dept. Aeronautics
and Astronautics, Stanford University, USA, 2011, p. 6.


Fully-Pipeline CORDIC-based FPGA Realization For A 3-DOF Hexapod-Leg Inverse Kinematics Calculation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fully-Pipeline CORDIC-based FPGA Realization For A 3-DOF Hexapod-Leg Inverse Kinematics Calculation

Uploaded by

Copyright:

Available Formats

CONFIDENTIAL. Limited circulation. For review only.

Fully-pipeline CORDIC-based FPGA Realization for a 3-DOF

it is necessary to find an efficient method to accelerate

Abstract—This paper presents a CORDIC-based FPGA

The authors are with the School of Electronic Engineering, Antenor

Preprint submitted to WRC Symposium on Advanced Robotics and Automation 2018.

II. LEG STRUCTURE III. ARCHITECTURE MODELING

Figure 1. Reference systems assigned on hexapod-leg.

𝑥 𝐾(𝑥 cos 𝑧 − 𝑦 sin 𝑧)

Preprint submitted to WRC Symposium on Advanced Robotics and Automation 2018.

Figure 4. Finite state machine architecture.

IV. CORDIC DESIGN

Since only vectoring mode is used, there is not 𝑧𝑖+1

Figure 5. Circular C3 CV Range of Convergence Analysis.

Preprint submitted to WRC Symposium on Advanced Robotics and Automation 2018.

ADDR_A [4: 0] Dout_A [31: 0]

 Linear cordic: This is shown at Figure 11 (basic

Xin [31: 0] Xout [31: 0]

 Circular cordic: This is shown at Figure 8 (basic

Figure 14. Test bench level for Hyperbolic CORDIC.

Figure 9. Test bench for Circular CORDIC.

Preprint submitted to WRC Symposium on Advanced Robotics and Automation 2018.

implemented to generate a VLSI architecture which is C. Timing Analysis

In consequence, pipeline clock signal must be fit in order

Figure 17. Test bench for timing.

In the Figure 17 shows the clock signaling described as

Preprint submitted to WRC Symposium on Advanced Robotics and Automation 2018.

Figure 18. Gate Accuracy.

Preprint submitted to WRC Symposium on Advanced Robotics and Automation 2018.

You might also like