You are on page 1of 11

Journal of Signal Processing Systems 2008

© 2008 Springer Science + Business Media, LLC. Manufactured in The United States.
DOI: 10.1007/s11265-007-0146-6

Efficient Mapping of CORDIC Algorithm for OFDM-Based WLAN
F. ANGARITA, M. J. CANET, T. SANSALONI, A. PEREZ-PASCUAL AND J. VALLS Department of Electronic Engineering, Polytechnic University of Valencia, 46730 Grao de Gandía, Valencia, Spain

Received: 9 June 2007; Revised: 3 September 2007; Accepted: 3 September 2007

Abstract. In an orthogonal frequency division multiplexing-based wireless local area network receiver there are three operations that can be performed by a unique coordinate rotation digital computer (CORDIC) processor since they are needed in different time instants. These are the rotation of a vector, the computation of the angle of a vector and the computation of the reciprocal. This paper proposes a common architecture of CORDIC algorithm suitable to implement the three operations with a reduced increase of the hardware cost with respect to a single operation CORDIC. The proposed architecture has been validated on field programmable gate-arrays devices and the results of the implementation show that area saving around 28% and throughput increment of 64% are obtained. Keywords: 1. OFDM, cordic, wireless LAN where these standards will be used. They have been designed to provide data rates up to 54 Mbps in order to support broadband multimedia communications. Synchronization and channel compensation in WLAN systems are obtained from a preamble. In this preamble, shown in Fig. 1 for Hiperlan/2, 3 sections can be distinguished: A, B, and C. Section A is used for automatic gain control and frame detection, section B is intended for time synchronization and coarse carrier frequency offset (CFO) estimation and section C can be used for fine CFO estimation and channel estimation. Figure 2 shows a block diagram with the stages of a WLAN base-band receiver: frame detection, timing synchronization, coarse and fine CFO estimation and correction, fast fourier transform (FFT)-based OFDM demodulation, channel estimation and compensation, and phase tracking. First, time synchronization is achieved by auto correlating the received data. The angle of the autocorrelation output is proportional to the CFO in section B, which can be calculated with a


Coordinate rotation digital computer (CORDIC) was introduced in 1959 by Volder [1]. It is an easy to implement and versatile algorithm widely used for digital signal processing applications [2] and communications systems [3]. It computes iteratively the rotation of a two-dimensional vector in a circular coordinates system using only add and shift operations. These rotations can also be used to compute the angle and module of the vector. In 1971, Walther [4] generalized this algorithm for the calculation of the rotation of a vector in linear and hyperbolic coordinate systems, which allows it to compute operations like division, logarithm or square root. Wireless local area network (WLAN) standards in the 5 GHz band, Hiperlan/2 and IEEE 802.11a, and in the 2.4 GHz band, IEEE 802.11 g, are based on orthogonal frequency division multiplexing (OFDM) transmission [5] due to its good performance on highly dispersive channels, like the indoor scenarios

circular and hyperbolic coordinate systems. Next. in linear. So. the residual CFO is 10 kHz. After fine CFO compensation. which can be obtained with a CORDIC working in linear coordinates. where a vector (X0. Section 4 details the proposed common architecture derived from those architectures presented in Section 3. It consists of two operating modes. CORDIC configured to obtain the angle of a vector. However. and the vectoring mode (VM) Figure 2. YN’). Hiperlan/2 Broadcast preamble. a residual CFO (around 1 kHz) continuously rotates the phase of the received OFDM signal and it causes a constellation rotation. so. and then the sub carriers (FFT output) must be equalized. Section 6 presents the conclusions. Receiver structure. the rotation mode (RM). the input data is rotated (with the CORDIC configured to rotate a vector) by using the obtained coarse CFO. Nevertheless. Finally. coarse and fine CFO is eliminated from input data by means of a CORDIC configured to rotate a vector. most of the operations required to recover the OFDM signal can be implemented with the help of a CORDIC processor. This work is organized as follows: in Section 2 the CORDIC algorithm is introduced. the pilot subcarriers detect the phase rotation by comparing the received pilot subcarrier data against the known pilot subcarrier data (the angle of this comparison is obtained with the CORDIC). After this correction. Y0) is rotated by an angle θ to obtain a new vector (XN’. .5 dB. Then. it is possible to design an optimized architecture that executes different pairs of mode and coordinate that achieve almost the same area and throughput with respect to an optimized specific CORDIC implementation. By doing so it is possible to reuse this generic CORDIC in different stages of the receiver and it will lead to a significant overall area saving at the receiver without reducing the signal processing data rate. After time and frequency synchronization. where coarse CFO has been already re-moved. a fine CFO estimation and correction are needed: another correlation must be done in section C. using only add and shift operations. This equalization needs the operation 1/x. 2. First. channel response must be estimated. most of CORDIC implementations are designed for a specific pair of modes (rotation or vectoring) and coordinates system [6–9] and few are designed for dual-mode and single coordinates [10]. which is not enough to get a SNR loss lower than 0.Angarita et al. CORDIC Algorithm CORDIC is an iterative algorithm for calculating the rotation of a two-dimensional vector (Fig. Therefore. Section 5 summarizes the main implementation results for field programmable gatearray (FPGA) devices. which is obtained with a CORDIC to compute the angle of a vector. as each operation needs to be done at different times. this estimated phase is removed (rotated with a CORDIC) from the equalized subcarriers as a constant phase). Again. Figure 1. the same CORDIC can be reused to enable all the required operations. This forces the receiver to track the carrier phase while data symbols are received. The final task to do is the phase tracking. After calculating it. Section 3 presents the single-modes/single-coordinates architectures and the extension to dual-mode/single-coordinate architectures. 3). the autocorrelation output is proportional to the fine CFO.

was originally described only for circular coordinates [1]. Y0). respectively. and was extended later to linear and hyperbolic coordinates and described in a generalized form [2] by the set of equations Eq. without significant hardware increasing in comparison to the single-mode/ single-coordinate implementations. its implementation must have a suitable architecture to achieve high data rates and low power. À1 ¼ 2  signðZÀ1 Þfor RM d0 ¼ ÀsignðYÀ1 Þfor VM 3. an extension of the algorithm allows the extension of this value to ±π. X0 ¼ ÀdÀ1 YÀ1 Y0 ¼ dÀ1 XÀ1  Z0 ¼ ZÀ1 À dÀ1 À1 . 2−i or tanh−1 2−i. For a straightforward interpretation of the generalized algorithm. executed by a finite number of micro-rotations indexed by i=0:N−1. a different coordinate system can be selected. m RM 0 XN YN ZN XN YN ZN ¼ K ðX0 cos Z0 À Y0 sin Z0 Þ ¼ K ðY0 cos Z0 þ X0 sin Z0 Þ ¼0 ¼ X0 ¼ Y0 þ X0 Z0 ¼0 Generalized CORDIC algorithm. therefore. For linear coordinates the maximum value of this sum is approximately ±2.Efficient Mapping of CORDIC Algorithm for OFDM-Based WLAN It is well known that the convergence of the CORDIC algorithm is the sum of all αi =0:N−1. for RM di ¼ ÀsignðYi Þ . the algorithm works in linear. Table 1. for VM ð1Þ By selecting appropriate values for the parameters m and αi. Mode The aim of this section is to show that the single-mode/ single-coordinate architectures have strong similitude that can be used to derive a common architecture to implement a dual-mode/dual-coordinate CORDIC in a look-up-table (LUT)-based FPGA. When m=0. a parallel architecture is the choice. The algorithm. YN and ZN. Nevertheless. This extension implies an extra preoperation described in the set of equations Eq. (1): Xiþ1 ¼ Xi À mdi 2Ài Yi Yiþ1 ¼ Yi þ di 2Ài Xi Ziþ1 ¼ Zi À di i . (2): Figure 3. circular or hyperbolic coordinate systems. Figures 4 and 5 represent two different common parallel architectures for RM and VM that depends on VM pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 ¼ K X0 þ Y0 ¼0   ¼ Z0 þ tan Y0=X0 ¼ X0 ¼0 ¼ Z0 þ Y0=X0 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 XN ¼ K 0 X1 þ Y1 YN ¼ 0   ZN ¼ Z1 þ tanhÀ1 Y1=X1 XN YN ZN XN YN ZN 1 −1 XN ¼ K 0 ðX1 cosh Z1 À Y1 sinh Z1 Þ YN ¼ K 0 ðY1 cosh Z1 þ X1 sinh Z1 Þ ZN ¼ 0 . Table 1 is introduced for the parameters Mode and m. The rotation and vectoring modes of the CORDIC algorithm. 1 or −1. Given that the CORDIC processor is intended to carry out a significant part of the signal processing required for implementing an OFDM-based broadband wireless digital receiver. and the values of αi are tan−1 2−i. resulting in different equations to obtain XN.  signðZi Þ . Single-Mode/Single-Coordinates Architectures ð2Þ where the algorithm computes the length R and the angle α towards the x-axis of a vector (X0. When using circular coordinates the convergence range is limited to ±π/2.

A common parallel architecture for hardware implementation for linear coordinates. Figure 4. The only difference is the way that the control signal di is generated: sign (Zi) for RM and −sign (Yi) for VM. Circular Coordinates (m=1) Figure 4 represents the parallel architecture for circular coordinates of CORDIC algorithm. . This architecture is common for the VM and RM.Angarita et al. A common parallel architecture for hardware implementation for circular coordinates. It must be remembered that CORDIC algorithm for hyperbolic coordinates (m=−1) is not implemented because it is not used in the OFDM based receiver. the coordinate system (circular and linear). Each one of the four possible pairs (Mode and m) that are implemented in this work has specific optimizations that will be discussed subsequently. 3.1. Figure 5.

the control signal di is used in each datapath (X. 2 for M ¼ N À 1 ð3Þ Due to the common architecture for the RM and VM. For RM. Thus. the new equation for Figure 6. Evidently. it is possible to implement a dual mode CORDIC (RM and VM) for circular coordinates (see row 1 of Table 1). does not need extra hardware. di is the MSB (Most Significant Bit) of Zi. Y0 for RM or X0. Taking Lxy as the length of X0 and Y0. per iteration. Remember that when the control signal is replicated. di is obtained by inverting the MSB of Yi. from i=0 to N. because it can be implemented inside each add-sub operator. the architecture can be split in rows. given in Xilinx slices. of the hardware implementation. On the other hand. can be reduced by the factor i−l. The not operation. Y and Z) and columns representing the iterations. Nevertheless. . due to the reduction of the number of bits of data-path Y or Z. simply by adding a multiplexer and a not as control hardware for implementing the selection between sign(Zi) or −sign(Yi). one for each data-path.Efficient Mapping of CORDIC Algorithm for OFDM-Based WLAN Clearly. and after each iteration. and Lz the length for Z0. (3) represents the total area. the number of bits to represent Zi or Yi. Z0 for VM) must be extended in order to avoid overflows due to the addition and subtraction operations. Nevertheless control signals are replicated three times. Depending on the operation mode. controlled by a Mode signal. Y and Z) for calculating the iterations. due to the high fan-out. Eq. Therefore. before starting the computations. which will be used as input to the next iteration. representing the data-paths (X. of Yi or Zi and the replication of the control hardware. (3) takes into account the area optimization due to the reduction of the number of bits. yielding an increase of three slices per iteration. and for VM. Nslices ¼ M Á ð2LXY þ LZ þ 5Þ À ð2M À 1Þ . the sign of the inputs (X0. this extension does not allow the area optimization previously mentioned. There is a significant speed reduction when deriving the control signal directly from Zi or Yi. Eq. an additional register per replication is needed. Consequently the LUTs associated to the registers can be used to implement the needed additional operations. Add-sub (left) and add-sub/buffer (right) hardware implementations in a Xilinx FPGA. and using the same slices used by the control replication. Zi (for RM) or Yi (for VM) converges to zero.

Linear Coordinates (m=0) The architecture for linear coordinates is quite similar to that for circular coordinates. in slices. Bearing in mind that the control signal is the same as the previous architecture for circular coordinates and that we can perform the same optimizations in RM and VM for data-paths Y and Z. Xi =X0. 2’sComplement/buffer element implementation in a Xilinx FPGA device. The main differences are the signal αi and the data-path X (see Eq. M=N. In this architecture it is found that there is no need for replication of control hardware for data-path X. an extra iteration. Figure 5 shows the parallel hardware architecture for linear coordinates of the CORDIC algorithm. i. calculating the area in the dual mode circular CORDIC is Nslices ¼ M Á ð2LXY þ LZ þ 7Þ . for M ¼ N À 1 ð4Þ 2 When using angle extension the extra pre-operation will imply.e.Angarita et al. Furthermore. . i. 3. so when implementing the hardware this will be translated in a saving of half slice per iteration. (1)). the equation that represents the total area. in terms of area. of the hardware Figure 7.e. except to register the data in each iteration. due to the pipeline architecture.2. data-path X does not imply any operation.

a separate explanation of each CORDIC data-path is going to be given. based on the dual-mode/single-coordinate CORDICs introduced previously. for M ¼ N À 1 ð6Þ 2 In this section a common CORDIC for dual-mode/dualcoordinate. Details of the muxKs/add-sub element implementation in a Xilinx FPGA device. which presents more complexity. This implementation is optimized for Xilinx devices. is presented. Eq. is: Nslices ¼ M Á ð2LXY þ LZ þ 3Þ À ð2M À 1Þ . beginning with datapath X. it is also valid for any LUT-based FPGA with 4-input LUT. Nslices ¼ M Á ð2LXY þ LZ þ 5Þ . is derived from the area Eq.Efficient Mapping of CORDIC Algorithm for OFDM-Based WLAN implementation for linear coordinates. (3). Following. (6) for the dual-mode linear CORDIC. 2 for M ¼ N À 1 ð5Þ 4. . Dual-Mode/Dual-Coordinate Implementation In the same way. nevertheless. (4) for dualmode circular CORDIC. The difficulty in developing a common architecture for data-path X is that there are no common elements between the linear and the circular dual-mode Figure 8. derived from Eq.

7. Once again. a new macro cell that implicates the same hardware of the previous architectures must be designed. Remember that for circular coordinate a sign extension is necessary for X0 which is not the case of linear coordinates. the angle extension pre-operation must affect only the circular coordinates. the sign extension affects both coordinates. a new macro called muxKs/add-sub has been designed (Fig. Nevertheless. Subsequently. using the common architecture for linear coordinates we need an extra register operation (buffer) previously to the first iteration to keep compliance in the structure. Therefore. This new macro is called add-sub/buffer. As a consequence. for the common implementation. However. we have to rearrange the LUT’s logic of the add-sub to obtain this new macro. The element that performs these operations is the 2’s Complement/buffer. the first one has a register and the second an add-sub circuit. this will be reproduced in the common CORDIC implementation case. Figure 9. we have to reuse the previously presented 2’sComplement/buffer. 7. The operations performed within data-path Z involves the utilization of αi. To do this. the structure of the data-path Y is the same in both of them.Angarita et al. To build this macro. As can be realized from Figs. but in this case the use of the multand element of the slice is necessary. 4 and 5. This operation (two’s complement) is only needed in the case of circular coordinates. as shown in Fig. architectures. When the angle extension pre-operation is implemented the α−1 that corresponds to the linear coordinates is zero. which are treated as constants. (1) it is seen that the values of αi differ depending on the used coordinates. Another issue is observed when moving from the single-coordinates to common architecture when the angle extension pre-operation is applied. 8). 6. Taking advantage that αi are constants and can be precalculated. From Eq. In this macro the values of the constants in the LUTs logic are inferred. as is shown in Fig. Architecture for the dual-mode/dual-coordinate CORDIC. Nevertheless. the LUT’s logic of the add-sub has been rearranged. as is shown in Fig. we firstly need to multiplex the two possible values and then execute either an addition or a subtraction operation. the above mentioned operations can be implemented implicating the same resources used by the single-coordinate architectures for data-path Z. now as a common architecture. as in the case of data-path X. .

5. Table 4 shows that for the non-pipelined version an area saving (throughput increment) of 50% (29%) and 33% (23%) is obtained with RPM and RTL version. It is also shown that an area saving (throughput increment) of 16% (57%) is obtained with RTL version with respect to the behavioural one for Stratix II device. (7) with Eq. In this way the block diagram represents exactly the hardware implementation. In order to show the advantages of the proposed mapping. in the worst case. the previous circuits have been synthesized for an Altera Stratix II and a Xilinx Virtex-4 device. as a matter of simplification. The implementation results are for pipelined parallel CORDIC architectures of 16-bit length for data-path X. The results show that. with respect to the behavioural one for Virtex 4 device. Area and frequency for CORDIC architectures implemented on a Virtex-4 device. the dual-mode/dual-coordinate CORDIC has Table 2. respectively. With the aim of obtaining generic results. Table 3 shows that an area saving (throughput increment) of 28% (64%) and 25% (33%) is obtained with RPM and RTL version. Design method Device Xilinx Virtex-4 Xc4vsx55-12 Altera Stratix II EP2S15-3 RPM 493 slices/ 22 MHz N/A RTL 555 slices/ 21 MHz 945 ALUTs/ 21 MHz Behavioral 741 slices/ 17 MHz 1090 ALUTs/ 18 MHz Architecture Dual-mode/dual coordinate Dual-mode/circular coordinate Dual-mode/linear coordinate Rotation mode/circular coordinate Vectoring mode/circular coordinate Rotation mode/linear coordinate Vectoring mode/linear coordinate . for M ¼ N À 1 ð7Þ 2 Table 3. (4) it is observed that the area required by the common CORDIC is the same as dual-mode circular CORDIC. Design method Device Xilinx Virtex-4 Xc4vsx55-12 Altera Stratix II EP2S15-3 RPM 551 slices/ 322 MHz N/A RTL 576 slices/ 262 MHz 1036 ALUTs/ 322 MHz Behavioral 770 slices/ 197 MHz 1227 ALUTs/ 205 MHz Comparing Eq. In this figure the new macros: 2’sComplement/ buffer. Note that the structural design cannot be implemented in Altera devices. The resulting area equation for the common implementation is: Nslices ¼ M Á ð2LXY þ LZ þ 6Þ . Y and Z and 16 iterations. Implementation results of fully pipelined dual-mode/ dual-coordinates CORDIC. respectively. using relative placed macros (RPM) in a Virtex4 Xilinx device. because of the use of Xilinx primitives in the VHDL code. and an area saving (throughput increment) of 15% (16%) is obtained with RTL version with respect to the behavioural one for Stratix II device. muxKs/add-sub and add-sub/buffer. register transfer level (RTL) description where the proposed mapping is specified and behavioural description where no information of the mapping is given. with respect to the behavioural one for Virtex 4 device. Implementation results of non pipelined dual-mode/ dual-coordinates CORDIC.Efficient Mapping of CORDIC Algorithm for OFDM-Based WLAN Figure 9 presents the resulting dual-mode/dualcoordinate CORDIC architecture proposed in this work. the proposed mapping improves the implementation of a dual-mode/dual-coordinate CORDIC in both technologies. respectively. are represented by the equivalent blocks. the area of the dual-mode/dual-coordinate CORDIC architecture is 32% larger than the area of a single-mode/singlecoordinate CORDIC and only 8% in comparison with the dual-mode/single-coordinate architectures. Table 3 and 4 show the results for a fully pipelined and non-pipelined version. Table 4. Device: xc4vsx55-12 Area [slices] 551 510 464 461 461 417 417 Throughput [MHz] 322 323 323 327 328 329 329 been modelled in VHDL using three different design styles: structural style using primitives from the vendor and placement information (RPM). Therefore. which is enough precision for an OFDM-WLAN receiver. FPGA Implementation Results Table 2 shows the resulting area and the maximum operation frequencies for the implementations of all architectures presented in Sections 3 and 4.

EC-8. DirectCore. . the Spanish Ministerio de Educación y Ciencia..D. Alantic City. Almenar.” in Proc. 46–50. Actel. “Evaluation of CORDIC Algorithms for FPGA Design. J. He is currently Ph. She is currently Ph. “A survey of CORDIC algorithms for FPGAs. “CORDIC-Based VLSI Architectures for Digital Signal Processing. computer arithmetic. 379– received her telecommunication engineering degree from the Universidad Politécnica de Valencia in 2001. vol.Angarita et al. K. GV06/114. DS249.D. vol. Product Summary. CORDIC Reference Design v1. It is also shown that an area and throughput improvement is achieved when the proposed mapping is applied. 6. VLSI signal processing. vol. 1998. 5. Prasad. A. J. Kuhlmann and K. Torres and V. 8. Spain. Walther. S. Xilinx. 330– 334. pp. 3. Nee and R. no. Colombia in 2001. pp. The results show that the area of the dual-mode/dualcoordinate CORDIC architecture is 32% larger than the area of a single-mode/single-coordinate CORDIC and only 8% in comparison with the dual-mode/ single-coordinate architectures. E. Hu. pp. 2000. His current research interests include the design of FPGAbased systems. 207–222. V. R. J. Fabián Angarita (faanpre@doctor. Altera.” IRE Trans. Xilinx. CA. Inc. 2006. pp. “A Unified Algorithm for Elementary Functions.” in FPGA ‘98. and digital communications systems. 16–35. TEC2005-08406-C03-01 and Generatitat Valenciana. Parhi. H. CORDIC v3. 1959. IEEE Commun. “The CORDIC Trigonometric Computing Technique. Application Note 263. 2004.. Actel. student in electronics engineering at the Universidad Politécnica de Valencia.” J. and digital communications. Her current research interests include the design of FPGA-based systems. T. Electron. Perez-Pascual.0. The architecture has been validated by means of its implementation on Xilinx and Altera FPGA devices. Sansaloni. 10.4. 8. 1971. Feb. Ma José Canet (macasu@eln. Acknowledgements This research was supported by FEDER. References 1. Conclusions This paper has proposed an efficient mapping of a common CORDIC architecture for circular and linear coordinates and for rotation and vectoring modes that can be applied to OFDM-based WLAN systems. Volder. Artech House Publishers. M. Jul. CoreCORDIC RTL Generator v2. “The Use of CORDIC in Software Defined Radios: A Tutorial”. Jun. 4. signal processing. Spain.” IEEE Signal Process. Sep. VLSI Signal Process. Mar. Andraka. pp. of the Joint Spring Computer Conference. Proceedings of the 1998 ACM/SIGDA sixth international symposium on Field programmable gate arrays. 32. vol. student in electronics engineering at the Universidad Politécnica de Valencia. R. 1992. pp. Comput. no. Valls. 7.. under Grant No. LogicCore Product Specification. Monterey. Y. Mag. Altera. 3. Valls. 2. 2006. J. under Grant No. 2002. 191–200. computer arithmetic.upv. May. 9.upv. OFDM for Wireless Multimedia Communications. Sep. V. received his telecommunication engineering degree from the Pontificia Universidad Javeriana. 2005.. no. 6. Mag. 3. Nov.0. 9.

VLSI signal processing. She is currently an Associate Professor in the Department of Electronics at Universidad Politécnica de Valencia. Trini Sansaloni received her telecommunication engineering and Ph. Her current research interests include the design of FPGA-based systems and VLSI signal processing. computer arithmetic. respectively. Asun Pérez-Pascual received her telecommunication engineering and Ph. . Her current research interests include the design of FPGA-based systems. Spain in 1993 and 1999. (telecommunication engineering) degrees from the Universidad Politécnica de Valencia in 1994 and 2001.D. computer arithmetic.Efficient Mapping of CORDIC Algorithm for OFDM-Based WLAN from the Universidad Politécnica de Valencia. respectively. Spain.upv. degree in telecommunication engineering from the Universidad Politécnica de Valencia.D.D. (telecommunication engineering) degrees Javier Valls (jvalls@eln. and his Ph. She has been an Associate Professor in the Department of Electronics at Universidad Politécnica de Valencia since 2002. Spain in 1997 and 2002. respectively. His current research interests include the design of FPGA-based systems. He is currently an Associate Professor in the Department of Electronics at Universidad Politécnica de Valencia since 1996. and digital communications. and digital communications. VLSI signal received his telecommunication engineering degree from the Universidad Politécnica de Cataluña.