This action might not be possible to undo. Are you sure you want to continue?
Jun CHEN, Rong LUO, Huazhong YANG and Hui WANG Department of Electronic Engineering, Tsinghua University, Beijing, 100084 China
In this paper, a low-power ROM-less direct digital frequency synthesizer (DDFS) is presented. A preset value pipelined accumulator (PVPA) is proposed achieving update rates in excess of 500MHz by careful choice of the 12-7-7-6 4-stage pipelined architecture. Power dissipation is reduced by moving redundant registers and no phase latency is introduced when switching frequency. The phase to sine amplitude converter is entirely made up of combinational logic without ROM, and modified Sunderland approximation and power-gating technique are used to reduce its area and power, respectively. Moreover, a 2MSB truncated phase is introduced to one-quadrant phase to sine amplitude converter to improve the spurious free dynamic rang (SFDR) by 10dB. The design was implemented using a 0.18 µ m CMOS technology. It occupies a core area of 0.04mm2 and dissipates 17.2mW at 1.8 V supply voltage and 500 MHz clock. Index Terms - Direct digital frequency synthesizer (DDFS), preset value phase accumulator (PVPA), ROM-less look up table, low power
2.1 Direct digital frequency synthesizer
The digital frequency synthesizer was originally presented by Tierney, Rader, and Gold in 1971 . In general, a DDFS consists of a phase accumulator, a phase to sine-amplitude converter (PSAC), a digital-to-analog converter (DAC), and a low-pass filter (LPF), as shown in Fig. 1. The synthesizer has two inputs: a clock reference fclk and a frequency control word FCW . The phase accumulator integrates the value of the FCW on every clock cycle producing a ramp whose slope is directly proportional to FCW . This gives the frequency of the output sine-wave as
M fout = fclk * FCW / 2
where M is the width of phase accumulator. An approximation to the sinusoid amplitude of the equivalent angles is produced by the PSAC.
FCW Phase accumulator N Phase amplitude converter DAC/ LPF
Direct digital frequency synthesizers (DDFS) generate sine (cosine) output with the advantage of fast settling time, sub-Hertz frequency resolution, large bandwidth, continuous-phase switching response, and low phase noise . These advantages have made the technology popular in spread-spectrum communication, radar systems, test instrumentation, and electronic warfare. In this paper, a new low power ROM-less DDFS using a preset value pipelined accumulator (PVPA) is proposed. The PVPA reduces the power consumption and increases the throughput of the accumulator without introducing phase latency. Also, power gating is used in the ROM-less phase-to-amplitude converter to reduce the dynamic power. In section 2, a conventional DDFS and pipelined accumulator is introduced. In section 3, we present the PVPA and DDFS design. In section 4 , simulation results are given. The conclusion is given in section 5.
Figure 1. Conventional DDFS architecture In order to improve the frequency resolution, a wide phase accumulator is usually used and the output of phase accumulator is truncated. The PSAC is usually implemented as a ROM lookup table (LUT). The LUT is a power hungry circuit and the power consumption is dominated by the ROM. required in the phase-to-amplitude conversion. Also a large ROM LUT is the bottleneck of the highest clock frequency of DDFS. Therefore, many ROM-less architectures - and ROM compression algorithms - have been proposed to lower power consumption and to improve clock frequency.
2.2 Architecture of pipelined accumulator
A wide phase accumulator is often used in DDFS for the fine frequency resolution at high clock frequency, and the wide accumulator cannot finish one addition in a short single clock period because of the delay caused by the carry
This project was sponsored in part by NSFC under grant
Proceedings of the 19th International Conference on VLSI Design (VLSID’06)
1063-9667/06 $20.00 © 2006 IEEE
In our design.∑ cin2 ) cin2 = f1( ∆φ20 . At the same time. shown in Fig. ∆φ si and cini +1 .4. 4. Proposed DDFS Design 3. Every new frequency input word is moved into the pipeline circuits consisting of D-flip-flops (DFF) and delay elements.3. The preset logic block in the preset value accumulator is used to realize the above formulas (5) and (6).∆φs 2 .∆φsi . L DFF L DFF L DFF + DFF L DFF L+k b k DFF (3) L bits L DFF L DFF L DFF + L DFF (4) m stages DFF L bits L DFF L DFF + DFF L DFF L bits L DFF + L DFF Figure 2. ∆φ s 2 . we can approximate ∆φi and cini as follows. such that. If we make the MSBs accumulator as one stage of the pipelined accumulator. When the FCW is changed. a 32-bit pipelined accumulator is divided into four stages: 12. as shown in Fig.cini +1) ∑ cini = f2 (∆φi 0. no clock latency is introduced in the truncated phase output. At the same time we find that ∑ cin3 equals the MSB of ∆φ s 3 and ∑ cin2 equals zero..bits propagating through the adder. ∑ cini and ∆φi can be found from ∆φi 0 . At the time of frequency switching. cini from (3).cini +1) ∆φ =∆φi 0 +i ∗∆φsi + ∑ cini i (2) From the above. and ∆φ30 all to zero. We assume that the values of the sum and carry have been initialized when frequency switching so that only the last column of DFFs is needed to store the input of frequency control word FCW . As it is difficulty for the preset logic to produce the exact values of ∆φi and cini in a short clock cycle.00 © 2006 IEEE .1 Proposed preset value pipelined accumulator Considering the pipeline circuits used in the pipelined accumulator.∆φs1. The speed of the accumulator based on this architecture can be increased up to m times. cin1= f1( ∆φs1) cin 2 = f1( ∆φs 2 ... The PPVA is made up of a 12 bits MSB accumulator and 3 preset value accumulators.∆φs 3 ) .∑ cin3 ) cin3 = f1( ∆φ30 . let’s suppose FCW is changed from φi →φs . ∆φout =∆φ0 + ∑ cin1 ∆φ1=∆φ10 +∆φs1+ ∑ cin2 ∆φ2 =∆φ20 + 2*∆φs 2 + ∑ cin3 ∆φ =∆φ +3*∆φ 30 s3 3 cin1= f1( ∆φ10 . we simply set. we will get the value of the ∆φi . 7 and 6 bits. the length of the first stage is equal to the length of the truncated output phase to reduce the phase output latency. The carry output is latched between successive adders. as shown in (2): (5) ∆φout =∆φ0 ∆φ1=∆φs1 (6) ∆φ2 =2*∆φs 2 +∆φs 3 ∆φ =3*∆φ =2*∆φ +∆φ s3 s3 s3 3 A 32 bits PPVA is implemented as shown in Fig.3 we can easily learn that ∆φ10 . . To reduce the PSAC complexity. and from Fig. An accumulator can be split into two parts: MSBs phase accumulator and LSBs carry generator. A conventional solution is to pipeline the phase accumulator as m stages of L bits each. 7. The speed of the proposed PPVA is limited by the first stage and a carry select adder is used to achieve high speed. 2. ∆φ10 ∆φs 2 . For the constant error has no influence on the output frequency. In Fig.4. we propose a new 12-7-7-6 four-stage-pipelined accumulator. Proceedings of the 19th International Conference on VLSI Design (VLSID’06) 1063-9667/06 $20. increasing the number of pipelined blocks would increase the loading of the clock network. we can find that the value stored in the same row DFF is equal when no frequency switching and the last column of DFF is enough for the pipeline circuits. the output of phase accumulator is usually truncated and only some MSBs are used as the input to the PSAC. Each adder generates L+1 bits output: L sum bits and one carry output bit. Based on this idea. A conventional pipelined accumulator However. To lower the complexity of the preset logic block. the sum and carry DFFs are initialized by preset logic blocks except for stage one controlled by Fce.∆φsi . then cini . Then a constant error is introduced which is smaller than one LSB of the truncated phase output. as shown in Fig. L bits L DFF L DFF cini = f1(∆φi 0. (4) to initialize the pipelined accumulator when frequency switching. The 3 preset value accumulators operate as LSBs carry generator as shown in Fig.∆φs 3) cin3 = f1( ∆φs 3 ) 3. The maximum width of PPVA is 4 times the width of the first stage. and ∆φ30 have little impact on truncated phase output. 3.. the pipeline circuit requires considerable area and power and introduces more frequency switching latency. the detailed structure of the preset value accumulator is given in preset value accumulator 3 of figure 4. m × L = M .
2MSB and C together generate fine amplitude. which is used to determine whether the sine amplitude is increasing or decreasing.φm ) (7) ∆φ1=∆φs1+φm where φm in (7) can be some MSBs of ∆φ10 or equals ∆φ10 .. respectively. while the remaining 10 bits are used for the one-quadrant phase to sine amplitude converter. 2MSB ) in (11).MSBs MSBs accumulator DFF LSB carry generator Truncated phase output LSBs Figure 3. Equation (10) is rewritten as (11). The coarse logic block is used to realize ci (b ) in (9). 5. 3.  split the phase word into three bit slices.. In both coarse logic block and fine logic block. The most significant two phase bits are used to decode the quadrant. B.2. To improve the SFDR performance. 2MSB is introduced to the one-quadrant phase to sine amplitude converter to reduce the error resulting from 1’s complement approximation.. shown in Fig. 6. Detailed block diagram of proposed Phase to sine amplitude converter Nicholas et al.8) (11) φ 1MSB A Coarse ROM Sin(A+B) B 1's complementor A+B+C A Fine ROM cosAsinC C ⊕ A phase to amplitude converter logic is made up of a coarse logic block and a fine logic block. only logic circuit is used.  proposed that minimizing the mean-square error provides the lowest total spur energy and minimizing the maximum absolute error tends to reduce the value of the greatest spurs. and 2MSB to one of eight phase to amplitude generator logic blocks. A.. 4. and C. and the inputs of other phase to amplitude converter logic blocks are unchanged and no dynamic power is consumed. Sunderland architecture Proceedings of the 19th International Conference on VLSI Design (VLSID’06) 1063-9667/06 $20.00 © 2006 IEEE . The remaining 10 phase bits are divided into three bit slices: A. In every clock cycle. 8) (10) Preset value accumulator 1 7b Preset value accumulator 2 6b Input DFF 2MSB 1's complementor MUX A is used to generate signals si (a ) to control two blocks: Input Latch and MUX. B. C. an approximation for the sine of the sum of three angles is made as (8). The MUX block is used to select coarse amplitude and fine amplitude as the inputs of a 9-bit adder.i =(1. Preset value pipelined accumulator The phase error introduced by preset logic can be reduced as cin1= f1( ∆φs1. Two block phase accumulator architecture Fce 12 bit Accumulator 32b FCW 7b 12b Phase output 12b We propose a new ROM-less PSAC based on equation (8) as shown in Fig. C. and 3 bits. fi (c. Figure 5. The block of Input Latch is used as power gating block to hold or pass the data of B.2MSB ). having 3. and 2MSB are used as inputs of one phase to amplitude converter logic block only.2 ROM-less phase to amplitude converter Sunderland et al. 4b 4b 8b Phase to amplitude converter logic 8 9b 3b 8b 3b A(3b) Control signal gernerator (8) Figure 6. B. 1Msb Carry out DFF MUX Preset value accumulator 3 LSB carry generator 10b 12b Figure 4. When C is small enough. Amp f = cos a sin c ≈ ∑ si (a )fi (c. and C.. Now equation (8) can be rewritten as follows: Ampc = sin(a + b ) = ∑ si (a ) ci (b ) i =(1..2. sin(φ )=sin( A+ B +C )≈sin( A+ B ) + cos A sin C Preset logic Sum DFFs 2Msb 4b 4b 4b Input Latch 4b Phase to amplitude converter logic 1 9b 3b 9b Phase to amplitude converter logic 2 1's complementor 1's complementor B&C (7b) 3b 9b MUX 9bit adder 9b . 8) (9) Amp f = cos a sin c ≈ ∑ si (a )fi (c ) i =(1.2.
4 0. Tierney. 357-363  A. M.8 2 Single output Pipeline levels(cycle) Note A DDFS design based on the proposed architecture in the Proceedings of the 19th International Conference on VLSI Design (VLSID’06) 1063-9667/06 $20. vol. “A low power direc digital frequency synthesizers in 0. H.00 © 2006 IEEE .µ m CMOS technology. Elmasry. A. and no clock latency is introduced. Direct Digital Synthesizers: Theory. S. 2002. previous section was designed in Verilog HDL and synthesized using a SMIC 0. vol. 8 (a) Output spectrum based on Sunderland approximation. “A Digital Frequency Synthesizer”. 37. Audio Electroacoust. 80 60 40 20 0 -20 0 50 100 150 200 250 5. Halonen.075 480 72 0. and M. and C. the PSAC block is in 2-stage pipelined topology. Sunderland. MA: Kluwer. S.18 1. pp1326-1330  K. and Edward K. vol.18 1. “CMOS/SOS frequency synthesizer LSI circuit for spread spectrum communications”..5 6 Quadrat. and A. M. Power gating method is used to lower the power consumption of the PSAC. and D. Langlois. Table 1 Performance comparisons Ours FCW Truncated phase Amplitude Output SFDR (dBc) Area (mm2) Max. vol. IEEE J. Simulation and Comparison Two DDFS architectures were simulated in Matlab. output  32bit 13bit 12bit 84 0.µ m CMOS”. Samueli. 385–390  J. 2004. P. output  11 11 11 58 0. Solid-State Circuits. pp. 35. 4. It shows that the design based on the proposed architecture has the lowest power and achieve high speed with only 2 pipeline level. M. F. 1984. Aug.377-382  Jian Dong Jiang. and B. Table 1 shows the performance comparisons with recent publications. Rader. It uses a new mapping technique of sine function by a ROM-less lookup table resulting in a small area. IEEE J. It shows that the proposed DDFS is suitable as IP cores in low-power applications. H. Oct. and B.I. 497–505  H.µ m CMOS library. O’brecht. 80 60 40 20 0 -20 0 50 100 150 200 250 Reference  J.3 1 Quadrat. De Caro. Design and Applications. Apr. IEEE Trans. IEEE CICC’03 pp283-286  D.Palomaki and J. The core cells occupy 0. 25.(b) Output spectrum based on modified Sunderland approximation. Frequency Contr.Kim. SC-19.2mW at 500MHz with a 70dBc SFDR in SMIC 0. IEEE J.2mW. Both have a 32-bit-width accumulator with 12-bit truncated phase output and 10-bit output to the DAC. 1971 pp. 42nd Annu. Napoli." in Proc.090 150 500 0. pp.150 30 290 0.8 4 Quadrat. 8. pp. A preset value method is used to improve the operating speed of 32-bit 4 pipelined phase accumulator up to 500MHz. T.clock (MHz) Power Dissipation (μW/MHz) Process (um) Supply voltage(V)  24bit 14bit 12bit 80 0. Solid-State Circuits. Lee. C.25 2. Cole. using worst-case library parameters. Post-synthesis simulation results. indicate that the design could be operated at a clock frequency of 500MHz. Strollo “High speed Direct Digital Frequency Synthesizers in 0. Strauch.µ m CMOS” IEEE CICC’04 pp163-166  Li Jincheng. “Low power direct digital frequency synthesis for wireless communications”. June 1988. 2000. M. Phase to amplitude converter logic To improve the highest clock frequency. Peterson. Yang Huazhong “A New Architecture of Twiddle Factor Generator for Radix-2 1024-Point FFT” Chinese Journal of semiconductors vol.B(4bit) Coarse logic 2MSB Coarse amplitude (9bit) Fine mplitude (3bit) Fine logic C(3bit) Figure 7. Wharfield. Al-Khalili. The architecture introducing 2MSB for the one-quadrant phase to sine amplitude converter results in 10dB improvement in SFDR. I. Symp. Bellaouar.35 3. output 32bit 12bit 10bit 70 0. Boston.18.18.04 mm2 and the total power dissipation is approximately 17. as shown in Fig. G. Mar. 48–57  D.25. Gold.18. Simulation results show that the average power is 17. Conclusion A new ROM-less low power DDFS has been proposed."The Optimization of Direct Digital Frequency Synthesizer in the Presence of Finite Word Length Effects Performance. meneoryless direct digital frequency synthesizer architecture” ISCAS’03 ppII77-II80 (a) (b) Figure.Niittylahti “A low-power. pp. Fahim. E. Vankka and K. R. Nicholas.040 500 34. Mar. Solid-State Circuits. 2001  J. “A low-power segmented nonlinear DAC-based direct digital frequency synthesizer ”. AU-19.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.