You are on page 1of 4


S. Soliman, F. Yuan, and K. Raahemifar

Department of Electrical and Computer Engineering Ryerson University Toronto, Ontario, Canada M5B 2K3
ABSTRACT An overview of the recent developments in the design techniques of CMOS phase detectors and an in-depth examination of the advantages and limitations of these techniques are presented. Both linear and nonlinear phase detectors are examined. Critical design issues, such as, sampling mechanism, lock condition, sensitivity to input data pattern, and reliability are investigated in detail.

&@ A

Vm - - d2 x


Clock and data recovery (CDR) circuits are used extensively in digital systems to extract timing information from data, to reduce clock jitter, and to suppress its skew. A vital building block of CDR circuits is the phase detector (PD). The main function of PD is to detect and amplify the phaselfrequency difference between the input signal and the output of the local oscillator (LO) employed in the CDR circuit to recover clock and data. The performance of CDR circuits critically depends on the characteristics of PD. Recently a significant effort has been made on designing high-speed low-power CMOS PD for telecommunications and digital systems. Many novel configurations and design techniques have emerged. An overview and in-depth examination of these techniques, however, are not available. In this paper, we investigate the advantages and limitations of these emerging techniques for CMOS PD. Phase detectors can be generally categorized into linear and nonlinear PDs. Linear PDs [2] compare the input signal with a reference signal, detect and amplify their phase difference. Because the phase error of the inputs ideally drops to zero in the lock condition, linear phase detectors do not generate significant activity on the VCO control line and hence smaller output jitter. Nonlinear phase detectors (61use the VCO output(s) to sample the input signal. Since, in the lock condition, the output of the PD does not drop to zero, significant activity is generated on the VCO control line and hence large output jitter. Fig. 1. Gilbert cell PD.

2.1. Analog Multiplier PD

A typical implementation of combinational PDs is a Gilbert cell, shown in Fig. 1 [l]. When signals of small amplitude are applied to the input ports of the cell, it behaves as an analog multiplier. If the phase error of the inputs is in the vicinity of go", the average value of the output is linearly proportional to the phase error. An advantage of analogmultiplier PD is its high operation speed as compared with other implementations. However, it suffers from high static power consumption. Also, the gain of the cell depends on the amplitude of the inputs. Moreover, it cannot detect the frequency error of the inputs.

2.2. XOR PD
If the amplitude of the inputs to the Gilbert cell is large, the cell behaves as a XOR gate. As the phase error of the inputs deviates from 90, the output duty cycle departs from 50%, resulting in a dc output that is proportional to the phase error. The advantage of XOR PDs is the improved acquisition range 0' 180,as shown in Fig. 1. Its drawback is the dependency of the cell dc output on the duty cycle of the inputs.


Based on their characteristics, phase detectors can be broadly categorized into (i) Combinational PDs, (ii) XOR PDs, and (iii) Edge-triggered PDs. In what follows, we investigate various implementations of these PDs and examine the advantages and limitations of these implementations.

2.3. R-S Latch PD

An edge-triggered PD can be implemented using an R-S latch as shown in Fig. 2 [l].Its differential output changes sign every time a rising edge at one input is followed by a rising edge at the other. The advantages of the R-S latch PD are the independence of the average value of its output on the duty cycle of the inputs and the improved acquisition

0-7803-7448-7/02/$17.0002002 IEEE

v - 457

Input Data -

Charge Pump


Fig. 2. R-S latch PD.

Retimed Data Recovered Clock DoublrEdge-Triggered Flip-Flop



m fi
lock state dawclock datadock

Fig. 3. D flip-flop PFD.

Fig. 4. Two-XOR PFD. eight differential clock signals CLKl through CLK4 each spaced by 45'. The PFD generates two signals: DATAlead by XORing CLKl and CLK3 and DATAI,, by XORing CLKl and the input data. DATAl,,d is periodic. The opposite applies to DATAL,,. While DATAl,,d acts as the charge-down signal that increases VCO output frequency, DATAI,, as the charge-up signal that reduces VCO output frequency. The waveform of the detected signal is shown in Fig. 4, A lock condition is achieved if the input data is phase-aligned with CLK3. DATAle,d is considered as a reference signal and DATAI,, as an error signal. By comparing these two signals, phase and frequency errors can be detected. An example of phase correction is when the input
data leads CLK3, the high state of DATAiead is longer than

360), as shown in Fig. 2. The drawbacks range ( 0" include (1) CDR circuits employing a R-S latch PD can not perform frequency synthesis because the frequency of its output is the same as that of the inputs, (2) CDR circuits may lock to a higher harmonic of the input as it generates a nonzero dc output if the frequency of one of its inputs is an integer multiple of that of the other, and (3) output jitter exists due to the metastability that occurs in the lock condition. 2.4. D-Flipflop PFD
To detect both the phase and frequency differences of incoming signals, phase-frequency detectors (PFD) are needed. The block diagram of a D-flipflop PFD is shown in Fig. 3 [l]. It employs two edge-triggered, resettable D flip-flops with their D inputs connected to logic HIGH. Signals A and B act as the clock input of the two flip-flops. If Q A = QB = 0, a LOW-to-HIGH transition on A causes Q A to go HIGH. Subsequent transitions on A will have no effect on & A . When B goes from LOW to HIGH, the AND gate activat? the RESET of both flip-flops, resetting both Q A and QBThe outputs Q A and QB are called UP and DOWN signals. The advantages of this PFD are the improved acquisition range and lock speed as it detects both the frequency and phase errors of the inputs. As shown in Fig. 3, the PFD has a constant gain over the phase error range f 2 n . D-Flipflop PDF suffers from a number of drawbacks : (1) when the delay between UP and DOWN signals becomes comparable to their switching delay in the vicinity of the lock condition, a dead zone is generated, causing the output to jitter, (2) output jitter exists due to the metastability that occurs in the lock condition, and (3) sensitive to input data patterns.

that of DATAL,,. The surplus part of DATAl,,d reduces the charge in the loop filter and shortens the clock period. An example of frequency correction is when the input data frequency is smaller than that of CLK3, the surplus part of DATAI,, reduces the charge in the loop filter, thus, speeding up the clock. The advantages of this technique include simple implementation, a large acquisition range, and high lock speed as it detects both phase and frequency errors of the input signals. This type of PDS, however, is sensitive to input data pattern and the metastability that occurs in the lock condition.

2.6. Sample-and-Hold PD
The schematic of a sample-and-hold (S/H) PD is shown in Fig. 5 [4]. It is realized as a master-slave S/H circuit (an analog D flip-flop). Each rising transition of Din samples VCO output. The circuit generates an output that is linearly proportional to the phase error of the inputs. Both the master and slave stages are realized using a differential pair whose tail current and load device turn off simultaneously, thereby storing the instantaneous value of VVCO on the parasitic capacitances Cpl-Cp4. The geometry of transistors M T , M3, and M4 is chosen such that when MT is on, M3 and M4 are forced into the triode region, elim-

The block diagram of the CDR circuit employing a twoXOR PFD is shown in Fig. 4 [3]. The VCO generates

V - 458




Average Output

inating the need for common-mode feedback. The control of the S/H circuit is implemented using PMOS transistors to allow operation from a low-supply voltage. The behavior of the PD in the vicinity of lock can be seen from its characteristics in Fig. 5. The advantages of S/H PDs are low power consumption and low activity on VCO control line in the lock condition. S/H PDs are sensitive to input data pattern. Although the flipflop is implemented using analog circuitry, the speed of the PD is limited by that of the current steering circuitry.

2.7. Half-Rate PD
The block diagram of a half-rate PD is shown in Fig. 6 [5]. It consists of four latches and two XOR gates. The data is applied to two sets of cascaded latches, each cascade constitutes a flip-flop. Since the flip-flops are driven by a half-rate clock, they demultiplex the original input sequence if the clock samples the data in the middle of the bit eye. The operation of the PD can be described using the waveforms shown in Fig. 6. The basic unit employed in the circuit is a latch whose output tracks its input for the half of the clock period and holds its value for the other 2 half. Because Error = X1 CEI X where @ is the exclusive-or operator, the error signal is equal to ONE only if a data transition has occurred. Since the input data is random in nature and the clock is periodic the average value of the error signal is pattern dependent. To convey this depen1 I dence, a reference signal = Y @ YZis generated. As can be seen in Fig. 6, the width of the error pulses is only half of the reference pulses in the lock condition. This dictates the scaling the amplitude of the error signal up by a factor of two. The difference between their average values in the lock condition drops to zero and the phase error continues to be linearly proportional to the difference in the vicinity of lock. To generate a full-rate output, the demultiplexed sequences are combined by a multiplexer clocked at half the clock rate. It is important to note that the two XOR gates must be symmetric with respect to their two differential inputs. Otherwise, a difference in propagation delay will result in a systematic phase error. The advantages of this technique are the use of the half-rate architecture that allows the circuit to work at a high speed and still maintain reliable operation, as well as the removal of the dead zone from its characteristics. Its main drawback is its complexity. 2.8. Bang-bang PD The block diagram of the CDR circuit employing a bangbang PD is shown in Fig. 7 [6]. Five positiveedge-triggered D-flip-flops (DFF) are used to sample the input data. The VCO generates five clock signals CLK1 CLKs phase Pdlland Pda, shifted by r/4. The PD generates PullPuz, which provide the digitized phase error information. These signals are fed to a frequency detection circuit where frequency acquisition is performed. The optimum sampling point occurs when CLK3 is phase-aligned with the input data transitions. The PD output pulses repr.esent the occurrence of the data transitions between the adjacent sampling clocks. Their width is fixed at half the clock period. The time delay of the Delay unit connected to CLKs is

Fig. 5. Sample-and-Hold PD. CLK





I :

2 :


4 1 x

5 : 15





output Data2


23 34


Fig. 6. Half-Rate PD.

outpvt Datal

Pu2 Pul Pdl PdZ

output CLK



. -vz
Input Data


t t CLK3 t CLKS t t


Fig. 7. Bang-bang PD.

v - 459

equal to a flipflop propagation delay. The PD output controls the charge pump to create four differential voltage steps &VI and ?cVz at the input of the VCO. The direction of the voltage steps depends on the occurrence of the data transition versus CLK3. These voltage steps result in corresponding frequency steps &Afl and &Afi. A larger step is used in the vicinity of lock to enhance the acquisition range and lock speed. A smaller one is used during the lock condition to reduce the activity on the VCO control line. An advantage of this technique is the reduced activity on VCO control line. Its drawback is the metastability that occurs in the lock condition. 3. DESIGN CONSIDERATIONS 3.1. Sampling Mechanism The phase/frequency difference between the input and the output of VCO can be obtained by sampling VCO output using the signal or verse versa. Because the duty cycle of the input signal varies, input-clocked PDs suffer from a high ripple on VCO control line, subsequently high jitter [4]. VCO-clocked PDs are therefore preferred in general. 3.2. Analog versus Digital PDs We have shown in the preceding sections that digital PDs employ one or more flip-flops that are typically implemented using two cascaded latches. Only one of them is activated at a time. Because each latch must be active sufficiently long to ensure that its output is stabilized, digital PDs in general suffer from long lock time. Analog PDs are therefore more suitable for high speed applications. The configuration of analog PDs is in general more complex. In addition, they consume static power. 3.3. The Lock Condition
A well defined lock condition in which the output of PD reduces to zero helps reduce the charge pump activity, subsequently the activity on VCO control line and output jitter. The master-slave configuration in Fig. 5 eliminates the transparent path from the input to the output so that the activity on VCO control line is reduced. Another example is the bang-bang PD in Fig. 7 where the output changes only when a sufficiently large error is detected.

in the triode region. The bang-bang PD is also sensitive to input data pattern. If a long string of zeros or ones is encountered before the lock is established, it will cause the output of PD to be at logic-0, resulting in lock delay. On the other hand, if it is encountered after the lock is established, it will cause CLK3 to drift away from the middle of the bit eye. The drift direction will depend on the accumulated charge in the loop filter and the mismatches in the charge pump circuits. In summary, the effect of datapattern dependency ranges from an undesired jitter in VCO output to a frequent loss of lock. 3.5. Reliability With the continuous increase of the operation speed, reliable operation becomes a critical issue. The increased speed of operation not only complicates the design of circuits but also increases the switching noise. To ensure reliable operation of CDR, the half-rate architecture [5] where the VCO runs at a frequency that is equal to the half of that of the input is often a preferred choice. To minimize the effect of switching noise that affects the potential of the power rails, PDs should be implemented in a differential mode. 4. CONCLUSION In this paper, we have presented an overview of the recent developments of CMOS phase detectors and an in-depth examination of the advantages and limitations of these design techniques. Analog phase detectors offer a high operation speed whereas digital phase detectors provide a large lock range and a better lock condition. To increase the speed of digital PDs, current mode logic (CML) circuits should be employed. 5. REFERENCES [l] B. h a v i , Monolithic Phase-Locked Loops and Clock Recovery Circuits, Theory and Design, New York: IEEE Press, 1996.
[2] C. Hogge, A self correcting clock recovery circuit, J.

Lightwave Tech., vol. LT-3, pp. 1312-1314, Dec. 1985. [3] J. Kang and D. Kim, A CMOS clock and data recovery with two-XOR phase-frequency detector circuit, ISSCC Dig. Tech. Papers, vol. IV, pp. 226-269, 2001. [4] S. Anand and B. Razavi, A CMOS clock recovery circuit for 2.5-Gb/s NRZ data, IEEE J. Solid-state Circuits, vol. 36, NO. 3, pp. 432-439, March 2001.

3.4. Sensitivity to Input Data Pattern

Sensitivity to the input data pattern is also critical to the design of PDs. Most PD implementations require that the input data to have a minimum timing content below which improper operation of the PD will result. For example, if a long string of ones or zeros is encountered in the twoXOR PFD studied earlier, high jitter in the output of the VCO will be generated due to the introduced activity on its control line, although the lock is still maintained. Another example is the S/H PD investigated before, when a long string of zeros is encountered before the lock, a lock delay is generated because the VCO output does not affect the PD output. On the other hand, a long string of ones causes the PD output to saturate since the load devices are biased

[5] J. Savoj and B. Razavi, A lO-Gb/s CMOS clock and data recovery circuit with a half-rate linear phase detector, IEEE J. Solid-state Circuits, vol. 36, NO. 5, pp.761-767, May 2001.
[6] M. Ramezani and C. Salama, An improved bang-bang phase detector for clock and data Recovery applications, ISSCC Dig. Tech. Papers, vol. I, pp. 715-717, 2001.

V - 460