Professional Documents
Culture Documents
Jafar Savoj
Transpectrum Technologies, Inc.
Behzad Razavi
University of California, Los Angeles
No part of this eBook may be reproduced or transmitted in any form or by any means, electronic,
mechanical, recording, or otherwise, without written consent from the Publisher
Index
123
List of Figures
4.2 Output SNR for a system (a) without and (b) with
a matched filter. 63
4.3 Matched filter for rectangular pulse. 64
4.4 Single matched filter. 65
4.5 Interleaved matched filters. 65
4.6 Interface architecture. 66
4.7 (a) Seven-stage amplifier, (b) the first common-
gate stage, (c) the following common-source stages. 67
4.8 Amplifier’s transfer function. 68
4.9 (a) Overall input-referred noise and (b) output eye
of the amplifier. 69
4.10 Stacked inductor. 70
4.11 (a) High-speed integrate-and-dump circuit, (b) cor-
responding waveforms. 71
4.12 (a) Addition of hold phase, (b) corresponding waveforms. 72
4.13 Demultiplexer. 72
4.14 Clock buffers. 73
4.15 Die photograph. 74
4.16 Eye diagram of the output. 75
4.17 Addition of matched filtering to optical receivers. 75
5.1 Generic CDR architecture. 78
5.2 Half-rate CDR architecture. 79
5.3 Effect of non-ideal duty cycle. 80
5.4 (a) Three-stage ring oscillator, (b) implementation
of each stage, (c) transistor-level schematic. 81
5.5 Small-signal (a) gain and (b) phase response of
each delay stage. 82
5.6 VCO gain partitioning: (a) fine control and (b)
coarse control. 83
5.7 (a) Phase detector, (b) operation of the circuit. 84
5.8 Symmetric XOR gate. 86
5.9 Determination of PD gain. 87
5.10 Charge pump and loop filter. 88
5.11 Lock acquisition. 89
5.12 Chip photograph. 90
5.13 (a) Spectrum of the recovered clock, (b) recovered
clock in the time domain. 91
5.14 Measured jitter transfer characteristic. 92
5.15 (a) Recovered demultiplexed data, (b) recovered
full-rate data. 93
6.1 CDR architecture. 98
x HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS
INTRODUCTION
The volume of the data transported over the Internet backbone has
increased with the exponential growth of the number of Internet users.
As shown in Fig. 1.1, the load on the global Internet backbone will be
as high as 11 Tb/s by the year 2005. This means that the bandwidth
requirements will increase by a factor of 50 to 100 every seven years.
Among the available transmission media, optical fibers achieve the
highest bandwidth and the lowest loss. These characteristics make them
an attractive medium for transmission of data over long distances.
Despite the unique transmission capabilities of optical fibers, the data
needs to be regenerated after a few tens of miles. Data gets distorted as
it travels through the fiber, mostly because of the fiber dispersion. This
2 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS
distortion leads to the closure of the data eye. The signal amplitude is
also reduced due to the loss throughout the fiber.
Restoration of the original data at the receiver side with acceptable
bit-error rate (BER) can only be performed if the signal sustains the re-
quired signal-to-noise ratio (SNR). The data must be regenerated mid-
way to prohibit degradation of its SNR. There has been extensive re-
search performed on finding techniques for regeneration of the data in
optical domain. However, most of these techniques are still under study
and the majority of the commercial systems employ electronic interfaces
for regeneration of the data. As a result, the optical pulses are first
converted into electric current, regenerated and processed in the electric
domain, and then converted back into the optical pulses.
The complexity of the procedure that takes place in the regenerator
introduces latency. Furthermore, the maximum data rate is determined
by the speed of the electronic interface. The operating speed of the back-
bone can be increased by either designing faster electronic interfaces to
handle a higher data rate, or by using a number of parallel regenera-
tors and wave-division multiplexing (WDM) to combine a number of
high-speed optical data streams on one fiber channel.
Throughout this book, various approaches for increasing the operat-
ing rate of the regenerators are addressed. These approaches introduce
innovations at both the system and circuit levels. Special attention has
been paid to reducing the complexity of the circuits, so that a number
of transceivers can be placed on one chip if parallelism is used.
In this work, we have targeted a data rate of 10 Gb/s. With the
operating rates increasing to 10 Gb/s, and costs staying the same, new
applications can be introduced that will become more attractive as the
cost of transport per bit decreases.
The majority of the backbone optical communication systems are
based on the SONET standard. Short for Synchronous Optical Network,
it was proposed by Bellcore in mid 80s and is now an ANSI standard.
SONET defines a hierarchy that allows data streams of different rates
to be multiplexed. SONET recommends optical carrier (OC) levels that
are integer multiples of 51.85 Mb/s. This standard has allowed different
communication carriers to interconnect their existing fiber optic systems
[1].
The SONET OC-192 standard has been specified for 10 Gb/s optical
communication. SONET recommends two types of architectures for use
in metropolitan and long-haul areas: ring and point to point (Fig. 1.2)
[2]. An OC-192 ring replaces multiple pre-existing OC-48 rings operating
at lower speeds. Furthermore, it allows a larger number of nodes to be
placed on the ring and provides the capability to process more added
Introduction 3
and dropped traffic at each node. For this reason, the complexity of the
network is drastically reduced.
Point-to-point architecture allows flexible routing between different
nodes and point-to-point services require connections on a per-customer
basis.
Other services provided by the OC-192 standard include video con-
ferencing and ATM-based services like LAN interconnections.
fiers. Both the transmitter and the receiver equipment employ optical
amplifiers. On the transmitting side, the transmitter is followed by a
booster amplifier. In the receiving end, a preamplifier and an optical fil-
Introduction 5
ter process the optical pulse before going to the receiver. The parameters
should be chosen to provide an overall BER of better than
The data transmitted over the fiber is encoded in nonreturn-to-zero
(NRZ) format. Therefore, the data stream does not carry any infor-
mation about the clock signal, and its spectrum contains no spectral
components at the frequency of the data rate. The only measure for the
clock signal that can be derived from the data sequence is the minimum
spacing between consecutive zero crossings of the data. This measure
can be extracted through nonlinear circuit techniques such as edge de-
tection, detecting the timing information contained in the transitions
between nonidentical adjacent bits. As a result, the edge-detected sig-
nal contains a tone at the data rate.
An NRZ stream can contain long sequences of ones and zeros with
no transitions in between. If the number of transitions is too low, syn-
chronization at the receiver end will become very difficult. For example,
if the receiver contains a phase-locked loop (PLL), the frequency of the
oscillator can drift during these long sequences of identical bits such that
the recovery of the data would no longer be possible.
To overcome this difficulty, high-speed communication systems encode
the data such that the maximum length of a continuous sequence of
ones or zeros is limited. A widely-accepted technique is the 8B/10B
encoding [4] that has been used for some of the systems operating at
2.5 Gb/s. It generates an encoded stream at 3.125 Gb/s. Using this
coding technique, an eight-bit data byte is converted into ten bits. As
a result, the minimum and the maximum number of consecutive zeros
or ones is one and five, respectively. In addition to providing a higher
transition density, this type of encoding limits the low-frequency content
of the data stream such that the sequence has no dc component on
average. As a result the optical modules can be ac coupled. Finally, this
encoding scheme detects many signaling errors.
A new encoding scheme is the 64B/66B encoding [5], in which two
additional bits are added to every 64 bits. If this encoding scheme is
applied to a 10-Gb/s stream, it will result in a high-speed sequence of
approximately 10.3 Gb/s.
ing this technology benefit from a sound backup foundry, and a shorter
pre-process waiting period. Due to the huge momentum of the digital
market, the CMOS process develops faster than other processes. The
migration of to CMOS process has taken place in only
two years.
Newly developed SiGe BiCMOS processes are another good alterna-
tive for development of high-speed integrated circuits. These processes
offer very fast bipolar devices suitable for building analog front-ends and
dense CMOS devices for the digital portion. A modified BiCMOS pro-
cess that does not have the trench isolation [6] has a fabrication cost and
turn-around time that is not much different from a pure CMOS process.
The number of fabrication masks for this BiCMOS process is slightly
higher than that of a CMOS process. The drawbacks of the BiCMOS
process are the small number of supplying foundries and the fact that
the scaling of their CMOS devices is usually not as aggressive as that
of the fastest available CMOS processes. The digital circuits fabricated
using these BiCMOS processes cannot operate as fast as those circuits
fabricated in a pure CMOS process.
Benefiting from such capabilities, the CMOS technology is a perfect
solution for implementation of systems that employ parallelism. A num-
ber of transceivers are placed on one chip to handle the incoming high-
speed sequences. These signals are carried over either a bundle of fibers
or a single fiber that uses wave-division multiplexing (WDM). The com-
plexity and the power dissipation of the transceivers are critical as they
determine the number of transceivers that can be placed on one chip.
Figure 1.6 depicts a fiber optic transceiver consisting of a transmitter
and a receiver. In the transmitter, parallel sequences of data at lower
rates are combined in a multiplexer to generate a single high-speed se-
rial signal. Multiplexing of data is performed in multiple steps, with a
gradual increase in the rate of the merging sequences. The multiplexer
therefore operates with a number of clock signals whose frequency dou-
bles as the multiplexing advances to the next level. Figure 1.7 depicts a
conceptual topology of 4-to-l and 2-to-l multiplexers.
Shown in Fig. 1.7(b), multiplexing at any level is performed by a
combination of five latches and a multiplexer. The four latches
tend to retime the data. The fifth latch, skews the data in one of the
signal paths by one half of a clock period. As a result, the multiplexer
samples both of the sequences starting from the middle of the data
period.
Figure 1.8 depicts the structure of current-steering latch and mul-
tiplexer used for high-speed applications. This structure allows for a
reduced voltage swing that is well defined. Reduction of the output
8 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS
considering that the data rate at the output of the multiplexer is twice
the frequency of the clock signal that drives the circuit. However, it
is critical to retime the output of the last multiplexer with a flipflop
operating at a frequency equal to the data rate for two reasons. First, in
the presence of mismatches the multiplexer will exhibit different delays
from each of its inputs to the output. This inherent mismatch increases
the static ISI on the output eye diagram of the multiplexer. Second, the
two input data sequences experience timing mismatch on their path to
the multiplexer, yielding to static bimodal jitter.
Introduction 11
The design of the CDR circuit is the most complicated part of imple-
menting an optical transceiver. It entails many challenges that will be
addressed in the following chapters.
3. Overview of Topics
Chapter 2 describes the high-speed front-end circuits of the optical
receivers, covering the design of transimpedance amplifiers and voltage
limiters. Chapter 3 provides an overview of existing CDR architectures,
and describes the implementation of their building blocks. Chapter 4 de-
scribes the techniques for optimizing detection in a high-speed receiver,
focusing on wideband amplification and matched filtering. An interface
built in CMOS process utilizing these techniques is introduced.
The remainder of this book concentrates on designing high-speed CDR
circuits operating at 10 Gb/s. Chapter 5 describes a CDR circuit incor-
porating a half-rate linear phase detector and a ring oscillator. Chap-
ter 6 covers the design of a CDR circuit that uses a half-rate binary
phase/frequency detector and a multi-phase LC oscillator. Chapter 7
concludes this book.
Chapter 2
1. TIAs
Transimpedance amplifiers play a critical role in optical receivers.
Trade-offs between noise, speed, gain, and supply voltage present many
challenges in TIA design. As TIAs experience a tighter performance
envelope with technology scaling at the device level and speed scaling at
the system level, it becomes necessary to design the cascade of the TIA,
the limiter, and the decision circuit concurrently. The TIA bandwidth
is typically chosen to be equal to 0.7 times the bit rate - a reasonable
compromise between the total integrated noise and the intersymbol in-
terference (ISI) resulting from limited bandwidth.
Shown in Fig. 2.1, the common-gate (or common-base) topology is
a candidate for TIAs as it provides a relatively low input impedance,
14 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS
2. Limiters
The voltage swing produced by TIAs at the minimum light level is
usually inadequate to drive the CDR circuit, necessitating further am-
plification. Used to boost the binary swings, limiters typically consist of
a cascade of differential pairs with enough bandwidth and a relatively
linear phase response so as to amplify the signal with negligible ISI.
The high small-signal gain requires low-frequency negative feedback to
prohibit the offset voltages of the differential pairs from saturating the
latter stages.
Interestingly, limiter design must cope with difficulties at both the
low corner and the high corner of the passband. Consider the limiter
topology shown in Fig. 2.3, where the feedback network suppresses the
offset of the last three stages. Since some optical standards require
that the low end of the passband fall around a few tens of kilohertz,
the values of and must be very large. More specifically, with a
small-signal gain of A per stage, the low corner frequency is given by
demanding an product on the order of 1 ms if
A is around 5. For this reason, the capacitors are usually placed off chip,
raising the number of package pins and also the possibility of crosstalk
from other bond wires. New circuit topologies may resolve this issue.
At the upper end of the passband, high-speed amplification techniques
must provide a well-behaved magnitude and phase response for both
small and large signals. Shown in Fig. 2.4, configurations such as the
Cherry-Hooper amplifier [8, 9] and the Gilbert gain cell [10] have been
used but their utility becomes more limited as the supply voltage falls.
In particular, the voltage drops across and in Fig. 2.4(a) and
TIAs and Limiters 17
the cascode in Fig. 2.4(b) both constrain the voltage headroom and
mandate level-shift circuits between the stages.
An attractive solution for low-voltage broadband amplifiers is induc-
tive peaking. Owing to the extensive work on monolithic inductors in
RF design, this method can now be realized with accurate modeling and
prediction of the performance in optical communication circuits as well.
Interestingly, inductor quality factors (Q’s) as low as 3 to 4 prove ade-
18 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS
quate for increasing the bandwidth, allowing the use of simple, compact
spiral structures.
Figure 2.5(a) shows a limiting stage incorporating inductive peaking.
It can be shown that an ideal inductor increases the bandwidth by ap-
create both substantial current switching from the supply and a finite
supply rejection, allowing a component to travel from the output stage
through the supply and back to the input stage. With a finite bond
wire inductance, the gain around the loop may exceed unity, leading
to high-frequency oscillation. The issue of course becomes much more
severe if a single-ended TIA shares the same supply lines with the lim-
iter. For this reason, separate supply lines, careful bypassing, symmetric
layout, and accurate package modeling are essential.
Notes
1 The input-referred noise current of is neglected for simplicity.
Chapter 3
In order to reduce jitter on the recovered clock signal, the filter should
have a very high selectivity to suppress the unwanted data-dependent
signal that results in amplitude and phase modulation.
Integration of highly selective band-pass filters operating at very high
frequencies is not practical using available fabrication processes. This
limitation calls for the use of external components such as SAW filters.
These filters, however, suffer from high loss and a relatively low speed
of operation that limits their applicability to 10-Gb/s operation.
(VCO) in a phase-locked loop (PLL). The idea is that during each data
transition, the location of the data transition with respect to the clock
edge is detected. If the data leads the clock, the clock is sped up. If the
data lags the clock, the clock is slowed down. If the zero crossings of
the data and the clock coincide, the clock frequency is kept constant to
ensure phase lock.
Figure 3.5 shows a generic CDR circuit. The VCO generates a clock
signal. The phase and the frequency of this signal is compared to that of
the incoming data in the phase detector, generating an error signal that is
passed through the charge pump and the low-pass filter to set the voltage
required by the VCO to oscillate at the frequency of interest. Phase
locking of the clock to the data means that their phases are different by
a small constant offset. This means that the derivative of their phases -
their frequencies - are identical.
The generated clock signal is also used to retime the data in the
decision circuit. As the incoming data is regenerated in this block, its
additive noise and ISI is suppressed while the amplitude is significantly
magnified.
Some of the design issues of the CDR circuits are mentioned in the
following:
Speed: The throughput of a high-speed receiver is determined by the
maximum operating rate of the CDR circuit. The CDR circuit consists
of blocks such as phase detectors and digital latches utilizing positive
feedback for regeneration at high speed. As the data rate increases, the
regeneration time of the latches becomes comparable to the data period,
thus limiting the maximum operating rate of a latch [12]. Another crit-
ical issue is sustaining the integrity of the clock signal generated by the
VCO, which operates at very high frequencies. Therefore, the available
low-cost fabrication processes such as CMOS technology can
marginally handle data rates as high as 10 Gb/s. This limitation can be
overcome by innovations at both the architecture and the circuit levels.
Later in this book, a number of approaches for increasing the speed of
the system are described.
Clock and Data Recovery Architectures 25
loop VCO jitter is significant within a frequency offset close to the loop
bandwidth with respect to the center frequency. Therefore, a wider loop
bandwidth removes a larger portion of the VCO jitter.
The capture range of an unaided CDR circuit is close to the loop
bandwidth. However, this limitation can be overcome by means of a
frequency acquisition scheme.
The acquisition time becomes shorter for a larger loop bandwidth.
Since the nominal loop bandwidth is defined by the standard, an adap-
tive bandwidth mechanism can be employed to speed up the acquisition.
Within the startup of the circuit, the loop bandwidth is increased to pro-
vide faster acquisition. As the circuit acquires lock, the bandwidth is set
back to the nominal value, suppressing the jitter entering the system.
A fiber link consists of many cascaded regenerators. Jitter can be
accumulated on the link as the smaller jitter peaking of regenerators is
added up to a large sum. To alleviate the difficulty, the jitter peaking
of each regenerator should be kept below 0.1 dB.
Jitter tolerance is a measure of the ability of the CDR circuit to track
a jittered input data signal. Jitter on the input signal can be considered
as phase modulation. The CDR must provide a clock signal that tracks
this phase modulation in order to accurately retime jittered data. The
jitter tolerance is defined as a mask that relates the maximum amount
of phase modulation that can be corrected by the loop to the frequency
offset with respect to the data rate (Fig. 3.7).
Power Dissipation: Until recently, low power dissipation was not
considered a critical requirement for optical transceivers. One reason was
that in contrast to handheld wireless transceivers, optical transceivers do
not run from a battery. Another reason was that high-quality transceivers
could only be integrated in power-hungry III-IV processes.
Development of bipolar technologies along with introduction of deep
sub-micron CMOS processes has allowed circuit designers to build sys-
tems with significantly reduced power consumption. This aspect be-
comes more attractive when a number of transceivers are placed on a
single chip in order to increase the operating rate of the system. The
power dissipation and the integrability of the circuit in VLSI technolo-
gies determine the number of transceivers that can be placed on one
chip. Lower power dissipation also eases packaging and eliminates heat
sinking issues.
Supply Scaling: Supply scaling has been a distinguished feature of
the trend towards deep sub-micron scaling in CMOS processes. While
resulting in reduced power consumption, the supply scaling limits the
choice of circuit topologies for high-speed applications. In a
Clock and Data Recovery Architectures 27
This comparison can be performed if the clock frequency equals the data
rate (Fig. 3.8(a)). As a result, retiming of the data can be performed
using flipflops that operate either on rising edge or falling edge of the
clock signal.
falling edges of the clock signal is different from half the clock period,
the width of the data eye sampled by the rising edge is different from
that sampled by the falling edge, resulting in bimodal jitter (Fig. 3.9).
The focus of the work presented in this book is the design of systems
employing half-rate architectures. Although the CMOS tech-
nology used here performs marginally in a full-rate system, the resulting
reduction of power consumption makes the half-rate approach a strong
candidate. Furthermore, since the scaling of CMOS processes cannot
keep up with the growing demand for systems operating at higher data
rates, half-rate approaches are becoming more attractive for high-speed
design in near future.
In chapters 5 and 6 we describe two half-rate CDR circuits. How-
ever, in the remainder of this chapter after a general review of the CDR
circuit’s building blocks, some of the existing full-rate architectures are
addressed by describing their phase and frequency detectors.
2.2. Oscillators
As an integral part of phase-locked loops, oscillators are used for clock
generation in these systems. The design of the VCO directly impacts the
jitter performance and the reproducibility of the CDR circuit. While LC
topologies achieve a potentially lower jitter and higher center frequency,
their limited tuning range makes it difficult to obtain a target frequency
without design and fabrication iterations.
Timing Jitter. In the CDR circuit, the main source of timing jitter
is the inherent thermal and shot noise of the active and passive devices
that make up each delay stage of the VCO. 1/f noise is usually not of
practical importance since it is rejected by the loop filter. Therefore,
minimizing the impacts of thermal and shot noise in the basic delay
stage becomes the key to attaining low timing jitter.
It can be shown that the thermal jitter improves with the square root
of power consumption [17]. To design for low jitter, the overdrive voltage
of the devices used in the delay stage should be maximized. For a fixed
delay and fixed current, the small-signal gain of each stage should also
be minimized. However, this gain must be large enough for oscillation
to occur.
2.2.3 LC Oscillators
Monolithic LC oscillators are formed by a resonant tank that consists
of a spiral inductor (L) and a variable capacitor (C) that resonate at a
frequency of
The inductors and capacitors suffer from having a resistive compo-
nent. This component is mostly dominated by the resistance of the
metal wire used in the inductor and Eddy current and displacement loss
Clock and Data Recovery Architectures 33
through the substrate. Figure 3.13 depicts the substrate loss versus its
sheet resistance. The Eddy loss reduces as the sheet resistance of the
It can be shown that the resistance seen between the two drains of a
cross-coupled differential pair equals Therefore, for the oscilla-
tion to occur, it is necessary that (Fig. 3.15).
Tuning. Only the inductor and capacitor values can be varied to tune
the frequency of an LC oscillator. Other parameters such as bias cur-
rents and transistor transconductances have a negligible effect on the
oscillation frequency. Since it is difficult to vary the value of the mono-
lithic inductor, the tank’s capacitance can be changed for tuning. The
amount of tuning achieved is reduced as the supply voltage gets lower.
Also, to maximize the tuning range, constant capacitances in the tank
must be minimized.
The variable capacitor can be formed using either pn junctions or
MOS varactors. The former is formed by diffusing doping in an N-
well (Fig. 3.16(a)), whereas the latter is formed by placing an NMOS
device in an N-well (Fig. 3.16(b)).
differential pairs coupling the output of each oscillator to the input of the
38 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS
bias current. The common-mode voltage and hence the oscillation fre-
quency can be varied by changing the on-resistance of [24]. As
increases, enters saturation mode and the voltage at node
P experiences a sudden change, resulting in nonlinearity in the VCO
characteristic. A second transistor, driven by a source-followed version
of can be added to the circuit to provide an effective resistance be-
tween P and the supply that smoothly varies with the control voltage.
A different tuning mechanism for quadrature oscillators is by chang-
ing the coupling coefficient between the two oscillators [22]. Figure 3.20
depicts the structure of this oscillator. The output phasor at node A
is determined by vector adding the phasor of the stand-alone oscillator
and the phasor of the coupling differential pair. As the coupling coeffi-
cient increases, the magnitude of the coupling phasor that is 90° away
from the stand-alone phasor increases. This indicates that the angle
Clock and Data Recovery Architectures 39
between the sum phasor and the stand-alone phasor increases, meaning
that the quadrature oscillator resonates at a larger frequency offset with
respect to the stand-alone oscillation frequency. As a result, the amount
of coupling can be changed to cover a very wide tuning range. This
value cannot be indefinitely reduced because the two oscillators will lose
synchronization if the coupling is too small. Phase noise sets a limit
on the maximum amount of coupling. As the frequency of oscillation
deviates from the resonance frequency of the stand-alone oscillator, the
Q of the tank at the frequency of oscillation reduces and the phase noise
is degraded.
The quadrature oscillator can also be used to generate a differential
signal at twice the frequency [25]. As shown in Fig 3.21, the fully differ-
ential topology allows for the possibility of sensing the common-source
nodes as the output at twice the frequency. The common-mode node
must be followed by proper buffering stages to ensure reasonable swings.
Another implementation of coupled oscillators is the circuit of [26].
Shown in Fig. 3.22, the circuit consists of a number of cross-coupled
oscillators that are placed in a ring. The idea is to improve phase noise
by providing a higher amount of noise filtering through several high Q
tanks. If n oscillators are cascaded in a loop, the output noise filtering
goes up by a factor of However, since the number of noise sources
increases proportionally with n, it can be assumed that the output noise
power density reduces by a factor of n. Meanwhile, the signal at the
oscillation frequency is amplified by a factor of n and its power scales
up with a factor of As a result, it can be assumed that the phase
40 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS
tuning range and phase noise becomes very difficult. The quality of the
inductors degrades as they operate at speeds close to their self-resonance
frequency. Variable capacitors added at the output of the oscillator to
provide tuning deteriorate the integrity of the output signal of oscillators.
An alternative solution for implementation of oscillators at very high
speeds is the distributed oscillator. The oscillator is formed by connect-
ing the output of a distributed amplifier to its input. Design of these
oscillators has significantly advanced in the recent years [28, 29]
where is the gain of the phase detector, and is the input phase
difference.
Although this simple approach proves to be useful for applications
where the two inputs have identical frequencies and different phases,
it falls short in providing frequency error information as the two input
frequencies start to grow apart from each other.
The reason is that if the two frequencies are not equal, the detector
generates a beat frequency with an average value of zero (Fig. 3.25).
The beat signal can still provide efficient information about the phase
and frequency difference if the two frequencies are slightly different. To
improve the capture range of the phase detector, modern phase-locked
systems incorporate additional means of frequency acquisition.
A circuit that can detect both phase and frequency difference proves
extremely useful because it significantly increases the acquisition range
and lock speed of PLLs. The sequential phase/frequency detector (PFD)
44 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS
proves to provide a large capture range for periodic waveforms [31]. Fig-
ure 3.26(a) shows the implementation of this circuit and the correspond-
ing waveforms when the two inputs have different frequencies and phases.
If the frequency of input A is greater than that of input B, then the PFD
Clock and Data Recovery Architectures 45
samples its input on the rising edge of the clock and the other one
samples it at the falling edge. As shown in the waveforms, if the three
signals, A, and are applied to the two XOR gates, the resulting
signals will have the property of linear phase detectors. One will carry
a pulse for every transition of the data with a width proportional to the
phase difference between the clock and the data. The other one will have
pulses as wide as half the clock period.
An important feature of the Hogge phase detector is the automatic
retiming of the incoming sequence. In the locked condition, the zero
crossings of the clock signal appear in the middle of a bit. Meaning that
the clock samples the bit at its optimum point.
Clock and Data Recovery Architectures 47
control the first and second down-ramp. This indicates that each phase
measurement persists for two clock periods and charge pump activities
provided by the four outputs cancel each other such that the triwave
transient has a net area of zero. This effect significantly reduces the
pattern-dependent jitter at the output of the CDR circuit.
repeatedly moves back and forth around the zero crossings of the data.
This is in contrast to linear phase detectors because the output of the
latter goes to zero in phase lock. This characteristic of binary phase
detectors can inherently lead to a higher charge pump activity, possibly
increasing the clock jitter.
One of the most commonly-used binary phase detectors is the circuit
presented by Alexander [33], in which the zero crossings of the data are
measured as early or late events when compared with the transitions of
the clock signal. Similar to the Hogge phase detector, the structure of
the Alexander phase detector allows for automatic retiming of the data.
During any particular clock interval, this phase detector provides
three binary samples of the data signal: the previous bit (A), a sam-
ple of the current bit at the zero crossing (B), and the current bit (C)
(Fig. 3.30(a)). Figures 3.30(b),(c) depict the value of these samples for
the early and late clocks, respectively. The retimed data can be taken
If clock is late.
clock signal applied to its input and the data signal applied to its trigger.
Figures 3.32(b) and (c) depict the waveforms for the two cases when
the clock is early or late. Utilizing both the rising and falling edges
increases the correction rate of the CDR circuit by a factor of two. This
can eventually result in a smaller output jitter, because the VCO phase
will be corrected at a higher rate.
Since the bang-bang nature of this phase detector creates significant
ripple on the control line in the locked condition and hence produces
large jitter at the VCO output, the latches forming the flipflops can be
replaced by sample-and-hold circuits to modify the binary characteristic
of the phase detector into a more linear behavior. In [35], the phase
detector is formed as a master-slave sample-and-hold circuit (Fig. 3.33).
The rising data transitions sample the instantaneous value of the VCO
output. The circuit thus generates an output that is linearly propor-
tional to the phase difference in the vicinity of the lock point.
the data when the circuit turns on and the VCO starts to oscillate at a
frequency that is very different from the data rate.
This limitation calls for an aided acquisition mechanism. Various
frequency detection schemes have been introduced that operate with or
without a reference signal. The idea is that as the circuit turns on, the
frequency detector pushes the VCO frequency to a value close to the
data rate. When the difference between the oscillation frequency and
the data rate is small enough to fall in the capture range of the phase
detector, the frequency detector is disabled and the phase detector takes
over. Eventually in the phase-lock condition, the phases of clock and
data signals are within a constant offset from each other, ensuring that
the clock frequency equals the data rate.
We describe a number of mechanisms for referenced and referenceless
frequency acquisition. Similar to the phase detectors, the frequency de-
tectors can operate with a full-rate or half-rate clock. We briefly review
a number of full-rate frequency detection schemes in this section. In
chapter 6, a new approach for half-rate frequency detection is described.
Because of this decomposition, the VCO gain from the fine control to the
output can be very small. A smaller VCO gain translates the ripple on
the control line into a smaller amount of output jitter. Meanwhile, the
coarse control guarantees phase lock over a very wide frequency range.
The two oscillators used in this circuit should be spaced far apart from
each other. Otherwise, injection locking of the two VCOs can result in
false lock. Theoretically, identical oscillators provide equal oscillation
frequencies for similar control voltages. However, inherent mismatches
between the two VCOs can be significant because they should be placed
relatively apart from each other. For this reason, the CDR circuit should
use a means of narrowband frequency detection to achieve phase lock for
small frequency mismatches.
On the other hand, the circuit described in [37] consists of a single
VCO, a phase detector, and a frequency detector. The phase detector
and the frequency detector are connected to the loop filter through a
multiplexer (Fig. 3.35). When the circuit turns on, the multiplexer ac-
tivates Loop I and the circuit locks to the reference clock. Then the
multiplexer switches to the other mode and Loop II is activated. As the
loop locks to the random data, the frequency detector is turned off, re-
ducing the power consumption. The operating mode of the multiplexer
is determined by a lock detector that measures the frequency difference
between the reference clock and VCO frequency. It can be formed as
Clock and Data Recovery Architectures 55
a counter that counts the number of pulses on one of the signals when
clocked by the other one.
correlator depends on the existence of a tone at the data rate and the
random data does not contain a spectral component at this frequency,
edge detection should be performed.
In this circuit, edge detection is performed by differentiating and rec-
tifying the data. The circuit uses stacking to integrate the tasks of
differentiation, rectification, mixing, and low-pass filtering in one block.
This will result in substantial power reduction since the circuits reuse
Clock and Data Recovery Architectures 57
the same bias current. Stacking also yields a smaller chip area since
routing between various stages can be eliminated.
The quadricorrelator can also be implemented using digital elements
to produce a binary error signal. The two examples are the rotational
frequency detector [41], and the circuit introduced by Pottbacker [34].
The operation of the rotational frequency detector can be described
as follows: In the presence of a frequency difference between the data
and the clock, the phase relationship will change with time at a rate
proportional to the frequency difference, producing a beat frequency. A
circular phasor diagram of the oscillator signal can be used to express the
concept (Fig. 3.38). The diagram is split in four quadrants, A, B, C, and
D. For simplicity the phasor for the clock is assumed to be constant,
serving as a reference, and the phasor for the data moves around the
circle. The direction of this rotation determines whether the data rate
is faster or slower than the clock frequency.
When the data frequency is lower than the data rate, the data phasor
rotates counterclockwise. The direction of rotation can be distinguished
by marking the two consecutive quadrants where the phasor is detected.
For example as the phasor moves from B to C, the clock is found to be
fast. A transition from the C to B quadrant denotes a slow clock.
In the Pottbacker frequency detector, shown in Fig. 3.32, two beat
frequencies equal to the difference of clock frequency and data rate are
generated at the outputs and one of them leading the other
one. The direction of the frequency difference can be determined from
the relative spacing of these two signals. If leads clock is
slow and if lags clock is fast. The relative spacing of the two
signals is extracted using a DETFF in which is sampling
CDR loops employing frequency detectors that operate with random
data exhibit only a moderate capture range, not exceeding of
the center frequency. This limitation can be explained with the aid
of the characteristic plotted in Fig. 3.39 for the frequency detector of
Fig. 3.32(a). We note that for a large difference between the data rate
and the VCO frequency, the average output is close to zero, carrying
58 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS
1. Introduction
This interface is used in a cryogenic radar system (Fig. 4.1). The
received radar signal is converted to a digital bit stream by means of a
Josephson junction analog-to-digital converter. The resulting output is a
pseudo-differential return-to-zero (RZ) signal with a bit rate of 1.2 Gb/s
and an amplitude of The interface must convert this serial data
into 8 parallel streams each having a peak-to-peak amplitude of 1 V.
The principal challenge in this design is the combination of high speed
and low signal levels in a moderate technology such as CMOS
process. To appreciate this challenge, some of the important issues in the
design of the interface, which consists of a pre-amplifier and a decision
circuit, are described.
For bandwidth calculations, an RZ signal can be roughly considered
as a nonreturn-to-zero (NRZ) signal with twice the bit rate. In or-
der to suppress intersymbol interference (ISI) in a broadband system,
the bandwidth should be at least equal to 70% of the bit rate, about
62 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS
1.7 GHz in this system. Increasing the bandwidth beyond this point
reduces the intersymbol interference but at the cost of higher power dis-
sipation and, more importantly, higher total integrated thermal noise.
In other words, there exists a trade-off between signal integrity and sen-
sitivity. The bandwidth is chosen to be approximately equal to 2 GHz
as a compromise between these two factors.
To achieve acceptable ISI, the amplifier should have a transfer function
with small ripples in magnitude and a linear phase across this bandwidth.
To overcome the offset of the decision circuit and provide enough
overdrive voltage, the amplifier must exhibit sufficient gain, on the order
of 40 dB.
Finally, to obtain a bit error rate (BER) of the input-referred
noise density must be lower than This is an important
concern because the bandwidth requirement does not allow a large gain
in the first stage of this amplifier, making the noise of the following
stages significant.
Simulations indicate that it is difficult to simultaneously satisfy all
of these requirements in a CMOS technology, even if power dis-
sipation is not critical. As a result, a means of relaxing the trade-offs
between these parameters must be introduced.
2. Matched Filtering
First, we assume that a stream of rectangular pulses with amplitude A
and period experiences additive noise and subsequently goes through
a low-pass system with bandwidth (Fig. 4.2(a)).
A CMOS Interface for Detection of 1.2-Gb/s RZ Data 63
3. Architecture
Figure 4.6 depicts the interface architecture [43]. The input signal is
66 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS
When one matched filter resets, the other integrates. Each decision
circuit begins to sample at the end of the integration mode, producing
a logical level at the output.
4. Building Blocks
4.1. Low-Noise Wideband Amplifier
The wideband amplifier in this interface must boost the signal level
with minimal ISI. This amplifier consists of 7 stages: The first is a
common gate topology and the following six are common-source stages
(Fig. 4.7(a)-(c)).
the switches in the matched filter. The eye opening is about 70% and
the jitter is about 40 ps. The signal slews with a slope of 1 V/ns.
In order to integrate the inductors in a reasonable area, a stacked
structure consisting of metal 2 and metal 3 has been used (Fig. 4.10).
Since in this circuit the self-resonance frequency is more critical than
the quality factor, the line width is only The values range from
11 nH to 17 nH.
circuit of Fig. 4.11 (a), and convert the input voltage to current
and the result flows through the total capacitance at nodes X and Y.
Simulations indicate that the parasitic capacitance already available at
nodes X and Y is sufficient to allow fast integration with considerable
voltage swings.
The dimensions of the input transistors are chosen as a compromise
between input-referred offset (around 10 mV) and speed. Each matched
filter is biased at a total current of 1 mA and provides a voltage gain of
approximately two at the end of the integrate phase.
As shown in Fig. 4.11(b), the RZ bit stream amplified by the pre-
amplifier is applied to the input and is integrated for one bit period.
Switch which is an NMOS device, resets the integrator at the end
of the integration phase. The dimensions of are chosen as a trade-
off between faster reset and less parasitic capacitance at the output.
However, common-mode feedback and a hold phase should be added to
this circuit.
Shown in Fig. 4.12(a), the common-mode feedback consists of two
relatively large resistors, and which are implemented by small
PMOS devices. The hold mode is controlled by switch which dis-
ables the differential pair at the end of the integration mode and freezes
the output voltage. In reality, the output does experience a small degra-
dation (Fig. 4.12(b)). The dip seen in the hold mode results from the
relatively large capacitance from the source of and to ground.
Matched filtering along with interleaving provides a hold period, mak-
ing the sampling less susceptible to jitter. Proper choice of device di-
mensions leads to a small input offset for the matched filter. However,
A CMOS Interface for Detection of 1.2-Gb/s RZ Data 71
the devices cannot be very large because of the limitations on speed and
loading on the previous stage. The output voltage of the matched fil-
ter drops during the hold mode. To alleviate this problem, the decision
circuit starts to sample the output of the matched filter at the start
of the hold mode rather than at the end of it. The non-square wave-
form of the input slightly degrades the improvement in SNR because the
corresponding matched filter is not exactly an integrator.
4.3. Demultiplexer
The matched filter is followed by a master-slave D flipflop (Fig. 4.13.
To achieve short set-up and hold times, the flipflops use current steering
with 2 V of differential swing. Each latch consists of a pre-amplifier that
senses the amplitude during half the period and a regenerative circuit
72 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS
A CMOS Interface for Detection of 1.2-Gb/s RZ Data 73
that boosts the level of the signal. Each latch uses a bias current of
1 mA and load resistors of
filters. The inverters are sized to drive the load capacitance with short
rise and fall times.
Since the I and Q clock signals experience different loading, these in-
verters are large enough to maintain reasonable matching in the interface
environment. In this design, rail-to-rail swings are preferred because of
their capability to switch nodes with different dc levels.
Resistor serves as a termination. This resistor also equalizes the
mismatch between the common-mode levels and also the phase of the
signals applied to the input of the two gates.
74 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS
5. Experimental Results
The interface has been fabricated in a CMOS technology. Fig-
ure 4.15 depicts the die photograph. The circuit occupies an area of
The circuit was tested with a 3.3-V supply in a
chip-on-board assembly.
6. Conclusion
The concepts introduced in this work can be applied to other appli-
cations as well. For example, in a fiber optic receiver (Fig. 4.17), the
current generated by the photo detector is amplified by a transimpedance
amplifier. We can then interpose interleaved matched filters between the
amplifier and the decision circuits to improve the SNR. Since the power
consumption and complexity of the matched filters can be quite low, the
boost in performance is obtained at minimal cost.
A CMOS interface for detection of 1.2-Gb/s RZ data incorporates
wideband amplification, matched filtering, and demultiplexing. Low-
noise amplifiers with bandwidths exceeding 2 GHz can be implemented
A CMOS Interface for Detection of 1.2-Gb/s RZ Data 75
1. Architecture
The choice of the CDR architecture is primarily determined by the
speed and supply voltage limitations of the technology as well as the
power dissipation and jitter requirements of the system.
In a generic CDR circuit, shown in Fig. 5.1, the phase detector com-
pares the phase of the incoming data to the phase of the clock generated
by the voltage-controlled oscillator (VCO), producing an error that is
proportional to the phase difference between its two inputs. The error is
then applied to a charge pump and a low-pass filter so as to generate the
oscillator control voltage. The clock signal also drives a decision circuit,
thereby retiming the data and reducing its jitter.
If attempted in a CMOS technology, the architecture of
Fig. 5.1 poses severe difficulties for 10-Gb/s operation. Although ex-
78 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS
ploiting aggressive device scaling, the CMOS process used in this work
provides marginal performance for such speeds. For example, even sim-
ple digital latches or three-stage ring oscillators fail to operate reliably
at these rates. These issues make it desirable to employ a “half-rate”
CDR architecture, where the VCO runs at a frequency equal to half of
the input data rate. The concept of half-rate clock has been used in
[44]-[47]. However, [44] and [45] incorporate a bang-bang phase detector
(PD), possibly creating large ripple on the control line of the oscilla-
tor and hence high jitter. The circuit reported in [46] inherently has a
smaller output jitter as a result of using a linear phase detector, but it
fails to operate at speeds above 6 Gb/s in CMOS technology.
The circuit of [47] benefits from a new linear phase detection scheme,
but it may not operate properly with certain data patterns.
Another critical issue in the architecture of Fig. 5.1 relates to the
inherently unequal propagation delays for the two inputs of the phase
detector: Most phase detectors that operate properly with random data
(e.g., a D flipflop) are asymmetric with respect to the data and clock
inputs, thereby introducing a systematic skew between the two in phase-
lock condition. Since it is difficult to replicate this skew in the decision
circuit, the generic CDR architecture suffers from a limited phase margin
- unless the raw speed of the technology is much higher than the data
rate.
The problem of the skew demands that phase detection and data
regeneration occur in the same circuit such that the clock still samples
the data at the midpoint of each bit even in the presence of a finite skew.
For example the Hogge PD [32] automatically sets the clock phase to the
optimum point in the data eye (but it fails to operate properly with a
half-rate clock).
The above considerations lead to the CDR architecture shown in
Fig. 5.2. Here, a half-rate phase detector produces an error proportional
to the phase difference between the 10-Gb/s data stream and the 5-GHz
output of the VCO. Furthermore, the PD automatically retimes and de-
multiplexes the data, generating two 5-Gb/s sequences and
A 10-Gb/s Linear Half-rate CMOS CDR Circuit 79
2. Building Blocks
2.1. VCO
The design of the VCO directly impacts the jitter performance and
the reproducibility of the CDR circuit. While LC topologies achieve
a potentially lower jitter, their limited tuning range makes it difficult
to obtain a target frequency without design and fabrication iterations.
Since the circuit reported here was our first design in technology,
a ring oscillator was chosen so as to provide a tuning range wide enough
to encompass process and temperature variations.
A three-stage differential ring oscillator [Fig. 5.4(a)] driving a buffer
operates no faster than 7 GHz in CMOS technology. The half-
rate CDR architecture overcomes this limitation, requiring a frequency
of only 5 GHz.
As shown in Fig. 5.4(b), each stage consists of a fast and a slow path
whose outputs are summed together. By steering the current between
the fast and the slow paths, the amount of delay achieved through each
stage and hence the VCO frequency can be adjusted. All three stages
in the ring are loaded by identical buffers to achieve equal rise and fall
times and hence improve the jitter performance. Figure 5.4(c) shows the
transistor implementation of each delay stage. The fast and slow paths
are formed as differential circuits sharing their output nodes. The tuning
A 10-Gb/s Linear Half-rate CMOS CDR Circuit 81
is achieved by reducing the tail current of one and increasing that of the
other differentially. Since the low supply voltage makes it difficult to
82 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS
control provides a gain of 150 MHz/V and the coarse control provides
2.5 GHz/V. The tuning range is 2.7 GHz (Fig. 5.6).
applied to the inputs of two sets of cascaded latches, each cascade con-
stituting a flipflop that retimes the data. Since the flipflops are driven
by a half-rate clock, the two output sequences and are the de-
A 10-Gb/s Linear Half-rate CMOS CDR Circuit 85
for the charge pump. The operation of the XOR circuit is as follows. If
the two logical inputs are not equal, then one of the input transistors
on the left and one of the input transistors on the right turn on, thus
turning off. If the two inputs are identical, one of the tail currents
flows through Since the average current produced by the Error
XOR gate is half of that generated by the Reference XOR gate, transis-
tor is scaled differently, making the average output voltages equal
for zero phase difference. Channel length modulation of transistor
reduces the precision of current scaling between the two XOR gates.
This effect can be avoided by increasing the length of the device.
The gain of the phase detector is determined by the value of the resis-
tor and the tail current sources The voltage is generated on
chip in order to track the variations over temperature and process. This
voltage equals the output common-mode level of the latches preceding
the XOR gate. It is generated using a differential pair that is a replica
of the preamplifier section of the latch. Current source raises the
common-mode level of the differential signal formed by the Error and
Reference signals, making compatible with the input of the charge
pump.
It is instructive to plot the input/output characteristic of the PD to
ensure linearity and absence of a dead zone. This is accomplished by
obtaining the average values of Error and Reference while the circuit
operates at maximum speed. Figure 5.9 shows the simulated behavior
as the phase difference varies from zero to one bit period. The Reference
average exhibits a notch where the clock samples the metastable points
of the data waveform. The Error and Reference signals cross at a phase
difference approximately 55 ps from the metastable point, indicating
A 10-Gb/s Linear Half-rate CMOS CDR Circuit 87
that the systematic offset between the data and the clock is very small.
The linear characteristic of the phase detector results in minimal charge
pump activity and small ripple on the control line in the locked condition.
The choice of the logic family used for the XOR gates and the latches
is determined by the speed and switching noise considerations. While
rail-to-rail CMOS logic achieves relatively high speeds, it requires am-
plifying the data swings generated by the stage preceding the CDR cir-
cuit (typically a limiting amplifier). Furthermore, CMOS logic produces
enormous switching noise in the substrate and on the supplies, disturb-
ing the oscillator considerably. For these reasons, the building blocks
incorporate current-steering logic. The phase detector incorporates an
input buffer with on-chip resistive matching.
where and are the gains of the VCO and PD, respectively,
and denotes the conversion gain of the charge pump. Equation (5.1)
can be used to determine the value of The amount of the jitter
A 10-Gb/s Linear Half-rate CMOS CDR Circuit 89
3. Experimental Results
The CDR circuit has been fabricated in a CMOS process.
Figure 5.12 shows a photograph of the chip, which occupies an area of
ESD protection diodes are included for all pads except the
high-speed ones. Nonetheless, since all of these lines have a termi-
90 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS
4. Conclusion
CMOS technology holds great promise for optical communication cir-
cuits. The raw speed resulting from aggressive scaling along with high
levels of integration provide a high performance at low cost. A 10-Gb/s
clock and data recovery circuit designed in CMOS technology
performs phase locking, data regeneration, and demultiplexing with 1 ps
of rms jitter.
A 10-Gb/s Linear Half-rate CMOS CDR Circuit 93
Chapter 6
1. Introduction
The majority of the CDR circuits employ ring and LC oscillators to
generate a clock signal. Ring oscillators have been dominantly used to
implement systems, operating at lower speeds such as OC-3 and OC-12.
They provide a wide tuning range and differential control that makes the
circuit less susceptible to supply and substrate noise. Furthermore, they
benefit from a compact layout, easing the routing of high-speed signals,
and yielding a smaller area. The output jitter of these oscillators is small
enough to meet the OC-3 and OC-12 standard specifications.
As the data rate increases, the ring topology becomes an unattractive
candidate for the oscillator implemented in a CDR circuit. The most
important disadvantage is its limited signal integrity. Generation of a
robust clock signal at a high frequency using a ring oscillator is diffi-
cult. The maximum oscillation frequency achieved from a ring oscillator
96 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS
2. Architecture
Because of the marginal performance of the technology used
in this work, similar to the circuit described in chapter 5, the clock
frequency is chosen to be half of the data rate. However, the previous
circuit suffers from a limited capture range because it lacks a means of
frequency detection.
Various techniques for performing frequency detection without a refer-
ence clock have been introduced. But such techniques rely on a full-rate
clock to obtain the frequency error signal.
In this work, a new approach to performing phase and frequency de-
tection using a half-rate clock is described. This technique both achieves
a high speed and automatically retimes the data.
Shown in Fig. 6.1, the CDR consists of a phase and frequency detector
(PFD), a voltage-controlled oscillator (VCO), a charge pump, and a low-
pass filter (LPF). The PFD compares the phase and the frequency of the
input data to that of a half-rate clock, providing two binary error signals
for phase and frequency. These error signals are fed back to the VCO
through the charge pump and the low-pass filter. After phase lock is
achieved, the phase of the output clock is within a small offset from
the phase of the input data. This guarantees that the clock frequency is
equal to one half of the input bit rate. The PFD is designed such that, in
addition to providing information about the phase error, it retimes the
data as well. Consequently, the CDR exhibits no systematic offset, i.e.,
inherent skews between clock and data edges due to their nonidentical
paths through the loop do not degrade the quality of detection. The
VCO provides multiple phases over the full tuning range.
98 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS
3. Building Blocks
3.1. VCO
Shown in Fig. 6.2(a), the VCO consists of a four-stage differential ring
oscillator with LC-tuned loads, providing a tuning range wide enough to
encompass process and temperature variations. The number of stages is
chosen such that multiple clock phases with 45° of spacing required in
the PFD can be generated. This loop must have a negative feedback at
low frequencies in order to provide multiple phases; otherwise, the four
signals will be in phase.
Figure 6.2(b) shows the implementation of each stage. The loads are
formed using spiral inductors and MOS varactors. In order to determine
the frequency of oscillation, we recognize that each stage in the ring
must provide 45° of phase shift for oscillation to occur. As shown in
Fig. 6.2(c), the load can be modeled by a parallel LC tank along with a
parallel resistor The major contributor to this resistive loss in the
tank is the limited Q of the inductor. Therefore, can be approximated
as Setting the phase shift of the parallel tank to 45°, we arrive at
the following equation.
A 10-Gb/s CMOS CDR Circuit with Wide Capture Range 99
As shown in [50], the model of Fig. 6.3 can be used to predict the self-
resonance frequency of the tank. Theoretically, it can be shown that the
effective capacitance of the inductor equals In reality,
this value is closer to
The VCO occupies a large chip area as a result
of having eight spiral inductors. Therefore, the metal lines carrying the
multi-phase clock signals are very long. These interconnects are laid out
using wide traces of the top metal layer in order to reduce the resis-
tance of the wire. This results in a large routing capacitance, since the
fringe capacitance of the top metal layer in a CMOS technology
is several times higher than its bottom-plate capacitance. If the buffers
following the VCO are placed before the interconnects, the parasitics will
introduce a large time constant at the buffer outputs, drastically reduc-
ing the voltage swing of the high-speed signal. To alleviate this problem,
the buffers are placed after the interconnects so that the parasitics can
A 10-Gb/s CMOS CDR Circuit with Wide Capture Range 101
A difference between the data rate and twice the clock frequency will
result in a beat frequency, formed as an alternating low-speed signal
at the output of the PD. The average period of this signal represents
the difference of the bit rate and twice the clock frequency. This signal,
however, is not sufficient to determine the polarity of the frequency error.
We therefore add a second PD to the circuit, whose structure is iden-
tical to the first PD. The only difference is that the in-phase and the
quadrature clock signals applied to this block lead their counterparts in
the other phase detector by 45°.
Figure 6.7 depicts the output of both PDs for two cases when clock
frequency is less or greater than half the data rate. From these wave-
forms, the following observations can be made:
If clock is slow, lags Therefore, if is sampled by the
rising and falling edges of the results are negative and positive,
respectively.
If clock is fast, leads Therefore, if is sampled by
the rising and falling edges of the results are the inverse of the
previous case.
We conclude that the modified DETFF can be used to extract the
frequency error signal from and
Figure 6.8 shows the PFD structure, where and
lead by 45°, 90°, and 135°, respectively. The voltages and
are used as the phase error and frequency error signals.
A 10-Gb/s CMOS CDR Circuit with Wide Capture Range 105
106 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS
bandwidth the buffer stages employ inductive peaking [43]. The value of
the inductors is chosen so as to avoid peaking in the passband. Since the
quality factor of the inductors is not critical here, the spiral structures
have a line width of only to achieve a high self-resonance frequency.
The value of the inductors ranges from 1.5 nH to 3.5 nH.
108 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS
4. Loop Characterization
Figure 6.12(a) depicts a linear small-signal model of the CDR circuit,
in the vicinity of the lock point. In chapter 5, the 3-dB bandwidth of
the transfer function from the input of the phase detector to the VCO
output and the value of jitter peaking in this system was calculated.
The assumption is that the loop filter only consists of a series resistor
and capacitor (Fig. 6.12(b)). This simple model can also be used for
determination of the 3-dB bandwidth of the closed-loop VCO’s phase
noise characteristic Shown in Fig. 6.12(c), this bandwidth can be
approximated if the loop is heavily overdamped and the jitter transfer
function has no peaking.
Thermal noise enters the system from the input of the phase detector
and the control of the VCO. If the transfer function from these inputs
to the output is represented as and respectively, then the
power spectral density of the output noise can be given as:
In this equation:
5. Experimental Results
The CDR circuit has been fabricated in a CMOS process.
Figure 6.13 shows a photograph of the chip, which occupies an area of
110 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS
purity at the lower bound of its tuning range (-102.35 dBc/Hz at 1-MHz
offset). The open-loop VCO phase noise at 5 GHz is -86 dBc/Hz. The
tuning characteristic of the VCO varies by 1% over process.
Figure 6.15(a) shows the spectrum of the clock in response to a
9.95328-Gb/s data sequence of length The phase noise at 1-MHz
offset is approximately equal to -107 dBc/Hz. Figure 6.15(b) depicts the
recovered clock in the time domain. The jitter performance of the CDR
circuit is characterized by the Anritsu MP1777 jitter analyzer. A ran-
A 10-Gb/s CMOS CDR Circuit with Wide Capture Range 111
the jitter on the clock signal is not limiting the circuit’s performance,
as verified by the jitter tolerance experiment at a lower rate, and the
input does not impose a severe limitation on the bandwidth, the output
buffer probably results in the high BER. Improvement of the output
A 10-Gb/s CMOS CDR Circuit with Wide Capture Range 113
6. Conclusion
A 10-Gb/s clock and data recovery circuit designed in CMOS
technology performs frequency acquisition, phase locking, and data re-
generation. Achieving an rms jitter of 0.8 ps, this circuit is the first
CMOS CDR circuit to meet the jitter generation requirements defined
by SONET. The power consumption of this circuit is much smaller than
the power consumption of similar circuits fabricated in bipolar or GaAs
processes.
Chapter 7
CONCLUSION
allow designers to increase the maximum operating rate of the CDR cir-
cuit by 60 to 80 percent in any given technology. This feature becomes
attractive since the speed capability of the fabrication processes always
lags the demand for higher bit rates.
Different types of VCOs and phase detectors were used in the CDR
circuits to demonstrate their performance at speeds as high as 10 Gb/s.
LC oscillators benefit from larger swings, lower phase noise, and higher
accuracy in prediction of the resonance frequency. Their drawbacks
are their relatively large area and narrow tuning range. However, as
their frequency of operation increases the size of the integrated spiral
inductors needed to form the LC tank reduces. At the same time, the
relatively precise oscillation frequency of the LC oscillators relaxes the
requirements for a wide tuning range. The design and optimization of
LC oscillators has extensively advanced due to the research performed in
UCLA and many other institutions in the last few years. These circuit
can be found in most of the high-speed optical transceivers in very near
future.
Both the linear and binary phase detectors are attractive for imple-
mentation in a CDR circuit. The performance of the linear phase de-
tectors can be more easily modeled and predicted. They benefit from a
lower charge pump activity at the lock point because unlike the binary
phase detectors, the output of the linear phase detectors goes to zero
in phase lock. On the other hand, the binary phase detectors are less
susceptible to peaking in their jitter transfer characteristic because they
have a single-pole-like jitter transfer characteristic [51].
One major advantage of binary phase detectors to linear phase de-
tectors is that they can be expanded to perform referenceless frequency
detection on top of phase detection. This is because binary topologies
are capable of providing a strong beat frequency in presence of clock and
data frequency mismatch. Having this issue in mind, a binary system is
more suitable where unaided frequency acquisition must be performed.
However, systems that rely on an external reference signal for frequency
acquisition can incorporate a linear phase detector. The two implemen-
tations described in this book illustrate this concept.
The CDR circuit remains to be the most critical block of the optical
receiver. In the years to come, we will see new techniques targeting im-
proved performance, higher data rates, higher integration, lower power
consumption, and lower cost.
References
[1] webopedia.internet.com/networks/networking_standards/SONET.html
[13] F. Herzel, B. Razavi, “A Study of Oscillator Jitter due to Supply and Substrate
noise,” IEEE Transactions on Circuits and Systems II: Analog and Digital Signal
Processing, vol. 46, pp. 56-62, Jan. 1999.
[14] B. Razavi, RF Microelectronics, Upper Saddle River, NJ: Prentice Hall, 1998.
[15] B. Razavi, Design of Analog CMOS Integrated Circuits, New York, NY: McGraw
Hill, 2000.
[16] B. Razavi, ed. Monolithic Phase-Locked Loops and Clock Recovery Circuits, Pis-
cataway, NJ: IEEE Press, 1996.
[17] T. C. Weigandt, B. Kim, P. R. Gray, “Analysis of Timing Jitter in CMOS Ring
Oscillators,” Proc. IEEE ISCAS, vol. 4, pp.27-30, June 1994.
[18] W. B. Kuhn, N. K. Yanduru, “Spiral Inductor Substrate Loss Modeling in Sili-
con RF IC’s,” Microwave Journal, pp. 66-81, March 1999.
[19] C. M. Hung, L. Shi, I. Lagnado, K. K. O., “A 25.9 GHz Voltage-Controlled Os-
cillator Fabricated in a CMOS Process,” Digest of Symposium on VLSI Circuits,
pp. 100-101, June 2000.
[20] D. B. Leeson, “A Simple Model of Feedback Oscillator Noise Spectrum,” Proc.
of IEEE, vol. 54, pp. 329-330, 1966.
[21] J. J. Rael, A. A. Abidi, “Physical Processes of Phase Noise in Differential LC
Oscillators,” Proceedings of the Custom Integrated Circuits Conference, pp. 569-
572, May 2000.
[22] T.-P. Liu, “A 6.5 GHz Monolithic CMOS Voltage-Controlled Oscillator,” ISSCC
Digest of Technical Papers, pp. 404-405, Feb. 1999.
[23] A. Rofougaran, J. Rael, M. Rofougaran, A. Abidi, “A 900 MHz CMOS LC-
Oscillator with Quadrature Outputs,” ISSCC Digest of Technical Papers, pp.
392-393, Feb. 1996.
[24] B. Razavi, “A 1.8 GHz CMOS Voltage-Controlled Oscillator,” ISSCC Digest of
Technical Papers, pp. 388-389, Feb. 1997.
[25] C. Lam, B. Razavi, “A 2.6 GHz/5.2 GHz CMOS Voltage-Controlled Oscillator,”
ISSCC Digest of Technical Papers, pp. 402-403, Feb. 1999.
[26] J. J. Kim, B. Kim, “A Low-Phase-Noise CMOS LC Oscillator with a Ring
Structure,” ISSCC Digest of Technical Papers, pp. 430-431, Feb. 2000.
[27] T.-P. Liu, “1.5-V 10-12.5 GHz Integrated CMOS Oscillators,” Digest of Sym-
posium on VLSI Circuits, pp. 55-56, June 1999.
[28] B. Kleveland, C. H. Diaz, D. Dieter, L. Madden, T. H. Lee, S. S. Wong, “Mono-
lithic CMOS Distributed Amplifier and Oscillator,” ISSCC Digest of Technical
Papers, pp. 70-71, Feb. 1999.
[29] H. Wu, A. Hajimiri, “A 10 GHz CMOS Distributed Voltage Controlled Oscil-
lator,” Proceedings of the Custom Integrated Circuits Conference, pp. 581-584,
May 2000.
REFERENCES 121
[35] S. B. Anand, B. Razavi, “A 2.5-Gb/s Clock Recovery Circuit for NRZ Data
in CMOS Technology,” Proceedings of the Custom Integrated Circuits
Conference, pp. 379-382, May 2000.
[36] J. C. Scheytt, G. Hanke, U. Langmann, “A 0.155, 0.622, and 2.488 Gb/s Auto-
matic Bit Rate Selecting Clock and Data Recovery IC for Bit Rate Transparent
SDH Systems,” ISSCC Digest of Technical Papers, pp. 348-349, Feb. 1999.
[37] G. Gutierrez, S. Kong, B. Coy, “2.488 Gb/s Silicon Bipolar Clock and Data Re-
covery IC for SONET (OC-48),” Proceedings of the Custom Integrated Circuits
Conference, pp. 575-578, May 1998.
[38] C. F. Schaeffer, “The Zero-Beat Method of Frequency Discrimination,” Proceed-
ings IRE, vol. 30, pp. 365-367, August 1942.
[46] K. Nakamura, et al., “A 6 Gb/s CMOS Phase Detecting DEMUX Module Using
Half-Frequency Clock,” Digest of Symposium on VLSI Circuits, pp. 196-197,
June 1998.
[47] E. Mullner, “A 20 Gbit/s Parallel Phase Detector and Demultiplexer Circuit in
a Production Silicon Bipolar Technology with ” Proceedings of the
Bipolar/BiCMOS Circuits and Technology Meeting, pp. 43-45, Sept. 1996.
[48] B. Razavi, Y. Ota, R. G. Swarz, “Design Techniques for Low-Voltage High-
Speed Digital Bipolar Circuits,” IEEE Journal of Solid-State Circuits, vol. 29,
pp. 332-339, March 1994.
[49] J. Savoj, B. Razavi, “A 10-Gb/s CMOS Clock and Data Recovery Circuit with
Frequency Detection,” ISSCC Digest of Technical Papers, pp. 78-79, Feb. 2001.
[50] A. Zolfaghari, A. Chan, and B. Razavi, “Stacked Inductors and 1-to-2 Trans-
formers in CMOS Technology,” Proceedings of the Customs Integrated Circuits
Conference, pp. 345-348, May 2000.
[51] Y. M. Greshishchev, et al, “A Fully Integrated SiGe Receiver IC for 10-Gb/s
Data Rate,” IEEE Journal of Solid-State Circuits, vol. 35, pp. 1949-1957, Dec.
2000.
[52] T. O. Anderson, W. J. Hurd, and W. C. Lindsey, “U.S. pat. no. 3,626,298;
Transition Tracking Bit Synchronization System,” Dec. 1971.
[53] G. Stix, “The Triumph of the Light,” Scientific American, Jan. 2001.
Index
analyzer, 90 Pottbacker, 51
bimodal, 29, 80 bang-bang, 50
generation, 25, 96 binary, 45, 48
pattern-dependent, 103 characteristic, 86
peak-to-peak, 90 gain, 86
thermal, 32 linear, 45, 83
tolerance, 26 pattern dependency, 85
transfer, 25, 88, 91 triwave, 47
Laser diode, 11 Phase/frequency detector, 43, 97, 102
Latch, 84 Photo detector, 11
current-steering, 7 Power, 26
Limiter, 16 Quadricorrelator, 55
Logic SNR, 21, 63
current-steering, 87 SONET, 2
Loss Serialize, 5
Displacement, 32 Silicon bipolar, 6
Eddy, 32 Supply
substrate, 33 scaling, 27
Mapper, 6 TIA, 11, 13
Multiplexer, 7, 85 common-gate, 13
Noise, 22 noise, 14
input-referred, 14, 68 Transceivers, 2
switching, 36 VCO, 8, 24, 29, 77, 80, 97–98
thermal, 35, 108 LC, 32, 95
Optical coarse control, 82
amplifier, 4 distributed, 41
carrier, 2 fine control, 82
communications, 2 gain, 82
point-to-point network, 3 half-quadrature, 102
ring network, 2 multi-phase, 36, 98
PLL, 5, 24, 53 phase noise, 34, 110
bandwidth, 25 quadrature, 36
characterization, 108 ring, 30, 80, 95
dual-loop, 54 sensitivity, 82
filter, 88 tuning, 32, 110
jitter, 41 Varactor, 34
Phase detector, 42, 83 Wave-division multiplexing, 2, 7
Alexander, 49 XOR, 23, 43, 85
Hogge, 46 symmetric, 85