You are on page 1of 132

HIGH-SPEED CMOS CIRCUITS

FOR OPTICAL RECEIVERS


HIGH-SPEED CMOS CIRCUITS

FOR OPTICAL RECEIVERS

Jafar Savoj
Transpectrum Technologies, Inc.

Behzad Razavi
University of California, Los Angeles

KLUWER ACADEMIC PUBLISHERS


NEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW
eBook ISBN: 0-306-47576-6
Print ISBN: 0-7923-7388-X

©2002 Kluwer Academic Publishers


New York, Boston, Dordrecht, London, Moscow

Print ©2001 Kluwer Academic Publishers


Dordrecht

All rights reserved

No part of this eBook may be reproduced or transmitted in any form or by any means, electronic,
mechanical, recording, or otherwise, without written consent from the Publisher

Created in the United States of America

Visit Kluwer Online at: http://kluweronline.com


and Kluwer's eBookstore at: http://ebooks.kluweronline.com
Contents

List of Figures vii


List of Tables xi
Preface xiii
1. INTRODUCTION 1
1.1 Overview of the Fiber Optic Network 3
1.2 Overview of Fiber Optic Transceivers 5
1.3 Overview of Topics 12
2. TIAS AND LIMITERS 13
2.1 TIAs 13
2.2 Limiters 16
3. CLOCK AND DATA RECOVERY
ARCHITECTURES 21
3.1 Open-Loop CDR Architectures 22
3.2 Phase-Locking CDR Architectures 23
3.2.1 Full-Rate and Half-Rate Architectures 27
3.2.2 Oscillators 29
3.2.2.1 General Theory 29
3.2.2.2 Ring Oscillators 30
3.2.2.3 LC Oscillators 32
3.2.2.4 PLL Jitter Calculation 41
3.2.3 Phase Detectors 42
3.2.3.1 Linear Phase Detectors 45
3.2.3.2 Binary Phase Detectors 48
3.2.4 Frequency Detectors 52
3.2.4.1 Referenced Frequency Detectors 53
3.2.4.2 Referenceless Frequency Detectors 55
3.2.5 Decision Circuits 58
vi HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

4. A CMOS INTERFACE FOR DETECTION OF 1.2-GB/S


RZ DATA 61
4.1 Introduction 61
4.2 Matched Filtering 62
4.3 Architecture 65
4.4 Building Blocks 67
4.4.1 Low-NoiseWidebandAmplifier 67
4.4.2 Integrate-and-Dump Circuit 69
4.4.3 Demultiplexer 71
4.4.4 Clock Buffer 73
4.5 Experimental Results 74
4.6 Conclusion 74
5. A 10-GB/S LINEAR HALF-RATE CMOS
CDR CIRCUIT 77
5.1 Architecture 77
5.2 Building Blocks 80
5.2.1 VCO 80
5.2.2 Phase Detector 83
5.2.3 Charge Pump and Loop Filter 87
5.3 Experimental Results 89
5.4 Conclusion 92
6. A 10-GB/S CMOS CDR CIRCUIT WITH WIDE CAPTURE
RANGE 95
6.1 Introduction 95
6.2 Architecture 97
6.3 Building Blocks 98
6.3.1 VCO 98
6.3.2 Phase and Frequency Detector 102
6.3.3 Charge Pump 106
6.3.4 Output Buffers 107
6.4 Loop Characterization 108
6.5 Experimental Results 109
6.6 Conclusion 113
7. CONCLUSION 115
REFERENCES 119

Index
123
List of Figures

1.1 Volume of data transported over the Internet. 1


1.2 SONET architectures. 3
1.3 Light propagation in single-mode and multi-mode fibers. 3
1.4 SONET OC-192 interfaces. 4
1.5 Role of the framer and the mapper in processing
the data. 6
1.6 Fiber optic transceiver. 8
1.7 (a) Four-to-one and (b) two-to-one multiplexer. 9
1.8 Current-steering (a) multiplexer and (b) latch. 10
1.9 The clock multiplying unit. 11
2.1 Common-gate TIA. 13
2.2 Feedback TIA and its realizations. 15
2.3 Simple limiter. 16
2.4 (a) Cherry-Hooper amplifier, (b) Gilbert gain cell. 17
2.5 (a) Inductive peaking, (b) simple inductor model,
(c) more complete inductor model. 18
2.6 Instability resulting from feedback through supply line. 19
3.1 Detector with peak value sampling. 21
3.2 Edge detection of the random data. 22
3.3 Edge detection using an XOR gate. 23
3.4 Spectral line clock and data recovery. 23
3.5 Generic phase-locking CDR circuit. 24
3.6 Jitter transfer mask. 25
3.7 Jitter tolerance mask. 27
3.8 (a) Full-rate and (b) half-rate data recovery. 28
3.9 Effect of non-ideal duty cycle. 29
3.10 Negative feedback system. 30
3.11 Three-stage ring oscillator. 30
3.12 Delay interpolation. 32
viii HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

3.13 Substrate loss vs. sheet resistance. 33


3.14 (a) Decaying impulse response. (b) Oscillatory im-
pulse response of the tank. 33
3.15 Compensation of the tank loss in an LC oscillator. 34
3.16 (a) pn junction, (b) MOS varactor. 35
3.17 Block diagram of a quadrature oscillator. 37
3.18 The quadrature LC oscillator. 37
3.19 Modified tuning mechanism for a quadrature oscillator. 38
3.20 Tuning a quadrature oscillator by changing the
coupling coefficient. 39
3.21 Doubling the oscillation frequency by means of a
quadrature oscillator. 40
3.22 Multi-phase coupled oscillator. 41
3.23 Ring oscillator incorporating common-source stages
with inductive loads. 42
3.24 XOR gate operating with periodic data. 43
3.25 Beat frequency at XOR output for inputs with dif-
ferent frequencies. 44
3.26 (a) Phase/frequency detector. Circuit response with
(b) (c) A leading B. 44
3.27 Hogge phase detector. 46
3.28 Problem of triwave. 47
3.29 Modified Hogge phase detector. 48
3.30 (a) Alexander phase detector. Operation of the
circuit with (b) late clock and (c) early clock. 49
3.31 (a) CDR circuit using a D flipflop phase detector,
(b) PD characteristic, (c) addition of skews in
and 50
3.32 (a) Pottbacker phase/frequency detector. Samples
generated by for (b) early clock and (c)
late clock. 51
3.33 Linearized early-late detector. 52
3.34 Frequency acquisition using a similar VCO. 54
3.35 Dual loop frequency acquisition. 55
3.36 Quadricorrelator. 56
3.37 Quadricorrelator operating on random data. 56
3.38 Phasor diagram of the clock and data signals. 57
3.39 Characteristic of the frequency detector in Fig. 3.32(a). 58
3.40 CDR architecture with wide capture range. 59
3.41 Detection with integration over one bit prior to sampling. 60
4.1 Role of the interface circuit and pseudo-differential
input signals. 62
List of Figures ix

4.2 Output SNR for a system (a) without and (b) with
a matched filter. 63
4.3 Matched filter for rectangular pulse. 64
4.4 Single matched filter. 65
4.5 Interleaved matched filters. 65
4.6 Interface architecture. 66
4.7 (a) Seven-stage amplifier, (b) the first common-
gate stage, (c) the following common-source stages. 67
4.8 Amplifier’s transfer function. 68
4.9 (a) Overall input-referred noise and (b) output eye
of the amplifier. 69
4.10 Stacked inductor. 70
4.11 (a) High-speed integrate-and-dump circuit, (b) cor-
responding waveforms. 71
4.12 (a) Addition of hold phase, (b) corresponding waveforms. 72
4.13 Demultiplexer. 72
4.14 Clock buffers. 73
4.15 Die photograph. 74
4.16 Eye diagram of the output. 75
4.17 Addition of matched filtering to optical receivers. 75
5.1 Generic CDR architecture. 78
5.2 Half-rate CDR architecture. 79
5.3 Effect of non-ideal duty cycle. 80
5.4 (a) Three-stage ring oscillator, (b) implementation
of each stage, (c) transistor-level schematic. 81
5.5 Small-signal (a) gain and (b) phase response of
each delay stage. 82
5.6 VCO gain partitioning: (a) fine control and (b)
coarse control. 83
5.7 (a) Phase detector, (b) operation of the circuit. 84
5.8 Symmetric XOR gate. 86
5.9 Determination of PD gain. 87
5.10 Charge pump and loop filter. 88
5.11 Lock acquisition. 89
5.12 Chip photograph. 90
5.13 (a) Spectrum of the recovered clock, (b) recovered
clock in the time domain. 91
5.14 Measured jitter transfer characteristic. 92
5.15 (a) Recovered demultiplexed data, (b) recovered
full-rate data. 93
6.1 CDR architecture. 98
x HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

6.2 (a) Four-stage LC-tuned ring oscillator, (b) imple-


mentation of each stage, (c) simple model of the
load. 99
6.3 Distributed inductor model. 100
6.4 Signal arrangements (a) minimizing differential ca-
pacitance, (b) equalizing the length of the traces. 101
6.5 In-phase and quadrature samples of positive and
negative data edges with early and late clock signals. 103
6.6 Phase Detector. 104
6.7 Timing diagram in the PFD for slow and fast clock
signals. 105
6.8 Phase and Frequency Detector. 105
6.9 Modified multiplexer. 106
6.10 Charge Pump. 107
6.11 Output Buffer. 107
6.12 (a) Linearized small-signal model of the loop, (b)
simple loop filter, (c) VCO noise shaping. 108
6.13 Chip photograph. 109
6.14 (a) VCO tuning range, (b) phase noise over tuning range. 110
6.15 (a) Spectrum of the recovered clock, (b) recovered
clock in the time domain. 111
6.16 Measured jitter transfer characteristic. 112
6.17 Measured jitter tolerance characteristic at 5 Gb/s. 112
6.18 Recovered clock and data. 113
List of Tables

4.1 Simulated gain, power, and noise distribution. 68


Preface

With the exponential growth of the number of Internet nodes, the


volume of the data transported on the backbone has increased with
the same trend. The load of the global Internet backbone will soon
increase to tens of terabits per second. This indicates that the backbone
bandwidth requirements will increase by a factor of 50 to 100 every seven
years.
Transportation of such high volumes of data requires suitable media
with low loss and high bandwidth. Among the available transmission
media, optical fibers achieve the best performance in terms of loss and
bandwidth.
High-speed data can be transported over hundreds of kilometers of
single-mode fiber without significant loss in signal integrity. These fibers
progressively benefit from reduction of cost and improvement of perfor-
mance.
Meanwhile, the electronic interfaces used in an optical network are
not capable of exploiting the ultimate bandwidth of the fiber, limiting
the throughput of the network. Different solutions at both the system
and the circuit levels have been proposed to increase the data rate of
the backbone.
System-level solutions are based on the utilization of wave-division
multiplexing (WDM), using different colors of light to transmit sev-
eral sequences simultaneously. In parallel with that, a great deal of
effort has been put into increasing the operating rate of the electronic
transceivers using highly-developed fabrication processes and novel cir-
cuit techniques.
The design of the clock and data recovery (CDR) circuit is the most
challenging part of building a high-speed optical transceiver because of
the complexity of this block. In this book, the design and experimental
results of two CDR circuits are described. Both the circuits achieve a
xiv HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

high operating speed by employing the concept of “half rate”, mean-


ing that the clock frequency is half the data rate. Furthermore, broad-
band circuit techniques including wideband amplification and high-speed
matched filtering are described in this book.
The two CDR circuits benefit from two major techniques for phase
detection, namely linear and binary. The design of the linear phase de-
tector is based on a new technique that allows a fast speed and low power
consumption because of its simplicity. The new binary phase/frequency
detector provides a wide capture range and a phase error signal that
is only revalidated at data transitions. Furthermore, the design of the
CDR circuits involves utilization of two major types of voltage-controlled
oscillators, which are ring and LC-tuned. The ring oscillator described
in this work achieves a wide tuning range and low power consumption.
The LC oscillator benefits from a new topology that provides multiple
phases with low jitter.
Chapter 1

INTRODUCTION

The volume of the data transported over the Internet backbone has
increased with the exponential growth of the number of Internet users.
As shown in Fig. 1.1, the load on the global Internet backbone will be

as high as 11 Tb/s by the year 2005. This means that the bandwidth
requirements will increase by a factor of 50 to 100 every seven years.
Among the available transmission media, optical fibers achieve the
highest bandwidth and the lowest loss. These characteristics make them
an attractive medium for transmission of data over long distances.
Despite the unique transmission capabilities of optical fibers, the data
needs to be regenerated after a few tens of miles. Data gets distorted as
it travels through the fiber, mostly because of the fiber dispersion. This
2 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

distortion leads to the closure of the data eye. The signal amplitude is
also reduced due to the loss throughout the fiber.
Restoration of the original data at the receiver side with acceptable
bit-error rate (BER) can only be performed if the signal sustains the re-
quired signal-to-noise ratio (SNR). The data must be regenerated mid-
way to prohibit degradation of its SNR. There has been extensive re-
search performed on finding techniques for regeneration of the data in
optical domain. However, most of these techniques are still under study
and the majority of the commercial systems employ electronic interfaces
for regeneration of the data. As a result, the optical pulses are first
converted into electric current, regenerated and processed in the electric
domain, and then converted back into the optical pulses.
The complexity of the procedure that takes place in the regenerator
introduces latency. Furthermore, the maximum data rate is determined
by the speed of the electronic interface. The operating speed of the back-
bone can be increased by either designing faster electronic interfaces to
handle a higher data rate, or by using a number of parallel regenera-
tors and wave-division multiplexing (WDM) to combine a number of
high-speed optical data streams on one fiber channel.
Throughout this book, various approaches for increasing the operat-
ing rate of the regenerators are addressed. These approaches introduce
innovations at both the system and circuit levels. Special attention has
been paid to reducing the complexity of the circuits, so that a number
of transceivers can be placed on one chip if parallelism is used.
In this work, we have targeted a data rate of 10 Gb/s. With the
operating rates increasing to 10 Gb/s, and costs staying the same, new
applications can be introduced that will become more attractive as the
cost of transport per bit decreases.
The majority of the backbone optical communication systems are
based on the SONET standard. Short for Synchronous Optical Network,
it was proposed by Bellcore in mid 80s and is now an ANSI standard.
SONET defines a hierarchy that allows data streams of different rates
to be multiplexed. SONET recommends optical carrier (OC) levels that
are integer multiples of 51.85 Mb/s. This standard has allowed different
communication carriers to interconnect their existing fiber optic systems
[1].
The SONET OC-192 standard has been specified for 10 Gb/s optical
communication. SONET recommends two types of architectures for use
in metropolitan and long-haul areas: ring and point to point (Fig. 1.2)
[2]. An OC-192 ring replaces multiple pre-existing OC-48 rings operating
at lower speeds. Furthermore, it allows a larger number of nodes to be
placed on the ring and provides the capability to process more added
Introduction 3

and dropped traffic at each node. For this reason, the complexity of the
network is drastically reduced.
Point-to-point architecture allows flexible routing between different
nodes and point-to-point services require connections on a per-customer
basis.
Other services provided by the OC-192 standard include video con-
ferencing and ATM-based services like LAN interconnections.

1. Overview of the Fiber Optic Network


The fiber used in the construction of a network is either single-mode
or multi-mode. Single-mode fiber is mostly used with a coherent light
source that produces a pure spectrum. Multi-mode fiber on the other
hand is used with optical sources that are not coherent or not spectrally
pure [3].
As shown in Fig. 1.3, single-mode fibers are designed to have a very
small core that limits the modes of propagation, and an index of re-
4 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

fraction profile that allows light to remain in the core. In a multi-mode


fiber, light that enters the fiber core at one end continuously bounces
off the interface of the core and cladding until it exits the fiber at the
other end. This effect can result in the dispersion of the pulses entering
the fiber. The light that directly travels through the core of the fiber
reaches the other end faster than the light that continuously bounces off
the interface. Because of this effect, the pulse gets wider as it travels
through the fiber. This pulse spreading limits the maximum length of a
fiber link formed using multi-mode fibers.
Dispersion is usually not a limiting factor with single-mode fibers.
For these fibers, the length is limited by the attenuation of the signal.
Therefore, long hauls are formed using single-mode fibers.
Figure 1.4 depicts the interfaces for the OC-192 optical systems [2].
The main optical path consists of the fiber and the optical line ampli-

fiers. Both the transmitter and the receiver equipment employ optical
amplifiers. On the transmitting side, the transmitter is followed by a
booster amplifier. In the receiving end, a preamplifier and an optical fil-
Introduction 5

ter process the optical pulse before going to the receiver. The parameters
should be chosen to provide an overall BER of better than
The data transmitted over the fiber is encoded in nonreturn-to-zero
(NRZ) format. Therefore, the data stream does not carry any infor-
mation about the clock signal, and its spectrum contains no spectral
components at the frequency of the data rate. The only measure for the
clock signal that can be derived from the data sequence is the minimum
spacing between consecutive zero crossings of the data. This measure
can be extracted through nonlinear circuit techniques such as edge de-
tection, detecting the timing information contained in the transitions
between nonidentical adjacent bits. As a result, the edge-detected sig-
nal contains a tone at the data rate.
An NRZ stream can contain long sequences of ones and zeros with
no transitions in between. If the number of transitions is too low, syn-
chronization at the receiver end will become very difficult. For example,
if the receiver contains a phase-locked loop (PLL), the frequency of the
oscillator can drift during these long sequences of identical bits such that
the recovery of the data would no longer be possible.
To overcome this difficulty, high-speed communication systems encode
the data such that the maximum length of a continuous sequence of
ones or zeros is limited. A widely-accepted technique is the 8B/10B
encoding [4] that has been used for some of the systems operating at
2.5 Gb/s. It generates an encoded stream at 3.125 Gb/s. Using this
coding technique, an eight-bit data byte is converted into ten bits. As
a result, the minimum and the maximum number of consecutive zeros
or ones is one and five, respectively. In addition to providing a higher
transition density, this type of encoding limits the low-frequency content
of the data stream such that the sequence has no dc component on
average. As a result the optical modules can be ac coupled. Finally, this
encoding scheme detects many signaling errors.
A new encoding scheme is the 64B/66B encoding [5], in which two
additional bits are added to every 64 bits. If this encoding scheme is
applied to a 10-Gb/s stream, it will result in a high-speed sequence of
approximately 10.3 Gb/s.

2. Overview of Fiber Optic Transceivers


In the fiber optic system, the receive and transmit modules contain
electronic blocks, each of which consists of several analog and digital
integrated circuits (ICs). The analog circuits detect, retime, serialize,
and deserialize the data. When the data rate is lowered by the analog
circuits, the digital circuits process the data according to the standard’s
6 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

recommendations. The digital ICs are generally referred to as Framers


and Mappers and are controlled by a microprocessor (Fig. 1.5).

The available commercial systems use digital ICs fabricated in a main-


stream CMOS process and high-speed analog front-end ICs fabricated
in a silicon bipolar or III-IV process.
The processes used for the implementation of the analog circuits
should have a transit frequency that is several times higher than
the data rate. Recent developments of the CMOS technology has intro-
duced processes with a minimum feature size smaller than Since
the of the technology is inversely proportional to the square of the
minimum feature size, the scaling has yielded in processes with an of
a few tens of gigahertz.
This has provided the capability of integrating the analog and the dig-
ital sections on the same chip. Some of the advantages of this integration
are the followings:
Cost: A significant portion of the cost of electronic systems comes
from packaging, circuit board design, and chip fabrication. Integration
of multiple chips into a single one reduces the number of packages and
decreases the circuit board area. Furthermore, CMOS processes have
lower fabrication cost compared to other processes because they include
a fewer number of masks.
Power Dissipation: If multiple chips are integrated into a single
circuit, the power-hungry output buffers driving the terminations
can be eliminated. Also, CMOS devices display the desired performance
at a smaller current density compared to other processes.
Time to market: The turn-around time of CMOS processes is
shorter than that of other processes. There are numerous foundries
providing a digital CMOS process. As a result, institutions employ-
Introduction 7

ing this technology benefit from a sound backup foundry, and a shorter
pre-process waiting period. Due to the huge momentum of the digital
market, the CMOS process develops faster than other processes. The
migration of to CMOS process has taken place in only
two years.
Newly developed SiGe BiCMOS processes are another good alterna-
tive for development of high-speed integrated circuits. These processes
offer very fast bipolar devices suitable for building analog front-ends and
dense CMOS devices for the digital portion. A modified BiCMOS pro-
cess that does not have the trench isolation [6] has a fabrication cost and
turn-around time that is not much different from a pure CMOS process.
The number of fabrication masks for this BiCMOS process is slightly
higher than that of a CMOS process. The drawbacks of the BiCMOS
process are the small number of supplying foundries and the fact that
the scaling of their CMOS devices is usually not as aggressive as that
of the fastest available CMOS processes. The digital circuits fabricated
using these BiCMOS processes cannot operate as fast as those circuits
fabricated in a pure CMOS process.
Benefiting from such capabilities, the CMOS technology is a perfect
solution for implementation of systems that employ parallelism. A num-
ber of transceivers are placed on one chip to handle the incoming high-
speed sequences. These signals are carried over either a bundle of fibers
or a single fiber that uses wave-division multiplexing (WDM). The com-
plexity and the power dissipation of the transceivers are critical as they
determine the number of transceivers that can be placed on one chip.
Figure 1.6 depicts a fiber optic transceiver consisting of a transmitter
and a receiver. In the transmitter, parallel sequences of data at lower
rates are combined in a multiplexer to generate a single high-speed se-
rial signal. Multiplexing of data is performed in multiple steps, with a
gradual increase in the rate of the merging sequences. The multiplexer
therefore operates with a number of clock signals whose frequency dou-
bles as the multiplexing advances to the next level. Figure 1.7 depicts a
conceptual topology of 4-to-l and 2-to-l multiplexers.
Shown in Fig. 1.7(b), multiplexing at any level is performed by a
combination of five latches and a multiplexer. The four latches
tend to retime the data. The fifth latch, skews the data in one of the
signal paths by one half of a clock period. As a result, the multiplexer
samples both of the sequences starting from the middle of the data
period.
Figure 1.8 depicts the structure of current-steering latch and mul-
tiplexer used for high-speed applications. This structure allows for a
reduced voltage swing that is well defined. Reduction of the output
8 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

swing increases the operating rate of the circuit. Furthermore, simula-


tions indicate that switching of the current can be performed at a speed
higher than the switching of the voltage. As a result, these architectures
provide the maximum operation speed for a given technology.
The structure of the latch and the multiplexer is quite similar except
for the second input of the multiplexer that is replaced by a cross-coupled
pair in the latch.
The incoming data sequences experience phase shifts with respect to
the transmitter clock signal. To cancel these time delays, these signals
are written into a FIFO and passed to its output. The depth of the
FIFO is typically chosen to be between 4 and 6. The FIFO is driven by
the transmitter clock signal.
The clock multiplying unit (CMU) generates the clock signals, used in
the multiplexer. This circuit operates based on phase locking of the inter-
nal voltage-controlled oscillator (VCO) to an external reference. Shown
in Fig. 1.9, the oscillator should produce a clock signal at a frequency
equal to the data rate. If the transmitter VCO oscillates at 10 GHz
and the crystal reference generator produces a signal at 156.25 MHz,
Introduction 9

the transmitter PLL should incorporate a 64:1 divider. This frequency


division is performed in 6 steps, each step reducing the frequency by a
factor of two. As a result, a group of clock signals with their frequency
varying from 10 GHz to 156.25 MHz will be provided.
Frequency division is performed by placing two latches in a negative
feedback loop. At high-frequencies, the frequency divider can oscillate at
the natural frequency defined by the ring oscillator consisting of the two
latches. This undesirable oscillation is alleviated if the divider is driven
by a strong external tone. In reality, the amplitude of the signal, required
for the switching of the latches in the frequency divider is reduced as
the frequency of the input clock approaches twice the self-resonance
frequency of the divider. In that region, the divider operates in an
injection-locked mode rather than a static mode.
The CMU takes advantage of the fact that it operates with a periodic
clock signal rather than a random data sequence. This simplifies the
design of the phase/frequency detector and the charge pump used inside
the loop. The loop filter should be designed to guarantee the stability
of the loop and suppress the jitter introduced by the phase noise of the
oscillator.
It may seem that the multiplexer that produces a maximum data rate
of 10 Gb/s requires clock signals of frequencies not exceeding 5 GHz,
10 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

considering that the data rate at the output of the multiplexer is twice
the frequency of the clock signal that drives the circuit. However, it
is critical to retime the output of the last multiplexer with a flipflop
operating at a frequency equal to the data rate for two reasons. First, in
the presence of mismatches the multiplexer will exhibit different delays
from each of its inputs to the output. This inherent mismatch increases
the static ISI on the output eye diagram of the multiplexer. Second, the
two input data sequences experience timing mismatch on their path to
the multiplexer, yielding to static bimodal jitter.
Introduction 11

The high-speed signal is ultimately modulated into the intensity of the


light emitted by the laser diode. The laser driver isolates the multiplexer
from the diode and boosts the signal level to the operational range of
the diode.
In the receiver, the received light is transformed to an electric current
by the photo detector. The detector is followed by a low-noise high-
bandwidth transimpedance amplifier that converts the current into a
voltage with a swing large enough for the proceeding blocks.
The sensitivity of the photo detector can be increased by widening its
light reception window. However, this increases the parasitic capacitance
of the photo detector, complicating the design of the high-bandwidth
receiver. This means that the receiver sensitivity trades off with its
bandwidth. The limiting amplifier increases the voltage swings, while
isolating its proceeding synchronous stages from the transimpedance am-
plifier. The clock feedthrough of the synchronous circuits to the sensitive
transimpedance amplifier can heavily corrupt the data signal.
The core of the receiver is the clock and data recovery (CDR) circuit.
In this block, a clock signal is generated such that its rising/falling edges
fall in the middle of the data eye. This means that if the clock signal
is used to retime the data, the sampling occurs at the optimum point,
improving the SNR of the receiver. CDR circuits for NRZ data can be
categorized into two main groups: open-loop CDRs with high-Q filtering,
and CDRs employing phase-locked loops (PLL). We provide a short
overview of the first technique in chapter 3. However, the emphasis of
the work, presented in this book, is on systems that utilize phase locking.
The retimed high-speed data is subsequently split among parallel se-
quences of lower speed by the demultiplexer. Similar to multiplexing,
demultiplexing is performed in multiple steps using clock signals of dif-
ferent frequencies. The frequency divider generates these signals.
12 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

The design of the CDR circuit is the most complicated part of imple-
menting an optical transceiver. It entails many challenges that will be
addressed in the following chapters.

3. Overview of Topics
Chapter 2 describes the high-speed front-end circuits of the optical
receivers, covering the design of transimpedance amplifiers and voltage
limiters. Chapter 3 provides an overview of existing CDR architectures,
and describes the implementation of their building blocks. Chapter 4 de-
scribes the techniques for optimizing detection in a high-speed receiver,
focusing on wideband amplification and matched filtering. An interface
built in CMOS process utilizing these techniques is introduced.
The remainder of this book concentrates on designing high-speed CDR
circuits operating at 10 Gb/s. Chapter 5 describes a CDR circuit incor-
porating a half-rate linear phase detector and a ring oscillator. Chap-
ter 6 covers the design of a CDR circuit that uses a half-rate binary
phase/frequency detector and a multi-phase LC oscillator. Chapter 7
concludes this book.
Chapter 2

TIAS AND LIMITERS

1. TIAs
Transimpedance amplifiers play a critical role in optical receivers.
Trade-offs between noise, speed, gain, and supply voltage present many
challenges in TIA design. As TIAs experience a tighter performance
envelope with technology scaling at the device level and speed scaling at
the system level, it becomes necessary to design the cascade of the TIA,
the limiter, and the decision circuit concurrently. The TIA bandwidth
is typically chosen to be equal to 0.7 times the bit rate - a reasonable
compromise between the total integrated noise and the intersymbol in-
terference (ISI) resulting from limited bandwidth.
Shown in Fig. 2.1, the common-gate (or common-base) topology is
a candidate for TIAs as it provides a relatively low input impedance,
14 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

a broad band, and a well-behaved time response. However, its input-


referred noise current, is relatively high. This is because per
unit bandwidth at low frequencies is given by

where denotes the excess noise coefficient of ( in


technology). Interestingly, the noise currents of and are
directly referred to the input with a unity factor. Furthermore, for a
given supply voltage, and trade with each other because the
minimum drain-source voltage of plus the voltage drop across
cannot exceed In other words, and are inevitably large.
It can also be shown that the noise contributed by and rises as
the frequency increases and the photodiode capacitance, shunts the
input.
A TIA configuration that achieves more relaxed noise-headroom trade-
offs is the shunt-shunt feedback topology. Shown in Fig. 2.2(a) as
feedback around a voltage amplifier the circuit exhibits a –3-dB
bandwidth of (if the poles of are neglected) and an
input-referred noise current per unit bandwidth equal to

where denotes the input-referred noise voltage of A1.1 The key


point here is that does not carry significant dc current and can
therefore be maximized so as to reduce both terms in (2.2). This is in
contrast to the behavior represented by (2.1).
Actual implementations of the feedback TIA suffer from voltage head-
room, stability, and overshoot problems. Considering the example shown
in Fig. 2.2(b), we recognize that significantly constrains
the dc drop across thereby limiting the open-loop gain and raising
the noise contributed by and Furthermore, the three poles at
the input node, the drain of and the output node degrade the phase
margin and hence the step response. Figure 2.2(c) suggests a modifica-
tion that isolates the feedback path from the input capacitance of the
subsequent stage. Finally, Fig. 2.2(d) eliminates the source follower
from the feedback loop to allow a greater drop across [7].
It is possible to choose the pole at node X in Figs. 2.2(c) or (d) so as
to increase the bandwidth of the TIA. In fact, if the magnitude of this
pole is equal to the TIA exhibits a slightly underdamped
step response but a bandwidth of i.e., 40% greater
than that for an ideal core amplifier.
TIAs and Limiters 15
16 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

2. Limiters
The voltage swing produced by TIAs at the minimum light level is
usually inadequate to drive the CDR circuit, necessitating further am-
plification. Used to boost the binary swings, limiters typically consist of
a cascade of differential pairs with enough bandwidth and a relatively
linear phase response so as to amplify the signal with negligible ISI.
The high small-signal gain requires low-frequency negative feedback to
prohibit the offset voltages of the differential pairs from saturating the
latter stages.
Interestingly, limiter design must cope with difficulties at both the
low corner and the high corner of the passband. Consider the limiter
topology shown in Fig. 2.3, where the feedback network suppresses the
offset of the last three stages. Since some optical standards require

that the low end of the passband fall around a few tens of kilohertz,
the values of and must be very large. More specifically, with a
small-signal gain of A per stage, the low corner frequency is given by
demanding an product on the order of 1 ms if
A is around 5. For this reason, the capacitors are usually placed off chip,
raising the number of package pins and also the possibility of crosstalk
from other bond wires. New circuit topologies may resolve this issue.
At the upper end of the passband, high-speed amplification techniques
must provide a well-behaved magnitude and phase response for both
small and large signals. Shown in Fig. 2.4, configurations such as the
Cherry-Hooper amplifier [8, 9] and the Gilbert gain cell [10] have been
used but their utility becomes more limited as the supply voltage falls.
In particular, the voltage drops across and in Fig. 2.4(a) and
TIAs and Limiters 17

the cascode in Fig. 2.4(b) both constrain the voltage headroom and
mandate level-shift circuits between the stages.
An attractive solution for low-voltage broadband amplifiers is induc-
tive peaking. Owing to the extensive work on monolithic inductors in
RF design, this method can now be realized with accurate modeling and
prediction of the performance in optical communication circuits as well.
Interestingly, inductor quality factors (Q’s) as low as 3 to 4 prove ade-
18 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

quate for increasing the bandwidth, allowing the use of simple, compact
spiral structures.
Figure 2.5(a) shows a limiting stage incorporating inductive peaking.
It can be shown that an ideal inductor increases the bandwidth by ap-

proximately 82% if a 7.5% overshoot in the step response is acceptable.


With the finite Q and parasitic capacitance of the inductors included,
the enhancement is around 50%, still quite a significant factor.
An interesting difficulty in modeling the inductors in the above circuit
arises from the narrowband nature of the definition of the Q, an issue
rarely encountered in RF design. Figure 2.5(b) depicts a rough model
where yields the correct Q at about 3/4 of the –3-dB band-
width. The approximation is reasonable because the inductor manifests
itself only near the high end of the band. Alternatively, a more complete
model such as that in Fig. 2.5(c) can be used. Here, denotes the
effective series resistance, and represent the resistance seen by
the electric coupling to the substrate, models the resistance seen by
TIAs and Limiters 19

the magnetic coupling to the substrate, and the capacitors approximate


the parasitic capacitances. While the values of some of the components
in this model do vary with frequency, the overall model can be fitted to
measured data over a broader range than the parallel tank of Fig. 2.5(b)
can.
The high gain provided by several stages in a limiter may lead to
oscillation or at least considerable peaking and ISI. Illustrated in Fig.
2.6, this phenomenon occurs if the mismatches in the differential stages

create both substantial current switching from the supply and a finite
supply rejection, allowing a component to travel from the output stage
through the supply and back to the input stage. With a finite bond
wire inductance, the gain around the loop may exceed unity, leading
to high-frequency oscillation. The issue of course becomes much more
severe if a single-ended TIA shares the same supply lines with the lim-
iter. For this reason, separate supply lines, careful bypassing, symmetric
layout, and accurate package modeling are essential.

Notes
1 The input-referred noise current of is neglected for simplicity.
Chapter 3

CLOCK AND DATA RECOVERY


ARCHITECTURES

If a single pulse is to be detected in the presence of additive noise and


intersymbol interference (ISI), the signal-to-noise ratio (SNR) is depen-
dent on the choice of the sampling instance. If sampling is synchronized
such that the peak value of the pulse is sensed, the output SNR is high
(Fig. 3.1).

Synchronized sampling requires two conditions to be simultaneously


satisfied. First, a clock signal should be generated such that its frequency
is equal to the data rate. Second, the clock signal should sample the
22 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

data at its peak point. Satisfaction of these two conditions is commonly


referred to as the task of clock and data recovery.
The clock and data recovery (CDR) architectures are categorized in
two major groups, open-loop CDRs and phase-locking CDRs. We briefly
describe the former in this chapter. However, the focus of this book is
on the latter.

1. Open-Loop CDR Architectures


The spectrum of an NRZ sequence does not carry a frequency tone
at the data rate. However, the information about the frequency of the
data can be extracted from the spacing between its transitions. These
transitions appear as the rising and falling edges of the data signal.
If a high-speed data sequence is passed through a differentiator, the
resulting signal will carry positive and negative pulses for rising and
falling edges of the clock signal, respectively. This differentiated signal
does not provide a strong spectral line at the frequency of the data
because the polarity of these pulses is random.
As shown in Fig. 3.2, the randomness of these pulses can be cancelled
by passing this signal through a rectifier. The resulting signal
can be decomposed into a periodic waveform with a fundamen-

tal frequency equal to the data rate, and a random, transition-dependent


signal with a zero dc average value [11].
Clock and Data Recovery Architectures 23

Differentiation and rectification of a random sequence with finite rise


and fall times is equivalent to edge detection of the signal. Edge detec-
tion is performed by a logical XOR operation.
Figure 3.3 depicts the structure of an edge detector that consists of
an XOR gate operating on the data and its delayed replica. Theoretical
derivations indicate that the highest degree of harmonic suppression can
be achieved if these two waveforms are spaced within half a bit period
from each other.

The clock signal can be recovered from the edge-detected waveform by


passing the signal through a band-pass filter tuned to the clock frequency.
Shown in Fig. 3.4, the recovered clock signal is fed to a phase aligner
to ensure that the output clock signal samples the data at its optimum
point in the decision circuit.

In order to reduce jitter on the recovered clock signal, the filter should
have a very high selectivity to suppress the unwanted data-dependent
signal that results in amplitude and phase modulation.
Integration of highly selective band-pass filters operating at very high
frequencies is not practical using available fabrication processes. This
limitation calls for the use of external components such as SAW filters.
These filters, however, suffer from high loss and a relatively low speed
of operation that limits their applicability to 10-Gb/s operation.

2. Phase-Locking CDR Architectures


A second approach to clock and data recovery is by synchronizing the
random data to a clock signal generated by a voltage-controlled oscillator
24 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

(VCO) in a phase-locked loop (PLL). The idea is that during each data
transition, the location of the data transition with respect to the clock
edge is detected. If the data leads the clock, the clock is sped up. If the
data lags the clock, the clock is slowed down. If the zero crossings of
the data and the clock coincide, the clock frequency is kept constant to
ensure phase lock.
Figure 3.5 shows a generic CDR circuit. The VCO generates a clock
signal. The phase and the frequency of this signal is compared to that of

the incoming data in the phase detector, generating an error signal that is
passed through the charge pump and the low-pass filter to set the voltage
required by the VCO to oscillate at the frequency of interest. Phase
locking of the clock to the data means that their phases are different by
a small constant offset. This means that the derivative of their phases -
their frequencies - are identical.
The generated clock signal is also used to retime the data in the
decision circuit. As the incoming data is regenerated in this block, its
additive noise and ISI is suppressed while the amplitude is significantly
magnified.
Some of the design issues of the CDR circuits are mentioned in the
following:
Speed: The throughput of a high-speed receiver is determined by the
maximum operating rate of the CDR circuit. The CDR circuit consists
of blocks such as phase detectors and digital latches utilizing positive
feedback for regeneration at high speed. As the data rate increases, the
regeneration time of the latches becomes comparable to the data period,
thus limiting the maximum operating rate of a latch [12]. Another crit-
ical issue is sustaining the integrity of the clock signal generated by the
VCO, which operates at very high frequencies. Therefore, the available
low-cost fabrication processes such as CMOS technology can
marginally handle data rates as high as 10 Gb/s. This limitation can be
overcome by innovations at both the architecture and the circuit levels.
Later in this book, a number of approaches for increasing the speed of
the system are described.
Clock and Data Recovery Architectures 25

Jitter: Jitter can be interpreted as the random perturbations of the


zero crossings of a signal with respect to a reference point. Jitter can
be measured as the rms value of the difference of the interval between
the two consecutive zero crossings of the signal and a constant time
period (cycle jitter), or the rms value of the difference of two consecutive
samples of such intervals (cycle-to-cycle jitter). These two values are
shown to have a close dependence [13].
To define a measure for the purity of the signals in a transceiver, the
SONET standard specifies three measures for the maximum allowable
jitter in the system [2]:
Jitter generation is a specification for the maximum allowable jitter
generated by the system, mainly because of the electronic noise of the
VCO, and the ripple on its control line. This specification concerns the
closed-loop behavior of the system, since the PLL provides some degree
of VCO noise cancellation. The SONET OC-192 standard specifies 10 ps
as the maximum peak-to-peak jitter on the clock and data signals in
the phase-locked condition. Other standards define a limit for the rms
jitter as well. However, the rms jitter requirements for the OC-192 are
smaller than the precision of most measuring equipment. Therefore, no
rms requirement is defined for the OC-192 standard.
Jitter transfer deals with the closed-loop transfer function of the
phase-locked system (Fig. 3.6). It is a measure for the suppression of the
input jitter through the CDR circuit. The system requirements define
the 3-dB bandwidth and the peaking in the transfer function. The loop

bandwidth is chosen as a tradeoff between external jitter suppression on


one hand, and internal jitter suppression, capture range, and acquisition
time on the other. The loop acts as a low-pass filter for the external
jitter entering the system. Input jitter with a frequency higher than the
loop bandwidth is significantly suppressed in this system. Therefore,
reduction of the loop bandwidth highly suppresses the jitter that enters
the system on the input. At the same time, the loop acts as a high-pass
filter for the jitter generated by the VCO. The suppression of the open-
26 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

loop VCO jitter is significant within a frequency offset close to the loop
bandwidth with respect to the center frequency. Therefore, a wider loop
bandwidth removes a larger portion of the VCO jitter.
The capture range of an unaided CDR circuit is close to the loop
bandwidth. However, this limitation can be overcome by means of a
frequency acquisition scheme.
The acquisition time becomes shorter for a larger loop bandwidth.
Since the nominal loop bandwidth is defined by the standard, an adap-
tive bandwidth mechanism can be employed to speed up the acquisition.
Within the startup of the circuit, the loop bandwidth is increased to pro-
vide faster acquisition. As the circuit acquires lock, the bandwidth is set
back to the nominal value, suppressing the jitter entering the system.
A fiber link consists of many cascaded regenerators. Jitter can be
accumulated on the link as the smaller jitter peaking of regenerators is
added up to a large sum. To alleviate the difficulty, the jitter peaking
of each regenerator should be kept below 0.1 dB.
Jitter tolerance is a measure of the ability of the CDR circuit to track
a jittered input data signal. Jitter on the input signal can be considered
as phase modulation. The CDR must provide a clock signal that tracks
this phase modulation in order to accurately retime jittered data. The
jitter tolerance is defined as a mask that relates the maximum amount
of phase modulation that can be corrected by the loop to the frequency
offset with respect to the data rate (Fig. 3.7).
Power Dissipation: Until recently, low power dissipation was not
considered a critical requirement for optical transceivers. One reason was
that in contrast to handheld wireless transceivers, optical transceivers do
not run from a battery. Another reason was that high-quality transceivers
could only be integrated in power-hungry III-IV processes.
Development of bipolar technologies along with introduction of deep
sub-micron CMOS processes has allowed circuit designers to build sys-
tems with significantly reduced power consumption. This aspect be-
comes more attractive when a number of transceivers are placed on a
single chip in order to increase the operating rate of the system. The
power dissipation and the integrability of the circuit in VLSI technolo-
gies determine the number of transceivers that can be placed on one
chip. Lower power dissipation also eases packaging and eliminates heat
sinking issues.
Supply Scaling: Supply scaling has been a distinguished feature of
the trend towards deep sub-micron scaling in CMOS processes. While
resulting in reduced power consumption, the supply scaling limits the
choice of circuit topologies for high-speed applications. In a
Clock and Data Recovery Architectures 27

CMOS technology running from a 1.8-V supply, circuit structures that


require stacking of more than three devices must be discarded.
Supply scaling also leads to a larger VCO gain for high-frequency
oscillators. This is because a smaller voltage range must sweep the VCO
frequency across the range of interest. If the VCO is used in a closed-
loop application such as a CDR circuit, the higher VCO gain translates
the noise on the control line of the VCO into a larger output jitter. In
chapter 5 we will describe a technique to reduce the VCO gain in a
CMOS process.
Fabrication Technology: As CMOS technology continues to ben-
efit from both scaling and enormous momentum of the digital market,
many high-speed integrated circuits that were once considered the exclu-
sive domain of III-IV or silicon bipolar technologies are likely to appear
as CMOS implementations. However, issues such as technology devel-
opment costs, computer-aided design (CAD) infrastructure, and fabri-
cation turnaround time make it desirable to use a single mainstream
digital CMOS process for all IC products.

2.1. Full-Rate and Half-Rate Architectures


Phase-locking CDR architectures can be categorized into two major
groups, full-rate and half-rate. In a full-rate circuit the location of the
data transition is compared to the falling (or rising) edge of the clock.
28 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

This comparison can be performed if the clock frequency equals the data
rate (Fig. 3.8(a)). As a result, retiming of the data can be performed
using flipflops that operate either on rising edge or falling edge of the
clock signal.

In a half-rate circuit, the location of the data transitions is compared


to that of both the rising and falling edges of clock (Fig. 3.8(b)). This
results in a clock frequency equal to one half of the data rate. In this
scenario, retiming of the data should be performed using flipflops that
operate on both the rising and falling edges of the clock.
The biggest advantage of a half-rate approach is the reduction of the
clocking frequency by a factor of two. Simulations indicate that circuits
operating at a lower speed consume less power. In fact, as the speed
of operation reaches the maximum operating frequency of a particular
technology, the required power consumption grows exponentially.
Another advantage of half-rate architectures is the reduction in com-
plexity if the CDR circuit is followed by a demultiplexer, i.e., if the
circuit is not required to generate a full-rate output. Since the half-rate
clock samples every other bit on its rising or falling edges, the first level
of demultiplexing is automatically performed. This technique also saves
on hardware and power consumption since a frequency divider, which
reduces the clock frequency by a factor of two, can be eliminated.
A major concern in employing a half-rate clock, however, is the duty
cycle mismatch if the system is designed to generate a full-rate output in
the receiver or the transmitter. Since the spacing between the rising and
Clock and Data Recovery Architectures 29

falling edges of the clock signal is different from half the clock period,
the width of the data eye sampled by the rising edge is different from
that sampled by the falling edge, resulting in bimodal jitter (Fig. 3.9).

The focus of the work presented in this book is the design of systems
employing half-rate architectures. Although the CMOS tech-
nology used here performs marginally in a full-rate system, the resulting
reduction of power consumption makes the half-rate approach a strong
candidate. Furthermore, since the scaling of CMOS processes cannot
keep up with the growing demand for systems operating at higher data
rates, half-rate approaches are becoming more attractive for high-speed
design in near future.
In chapters 5 and 6 we describe two half-rate CDR circuits. How-
ever, in the remainder of this chapter after a general review of the CDR
circuit’s building blocks, some of the existing full-rate architectures are
addressed by describing their phase and frequency detectors.

2.2. Oscillators
As an integral part of phase-locked loops, oscillators are used for clock
generation in these systems. The design of the VCO directly impacts the
jitter performance and the reproducibility of the CDR circuit. While LC
topologies achieve a potentially lower jitter and higher center frequency,
their limited tuning range makes it difficult to obtain a target frequency
without design and fabrication iterations.

2.2.1 General Theory


An oscillator can be modeled as an amplifier in a unity-gain negative
feedback system (Fig. 3.10) where:
30 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

If the amplifier itself experiences so much phase shift at high frequencies


that the overall feedback becomes positive, an oscillation may occur. If
this happens at a frequency of the circuit amplifies its own noise
components at indefinitely [14].
The above statement can be formulized as follows: Oscillation at the
frequency can occur if these two conditions are simultaneously satis-
fied.

2.2.2 Ring Oscillators


A ring oscillator is formed by placing a number of gain stages in a loop.
The poles of these stages should introduce a 180° phase shift required
for oscillation [15].
Oscillation cannot occur if a single stage is placed in a unity-gain loop.
This is because a maximum frequency-dependent phase shift of 90° can
be provided by the single pole of the open-loop circuit. If two stages are
used in a loop, the loop contains two poles. Since each stage introduces
90° of frequency-dependent phase shift, the overall phase shift can reach
180°, but at a frequency of infinity. However, the loop gain vanishes at
very high frequencies. Therefore, the requirements of oscillation are not
simultaneously satisfied.
To achieve a greater phase shift around the loop, a third inverting
stage can be added to the loop. The phase shift can therefore reach
180° where the loop gain is still greater than or equal to unity.
Clock and Data Recovery Architectures 31

As described in [15], if the transfer function of each stage is denoted


as – then the loop gain is given as:

In order to achieve the frequency of oscillation and the minimum


required gain per stage, we let the frequency-dependent phase shift and
the magnitude of the loop gain at the oscillation frequency equal 180°
and unity respectively. It follows that and where
is the 3-dB bandwidth, and is the dc gain of each stage. These
values are used for choosing the proper scaling for the active and passive
elements forming the stages. The dc gain of each stage should be reduced
to the minimum value required for oscillation in order to reduce the
output jitter.
In practice, as the oscillation amplitude grows, the stages in the sig-
nal path experience nonlinearity and eventually saturation, causing the
signal amplitude to be clipped at two boundaries. If the small-signal
loop gain is greater than unity, the time allocation between the linear
and nonlinear modes of operation in the circuit is such that the average
loop gain is still equal to unity.
If the time spacing between the zero crossings of the input and the
output signals of each stage is the consecutive nodal voltages track
each other within a time period of yielding a period of
The oscillation frequency can be derived using both the small-signal
and the large-signal circuit analysis. Using the equation (3.4), the small-
signal frequency can be given as while the large-signal value
is These two values are not equal. The small-signal frequency
depends on the output time constant of each stage, primarily given by
the resistance and the capacitance of each stage. results from the
large-signal slew rate of each stage that is related to its nonlinear current
drive and capacitances. As a result, the oscillation begins with the small-
signal frequency, but as the amplitude grows and the circuit becomes
nonlinear, the frequency shifts to the large-signal frequency which is a
larger value.
The 3-dB bandwidth of each stage is mostly determined by the load
resistance and total parasitic capacitance at the output node. To in-
crease the oscillation frequency of the ring oscillator, the load resistance
should be reduced. However, in order to keep the output voltage swing
constant, the bias current and hence the power consumption of the cir-
cuit should be increased.
32 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

Tuning. In a ring oscillator that consists of n stages, the frequency of


oscillation is where is the delay through each stage.
In order to change the frequency of oscillation, either the effective
number of stages or the delay of each stage must be altered. Figure 3.12
depicts a conceptual illustration of this technique. This approach, mostly

referred to as “delay interpolation”, consists of a fast path and a slow


path in parallel [16].
The total delay is adjusted by increasing the gain of one path and
decreasing that of the other, differentially. The total delay is hence a
weighted sum of the delays of the two paths.

Timing Jitter. In the CDR circuit, the main source of timing jitter
is the inherent thermal and shot noise of the active and passive devices
that make up each delay stage of the VCO. 1/f noise is usually not of
practical importance since it is rejected by the loop filter. Therefore,
minimizing the impacts of thermal and shot noise in the basic delay
stage becomes the key to attaining low timing jitter.
It can be shown that the thermal jitter improves with the square root
of power consumption [17]. To design for low jitter, the overdrive voltage
of the devices used in the delay stage should be maximized. For a fixed
delay and fixed current, the small-signal gain of each stage should also
be minimized. However, this gain must be large enough for oscillation
to occur.

2.2.3 LC Oscillators
Monolithic LC oscillators are formed by a resonant tank that consists
of a spiral inductor (L) and a variable capacitor (C) that resonate at a
frequency of
The inductors and capacitors suffer from having a resistive compo-
nent. This component is mostly dominated by the resistance of the
metal wire used in the inductor and Eddy current and displacement loss
Clock and Data Recovery Architectures 33

through the substrate. Figure 3.13 depicts the substrate loss versus its
sheet resistance. The Eddy loss reduces as the sheet resistance of the

substrate increases. As a result, in bulk processes with high sheet resis-


tance the loss is dominated by displacement. The displacement loss is
maximized when the substrate’s resistive impedance becomes compara-
ble with the oxide’s capacitive impedance [18].
The tank can therefore be modeled as a parallel combination of an in-
ductor, a capacitor, and a resistor (Fig. 3.14(a)). Because
of this resistive element, the tank cannot sustain oscillation indefinitely,

if it is stimulated by a current impulse. However, if a negative resis-


tance is placed in parallel with the tank, the combination can oscillate
(Fig. 3.14(b)). This is the main idea behind the operation of the cross-
coupled LC oscillators.
34 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

It can be shown that the resistance seen between the two drains of a
cross-coupled differential pair equals Therefore, for the oscilla-
tion to occur, it is necessary that (Fig. 3.15).

The resulting LC oscillators can achieve a very high center frequency.


This frequency can be as high as 25.9 GHz in CMOS process [19]. This
is in contrast to ring oscillators, where the frequency is limited to the
minimum delay per stage and the minimum required number of stages.

Tuning. Only the inductor and capacitor values can be varied to tune
the frequency of an LC oscillator. Other parameters such as bias cur-
rents and transistor transconductances have a negligible effect on the
oscillation frequency. Since it is difficult to vary the value of the mono-
lithic inductor, the tank’s capacitance can be changed for tuning. The
amount of tuning achieved is reduced as the supply voltage gets lower.
Also, to maximize the tuning range, constant capacitances in the tank
must be minimized.
The variable capacitor can be formed using either pn junctions or
MOS varactors. The former is formed by diffusing doping in an N-
well (Fig. 3.16(a)), whereas the latter is formed by placing an NMOS
device in an N-well (Fig. 3.16(b)).

Phase noise. The phase noise is defined as a small random excess


phase, representing variations in the period of a sinusoidal signal. The
phase noise of LC oscillators usually depends on the quality of the tank
(Q). The higher the Q, the sharper the resonance and the lower the
phase noise skirts.
Clock and Data Recovery Architectures 35

Oscillation phase noise is generated primarily through two mecha-


nisms, distinguished by the path into which the noise is injected. Noise
in the signal path is shaped by the oscillator and generates the phase
noise skirts, whereas the noise in the control path is translated to the
region around the carrier. When a VCO is placed inside a CDR circuit,
its phase noise characteristic is shaped by the loop. Phase noise within
the loop bandwidth is suppressed, while the out-of-band noise remains
unattenuated.
The phase noise of the oscillator due to thermal noise was formulated
in [20]. Recently, research performed in UCLA led to a clearer represen-
tation of the formula, based on the physical parameters of the oscillator
[21].
Thermal noise can be injected into the signal path from either the
tank, the tail current source, or the cross-coupled differential pair. The
current noise injected by the resistor, representing the loss of the tank,
directly contributes to the output phase noise. The noise of the current
source affects the oscillator through more complicated mechanisms. As
a result of the switching of the cross-coupled differential pair in the
oscillator, noise is up or down converted in frequency. Low-frequency
noise of the current source is up converted to the vicinity of the oscillator
carrier frequency. Also, noise of the current source at twice the carrier
frequency is down converted to the oscillator output frequency.
The up and down conversions affect the oscillator phase noise through
different mechanisms. Down conversion directly contributes to phase
noise that is proportional to the noise factor of the devices. Up conver-
sion on the other hand manifests itself as perturbations on the amplitude
of the output signal. If the oscillator uses varactors as a means of tuning,
amplitude variations can modulate the varactor capacitance and result
36 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

in phase noise. Known as AM-to-FM modulation, this effect becomes


more significant when the gain of VCO is large, meaning that a change
in the voltage across the varactor is translated into a larger amount of
capacitance modulation.
Noise of the differential pair is sampled during a time window when the
switching occurs. The phase noise introduced by the pair is a constant,
given by the noise bandwidth product of the devices. This indicates that
the phase noise introduced by the pair is independent of the device sizes.

Multi-Phase LC Oscillators. The architecture of a clock and data


recovery circuit determines the number of clock phases required for the
implementation of the system. This number is primarily determined
by the ratio of the clock signal to the data rate. Furthermore, systems
incorporating referenceless frequency detection require a higher number
of clock phases. As an example, a minimum of two data samples must
be obtained to derive the phase relationship between the clock and the
data signals in a binary phase detector - one from the data transition
instant and one from the previous bit. A single phase of a full-rate clock
is sufficient to obtain these two samples in two flipflops operating on
the opposite edges of the clock. However, if the circuit is designed to
operate with a half-rate clock signal, two quadrature phases of the clock
are required to obtain the same samples.
The requirement for using several phases of the clock signal necessi-
tates the use of a VCO, capable of producing multiple equally-spaced
phases. In a ring oscillator, consisting of a number of stages in a loop,
multiple phases can be taken from the output of the different stages.
The number of the stages should be chosen to be equal to the number of
required phases or an integer multiple of it. As the frequency of oscilla-
tion increases, the number of stages in the loop should be reduced. The
ring oscillators, therefore, fail to operate reliably in modern high-speed
systems.
The design of LC oscillators capable of generating quadrature phases
has gained a huge momentum in the recent years. The signal generated
by these oscillators is relatively pure and their phase noise is only a few
dB worse than the phase noise of a stand-alone LC oscillator at a similar
frequency offset.
A quadrature oscillator consists of two LC oscillators. The output
of each oscillator is coupled to the input of the other one with a given
coupling coefficient (k). Shown in Fig. 3.17 each VCO can be modeled
as a unity-gain feedback system with an open-loop gain of H If both
of the coupled oscillators resonate at the output phasors of the two
Clock and Data Recovery Architectures 37

oscillators (X and Y) must satisfy the following equations [22]:

The combination of these two equations indicates that


The two signals are therefore 90° apart from each other.
Figure 3.18 depicts a quadrature oscillator, implemented based on the
above analysis [23]. This structure consists of two LC oscillators and two

differential pairs coupling the output of each oscillator to the input of the
38 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

other one. In this oscillator, tuning is performed by means of a control


voltage The current flowing through and therefore the bias
current of the oscillators change as varies. Variation of the bias
current changes the junction capacitance and the oscillation frequency
of the VCO.
This circuit can be modified to alleviate significant variations of the
bias current, required for tuning the oscillator. As shown in Fig. 3.19 a
tail current source can be added to the circuit to maintain a constant

bias current. The common-mode voltage and hence the oscillation fre-
quency can be varied by changing the on-resistance of [24]. As
increases, enters saturation mode and the voltage at node
P experiences a sudden change, resulting in nonlinearity in the VCO
characteristic. A second transistor, driven by a source-followed version
of can be added to the circuit to provide an effective resistance be-
tween P and the supply that smoothly varies with the control voltage.
A different tuning mechanism for quadrature oscillators is by chang-
ing the coupling coefficient between the two oscillators [22]. Figure 3.20
depicts the structure of this oscillator. The output phasor at node A
is determined by vector adding the phasor of the stand-alone oscillator
and the phasor of the coupling differential pair. As the coupling coeffi-
cient increases, the magnitude of the coupling phasor that is 90° away
from the stand-alone phasor increases. This indicates that the angle
Clock and Data Recovery Architectures 39

between the sum phasor and the stand-alone phasor increases, meaning
that the quadrature oscillator resonates at a larger frequency offset with
respect to the stand-alone oscillation frequency. As a result, the amount
of coupling can be changed to cover a very wide tuning range. This
value cannot be indefinitely reduced because the two oscillators will lose
synchronization if the coupling is too small. Phase noise sets a limit
on the maximum amount of coupling. As the frequency of oscillation
deviates from the resonance frequency of the stand-alone oscillator, the
Q of the tank at the frequency of oscillation reduces and the phase noise
is degraded.
The quadrature oscillator can also be used to generate a differential
signal at twice the frequency [25]. As shown in Fig 3.21, the fully differ-
ential topology allows for the possibility of sensing the common-source
nodes as the output at twice the frequency. The common-mode node
must be followed by proper buffering stages to ensure reasonable swings.
Another implementation of coupled oscillators is the circuit of [26].
Shown in Fig. 3.22, the circuit consists of a number of cross-coupled
oscillators that are placed in a ring. The idea is to improve phase noise
by providing a higher amount of noise filtering through several high Q
tanks. If n oscillators are cascaded in a loop, the output noise filtering
goes up by a factor of However, since the number of noise sources
increases proportionally with n, it can be assumed that the output noise
power density reduces by a factor of n. Meanwhile, the signal at the
oscillation frequency is amplified by a factor of n and its power scales
up with a factor of As a result, it can be assumed that the phase
40 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

noise of this oscillator is improved by compared to a stand-


alone oscillator. The penalty for this improved phase noise performance
is the higher power consumption and the larger chip area. In fact, if the
power of a single oscillator is increased by n, a similar performance can
be achieved from the oscillator. However, this circuit produces multiple
phases, allowing the oscillator to be used as the clock generator for high-
speed systems, replacing a ring oscillator with its superior performance.
Furthermore, as the number of LC oscillators in the loop increases, the
effects of mismatches on the performance of the oscillator becomes less
pronounced because the oscillation frequency is determined by the aver-
age characteristics of the oscillators in the loop.
Multi-stage LC oscillators can be modified to produce only differential
phases. In the circuit of Fig. 3.23 [27], four single-ended common-source
amplifiers with inductive loads are placed in a loop. The inductor res-
onates with the parasitic capacitances at the output node and the stage
sustains a phase shift of 180° between its input and output. Since the
number of stages is even, the overall phase shift around the loop is zero
and therefore the circuit oscillates at a frequency where the tank has
the maximum Q. The two pairs of differential signals taken from this
oscillator are summed together to reduce the phase noise by 3 dB.
As the operation frequency of the oscillators approaches of a CMOS
process, the design of the oscillators satisfying the requirements for the
Clock and Data Recovery Architectures 41

tuning range and phase noise becomes very difficult. The quality of the
inductors degrades as they operate at speeds close to their self-resonance
frequency. Variable capacitors added at the output of the oscillator to
provide tuning deteriorate the integrity of the output signal of oscillators.
An alternative solution for implementation of oscillators at very high
speeds is the distributed oscillator. The oscillator is formed by connect-
ing the output of a distributed amplifier to its input. Design of these
oscillators has significantly advanced in the recent years [28, 29]

2.2.4 PLL Jitter Calculation


In design or measurement, it is often necessary to predict the output
jitter of a PLL if the electronic noise in the VCO is the dominant source.
We describe a simple approach that estimates the closed-loop jitter with
reasonable accuracy.
Using simulations or measurements, we first compute the relative
phase noise of the free-running VCO due to the sources of white noise.
The cycle-to-cycle jitter is then calculated from the phase noise with the
42 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

aid of the following equation:

where denotes the oscillation frequency and represents the


relative phase noise power at an offset frequency of [13].
In the next step, we relate the jitter of the PLL to that of the free-
running VCO. It has been shown that the closed-loop jitter can be viewed
as if the VCO jitter rises with the square-root of time and saturates at
a time equal to the inverse of the loop bandwidth [30]. If the loop
bandwidth is hertz, then the VCO produces a total of
cycles in seconds. Thus, the total accumulated jitter due to the
VCO is equal to

2.3. Phase Detectors


In a CDR circuit the phase detector is the key element for provid-
ing the phase lock between the VCO clock signal and the input data
sequence.
Clock and Data Recovery Architectures 43

The task of the phase detector is to provide information about the


spacing between the zero crossings of the data and the clock. This
information is used to set the control voltage of the VCO at a value
required by the VCO to oscillate at the frequency of interest. When
phase lock is achieved, this voltage stays constant and the phase detector
output does not corrupt that.
A commonly-used type of phase detector operating with periodic data
is an XOR gate. As shown in Fig. 3.24, if the two sequences with a phase
difference of are applied to the input of the XOR gate, the output
will carry pulses as wide as

The dc value of the resulting signal is linearly proportional to the


difference between the phases of the two input signals.

where is the gain of the phase detector, and is the input phase
difference.
Although this simple approach proves to be useful for applications
where the two inputs have identical frequencies and different phases,
it falls short in providing frequency error information as the two input
frequencies start to grow apart from each other.
The reason is that if the two frequencies are not equal, the detector
generates a beat frequency with an average value of zero (Fig. 3.25).
The beat signal can still provide efficient information about the phase
and frequency difference if the two frequencies are slightly different. To
improve the capture range of the phase detector, modern phase-locked
systems incorporate additional means of frequency acquisition.
A circuit that can detect both phase and frequency difference proves
extremely useful because it significantly increases the acquisition range
and lock speed of PLLs. The sequential phase/frequency detector (PFD)
44 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

proves to provide a large capture range for periodic waveforms [31]. Fig-
ure 3.26(a) shows the implementation of this circuit and the correspond-

ing waveforms when the two inputs have different frequencies and phases.
If the frequency of input A is greater than that of input B, then the PFD
Clock and Data Recovery Architectures 45

produces positive pulses at while remains zero (Fig. 3.26(b)).


Conversely, if positive pulses appear at while If
then the circuit generates pulses at either or with a
width equal to the phase difference between the two inputs (Fig. 3.26(c)).
Thus the average value of is an indication of the frequency or
phase difference between A and B.
The sequential phase/frequency detector is a major block used for
phase detection in frequency synthesizers and clock generators. Its com-
pact and power-efficient structure makes it attractive for low-power ap-
plications. However, this circuit cannot be used to provide phase error
information for random data, because in contrast to periodic data, a zero
crossing at the end of each bit is not guaranteed. Consecutive ones and
zeros are very likely to appear in a random sequence and automatically
reduce the transition density of the signal.
Binary data is usually transmitted in the “nonreturn-to-zero” (NRZ)
format. Each bit has a duration of T and is equally likely to be one
or zero. NRZ data has two properties that make the task of clock and
data recovery difficult. First, the data may exhibit long sequences of
consecutive ones and zeros. This means that in the absence of data
transitions, the CDR circuit should not only continue to produce the
clock, but also incur negligible drift in the clock frequency.
Second, the spectrum of the NRZ data has nulls at frequencies that are
integer multiples of the bit rate. Due to the lack of a spectral component
at the bit rate in the NRZ format, a clock recovery circuit may lock to
spurious signals or simply not lock at all. As we mentioned previously,
the NRZ data usually undergoes a nonlinear operation at the front end
of the circuit so as to create a frequency component at the bit rate.
The phase detectors that operate with random data are categorized
in two groups, linear and binary. In a linear phase detector, similar to
the XOR gate for the periodic signal, the phase error signal has a linear
relationship with the phase difference, falling to zero in the phase-lock
condition. In a binary phase detector, a binary (early or late) signal is
generated in response to arbitrarily small phase differences between the
clock and data.

2.3.1 Linear Phase Detectors


In a linear PD, like the one addressed in [32], phase error information
is produced by taking the difference of two pulses, both of which are
generated at any data transition. The width of one of the pulses is
linearly proportional to the phase difference between the clock and data,
whereas the other one has a constant pulsewidth. By using a differential
46 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

error signal, pattern dependency of phase error is cancelled since both


pulses are present only when a data transition occurs.
Figure 3.27 depicts the implementation of the Hogge phase detector.
The input NRZ data ripples through two D flipflops. One of the flipflops

samples its input on the rising edge of the clock and the other one
samples it at the falling edge. As shown in the waveforms, if the three
signals, A, and are applied to the two XOR gates, the resulting
signals will have the property of linear phase detectors. One will carry
a pulse for every transition of the data with a width proportional to the
phase difference between the clock and the data. The other one will have
pulses as wide as half the clock period.
An important feature of the Hogge phase detector is the automatic
retiming of the incoming sequence. In the locked condition, the zero
crossings of the clock signal appear in the middle of a bit. Meaning that
the clock samples the bit at its optimum point.
Clock and Data Recovery Architectures 47

There is an important issue in the design of the Hogge linear phase


detector. Among the three signals applied to the XORs, two of them
( and A) contain a clock-to-Q delay with respect to the clock edge.
However, does not contain this delay. This systematic difference in
the timing of these signals can cause a phase offset that could result in
degradation of the quality of signal detection if its value is large com-
pared to the data period. In practice, an additional delay element is
placed on the path of the data signal to the input of the XOR such that
a delay equal to the clock-to-Q delay of the latches is introduced on the
signal path.
Another problem of the Hogge phase detector is that the retiming
delay through leads to a half-period skew between the pulses at
Error and those at Reference. Consequently, even in lock, a charge
pump and loop filter driven by Error and Reference produce a pos-
itive ramp while Error is high and a negative ramp while Reference
is high. The control line of the VCO therefore experiences a triwave
with a positive net area, disturbing the VCO on every data transition
(Fig 3.28).

The Hogge phase detector can be modified to alleviate the residual


error caused by the triwaves. Shown in Fig. 3.29, one extra latch and
two additional XOR gates are added to the original Hogge PD [41].
The latches are driven by alternating phases of the clock signal and
two cascaded latches form a flipflop. The outputs of the XOR gates
control the charge pump. Any pulse generated at the first output,
starts with a data edge and ends with a clock edge. Therefore it carries
information about the phase difference between the data and clock. The
other three outputs provide pulses as wide as one half of a clock period for
every data transition as the signal ripples through the chain of latches.
The resulting outputs control the charge pump in the following way:
and control the first and second up-ramp, respectively. and
48 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

control the first and second down-ramp. This indicates that each phase
measurement persists for two clock periods and charge pump activities
provided by the four outputs cancel each other such that the triwave
transient has a net area of zero. This effect significantly reduces the
pattern-dependent jitter at the output of the CDR circuit.

2.3.2 Binary Phase Detectors


In a binary phase detector, a binary error signal is generated in re-
sponse to small phase differences between the clock and the data. This
binary error signal determines whether the clock phase is “early” or
“late” with respect to the data phase.
An inherent characteristic of a binary phase detector is the continuous
generation of early and late pulses at its output, while the clock edge
Clock and Data Recovery Architectures 49

repeatedly moves back and forth around the zero crossings of the data.
This is in contrast to linear phase detectors because the output of the
latter goes to zero in phase lock. This characteristic of binary phase
detectors can inherently lead to a higher charge pump activity, possibly
increasing the clock jitter.
One of the most commonly-used binary phase detectors is the circuit
presented by Alexander [33], in which the zero crossings of the data are
measured as early or late events when compared with the transitions of
the clock signal. Similar to the Hogge phase detector, the structure of
the Alexander phase detector allows for automatic retiming of the data.
During any particular clock interval, this phase detector provides
three binary samples of the data signal: the previous bit (A), a sam-
ple of the current bit at the zero crossing (B), and the current bit (C)
(Fig. 3.30(a)). Figures 3.30(b),(c) depict the value of these samples for
the early and late clocks, respectively. The retimed data can be taken

from A or C. The output is usually taken from A so as to get an addi-


tional retiming of the data pulse and further improve the data eye.
The location of the clock edge with respect to the data edge can be
determined based on the following rules.
If clock is early.
50 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

If clock is late.

If no data transition has occurred.


Using the above observations, the three samples can be used to produce
a phase error in a GDR circuit. The Early signal can be formed as
and the Late signal is generated as The desired phase error can
be obtained by subtracting the Early signal from the Late signal.
To improve the performance of the phase detector, the three samples
A, B, and C should all be available at the same time. For this reason,
sample B is regenerated by the clock edge that produces the two samples
A and C.
A simple master-slave D flipflop can serve as an NRZ phase detector
if its clock input is driven by the data stream and its D input senses the
VCO output [Fig. 3.31(a)]. Called a “bang-bang” phase detector, this

topology exhibits a very nonlinear characteristic [Fig. 3.31(b)], applying


large swings to the loop filter and possibly introducing substantial ripple
on the oscillator control line.
Clock and Data Recovery Architectures 51

A critical drawback of this CDR architecture at high speeds results


from the skews in and Since typical flipflops suffer from un-
equal data-to-output and clock-to-output delays, the loop locks such that
the recovered clock and the input data sustain a finite, systematic phase
offset, compensating for the delay difference. Illustrated in Fig. 3.31(c),
the skews of and add, resulting in a significant deviation of the
clock edge from the middle of the data bits.
Another implementation of a full-rate binary system is the circuit
presented by Pottbacker [34]. This circuit is a digital implementation of
the quadricorrelator, providing a capture range of 15 %. The operation
of the circuit is based on the bang-bang concept described above, with
the difference that the PD operates on both the rising and falling edges
of the clock. Similar to the previous circuit, its drawback is the lack of
inherent data regeneration in its structure. As shown in Fig. 3.32(a) it

consists of two phase detectors and a frequency detector. Each phase


detector is formed using a double-edge-triggered flipflop (DETFF) with
52 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

clock signal applied to its input and the data signal applied to its trigger.
Figures 3.32(b) and (c) depict the waveforms for the two cases when
the clock is early or late. Utilizing both the rising and falling edges
increases the correction rate of the CDR circuit by a factor of two. This
can eventually result in a smaller output jitter, because the VCO phase
will be corrected at a higher rate.
Since the bang-bang nature of this phase detector creates significant
ripple on the control line in the locked condition and hence produces
large jitter at the VCO output, the latches forming the flipflops can be
replaced by sample-and-hold circuits to modify the binary characteristic
of the phase detector into a more linear behavior. In [35], the phase
detector is formed as a master-slave sample-and-hold circuit (Fig. 3.33).
The rising data transitions sample the instantaneous value of the VCO
output. The circuit thus generates an output that is linearly propor-
tional to the phase difference in the vicinity of the lock point.

2.4. Frequency Detectors


The fiber optic standards require operation at an exact data rate.
Therefore, the oscillators should be guaranteed to oscillate at an exact
frequency which equals the data rate or an integer fraction of it. The
oscillators are designed with a large tuning range to account for the
process and temperature variations.
On the other hand, the CDR circuits provide a very narrow capture
range. This range is primarily determined by two factors: loop band-
width and phase detector topology. The loop bandwidth of the CDR
circuit is defined by the standard and does not exceed a few megahertz.
The linear phase detectors usually have a capture range of a fraction
of one percent of the incoming data rate. This value can be as high as
a few percents for a bang-bang phase detector. Therefore, the capture
range of the CDR circuits is much smaller than the tuning range of the
oscillator. For this reason, the CDR circuit is unlikely to acquire lock to
Clock and Data Recovery Architectures 53

the data when the circuit turns on and the VCO starts to oscillate at a
frequency that is very different from the data rate.
This limitation calls for an aided acquisition mechanism. Various
frequency detection schemes have been introduced that operate with or
without a reference signal. The idea is that as the circuit turns on, the
frequency detector pushes the VCO frequency to a value close to the
data rate. When the difference between the oscillation frequency and
the data rate is small enough to fall in the capture range of the phase
detector, the frequency detector is disabled and the phase detector takes
over. Eventually in the phase-lock condition, the phases of clock and
data signals are within a constant offset from each other, ensuring that
the clock frequency equals the data rate.
We describe a number of mechanisms for referenced and referenceless
frequency acquisition. Similar to the phase detectors, the frequency de-
tectors can operate with a full-rate or half-rate clock. We briefly review
a number of full-rate frequency detection schemes in this section. In
chapter 6, a new approach for half-rate frequency detection is described.

2.4.1 Referenced Frequency Detectors


A high-speed transceiver uses a reference clock in the transmitter to
multiplex the low-speed sequences into a single high-speed signal. This
reference clock can be used for frequency acquisition in the receiver that
is built on the same chip. This signal is used in a frequency locked
loop (FLL) that brings the VCO frequency close to the data rate. Two
approaches utilizing this concept are described in this section, the first
one captures lock while the frequency locking circuit is still running,
whereas in the second one, the FLL is deactivated before the CDR takes
over.
In the circuit described in [36], an additional reference PLL is used
for aided acquisition. As shown in Fig. 3.34, the reference PLL locks to
a frequency that is N times higher than the frequency of the reference
clock. Since both the reference and the signals are periodic, the
phase/frequency detector of the reference PLL can be implemented as a
simple block with a wide capture range. The two oscillators used in the
reference PLL and the CDR circuit should be identical such that they
produce the same output frequency for identical control voltages applied
to their inputs.
The control voltage of the VCO in the CDR circuit is decomposed into
coarse and fine voltages. The control voltage generated by the reference
PLL is routed to the CDR circuit as the coarse control voltage of the
VCO. The fine control voltage is generated by the CDR circuit based
on the comparison of the phase of the data to that of the clock signal.
54 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

Because of this decomposition, the VCO gain from the fine control to the
output can be very small. A smaller VCO gain translates the ripple on
the control line into a smaller amount of output jitter. Meanwhile, the
coarse control guarantees phase lock over a very wide frequency range.
The two oscillators used in this circuit should be spaced far apart from
each other. Otherwise, injection locking of the two VCOs can result in
false lock. Theoretically, identical oscillators provide equal oscillation
frequencies for similar control voltages. However, inherent mismatches
between the two VCOs can be significant because they should be placed
relatively apart from each other. For this reason, the CDR circuit should
use a means of narrowband frequency detection to achieve phase lock for
small frequency mismatches.
On the other hand, the circuit described in [37] consists of a single
VCO, a phase detector, and a frequency detector. The phase detector
and the frequency detector are connected to the loop filter through a
multiplexer (Fig. 3.35). When the circuit turns on, the multiplexer ac-
tivates Loop I and the circuit locks to the reference clock. Then the
multiplexer switches to the other mode and Loop II is activated. As the
loop locks to the random data, the frequency detector is turned off, re-
ducing the power consumption. The operating mode of the multiplexer
is determined by a lock detector that measures the frequency difference
between the reference clock and VCO frequency. It can be formed as
Clock and Data Recovery Architectures 55

a counter that counts the number of pulses on one of the signals when
clocked by the other one.

2.4.2 Referenceless Frequency Detectors


Referenceless frequency detectors become attractive for various rea-
sons. Elimination of an external reference signal helps further integration
of the system. An external clock signal can degrade the performance of
the circuit mostly through substrate coupling. Furthermore, implemen-
tation of referenced acquisition schemes requires more circuitry and a
larger chip area.
In this section, a number of referenceless schemes are discussed. These
techniques are all based on the concept of the quadricorrelator that was
originally described in [38]. Figure 3.36 depicts a simple quadricorre-
lator. The incoming passband signal is multiplied by the in-phase and
quadrature signals produced by the oscillator to generate the correspond-
ing in-phase and quadrature baseband components ( and ). The
mixers produce the sum and the difference frequency products between
the input signal and the local oscillator. Low-pass filters following the
mixers suppress the sum and pass the difference frequency. The in-phase
baseband component is differentiated and multiplied by the quadrature
baseband component.
If the input is a tone, the output of the quadricorrelator consists of
a dc component proportional to the frequency difference between the
input and the oscillator signals, and a ripple component at double the
frequency difference. The dc component can be used as the error signal
in the frequency tracking loop.
56 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

The quadricorrelator, however, has a limited capture range [39]. The


frequency difference between an incoming signal and the oscillator must
fall within the passband of the filter. A signal that is well outside of the
passband will be significantly suppressed by the filter.
Although this approach primarily deals with periodic incoming sig-
nals, it can be modified to work with random data as well. In [40] a
quadricorrelator operating with random data is introduced. This circuit
achieves a capture range of 12%. Since the operation of the quadri-

correlator depends on the existence of a tone at the data rate and the
random data does not contain a spectral component at this frequency,
edge detection should be performed.
In this circuit, edge detection is performed by differentiating and rec-
tifying the data. The circuit uses stacking to integrate the tasks of
differentiation, rectification, mixing, and low-pass filtering in one block.
This will result in substantial power reduction since the circuits reuse
Clock and Data Recovery Architectures 57

the same bias current. Stacking also yields a smaller chip area since
routing between various stages can be eliminated.
The quadricorrelator can also be implemented using digital elements
to produce a binary error signal. The two examples are the rotational
frequency detector [41], and the circuit introduced by Pottbacker [34].
The operation of the rotational frequency detector can be described
as follows: In the presence of a frequency difference between the data
and the clock, the phase relationship will change with time at a rate
proportional to the frequency difference, producing a beat frequency. A
circular phasor diagram of the oscillator signal can be used to express the
concept (Fig. 3.38). The diagram is split in four quadrants, A, B, C, and
D. For simplicity the phasor for the clock is assumed to be constant,
serving as a reference, and the phasor for the data moves around the
circle. The direction of this rotation determines whether the data rate
is faster or slower than the clock frequency.

When the data frequency is lower than the data rate, the data phasor
rotates counterclockwise. The direction of rotation can be distinguished
by marking the two consecutive quadrants where the phasor is detected.
For example as the phasor moves from B to C, the clock is found to be
fast. A transition from the C to B quadrant denotes a slow clock.
In the Pottbacker frequency detector, shown in Fig. 3.32, two beat
frequencies equal to the difference of clock frequency and data rate are
generated at the outputs and one of them leading the other
one. The direction of the frequency difference can be determined from
the relative spacing of these two signals. If leads clock is
slow and if lags clock is fast. The relative spacing of the two
signals is extracted using a DETFF in which is sampling
CDR loops employing frequency detectors that operate with random
data exhibit only a moderate capture range, not exceeding of
the center frequency. This limitation can be explained with the aid
of the characteristic plotted in Fig. 3.39 for the frequency detector of
Fig. 3.32(a). We note that for a large difference between the data rate
and the VCO frequency, the average output is close to zero, carrying
58 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

little information. Figure 3.40(a) shows a part of a dual-loop architec-


ture that substantially increases the capture range [42]. The frequency
detector used here is similar to that in Fig. 3.32(a). Here, a counter
controlling the capacitor array sets the VCO frequency to the lowest
value. Under this condition, is very negative and is close to
zero. Thus, the two comparators generate logical zeros, the output of
the OR gate remains low, and the counter continues to (slowly) count up
until drops below This is an indication that has reached a
reliable level. Now the two flipflops begin to save each state before the
next count is carried out. The counter still continues to count until
crosses zero and jumps from negative to positive. The two flipflops
then record this change, disabling the counter and enabling the CDR
loop (not shown in Fig. 3.40).

2.5. Decision Circuits


The synchronized sampling of the peak value of a pulse results in a
high SNR. However, we can take advantage of the random nature of the
noise by performing averaging in one bit period. Shown in Fig. 3.41,
is an example where the input pulse in integrated from 0 to T and the
sampling occurs at The noise components that vary significantly
in a period of T tend to average out [14].
This idea leads to the concept of matched filters. For a pulse that is
corrupted by additive white noise, there exists an optimum filter that
maximizes the SNR at the sampling instant. Matched filters are exten-
sively used in low-speed communication systems. For practical issues,
they have not been previously implemented in high-speed systems such
as optical receivers.
Before getting into the design of high-speed CDR circuits, we describe
an approach for high-speed matched filtering in chapter 4. Then in
Clock and Data Recovery Architectures 59

chapters 5 and 6 we describe two implementations of 10-Gb/s CMOS


CDR circuits, the former using a ring oscillator and a linear half-rate
phase detector and the latter benefiting from a multi-phase LC oscillator
and a binary half-rate phase/frequency detector.
60 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS
Chapter 4

A CMOS INTERFACE FOR DETECTION


OF 1.2-GB / S RZ DATA

This chapter describes the design of an interface for cryogenic radar


systems. This interface circuit incorporates an amplifier with 2-GHz
bandwidth, interleaved matched filters, and a 1:2 demultiplexer. Fabri-
cated in a CMOS technology, the interface achieves a sensitivity
of while consuming 142 mW from a 3.3-V supply, and occupy-
ing an area of The low sensitivity, wide bandwidth,
and small input-referred noise requirements of this circuit are similar to
those of a fiber optic receiver. The solutions provided in this chapter
become useful for the implementation of high-speed front-end circuits in
an optical communication system.

1. Introduction
This interface is used in a cryogenic radar system (Fig. 4.1). The
received radar signal is converted to a digital bit stream by means of a
Josephson junction analog-to-digital converter. The resulting output is a
pseudo-differential return-to-zero (RZ) signal with a bit rate of 1.2 Gb/s
and an amplitude of The interface must convert this serial data
into 8 parallel streams each having a peak-to-peak amplitude of 1 V.
The principal challenge in this design is the combination of high speed
and low signal levels in a moderate technology such as CMOS
process. To appreciate this challenge, some of the important issues in the
design of the interface, which consists of a pre-amplifier and a decision
circuit, are described.
For bandwidth calculations, an RZ signal can be roughly considered
as a nonreturn-to-zero (NRZ) signal with twice the bit rate. In or-
der to suppress intersymbol interference (ISI) in a broadband system,
the bandwidth should be at least equal to 70% of the bit rate, about
62 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

1.7 GHz in this system. Increasing the bandwidth beyond this point
reduces the intersymbol interference but at the cost of higher power dis-
sipation and, more importantly, higher total integrated thermal noise.
In other words, there exists a trade-off between signal integrity and sen-
sitivity. The bandwidth is chosen to be approximately equal to 2 GHz
as a compromise between these two factors.
To achieve acceptable ISI, the amplifier should have a transfer function
with small ripples in magnitude and a linear phase across this bandwidth.
To overcome the offset of the decision circuit and provide enough
overdrive voltage, the amplifier must exhibit sufficient gain, on the order
of 40 dB.
Finally, to obtain a bit error rate (BER) of the input-referred
noise density must be lower than This is an important
concern because the bandwidth requirement does not allow a large gain
in the first stage of this amplifier, making the noise of the following
stages significant.
Simulations indicate that it is difficult to simultaneously satisfy all
of these requirements in a CMOS technology, even if power dis-
sipation is not critical. As a result, a means of relaxing the trade-offs
between these parameters must be introduced.

2. Matched Filtering
First, we assume that a stream of rectangular pulses with amplitude A
and period experiences additive noise and subsequently goes through
a low-pass system with bandwidth (Fig. 4.2(a)).
A CMOS Interface for Detection of 1.2-Gb/s RZ Data 63

The signal-to-noise ratio (SNR) at the output of this system is equal


to The value of the SNR ultimately determines the BER,
based on the type of modulation. Also, is the noise bandwidth rather
than the 3-dB bandwidth of the system.
Next, the low-pass system is replaced by a matched filter, a circuit
whose impulse response is similar to the input pulse shape but reversed
in time and shifted by (Fig. 4.2(b)). In this case, the output SNR
is given by [14]. To realize the advantage of matched
filtering, the bandwidth of the first system is assumed to be roughly
equal to the bit rate. This indicates that the SNR of the second system
is twice that of the first system, which is an improvement of 3 dB. In
reality, the noise bandwidth is typically higher than the bit rate and the
improvement can be as high as a factor of that is, 5 dB.
Matched filters are used extensively in low-speed communication sys-
tems, but this improvement makes them attractive for the gigahertz
range as well. In the first step, the filter can be reduced to a more
familiar form.
For a square pulse, the matched filter can be implemented as an
integrate-and-dump operation. As shown in Fig. 4.3, additive white
noise in the system corrupts the input data. If the decision circuit sam-
ples the output of the integrator at the end of the bit period, this sample
carries information about the signal not only at the sampling instant but
also for the entire period. From another point of view, integration filters
out high frequency components of noise, and the final level is somewhat
cleaner than any single point on the original waveform.
64 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

At the end of the integration mode, the integrator must be quickly


reset and integration of the next bit must begin. But this is difficult for
two reasons. First, the final value must be held constant until the deci-
sion circuit has reliably sampled it. Second, the dump operation cannot
be arbitrarily fast because larger devices required for quick discharge
also add substantial capacitance to the circuit and lower the gain of the
integrator.
Fortunately, an idle zero exists between every two bits in an RZ se-
quence. This suggests that both hold and dump can be performed during
this idle time (Fig. 4.4). However, the partitioning of this small period
between these two operations is quite difficult, requiring additional clock
edges that are sensitive to process and temperature. Furthermore, a to-
tal time of 416 ps is not sufficient for both.
With these issues in mind, we can consider interleaving two matched
filters to relax the timing constraints. Now each integrate-and-dump
circuit operates in three phases: integrate, hold, and dump. As shown
in Fig. 4.5, when one integrator enters the reset mode, the other begins
to integrate. This allows one bit period for each of these operations.
Furthermore, only quadrature phases of a clock with a frequency equal
to half the bit rate are needed.
A CMOS Interface for Detection of 1.2-Gb/s RZ Data 65

3. Architecture
Figure 4.6 depicts the interface architecture [43]. The input signal is
66 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

applied to a low-noise amplifier consisting of seven stages, generating an


output swing of approximately 200 mV. The signal is then processed by
two matched filters and subsequently sampled by two decision circuits.
The decision circuit is implemented as a master-slave D flipflop and it
extracts the zeros and ones while converting the RZ data into an NRZ
stream. This signal is passed to the buffers which are formed as open-
drain differential pairs connected to the output pads.
External quadrature clock signals at a frequency equal to half the bit
rate of the input signal are used to maintain synchronization and provide
the control commands in the circuit. Since the required clock frequency
is half the bit rate, the quadrature phases can be easily generated by a
divide-by-two circuit, but for this circuit they are provided externally.
A CMOS Interface for Detection of 1.2-Gb/s RZ Data 67

When one matched filter resets, the other integrates. Each decision
circuit begins to sample at the end of the integration mode, producing
a logical level at the output.

4. Building Blocks
4.1. Low-Noise Wideband Amplifier
The wideband amplifier in this interface must boost the signal level
with minimal ISI. This amplifier consists of 7 stages: The first is a
common gate topology and the following six are common-source stages
(Fig. 4.7(a)-(c)).

In order to achieve an overall bandwidth of 2 GHz, each stage must


achieve a similar bandwidth. This is accomplished by incorporating
inductive peaking in all stages. In addition, the common-source dif-
ferential pairs also use capacitive/resistive degeneration to increase the
bandwidth and reduce the ripple in the passband.
The common-gate input stage provides two useful properties. First,
the input impedance can be set to over a relatively wide frequency
68 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

band by proper choice of device dimensions and bias currents. Second,


the noise figure of the stage is relatively independent of frequency and
does not require tuning techniques.
Figure 4.8 shows the simulation results for the amplifier frequency
response. The 3-dB bandwidth is about 2 GHz. The fabricated pro-

totype of the amplifier exhibits a gain of 43 dB across a bandwidth of


2.1 GHz.
The simulated results for the noise and the gain of the stages at 2 GHz
are presented in Table 4.1. The voltage gain of the stages adds up to
45.7 dB. The noise of these stages adds up to an overall input-referred
noise voltage of

The simulated input-referred noise voltage across the band is shown


Fig. 4.9(a). The noise varies from 3.2 to This is an op-
timistic estimate of the noise because SPICE assumes an excess noise
coefficient of 2/3 whereas for short-channel devices, it is quite larger.
The output of the amplifier with RZ data is presented in Fig. 4.9(b).
The major contributor to the output ISI is the kickback noise caused by
A CMOS Interface for Detection of 1.2-Gb/s RZ Data 69

the switches in the matched filter. The eye opening is about 70% and
the jitter is about 40 ps. The signal slews with a slope of 1 V/ns.
In order to integrate the inductors in a reasonable area, a stacked
structure consisting of metal 2 and metal 3 has been used (Fig. 4.10).
Since in this circuit the self-resonance frequency is more critical than
the quality factor, the line width is only The values range from
11 nH to 17 nH.

4.2. Integrate-and-Dump Circuit


The speed and input-referred offset issues require a simple and com-
pact topology for both the integrator and its reset mechanism. In the
70 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

circuit of Fig. 4.11 (a), and convert the input voltage to current
and the result flows through the total capacitance at nodes X and Y.
Simulations indicate that the parasitic capacitance already available at
nodes X and Y is sufficient to allow fast integration with considerable
voltage swings.
The dimensions of the input transistors are chosen as a compromise
between input-referred offset (around 10 mV) and speed. Each matched
filter is biased at a total current of 1 mA and provides a voltage gain of
approximately two at the end of the integrate phase.
As shown in Fig. 4.11(b), the RZ bit stream amplified by the pre-
amplifier is applied to the input and is integrated for one bit period.
Switch which is an NMOS device, resets the integrator at the end
of the integration phase. The dimensions of are chosen as a trade-
off between faster reset and less parasitic capacitance at the output.
However, common-mode feedback and a hold phase should be added to
this circuit.
Shown in Fig. 4.12(a), the common-mode feedback consists of two
relatively large resistors, and which are implemented by small
PMOS devices. The hold mode is controlled by switch which dis-
ables the differential pair at the end of the integration mode and freezes
the output voltage. In reality, the output does experience a small degra-
dation (Fig. 4.12(b)). The dip seen in the hold mode results from the
relatively large capacitance from the source of and to ground.
Matched filtering along with interleaving provides a hold period, mak-
ing the sampling less susceptible to jitter. Proper choice of device di-
mensions leads to a small input offset for the matched filter. However,
A CMOS Interface for Detection of 1.2-Gb/s RZ Data 71

the devices cannot be very large because of the limitations on speed and
loading on the previous stage. The output voltage of the matched fil-
ter drops during the hold mode. To alleviate this problem, the decision
circuit starts to sample the output of the matched filter at the start
of the hold mode rather than at the end of it. The non-square wave-
form of the input slightly degrades the improvement in SNR because the
corresponding matched filter is not exactly an integrator.

4.3. Demultiplexer
The matched filter is followed by a master-slave D flipflop (Fig. 4.13.
To achieve short set-up and hold times, the flipflops use current steering
with 2 V of differential swing. Each latch consists of a pre-amplifier that
senses the amplitude during half the period and a regenerative circuit
72 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS
A CMOS Interface for Detection of 1.2-Gb/s RZ Data 73

that boosts the level of the signal. Each latch uses a bias current of
1 mA and load resistors of

4.4. Clock Buffer


The external clock is applied to the system in the form of sinusoids
with small amplitudes. The clock buffer (Fig. 4.14) converts this signal
to the sharp, rail-to-rail edges required for the switches in the matched

filters. The inverters are sized to drive the load capacitance with short
rise and fall times.
Since the I and Q clock signals experience different loading, these in-
verters are large enough to maintain reasonable matching in the interface
environment. In this design, rail-to-rail swings are preferred because of
their capability to switch nodes with different dc levels.
Resistor serves as a termination. This resistor also equalizes the
mismatch between the common-mode levels and also the phase of the
signals applied to the input of the two gates.
74 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

5. Experimental Results
The interface has been fabricated in a CMOS technology. Fig-
ure 4.15 depicts the die photograph. The circuit occupies an area of
The circuit was tested with a 3.3-V supply in a
chip-on-board assembly.

The eye diagram of the output of one of the channels is depicted


in Fig. 4.16, with 1.2-Gb/s RZ data, both channels have a
relatively open eye diagram. The measurement of the BER at these
frequencies has been quite difficult because the input is RZ and the
output is NRZ.
The amplifier achieves a gain of 43 dB over a bandwidth of 2.1 GHz. A
total power of 142 mW is dissipated in this interface, of which 80 mW is
consumed by the pre-amplifier and 4 mW is consumed by each matched
filter. It can be realized that the power penalty for the matched filters
is very small.

6. Conclusion
The concepts introduced in this work can be applied to other appli-
cations as well. For example, in a fiber optic receiver (Fig. 4.17), the
current generated by the photo detector is amplified by a transimpedance
amplifier. We can then interpose interleaved matched filters between the
amplifier and the decision circuits to improve the SNR. Since the power
consumption and complexity of the matched filters can be quite low, the
boost in performance is obtained at minimal cost.
A CMOS interface for detection of 1.2-Gb/s RZ data incorporates
wideband amplification, matched filtering, and demultiplexing. Low-
noise amplifiers with bandwidths exceeding 2 GHz can be implemented
A CMOS Interface for Detection of 1.2-Gb/s RZ Data 75

in CMOS process using inductors. These inductors can be inte-


grated without any process modifications. Matched filters improve the
overall SNR by approximately 3 dB. Matched filtering can be performed
on high-speed data using a simple and compact topology.
Chapter 5

A 10-GB / S LINEAR HALF-RATE CMOS


CDR CIRCUIT

This chapter describes the design and experimental results of a 10-


Gb/s CMOS phase-locked clock and data recovery circuit. The circuit
incorporates an interpolating voltage-controlled oscillator and a half-rate
phase detector. The phase detector provides a linear characteristic while
retiming and demultiplexing the data with no systematic phase offset.
Fabricated in a CMOS technology in an area of
the circuit exhibits an rms jitter of 1 ps, and a peak-to-peak jitter of
14.5 ps in the recovered clock and a bit error rate of with
random data input of length The power dissipation is 72 mW
from a 2.5-V supply.
The next section describes the CDR architecture and its design issues.
The following sections present the design of the building blocks and the
description of the experimental results.

1. Architecture
The choice of the CDR architecture is primarily determined by the
speed and supply voltage limitations of the technology as well as the
power dissipation and jitter requirements of the system.
In a generic CDR circuit, shown in Fig. 5.1, the phase detector com-
pares the phase of the incoming data to the phase of the clock generated
by the voltage-controlled oscillator (VCO), producing an error that is
proportional to the phase difference between its two inputs. The error is
then applied to a charge pump and a low-pass filter so as to generate the
oscillator control voltage. The clock signal also drives a decision circuit,
thereby retiming the data and reducing its jitter.
If attempted in a CMOS technology, the architecture of
Fig. 5.1 poses severe difficulties for 10-Gb/s operation. Although ex-
78 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

ploiting aggressive device scaling, the CMOS process used in this work
provides marginal performance for such speeds. For example, even sim-
ple digital latches or three-stage ring oscillators fail to operate reliably
at these rates. These issues make it desirable to employ a “half-rate”
CDR architecture, where the VCO runs at a frequency equal to half of
the input data rate. The concept of half-rate clock has been used in
[44]-[47]. However, [44] and [45] incorporate a bang-bang phase detector
(PD), possibly creating large ripple on the control line of the oscilla-
tor and hence high jitter. The circuit reported in [46] inherently has a
smaller output jitter as a result of using a linear phase detector, but it
fails to operate at speeds above 6 Gb/s in CMOS technology.
The circuit of [47] benefits from a new linear phase detection scheme,
but it may not operate properly with certain data patterns.
Another critical issue in the architecture of Fig. 5.1 relates to the
inherently unequal propagation delays for the two inputs of the phase
detector: Most phase detectors that operate properly with random data
(e.g., a D flipflop) are asymmetric with respect to the data and clock
inputs, thereby introducing a systematic skew between the two in phase-
lock condition. Since it is difficult to replicate this skew in the decision
circuit, the generic CDR architecture suffers from a limited phase margin
- unless the raw speed of the technology is much higher than the data
rate.
The problem of the skew demands that phase detection and data
regeneration occur in the same circuit such that the clock still samples
the data at the midpoint of each bit even in the presence of a finite skew.
For example the Hogge PD [32] automatically sets the clock phase to the
optimum point in the data eye (but it fails to operate properly with a
half-rate clock).
The above considerations lead to the CDR architecture shown in
Fig. 5.2. Here, a half-rate phase detector produces an error proportional
to the phase difference between the 10-Gb/s data stream and the 5-GHz
output of the VCO. Furthermore, the PD automatically retimes and de-
multiplexes the data, generating two 5-Gb/s sequences and
A 10-Gb/s Linear Half-rate CMOS CDR Circuit 79

Although the focus of this work is point-to-point communications, a full-


rate retimed output, is also generated to produce flexibility in
testing and exercise the ultimate speed of the technology. The VCO
has both fine and coarse control lines, the latter allowing inclusion of a
frequency-locked loop in future implementations.
In this chapter, a new approach to performing linear phase detec-
tion using a half-rate clock is described. Owing to its simplicity, this
technique achieves both a high speed and low power dissipation while
minimizing the ripple on the oscillator control voltage.
It is interesting to note that half-rate architectures do suffer from one
drawback: the deviation of the clock duty cycle from 50% translates to
bimodal jitter. As depicted in Fig. 5.3, since both clock edges sample the
data waveform, the clock duty cycle distortion pushes both edges away
from the midpoint of the bits. Typical duty cycle correction techniques
used at lower speeds are difficult to apply here as they suffer from sig-
80 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

nificant dynamic mismatches themselves. Thus special attention is paid


to the symmetry in the layout to minimize bimodal jitter.
Another important aspect of CDR design is the leakage of data tran-
sitions to the oscillator. In Fig. 5.2, such leakage arises from (1) capac-
itive feedthrough from to CK in the phase detector, (2) capacitive
feedthroughfrom and to CK through the multiplexer, and
(3) coupling of to the oscillator through the substrate. To mini-
mize these effects, the VCO is followed by an isolation buffer and all of
the building blocks incorporate fully differential topologies.

2. Building Blocks
2.1. VCO
The design of the VCO directly impacts the jitter performance and
the reproducibility of the CDR circuit. While LC topologies achieve
a potentially lower jitter, their limited tuning range makes it difficult
to obtain a target frequency without design and fabrication iterations.
Since the circuit reported here was our first design in technology,
a ring oscillator was chosen so as to provide a tuning range wide enough
to encompass process and temperature variations.
A three-stage differential ring oscillator [Fig. 5.4(a)] driving a buffer
operates no faster than 7 GHz in CMOS technology. The half-
rate CDR architecture overcomes this limitation, requiring a frequency
of only 5 GHz.
As shown in Fig. 5.4(b), each stage consists of a fast and a slow path
whose outputs are summed together. By steering the current between
the fast and the slow paths, the amount of delay achieved through each
stage and hence the VCO frequency can be adjusted. All three stages
in the ring are loaded by identical buffers to achieve equal rise and fall
times and hence improve the jitter performance. Figure 5.4(c) shows the
transistor implementation of each delay stage. The fast and slow paths
are formed as differential circuits sharing their output nodes. The tuning
A 10-Gb/s Linear Half-rate CMOS CDR Circuit 81

is achieved by reducing the tail current of one and increasing that of the
other differentially. Since the low supply voltage makes it difficult to
82 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

stack differential pairs under and the current variation


is performed through mirror arrangements driven by PMOS differential
pairs. Figure 5.5 depicts the small-signal gain and phase response of each
delay stage. While providing a phase shift of 60°, each stage achieves
a gain of 5.5 dB at 5 GHz, yielding robust oscillation at the target
frequency.

A critical drawback of supply scaling in deep submicron technologies


is the inevitable increase in the VCO gain for a given tuning range.
To alleviate this difficulty, the control of the VCO is split between a
coarse input and a fine input. The partitioning of the control allows a
reduction of more than one order of magnitude in the VCO sensitivity.
The idea is that the fine control is established by the phase detector and
the coarse control is a provision for adding a frequency detection loop.
The coarse control is provided externally in this prototype. The fine
A 10-Gb/s Linear Half-rate CMOS CDR Circuit 83

control provides a gain of 150 MHz/V and the coarse control provides
2.5 GHz/V. The tuning range is 2.7 GHz (Fig. 5.6).

2.2. Phase Detector


For linear phase comparison between data and a half-rate clock, each
transition of the data must produce an “error” pulse whose width is
84 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

equal to the phase difference. Furthermore, to avoid a dead zone in


the characteristics, a “reference” pulse must be generated whose area is
subtracted from that of the error pulse, thus creating a net value that
falls to zero in lock.
The above observations lead to the PD topology shown in Fig. 5.7(a).
The circuit consists of four latches and two XOR gates. The data is

applied to the inputs of two sets of cascaded latches, each cascade con-
stituting a flipflop that retimes the data. Since the flipflops are driven
by a half-rate clock, the two output sequences and are the de-
A 10-Gb/s Linear Half-rate CMOS CDR Circuit 85

multiplexed waveforms of the original input sequence if the clock samples


the data in the middle of the bit period.
The operation of the PD can be described using the waveforms de-
picted in Fig. 5.7(b). The basic unit employed in the circuit is a latch
whose output carries information about the zero crossings of both the
data and the clock signal. The output of each latch tracks its input
for half a clock period and holds the value for the other half, yielding
the waveforms shown in Fig. 5.7(b) for points and The two
waveforms differ because their corresponding latches operate on oppo-
site clock edges. Produced as the Error signal is equal to ZERO
for the portion of time that identical bits of and overlap and equal
to the XOR of two consecutive bits for the rest. In other words, Error
is equal to ONE only if a data transition has occurred.
It may seem that the Error signal uniquely represents the phase dif-
ference, but that would be true only if the data were periodic. The
random nature of the data and the periodic behavior of the clock in fact
make the average value of Error pattern dependent. For this reason,
a reference signal must also be generated whose average conveys this
dependence. The two waveforms and contain the samples of the
data at the rising and falling edges of the clock. Thus, contains
pulses as wide as half the clock period for every data transition, serving
as the reference signal.
While the two XOR operations provide both the Error and the Ref-
erence pulses for every data transition, the pulses in Error are only half
as wide as those in Reference. This means that the amplitude of Error
must be scaled up by a factor of two with respect to Reference so that
the difference between their averages drops to zero when clock transi-
tions are in the middle of the data eye. The phase error with respect to
this point is then linearly proportional to the difference between the two
averages.
In order to generate a full-rate output, the demultiplexed sequences
are combined by a multiplexer that operates on the half-rate clock as
well. This output can also be used for testing purposes in order to obtain
the overall bit error rate (BER) of the receiver.
It is important to note that the XOR gates in Fig. 5.7 must be sym-
metric with respect to their two differential inputs. Otherwise, differ-
ences in propagation delays result in systematic phase offsets. Each of
the XOR gates is implemented as shown in Fig. 5.8 [48]. The circuit
avoids stacking stages while providing perfect symmetry between the
two inputs. The output is single-ended but the single-ended Error and
Reference signals produced by the two XOR gates in the phase detector
are sensed with respect to each other, thus acting as a differential drive
86 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

for the charge pump. The operation of the XOR circuit is as follows. If
the two logical inputs are not equal, then one of the input transistors
on the left and one of the input transistors on the right turn on, thus
turning off. If the two inputs are identical, one of the tail currents
flows through Since the average current produced by the Error
XOR gate is half of that generated by the Reference XOR gate, transis-
tor is scaled differently, making the average output voltages equal
for zero phase difference. Channel length modulation of transistor
reduces the precision of current scaling between the two XOR gates.
This effect can be avoided by increasing the length of the device.
The gain of the phase detector is determined by the value of the resis-
tor and the tail current sources The voltage is generated on
chip in order to track the variations over temperature and process. This
voltage equals the output common-mode level of the latches preceding
the XOR gate. It is generated using a differential pair that is a replica
of the preamplifier section of the latch. Current source raises the
common-mode level of the differential signal formed by the Error and
Reference signals, making compatible with the input of the charge
pump.
It is instructive to plot the input/output characteristic of the PD to
ensure linearity and absence of a dead zone. This is accomplished by
obtaining the average values of Error and Reference while the circuit
operates at maximum speed. Figure 5.9 shows the simulated behavior
as the phase difference varies from zero to one bit period. The Reference
average exhibits a notch where the clock samples the metastable points
of the data waveform. The Error and Reference signals cross at a phase
difference approximately 55 ps from the metastable point, indicating
A 10-Gb/s Linear Half-rate CMOS CDR Circuit 87

that the systematic offset between the data and the clock is very small.
The linear characteristic of the phase detector results in minimal charge
pump activity and small ripple on the control line in the locked condition.

The choice of the logic family used for the XOR gates and the latches
is determined by the speed and switching noise considerations. While
rail-to-rail CMOS logic achieves relatively high speeds, it requires am-
plifying the data swings generated by the stage preceding the CDR cir-
cuit (typically a limiting amplifier). Furthermore, CMOS logic produces
enormous switching noise in the substrate and on the supplies, disturb-
ing the oscillator considerably. For these reasons, the building blocks
incorporate current-steering logic. The phase detector incorporates an
input buffer with on-chip resistive matching.

2.3. Charge Pump and Loop Filter


Figure 5.10 shows the implementation of the differential charge pump.
The common-mode feedback (CMFB) circuit senses the output CM level
88 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

by and providing correction through and Both the


matching and channel-length modulation of in Fig. 5.10 impact
the residual phase error in locked condition. Thus, their lengths and
widths are relatively large to minimize these effects.

The design of the loop filter is based on a linear, time-invariant model


of the loop and is performed in continuous time domain. The loop is in
general a nonlinear time-variant system and can only be assumed linear
if the phase error is small. The time-invariant analysis is valid if the
averaging behavior of the loop rather than its single-cycle performance
is of interest, i.e., the loop can be analyzed by continuous-time approx-
imation if the loop bandwidth is small. Under this condition, the state
of the CDR changes by only a small amount on each cycle of the input
signal.
A low-pass jitter transfer function with a given bandwidth and a max-
imum gain in the passband is specified for a SONET system. The closed-
loop transfer function of the CDR has a zero at a frequency lower than
the first closed-loop pole. This results in jitter peaking that can never
be eliminated. But the peaking can be reduced to negligible levels by
overdamping the loop.
As derived in [41], the closed-loop unity-gain bandwidth is approxi-
mated as:

where and are the gains of the VCO and PD, respectively,
and denotes the conversion gain of the charge pump. Equation (5.1)
can be used to determine the value of The amount of the jitter
A 10-Gb/s Linear Half-rate CMOS CDR Circuit 89

peaking in the closed-loop transfer function can be approximated as:

Equation (5.2) yields the required value of In order to obtain


greater suppression of high-frequency jitter, a second capacitor is added
in parallel with the series combination of and These components
are added externally to achieve flexibility in defining the closed-loop
characteristics of the circuit.
Another advantage of linear PDs over their bang-bang counterparts
is that their jitter transfer characteristic is independent of the jitter
amplitude. It should also be mentioned that if the CDR is followed by
a demultiplexer, the tight specifications for jitter peaking need not be
satisfied because such specifications are defined for cascaded regenerators
handling full-rate data.
Figure 5.11 depicts the simulated behavior of the CDR circuit at the
transistor level. The voltage across the filter is initialized to a value
relatively close to its value in phase lock. The loop goes through a
transition of 350 ns before it locks. The ripple on the control line in
phase lock is approximately 1 mV.

3. Experimental Results
The CDR circuit has been fabricated in a CMOS process.
Figure 5.12 shows a photograph of the chip, which occupies an area of
ESD protection diodes are included for all pads except the
high-speed ones. Nonetheless, since all of these lines have a termi-
90 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

nation to they exhibit some tolerance to ESD. The circuit is tested


in a chip-on-board assembly. In this prototype, the width of the poly
resistors was not sufficient to guarantee the nominal sheet resistance.
As a result, the fabricated resistor values deviated from their nominal
value by 30%, and the VCO center frequency was proportionally lower
than the simulated value at the nominal supply voltage (1.8 V). The
supply was increased to 2.5 V to achieve reliable operation at 10 Gb/s.
While such a high supply voltage creates hot-carrier effects in rail-to-rail
CMOS circuits, it is less detrimental in this design because no transistor
in the circuit experiences a gate-source or drain-source voltage of more
than 1 V. This issue is nonetheless resolved in a second design [49] by
proper choice of resistor dimensions. The circuit is brought close to lock
with the aid of the VCO coarse control before phase locking takes over.

Figure 5.13(a) shows the spectrum of the clock in response to a 10-


Gb/s data sequence of length . The effect of the noise shaping of
the loop can be observed in this spectrum. The phase noise at a 1-MHz
offset is approximately equal to -106 dBc/Hz. Figure 5.13(b) depicts
the recovered clock in the time domain. The time-domain measure-
ments using an oscilloscope overestimate the jitter, requiring specialized
equipment, e.g., the Anritsu MP1777 jitter analyzer. The jitter perfor-
mance of the CDR circuit is characterized by this analyzer. A random
sequence of length produces 14.5 ps of peak-to-peak and 1 ps
of rms jitter on the clock signal. These values are respectively reduced
A 10-Gb/s Linear Half-rate CMOS CDR Circuit 91

to 4.4 ps and 0.6 ps for a random sequence of length . SONET


OC-192 specifies 10 ps as the maximum peak-to-peak jitter on the clock.
Therefore, the measured results are relatively close to the specifications.

The measured jitter transfer characteristic of the CDR is shown in


Fig. 5.14. The jitter peaking is 1.48 dB and the 3-dB bandwidth is 15
MHz. The loop bandwidth can be reduced to the SONET specifications,
but the jitter analyzer must then generate large jitter and drives the
loop out of lock. The loop bandwidth can be reduced to the SONET
specifications if a means of frequency detection is added to the loop
(Chapter 6). The circuit is then much less susceptible to loss of lock due
to the jitter generated by the analyzer.
Figure 5.15 depicts the retimed data. The demultiplexed data outputs
are shown in Fig. 5.15(a). The difference between the waveforms results
from systematic differences between the bond wires and traces on the test
board. Figure 5.15(b) depicts the full-rate output. Using this output, the
BER of the system can be measured. With a random sequence of ,
92 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

BER is less than However, a random sequence of results


in a BER of This BER can be reduced if the bandwidth of
the output buffer driving the 10-Gb/s data is increased. Furthermore,
if the value of the linear resistors is adjusted to their nominal value,
the increased operating speed of the back-end multiplexer results in an
improved BER (Chapter 6).
The CDR circuit exhibits a capture range of 6 MHz and a tracking
range of 177 MHz. The total power consumed by the circuit excluding
the output buffers is 72 mW from a 2.5-V supply. The VCO, the PD,
and the clock and data buffers consume 20.7 mW, 33.2 mW and 18.1
mW, respectively.

4. Conclusion
CMOS technology holds great promise for optical communication cir-
cuits. The raw speed resulting from aggressive scaling along with high
levels of integration provide a high performance at low cost. A 10-Gb/s
clock and data recovery circuit designed in CMOS technology
performs phase locking, data regeneration, and demultiplexing with 1 ps
of rms jitter.
A 10-Gb/s Linear Half-rate CMOS CDR Circuit 93
Chapter 6

A 10-GB / S CMOS CDR CIRCUIT WITH


WIDE CAPTURE RANGE

This chapter describes the design and experimental results of a 10-


Gb/s phase-locked CDR circuit incorporating a multiphase LC oscillator
and a half-rate phase/frequency detector with automatic data retiming.
Fabricated in CMOS technology over an area of
the circuit exhibits a capture range of 1.43 GHz, an rms jitter of 0.8 ps,
and a peak-to-peak jitter of 9.9 ps with a PRBS of length
The power dissipation is 91 mW from a 1.8-V supply. This circuit is
the first 10-Gb/s CMOS CDR circuit to meet the specifications for jit-
ter generation defined by SONET OC-192. The high integrability and
low power dissipation of this CDR circuit demonstrates the capability
of using a full CMOS process for implementation of high-performance
SONET transceivers operating at 10 Gb/s.

1. Introduction
The majority of the CDR circuits employ ring and LC oscillators to
generate a clock signal. Ring oscillators have been dominantly used to
implement systems, operating at lower speeds such as OC-3 and OC-12.
They provide a wide tuning range and differential control that makes the
circuit less susceptible to supply and substrate noise. Furthermore, they
benefit from a compact layout, easing the routing of high-speed signals,
and yielding a smaller area. The output jitter of these oscillators is small
enough to meet the OC-3 and OC-12 standard specifications.
As the data rate increases, the ring topology becomes an unattractive
candidate for the oscillator implemented in a CDR circuit. The most
important disadvantage is its limited signal integrity. Generation of a
robust clock signal at a high frequency using a ring oscillator is diffi-
cult. The maximum oscillation frequency achieved from a ring oscillator
96 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

depends on the number of stages and the minimum amount of delay


achieved from each stage. As the number of stages is reduced, the phase
shift introduced by each stage increases. However, achieving a significant
phase shift from a single stage in presence of process and temperature
variations is difficult. Therefore, to achieve reliable operation, a mini-
mum number of three stages should be used in a ring oscillator. A higher
number of stages limits the maximum frequency that can be achieved
from a ring oscillator.
The oscillation frequency of the ring oscillators by itself varies signif-
icantly over the process. It heavily depends on the RC delay of each
stage. A large portion of the output capacitance of each stage comes
from the interconnect capacitances that are hard to extract because of
their versatile configurations. Also, the sheet resistance of the resistors
significantly varies from one wafer to another.
The LC oscillators in general provide a more precise center frequency
because the value of the inductor can be estimated with a high precision
and the parasitic capacitances mostly contribute to a small percentage of
the output capacitance of each stage. This capacitance is mostly domi-
nated by the diode or the varactor. Furthermore, since the inductor and
the capacitor values determine the oscillation frequency, their product
can be reduced to achieve reliable operation at high frequencies. How-
ever, the drawback of the LC oscillators is a narrower tuning range and
single-ended control.
Meeting the specifications for jitter generation defined by SONET re-
quires a VCO that inherently has a small phase noise. Since the loop
bandwidth of the CDR circuit is small, the jitter produced by the os-
cillator accumulates in the loop. Therefore, the inherent jitter of the
oscillator should be as low as possible, requiring the implementation of
the CDR circuit using an LC oscillator with low phase noise.
The circuit introduced in this chapter uses an oscillator that benefits
from the quality of LC tanks to achieve an improved jitter performance.
Furthermore, the oscillator is formed by placing a number of stages in
a loop, providing multiple phases over the tuning range, without the
oscillation frequency being a strong function of the number of stages in
the loop.
The SONET standard recommends operation at an exact data rate.
Therefore the oscillators should be guaranteed to generate a clock signal
at an exact frequency. The tuning range should be wide enough to cover
the frequency variations over process and temperature.
On the other hand, the CDR circuits lacking a means of frequency
acquisition, provide a very narrow capture range. This range is primarily
determined by two factors: loop bandwidth and phase detector topology,
A 10-Gb/s CMOS CDR Circuit with Wide Capture Range 97

limiting the circuit’s capture range to a few megahertz. Therefore, the


CDR circuit cannot acquire lock if the VCO starts at a frequency that
is significantly different from the data rate.
This limitation calls for an aided acquisition mechanism. Various fre-
quency detection schemes have been introduced that operate with or
without a reference signals. In this chapter, a new frequency acquisi-
tion scheme using a half-rate clock is described. This circuit benefits
from full compatibility with the operation of the phase detector, elimi-
nating the need for a lock detection scheme. The frequency detector is
automatically tri-stated when the circuit gets close to the phase lock.
The next section of the chapter presents the CDR architecture and
design issues. Following that, the implementation of the building blocks,
loop characterization, and the experimental results are described.

2. Architecture
Because of the marginal performance of the technology used
in this work, similar to the circuit described in chapter 5, the clock
frequency is chosen to be half of the data rate. However, the previous
circuit suffers from a limited capture range because it lacks a means of
frequency detection.
Various techniques for performing frequency detection without a refer-
ence clock have been introduced. But such techniques rely on a full-rate
clock to obtain the frequency error signal.
In this work, a new approach to performing phase and frequency de-
tection using a half-rate clock is described. This technique both achieves
a high speed and automatically retimes the data.
Shown in Fig. 6.1, the CDR consists of a phase and frequency detector
(PFD), a voltage-controlled oscillator (VCO), a charge pump, and a low-
pass filter (LPF). The PFD compares the phase and the frequency of the
input data to that of a half-rate clock, providing two binary error signals
for phase and frequency. These error signals are fed back to the VCO
through the charge pump and the low-pass filter. After phase lock is
achieved, the phase of the output clock is within a small offset from
the phase of the input data. This guarantees that the clock frequency is
equal to one half of the input bit rate. The PFD is designed such that, in
addition to providing information about the phase error, it retimes the
data as well. Consequently, the CDR exhibits no systematic offset, i.e.,
inherent skews between clock and data edges due to their nonidentical
paths through the loop do not degrade the quality of detection. The
VCO provides multiple phases over the full tuning range.
98 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

In order to minimize sensitivity to common-mode noise, the CDR


circuit incorporates fully differential topologies for all of the building
blocks.

3. Building Blocks
3.1. VCO
Shown in Fig. 6.2(a), the VCO consists of a four-stage differential ring
oscillator with LC-tuned loads, providing a tuning range wide enough to
encompass process and temperature variations. The number of stages is
chosen such that multiple clock phases with 45° of spacing required in
the PFD can be generated. This loop must have a negative feedback at
low frequencies in order to provide multiple phases; otherwise, the four
signals will be in phase.
Figure 6.2(b) shows the implementation of each stage. The loads are
formed using spiral inductors and MOS varactors. In order to determine
the frequency of oscillation, we recognize that each stage in the ring
must provide 45° of phase shift for oscillation to occur. As shown in
Fig. 6.2(c), the load can be modeled by a parallel LC tank along with a
parallel resistor The major contributor to this resistive loss in the
tank is the limited Q of the inductor. Therefore, can be approximated
as Setting the phase shift of the parallel tank to 45°, we arrive at
the following equation.
A 10-Gb/s CMOS CDR Circuit with Wide Capture Range 99

The oscillation frequency can therefore be written as:

Equation (6.2) suggests that the tuning characteristic of LC-tuned ring


oscillators is identical to cross-coupled LC oscillators if the inductor has
a relatively constant Q across the tuning range. Also, as the number
of stages, n, increases, the oscillation frequency becomes less dependent
on n, approaching This is in contrast to ring oscillators using
resistive loads.
The dominant portion of the tank’s parallel capacitance is contributed
by the MOS varactor. The varactor capacitance varies by a factor of 2
across the tuning range.
As shown in Fig. 6.2(b), resistor shifts the common-mode level
of down so that the varactor gate-source voltage can assume both
positive and negative values, providing a large tuning range. Each stage
has a tail current source of 4 mA. The bias current is chosen to provide
large voltage swings at the output to drive the following circuit with
smaller phase noise.
100 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

Simulation results indicate that the maximum self-resonance frequency


for the inductors is achieved by forming them as the stack of two metal
layers, M3 and M6 [50], despite the fact that the inductors required for
oscillation at 5 GHz are usually small enough to be formed using a single
layer of metal over a relatively small area.
In order to arrive at the exact oscillation frequency in simulations, a
distributed model is used for the inductor. Shown in Fig. 6.3, a lumped
inductor is replaced by a chain of smaller inductors in series with re-
sistors, modeling the loss of the tank. The layer-to-layer capacitance
and the layer-to-substrate capacitance is distributed across
the nodes on the resistive/inductive chain. In this model, and
equal 1/8 of the inductor value and its series resistor. and equal
1/5 of and respectively.

As shown in [50], the model of Fig. 6.3 can be used to predict the self-
resonance frequency of the tank. Theoretically, it can be shown that the
effective capacitance of the inductor equals In reality,
this value is closer to
The VCO occupies a large chip area as a result
of having eight spiral inductors. Therefore, the metal lines carrying the
multi-phase clock signals are very long. These interconnects are laid out
using wide traces of the top metal layer in order to reduce the resis-
tance of the wire. This results in a large routing capacitance, since the
fringe capacitance of the top metal layer in a CMOS technology
is several times higher than its bottom-plate capacitance. If the buffers
following the VCO are placed before the interconnects, the parasitics will
introduce a large time constant at the buffer outputs, drastically reduc-
ing the voltage swing of the high-speed signal. To alleviate this problem,
the buffers are placed after the interconnects so that the parasitics can
A 10-Gb/s CMOS CDR Circuit with Wide Capture Range 101

be tuned out by the inductors in the VCO. The parasitic capacitances


are precisely calculated so that the resulting oscillation frequency does
not fall out of the range of interest.
One remedy for reduction of the differential capacitance between the
adjacent lines is by routing the signals such that the two adjacent lines
carry signals that are close in phase. Figure 6.4(a) depicts the signal
arrangement that minimizes the differential capacitance. The signals
carried over two adjacent lines are only 45° apart in the phase domain.
Figure 6.4(b) depicts an arrangement in which differential signals are
placed close to each other. This orientation maximizes the capacitance
because the two signals sustain a maximum phase difference of 180°.

Although the orientation of Fig. 6.4(a) minimizes the coupling ca-


pacitance between the lines, it results in unequal lengths for the traces
carrying the signals. At 5 GHz, only a few picoseconds of skew can
significantly degrade the performance of the circuit. Therefore, the ori-
entation of Fig. 6.4(b) was adopted for routing the signals. However, in
order to minimize the capacitance, the traces are placed far apart from
each other. The first-order parasitic capacitance models indicate that
the value of the coupling capacitance stays constant until the spacing
reaches a certain limit. There after, the capacitance decreases linearly
102 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

as the spacing increases. The spacing of was chosen such that


the parasitic capacitances contribute to less than 20% of the total ca-
pacitance seen at the output of each stage. Special attention was paid
to equalizing the length of the traces carrying the high-frequency clock
signal to the phase and frequency detectors.
The metal trace connecting the control line of the VCO to the pad
is shielded by two metal layers that are connected to the VCO supply.
Therefore, the supply noise is capacitively coupled to the VCO control,
and the modulation of the varactor capacitance due to supply noise is
less pronounced.
The output clock signal can be taken from either of the clock phases.
Also, the sum of the four phases can be routed to the output. The
second solution provides a larger swing at the input of the clock buffer.
However, simulations indicate that at such high frequencies, the resulting
sum is significantly distorted. For this reason, only one of the phases is
passed to the output and additional dummy circuits are introduced, so
that the capacitive loading on all four phases is equal.
Another approach for generation of half-quadrature phases is by using
a quadrature oscillator, generating the 0° and 90° phases, and interpolat-
ing between these phases to generate the half-quadrature phases. How-
ever, this approach is susceptible to introduction of mismatch between
the phases. The combination of the quadrature VCO and the interpola-
tors consumes more power for the same performance, compared to the
VCO described here.

3.2. Phase and Frequency Detector


The PFD described in this work consists of two phase detectors (PD)
and a modified double-edge-triggered flipflop (DETFF). The PD is de-
rived from the data transition tracking loop (DTTL) described in [52]
and [11]. In this PD, in-phase and quadrature phases of a half-rate clock
signal sample the data in two double-edge-triggered flipflops. As shown
in Fig. 6.5, four distinct possibilities can be identified for the cases when
the clock is early or late, whether the data edge is positive or negative.
For a positive edge, if the clock is early, the quadrature sample is neg-
ative and if the clock is late, the quadrature sample is positive. When
the data edge is negative, the polarity of the quadrature samples is re-
versed. If either of the in-phase or quadrature samples is used to form
the phase-error signal, the reversed polarity of the samples of positive
and negative data edges provides inconsistent phase-error information.
Since the phase error information is only present when a data tran-
sition occurs, the following set of rules can be proposed to obtain the
desired phase error signal.
A 10-Gb/s CMOS CDR Circuit with Wide Capture Range 103

If the data makes a low-to-high transition, the quadrature sample


goes to the phase-detector output.

If the data makes a high-to-low transition, the inverse of the quadra-


ture sample goes to the phase-detector output.

Figure 6.6 shows the implementation of the PD according to these


rules. Two latches operating on opposite clock phases and a multiplexer
form a DETFF that samples the data using both the positive and neg-
ative transitions of a half-rate clock. The two signals and are
therefore the in-phase and quadrature samples of data, respectively. A
modified DETFF is used to implement the above rules. The output of
the latch operating on the rising edge of the trigger signal goes to the
multiplexer with no inversion (the first rule), whereas the output of the
latch operating on the falling edge of the trigger signal is inverted before
going to the multiplexer (the second rule).
We therefore use the in-phase sample to clock the quadrature
sample The output of the modified DETFF is the phase error
signal.
This phase detector can operate at a high speed because it uses a half-
rate clock. Since in the locked condition, the rising and falling edges of
the quadrature clock coincide with data transitions, the in-phase clock
transitions sample the data at its optimum point with no systematic
offset, generating a full-rate output stream. Also, since the phase-error
signal is revalidated only at data transitions, the ripple in the phase
error signal is suppressed. The PD is independent of the data transition
density, resulting in substantial reduction of pattern-dependent jitter.
104 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

A difference between the data rate and twice the clock frequency will
result in a beat frequency, formed as an alternating low-speed signal
at the output of the PD. The average period of this signal represents
the difference of the bit rate and twice the clock frequency. This signal,
however, is not sufficient to determine the polarity of the frequency error.
We therefore add a second PD to the circuit, whose structure is iden-
tical to the first PD. The only difference is that the in-phase and the
quadrature clock signals applied to this block lead their counterparts in
the other phase detector by 45°.
Figure 6.7 depicts the output of both PDs for two cases when clock
frequency is less or greater than half the data rate. From these wave-
forms, the following observations can be made:
If clock is slow, lags Therefore, if is sampled by the
rising and falling edges of the results are negative and positive,
respectively.
If clock is fast, leads Therefore, if is sampled by
the rising and falling edges of the results are the inverse of the
previous case.
We conclude that the modified DETFF can be used to extract the
frequency error signal from and
Figure 6.8 shows the PFD structure, where and
lead by 45°, 90°, and 135°, respectively. The voltages and
are used as the phase error and frequency error signals.
A 10-Gb/s CMOS CDR Circuit with Wide Capture Range 105
106 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

As shown in Fig. 6.7, if the PFD is designed such that has a


unipolar output, the difference between and will have positive
and negative unipolar pulses for slow and fast clock signals. The polarity
of these pulses determines the sign of frequency error.
A PFD that generates two signals for phase and frequency error must
be designed such that its frequency error signal falls to zero in phase
lock. As described in [34], the modified DETFF used to produce the
frequency error signal generates unipolar tri-state pulses at the output.
Figure 6.9 depicts how the multiplexer used in the DETFF is modified
for this purpose.

3.3. Charge Pump


Figure 6.10 shows the implementation of the charge pump. Since the
circuit drives the single-ended control of the varactors, it is designed to
provide a single-ended output. In phase lock, the differential frequency
error signal falls to zero. Therefore, is equally split between and
having negligible effect on In order to reduce the ripple at the
output, the charge-pump current is relatively small.
Simulation results indicate that the capture range of the circuit is
limited because the output of the charge pump cannot go from rail to
rail. The current sources, and and the transistor impose a
voltage drop. This drop can be minimized by proper choice of device
dimensions.
A 10-Gb/s CMOS CDR Circuit with Wide Capture Range 107

3.4. Output Buffers


The output buffer delivers the high-speed clock and data signals to
the output termination. As shown in Fig. 6.11, to achieve a wide

bandwidth the buffer stages employ inductive peaking [43]. The value of
the inductors is chosen so as to avoid peaking in the passband. Since the
quality factor of the inductors is not critical here, the spiral structures
have a line width of only to achieve a high self-resonance frequency.
The value of the inductors ranges from 1.5 nH to 3.5 nH.
108 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

4. Loop Characterization
Figure 6.12(a) depicts a linear small-signal model of the CDR circuit,
in the vicinity of the lock point. In chapter 5, the 3-dB bandwidth of

the transfer function from the input of the phase detector to the VCO
output and the value of jitter peaking in this system was calculated.
The assumption is that the loop filter only consists of a series resistor
and capacitor (Fig. 6.12(b)). This simple model can also be used for
determination of the 3-dB bandwidth of the closed-loop VCO’s phase
noise characteristic Shown in Fig. 6.12(c), this bandwidth can be
approximated if the loop is heavily overdamped and the jitter transfer
function has no peaking.
Thermal noise enters the system from the input of the phase detector
and the control of the VCO. If the transfer function from these inputs
to the output is represented as and respectively, then the
power spectral density of the output noise can be given as:

where is the power spectral density of the input thermal noise.


The 3-dB bandwidth is the frequency at which
equals 0.5. Therefore
A 10-Gb/s CMOS CDR Circuit with Wide Capture Range 109

In this equation:

In a heavily overdamped system The mathematical term for


can be substantially simplified using this approximation:

The assumption of will require that where


represents the 3-dB bandwidth from the input of the phase detector to
the oscillator output.

5. Experimental Results
The CDR circuit has been fabricated in a CMOS process.
Figure 6.13 shows a photograph of the chip, which occupies an area of
110 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

ESD protection diodes are included for all pads except


the high-speed ones. The circuit is tested in a chip-on-board assembly,
running from a 1.8-V supply.
Figure 6.14(a) depicts the VCO tuning characteristic. It achieves a
tuning range of 1.2 GHz The VCO achieves the highest signal

purity at the lower bound of its tuning range (-102.35 dBc/Hz at 1-MHz
offset). The open-loop VCO phase noise at 5 GHz is -86 dBc/Hz. The
tuning characteristic of the VCO varies by 1% over process.
Figure 6.15(a) shows the spectrum of the clock in response to a
9.95328-Gb/s data sequence of length The phase noise at 1-MHz
offset is approximately equal to -107 dBc/Hz. Figure 6.15(b) depicts the
recovered clock in the time domain. The jitter performance of the CDR
circuit is characterized by the Anritsu MP1777 jitter analyzer. A ran-
A 10-Gb/s CMOS CDR Circuit with Wide Capture Range 111

dom sequence of length produces 9.9 ps of peak-to-peak and 0.8 ps


of rms jitter on the clock signal. These values are respectively reduced
to 2.4 ps and 0.4 ps for a random sequence of length SONET
OC-192 specifies 10 ps as the maximum peak-to-peak jitter on the clock.
Therefore, the measured results are within the standard specifications.
The measured jitter transfer characteristic of the CDR is shown in
Fig. 6.16. The jitter peaking is 0.04 dB and the 3-dB bandwidth is 5.2
MHz.
In order to measure the jitter tolerance, a random sequence of 10 Gb/s
was applied to the circuit. The BER was for a PRBS of and
the circuit did not pass the tolerance requirements defined by SONET.
To identify the source of the error, the data rate was reduced to 5 Gb/s.
The circuit still sustained lock while the VCO was oscillating at 5 GHz.
The BER was smaller than and the circuit passed the SONET
mask (Fig. 6.17). The jitter tolerance of the circuit can be limited by
either the input buffer, the CDR circuit, or the output buffer. Since
112 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

the jitter on the clock signal is not limiting the circuit’s performance,
as verified by the jitter tolerance experiment at a lower rate, and the
input does not impose a severe limitation on the bandwidth, the output
buffer probably results in the high BER. Improvement of the output
A 10-Gb/s CMOS CDR Circuit with Wide Capture Range 113

buffer bandwidth can result in improved performance, required to meet


the SONET specification.
Figure 6.18 depicts the full-rate retimed data.

Despite the small loop bandwidth, the frequency detector provides a


capture range of 1.43 GHz, obviating the need for external references.
The total power consumed by the circuit excluding the output buffers
is 91 mW from a 1.8-V supply. The VCO, the PFD, and the clock and
data buffers consume 30.6 mW, 42.2 mW and 18.2 mW, respectively.

6. Conclusion
A 10-Gb/s clock and data recovery circuit designed in CMOS
technology performs frequency acquisition, phase locking, and data re-
generation. Achieving an rms jitter of 0.8 ps, this circuit is the first
CMOS CDR circuit to meet the jitter generation requirements defined
by SONET. The power consumption of this circuit is much smaller than
the power consumption of similar circuits fabricated in bipolar or GaAs
processes.
Chapter 7

CONCLUSION

The number of the Internet nodes doubles approximately every 100


days, leading to an average bit rate of a few terabits per second on the
backbone. The bandwidth requirements are growing with an extremely
fast pace. Applications such as online virtual reality will require data
rates that are 10,000 times higher than currently available ones [53].
With fiber optics being the only communication medium capable of han-
dling such high data rates, this trend has suddenly created a widespread
demand for high-speed optical and electronic devices, circuits, and sys-
tems.
The new optical revolution has gradually replaced modular, general-
purpose building blocks by end-to-end solutions that benefit from device,
circuit, and architecture codesign. Greater levels of integration on a
single chip enable higher performance and lower cost. Mainstream VLSI
technologies such as CMOS continue to take over the territories thus far
claimed by GaAs and InP devices.
In the past two decades, CMOS technology has rapidly penetrated
the analog integrated circuit design arena, providing low-cost, high-
performance solutions and rising to dominate the market. More than
90% of the analog and mixed-signal products in today’s semiconductor
industry are designed and fabricated in pure CMOS technologies.
Exploitation of the CMOS process for fabrication of the electronic
interface in the optical system allows for integration of high-speed front-
end circuits and low-speed framers and mappers on the same chip. This
integration can reduce the package count, board size, and cost of the
system.
The two widely accepted commercial systems, namely SONET OC-
48 and OC-192, operate at 2.5 and 10 Gb/s respectively. The 2.5-Gb/s
116 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

CMOS transceiver has already been introduced by a few companies and


an extensive amount of research has been performed to improve the
design of these systems. However, implementation of the 10-Gb/s CMOS
transceivers lags the 2.5 Gb/s receivers by a few years because these
systems have only become realizable in relatively advanced technologies
such as the process that has become available in the last two
years.
In this book, the design of the world’s first and second 10-Gb/s CDR
circuits has been described. Targeting the performance requirements
of the SONET OC-192 standard, the second circuit satisfies the jitter
generation specification. The jitter tolerance specification can be satis-
fied by further improvement of the output data buffer bandwidth and
reduction of the circuit’s BER.
The future research in the field of optical transceivers can be diversi-
fied into two major categories. On the one hand, circuit designers can
look into implementation of systems operating at higher rates. The next
optical standard (SONET OC-768) introduces a data rate that is very
close to 40 Gb/s. Device scaling of the CMOS process will soon allow
operation at such speeds. However, the integrity and purity of the sig-
nal are issues that are becoming extremely critical at such high speeds.
Only a few picoseconds of timing jitter can have a detrimental effect on
the performance of the system. Furthermore, connection of the circuit
core to the output environment at these speeds, in the presence of
parasitics is extremely difficult. New broadband circuit techniques need
to be developed to address these issues and many other challenges that
arise in implementation of 40-Gb/s systems.
The research in this field can be directed to a different direction. In-
tegration of receivers, transmitters, and perhaps digital circuits on the
same chip is indeed a very tough challenge. If the receiver and the trans-
mitter incorporate separate VCOs running at close frequencies, special
attention needs to be paid to avoid signal coupling and reduce clock
jitter. The digital circuits can heavily pollute the signal environment,
introducing noise through substrate and supplies. In such an environ-
ment, meeting the jitter generation and transfer requirements is critical.
The noise usually manifests itself as peaking in the jitter transfer char-
acteristic.
The goal of the research described here was not only to demonstrate
the capability of the CMOS process for fabrication of broadband circuits
such as 10-Gb/s CDR circuits, but also to provide architectural and cir-
cuit techniques that can be used in any commercial system incorporating
clock and data recovery. The focus of this work was implementing CDR
circuits using the half-rate concept. Utilization of this technique will
Conclusion 117

allow designers to increase the maximum operating rate of the CDR cir-
cuit by 60 to 80 percent in any given technology. This feature becomes
attractive since the speed capability of the fabrication processes always
lags the demand for higher bit rates.
Different types of VCOs and phase detectors were used in the CDR
circuits to demonstrate their performance at speeds as high as 10 Gb/s.
LC oscillators benefit from larger swings, lower phase noise, and higher
accuracy in prediction of the resonance frequency. Their drawbacks
are their relatively large area and narrow tuning range. However, as
their frequency of operation increases the size of the integrated spiral
inductors needed to form the LC tank reduces. At the same time, the
relatively precise oscillation frequency of the LC oscillators relaxes the
requirements for a wide tuning range. The design and optimization of
LC oscillators has extensively advanced due to the research performed in
UCLA and many other institutions in the last few years. These circuit
can be found in most of the high-speed optical transceivers in very near
future.
Both the linear and binary phase detectors are attractive for imple-
mentation in a CDR circuit. The performance of the linear phase de-
tectors can be more easily modeled and predicted. They benefit from a
lower charge pump activity at the lock point because unlike the binary
phase detectors, the output of the linear phase detectors goes to zero
in phase lock. On the other hand, the binary phase detectors are less
susceptible to peaking in their jitter transfer characteristic because they
have a single-pole-like jitter transfer characteristic [51].
One major advantage of binary phase detectors to linear phase de-
tectors is that they can be expanded to perform referenceless frequency
detection on top of phase detection. This is because binary topologies
are capable of providing a strong beat frequency in presence of clock and
data frequency mismatch. Having this issue in mind, a binary system is
more suitable where unaided frequency acquisition must be performed.
However, systems that rely on an external reference signal for frequency
acquisition can incorporate a linear phase detector. The two implemen-
tations described in this book illustrate this concept.
The CDR circuit remains to be the most critical block of the optical
receiver. In the years to come, we will see new techniques targeting im-
proved performance, higher data rates, higher integration, lower power
consumption, and lower cost.
References

[1] webopedia.internet.com/networks/networking_standards/SONET.html

[2] SONET OC-192 Transport System Generic Criteria, GR-1377-CORE, Issue 5,


Dec. 1998.
[3] Cypress Hotlink, User’s Guide, Cypress Semiconductor, April 1999.

[4] A. X. Widmer, P. A. Franaszek, “A DC-Balanced, Partitioned-Block, 8B/10B


Transmission Code,” IBM Journal of Research and Development, vol. 27, pp.
440-451, Sept. 1983.
[5] R. Walker, B. Amrutur, T. Knotts, “64B/66B Coding Update,” IEEE 802.3ae
Meeting, Albuquerque, March 2000.
[6] H. Kim, J. Bauman, “A 12 GHz 30 dB Modular BiCMOS Limiting Amplifier
for 10 Gb SONET Receiver,” ISSCC Digest of Technical Papers, vol. 43, pp.
160-161 , Feb. 2000.
[7] M. Neuhauser, H.-M. Rein, H. Wrenz, “Low-Noise, High-Gain Si Bipolar Pream-
plifiers for 10 Gb/s Optical Fiber Links - Design and Realization,” IEEE Journal
of Solid-State Circuits, vol. 31, pp. 24-29, January 1996.
[8] E. M. Cherry, D. E. Hooper, “The Design of Wideband Transistor Feedback
Amplifiers,” Proc. IEE, vol. 110, pp. 375-389, February 1963.
[9] H.-M. Rein, M. Moller, “Design Considerations for Very High Speed Si Bipolar
ICs Operating up to 50 Gb/s,” IEEE Journal of Solid-State Circuits, vol. 31,
pp. 1076-1090, August 1996.
[10] B. Gilbert, “A New Wideband Amplifier Technique,” IEEE Journal of Solid-
State Circuits, vol. 3, pp. 353-365, December 1968.
[11] A. W. Buchwald, Design of Integrated Fiber-Optic Receivers Using Heterojunc-
tion Bipolar Transistors, Ph.D. Thesis, University of California, Los Angeles,
Jan. 1993.
[12] J. Savoj, B. Razavi, “A 10-Gb/s CMOS Clock and Data Recovery Circuit,”
Digest of Symposium on VLSI Circuits, pp. 136-139, June 2000.
120 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

[13] F. Herzel, B. Razavi, “A Study of Oscillator Jitter due to Supply and Substrate
noise,” IEEE Transactions on Circuits and Systems II: Analog and Digital Signal
Processing, vol. 46, pp. 56-62, Jan. 1999.
[14] B. Razavi, RF Microelectronics, Upper Saddle River, NJ: Prentice Hall, 1998.

[15] B. Razavi, Design of Analog CMOS Integrated Circuits, New York, NY: McGraw
Hill, 2000.
[16] B. Razavi, ed. Monolithic Phase-Locked Loops and Clock Recovery Circuits, Pis-
cataway, NJ: IEEE Press, 1996.
[17] T. C. Weigandt, B. Kim, P. R. Gray, “Analysis of Timing Jitter in CMOS Ring
Oscillators,” Proc. IEEE ISCAS, vol. 4, pp.27-30, June 1994.
[18] W. B. Kuhn, N. K. Yanduru, “Spiral Inductor Substrate Loss Modeling in Sili-
con RF IC’s,” Microwave Journal, pp. 66-81, March 1999.
[19] C. M. Hung, L. Shi, I. Lagnado, K. K. O., “A 25.9 GHz Voltage-Controlled Os-
cillator Fabricated in a CMOS Process,” Digest of Symposium on VLSI Circuits,
pp. 100-101, June 2000.
[20] D. B. Leeson, “A Simple Model of Feedback Oscillator Noise Spectrum,” Proc.
of IEEE, vol. 54, pp. 329-330, 1966.
[21] J. J. Rael, A. A. Abidi, “Physical Processes of Phase Noise in Differential LC
Oscillators,” Proceedings of the Custom Integrated Circuits Conference, pp. 569-
572, May 2000.
[22] T.-P. Liu, “A 6.5 GHz Monolithic CMOS Voltage-Controlled Oscillator,” ISSCC
Digest of Technical Papers, pp. 404-405, Feb. 1999.
[23] A. Rofougaran, J. Rael, M. Rofougaran, A. Abidi, “A 900 MHz CMOS LC-
Oscillator with Quadrature Outputs,” ISSCC Digest of Technical Papers, pp.
392-393, Feb. 1996.
[24] B. Razavi, “A 1.8 GHz CMOS Voltage-Controlled Oscillator,” ISSCC Digest of
Technical Papers, pp. 388-389, Feb. 1997.
[25] C. Lam, B. Razavi, “A 2.6 GHz/5.2 GHz CMOS Voltage-Controlled Oscillator,”
ISSCC Digest of Technical Papers, pp. 402-403, Feb. 1999.
[26] J. J. Kim, B. Kim, “A Low-Phase-Noise CMOS LC Oscillator with a Ring
Structure,” ISSCC Digest of Technical Papers, pp. 430-431, Feb. 2000.
[27] T.-P. Liu, “1.5-V 10-12.5 GHz Integrated CMOS Oscillators,” Digest of Sym-
posium on VLSI Circuits, pp. 55-56, June 1999.
[28] B. Kleveland, C. H. Diaz, D. Dieter, L. Madden, T. H. Lee, S. S. Wong, “Mono-
lithic CMOS Distributed Amplifier and Oscillator,” ISSCC Digest of Technical
Papers, pp. 70-71, Feb. 1999.
[29] H. Wu, A. Hajimiri, “A 10 GHz CMOS Distributed Voltage Controlled Oscil-
lator,” Proceedings of the Custom Integrated Circuits Conference, pp. 581-584,
May 2000.
REFERENCES 121

[30] J. A. McNeill, “Jitter in Ring Oscillators,” IEEE Journal of Solid-State Circuits,


vol. 32, pp. 870-879, June 1997.
[31] C. A. Sharpe, “A 3-State Phase Detector Can Improve Your Next PLL Design,”
EDN, pp. 55-59, Sept. 1976.
[32] C. Hogge, “A Self-Correcting Clock Recovery Circuit,” IEEE Journal of Light-
wave Technology, vol. LT-3, pp.1312-1314, December 1985.

[33] J. D. H. Alexander, “Clock Recovery from Random Binary Data,” Electronics


Letters, vol. 11, pp. 541-542, Oct. 1975.
[34] A. Pottbacker, U. Langmann, H. U. Schreiber, “A Si Bipolar Phase and Fre-
quency Detector IC for Clock Extraction up to 8 Gb/s,” IEEE Journal of Solid-
State Circuits, vol. 27, pp. 1747-1751, December 1992.

[35] S. B. Anand, B. Razavi, “A 2.5-Gb/s Clock Recovery Circuit for NRZ Data
in CMOS Technology,” Proceedings of the Custom Integrated Circuits
Conference, pp. 379-382, May 2000.
[36] J. C. Scheytt, G. Hanke, U. Langmann, “A 0.155, 0.622, and 2.488 Gb/s Auto-
matic Bit Rate Selecting Clock and Data Recovery IC for Bit Rate Transparent
SDH Systems,” ISSCC Digest of Technical Papers, pp. 348-349, Feb. 1999.
[37] G. Gutierrez, S. Kong, B. Coy, “2.488 Gb/s Silicon Bipolar Clock and Data Re-
covery IC for SONET (OC-48),” Proceedings of the Custom Integrated Circuits
Conference, pp. 575-578, May 1998.
[38] C. F. Schaeffer, “The Zero-Beat Method of Frequency Discrimination,” Proceed-
ings IRE, vol. 30, pp. 365-367, August 1942.

[39] F. M. Gardner, “Properties of Frequency Difference Detectors,” IEEE Transca-


tion on Communications, vol. COM-33, pp. 131-138, Feb. 1985.
[40] B. Razavi, J. Sung, “A 2.5-Gb/s 15-mW BiCMOS Clock Recovery Circuit,”
Digest of Symposium on VLSI Circuits, pp. 83-84, June 1995.
[41] L. M. De Vito, “A Versatile Clock Recovery Architecture and Monolithic Imple-
mentation,” Invited Paper, Monolithic Phase-Locked Loops and Clock Recovery
Circuits, Theory and Design, Edited by B. Razavi, IEEE Press, New York 1996.
[42] S. B. Anand, B. Razavi, “A 2.75-Gb/s CMOS Clock Recovery Circuit with
Broad Capture Range,” ISSCC Digest of Technical Papers, pp. 214-215, Feb.
2001.
[43] J. Savoj, B. Razavi, “A CMOS Interface Circuit for Detection of 1.2 Gb/s RZ
Data,” ISSCC Digest of Technical Papers, pp. 278-279, Feb. 1999.
[44] M. Wurzer, et al., “40-Gb/s Integrated Clock and Data Recovery Circuit in a
Silicon Bipolar Technology,” Proceedings of the Bipolar/BiCMOS Circuits and
Technology Meeting, pp. 136-139, Sept. 1998.
[45] M. Rau, et al., “Clock/Data Recovery PLL Using Half-Frequency Clock,” IEEE
Journal of Solid-State Circuits, vol. 32, pp. 1156-1159, July 1997.
122 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

[46] K. Nakamura, et al., “A 6 Gb/s CMOS Phase Detecting DEMUX Module Using
Half-Frequency Clock,” Digest of Symposium on VLSI Circuits, pp. 196-197,
June 1998.
[47] E. Mullner, “A 20 Gbit/s Parallel Phase Detector and Demultiplexer Circuit in
a Production Silicon Bipolar Technology with ” Proceedings of the
Bipolar/BiCMOS Circuits and Technology Meeting, pp. 43-45, Sept. 1996.
[48] B. Razavi, Y. Ota, R. G. Swarz, “Design Techniques for Low-Voltage High-
Speed Digital Bipolar Circuits,” IEEE Journal of Solid-State Circuits, vol. 29,
pp. 332-339, March 1994.
[49] J. Savoj, B. Razavi, “A 10-Gb/s CMOS Clock and Data Recovery Circuit with
Frequency Detection,” ISSCC Digest of Technical Papers, pp. 78-79, Feb. 2001.
[50] A. Zolfaghari, A. Chan, and B. Razavi, “Stacked Inductors and 1-to-2 Trans-
formers in CMOS Technology,” Proceedings of the Customs Integrated Circuits
Conference, pp. 345-348, May 2000.
[51] Y. M. Greshishchev, et al, “A Fully Integrated SiGe Receiver IC for 10-Gb/s
Data Rate,” IEEE Journal of Solid-State Circuits, vol. 35, pp. 1949-1957, Dec.
2000.
[52] T. O. Anderson, W. J. Hurd, and W. C. Lindsey, “U.S. pat. no. 3,626,298;
Transition Tracking Bit Synchronization System,” Dec. 1971.
[53] G. Stix, “The Triumph of the Light,” Scientific American, Jan. 2001.
Index

Acquisition time, 26 Fiber optics


Aided acquisition, 53 dispersion, 4
Amplifier, 67 multi-mode, 3
common-gate, 67 single-mode, 3
common-source, 67 transceiver, 7
low-noise, 67 Filter
wideband, 67 band-pass, 23
BER, 92 low-pass, 24, 97
Buffer, 73 matched, 62
CDR, 11, 22, 77 selectivity, 23
full-rate, 27 Framer, 6
half-rate, 27, 78, 97 Frequency detector, 52, 97
open-loop, 22 Pottbacker, 57
phase-locking, 23 referenced, 53
speed, 24 referenceless, 53, 55
CMOS process, 6, 92, 115 rotational, 57
Capture range, 26, 92 Frequency
Charge pump, 24, 87, 97, 106 division, 9
Cherry-Hooper, 16 self-resonance, 100
Clock-multiplying unit, 8 Gilbert amplifier, 16
DETFF, 103 Hot-carrier effect, 90
Data IC, 5
NRZ, 5, 45 ISI, 19, 21, 24, 61
RZ, 61 Inductive peaking, 17
binary, 45 Inductor, 18
regeneration, 78 Q, 18
spectrum, 45 distributed-model, 100
transition density, 85 integrated, 18
Decision circuit, 58 stacked, 69, 100
Demultiplexer, 71 Injection locking, 9
Deserialize, 5 Integrate-and-dump, 63, 69
Duty cycle, 29 Integrator, 64
ESD, 110 Interface, 61
Edge detection, 5, 23 electronic, 2
Encoding optical system, 4
64B/66B, 5 Interleaving, 64
8B/10B, 5 Internet
FIFO, 8 backbone, 1
Feedback Jitter, 25, 32, 69, 77, 110
shunt-shunt, 14 RMS, 90
124 HIGH-SPEED CMOS CIRCUITS FOR OPTICAL RECEIVERS

analyzer, 90 Pottbacker, 51
bimodal, 29, 80 bang-bang, 50
generation, 25, 96 binary, 45, 48
pattern-dependent, 103 characteristic, 86
peak-to-peak, 90 gain, 86
thermal, 32 linear, 45, 83
tolerance, 26 pattern dependency, 85
transfer, 25, 88, 91 triwave, 47
Laser diode, 11 Phase/frequency detector, 43, 97, 102
Latch, 84 Photo detector, 11
current-steering, 7 Power, 26
Limiter, 16 Quadricorrelator, 55
Logic SNR, 21, 63
current-steering, 87 SONET, 2
Loss Serialize, 5
Displacement, 32 Silicon bipolar, 6
Eddy, 32 Supply
substrate, 33 scaling, 27
Mapper, 6 TIA, 11, 13
Multiplexer, 7, 85 common-gate, 13
Noise, 22 noise, 14
input-referred, 14, 68 Transceivers, 2
switching, 36 VCO, 8, 24, 29, 77, 80, 97–98
thermal, 35, 108 LC, 32, 95
Optical coarse control, 82
amplifier, 4 distributed, 41
carrier, 2 fine control, 82
communications, 2 gain, 82
point-to-point network, 3 half-quadrature, 102
ring network, 2 multi-phase, 36, 98
PLL, 5, 24, 53 phase noise, 34, 110
bandwidth, 25 quadrature, 36
characterization, 108 ring, 30, 80, 95
dual-loop, 54 sensitivity, 82
filter, 88 tuning, 32, 110
jitter, 41 Varactor, 34
Phase detector, 42, 83 Wave-division multiplexing, 2, 7
Alexander, 49 XOR, 23, 43, 85
Hogge, 46 symmetric, 85

You might also like