You are on page 1of 6

TH E ANALOG M IN D

Behzad Razavi

The Design of an Equalizer—Part Two

A
As explained in “The Design of an
Equalizer—Part One” [1], we wish to
develop an equalizer meeting the fol-
lowing performance:
■■ data format: nonreturn-to-zero
This time-domain characterization
reveals that each bit generates a tail
that interferes with the next bit(s).
We, thus, seek a method of canceling
this effect, recognizing that
a scaling factor of h 1 and is called
the “first tap” or “first coefficient.”
We expect that the tail cancellation
accords the “summing junction”
signal D sum, a greater
■■ data rate = 56 Gb/s a fixed channel exhibits a eye opening than that
■■ channel loss at 28 GHz = 20 dB certain ratio between the In this simplest at the DEF input.
■■ input differential swing  = 800 mVpp amplitudes of, for example, form, a DFE senses It is instructive to
■■ bit error rate (BER) 1 10
-12
the first postcursor and the data, makes examine the DFE wave-
a binary decision,
■■ power consumption = 10 mW the main cursor. Denot- forms in a favorable
delays the result
■■ V DD = 1V. ing this ratio by h 1, we condition, i.e., with little
by a 1-b period Tb ,
The simulations are carried out articulate the necessary and feeds a scaled channel loss. As illus-
with VDD = 1V - 5% in the slow–slow operation as follows: copy of the result trated in Figure 2(b), the
corner of the process and at T = 75c C. the present bit should back to the input. first bit B 1 is delayed and
We surmised that a continuous- be “memorized,” scaled scaled by a factor of h 1 as
time linear equalizer (CTLE) and a by a factor of h 1, and it appears in DF. Upon sub-
decision-feedback equalizer (DFE) subtracted from the next bit so as to traction from the next bit B 2, this
would prove necessary for this pur- remove the interfering tail. This func- scaled replica increases the voltage
pose. The two-stage CTLE designed in tion is afforded by a DFE. swing to -1 - h 1 . Similarly, B 2 boosts
the first part provides a boost factor of In this simplest form, a DFE the amplitude of B 3 to +1 + h 1 . We
about 13 dB at the Nyquist frequency senses the data, makes a binar y conclude that a 1010 data sequence
fNyq = 28 GHz. The circuit delivers an decision, delays the result by a 1-b experiences greater swings as a
eye height of 220 mV and an eye width period Tb, and feeds a scaled copy of result of the DFE action, a highly
of 13.6 ps while drawing 5 mW. In this the result back to the input. Shown desirable effect, as such a sequence
part, we develop a DFE that achieves a in Figure  2(a) is a possible realiza- displays the most severe eye closure
greater eye opening and, hence, allows tion, where the flip-flop (FF) per- in a lossy medium [1].
robust sampling of the data with a low forms both the decision action (also Interestingly, the DFE introduces
BER. The reader is referred to various referred to as “slicing”) and the its own eye closure in the presence
articles written on the subject [2]–[8]. delay function. The return path has of consecutive ones (or zeros), also

General Considerations
Suppose that we apply an impulse
to a lossy channel (Figure 1). Due Main Cursor
Postcursors
to the medium’s limited bandwidth,
the output exhibits a long tail. The
h1
peak value is called the “main cur-
Channel h2
sor,” that at t = Tb is referred to as
the “first postcursor,” and so on.
0 t 0 Tb 2Tb t

Digital Object Identifier 10.1109/MSSC.2021.3126997


Date of current version: 24 January 2022 FIGURE 1: The impulse response of a channel.

IEEE SOLID-STATE CIRCUITS MAGAZINE W I N T E R 2 0 2 2 7


Authorized licensed use limited to: UCLA Library. Downloaded on February 17,2022 at 10:45:39 UTC from IEEE Xplore. Restrictions apply.
called a “run.” Consider the input number of taps depends on the type and within the FF but at the cost of
waveform shown in Figure 2(c), where of the channel and is typically in the greater complexity and signal-rout-
a zero and two ones occur between range of five to 10 [2]. ing difficulties.
t 1 and t 4 . From the first bit, the DFE The principal challenge in DFE de­­ The overall equalizer BER is deter-
generates a feedback value of -h 1 sign relates to the finite delay around mined by the eye opening at the sum-
and subtracts it from the second the loop. If excessive, this delay simply ming junction, node A in Figure 2(a).
bit, yielding D sum = 1 + h 1 . Next, the causes the circuit to fail. As explained Illustrated in Figure  3 is the wave-
circuit generates D F = +h 1 from B 2 in [7], the following timing constraint form at this node, exhibiting a worst-
and subtracts it from B 3, degrading must be met: case peak of TV. Sensing this voltage,
the swing to 1 - h 1 . This is the price the FF must make correct decisions
TCK - Q + TFB + Tsetup # Tb, (1)
paid for dealing with the eye closure in the presence of two nonideali-
due to a 1010 sequence. where TCK - Q , TFB, and Tsetup denote ties: the total root-mean-square (rms)
The channel impulse response de­­ the FF clock-to-Q delay, the feed- noise Vn,rms and the total dc offset
picted in Figure 1 exhibits higher post- back delay, and the FF setup time, VOS, both referred to this interface.
cursors as well. We can then employ respectively. These quantities can The noise and offset contain contri-
additional FFs and feedback taps to be reduced by means of inductive butions by the CTLE, summer, and
cancel those components. The total peaking at the summing junction FF. For BER . 10 -12, we write

c m
1 Q TV - 4VOS 1 10 - 12, (2)
2 Vn, rms
FF1

First Second where Q ($) is the “error function”


Latch Latch (the integral of a Gaussian distribu-
Dsum DM
L1 L2 Dout tion). Note that 4VOS represents the
Din
+ A 4v variance of the offset. This con-

dition is satisfied if the argument of
CK CK Q ($) exceeds seven, i.e., if

TV $ 7Vn, rms + 4VOS .(3)


DF
h1
(a) The noise and offset voltages must
B1 B2 B3 B1 B2 B3 be calculated for a particular CTLE/
+1 +1 DFE cascade, but we typically tar-
Din Din get a TV of at least 150–200 mV to
t1 t2 t3 t t1 t2 t3 t4 t accommodate the FF’s finite sensi-
–1 –1
tivity as well [6]. We return to this
+h1 +h1 point later.
DF DF
t2 t3 t2 t3 t4 Numerous DFE architectures have
t t
–h1 –h1 been reported [2]–[8], and the pros
1 + h1 1 + h1 and cons of some have been described
+1 +1 in [7]. Most do not relax the timing
Dsum Dsum 1 – h1
t2 t3 t2 t3 t4
constraint expressed by (1). We begin
t t
–1 –1 with the full-rate, “direct” topology of
–1 – h1 –1 – h1
Figure 2(a) for its simplicity and con-
(b) (c)
sider others if this approach does not
provide satisfactory performance.
FIGURE 2: (a) The basic DFE topology as well as its effect with a (b) 1010 sequence and
(c) 11 run.
Channel–CTLE Impulse Response
To compute the relative strengths
of the DFE taps, we must examine
Tb
Dsum the impulse response of the chan-
+∆V nel–CTLE cascade. We apply to the
channel a differential pulse having
t1 t2 t3 t a width of 2 ps (% Tb = 17.9 ps) and
–∆V rise and fall times of 0.1 ps. Plotted
in Figure 4 are the output waveforms
of the channel and CTLE. The for-
FIGURE 3: The typical waveform at the summing junction. mer exhibits its first, second, and

8 W I N T E R 2 0 2 2 IEEE SOLID-STATE CIRCUITS MAGAZINE

Authorized licensed use limited to: UCLA Library. Downloaded on February 17,2022 at 10:45:39 UTC from IEEE Xplore. Restrictions apply.
third postcursors at relative levels of t y p i cally chosen in the range of 400 erate gate-source voltage, e.g., around
60%, 41%, and 30%, respectively. The to 500  mV. With complete switch- 700 mV, so that these transistors
latter indicates that the CTLE reduces ing of M 1 – M 4, these swings are eq­­ operate in or near saturation. Third,
these to 22%, –3%, and –6%, respec- ual to 1mA # R D, suggesting the current mirror as
tively. The slightly underdamped R D . 500 X. We must also well as C 1 and C 2 can
response at the CTLE output origi- ensure that the small-signal To compute the be shared among all of
relative strengths
nates from a few decibels of peaking loop gain around M 3 and the DFE’s latches; the
of the DFE taps,
that appear in the cascade frequency M 4 is greater than unity capacitor values are then
we must examine
response (s h o w n in the first arti- so that the circuit properly the impulse chosen according to the
cle in this series [1]). regenerates when CK falls response of the total gate capacitance that
The voltage swings in Figure 4 char- and CK rises. channel–CTLE they must drive.
acterize the CTLE under small-signal The clock path in cascade. When characterizing
conditions. This is necessary because we Figure 5 merits some re­­ the speed of FFs for a DFE
can predict the step or pulse response marks. First, at 56 GHz, environment, we should
from the impulse response only if the CK and CK are close to sinusoids, espe- examine their input sensitivity Vsen,
system remains linear. Also, the worst- cially if they are provided by resonant defined as the minimum difference
case scenario of a 1010 sequence does buffers (stages with LC tank loads). that guarantees correct decisions
lead to relatively small swings at the Compared to square waves, sinusoidal at the desired clock frequency. Spe-
channel output. Nevertheless, as the clocks exhibit longer transition times, cifically, Vsen must be studied in the
channel and CTLE “recover” from a long thus elongating the FF response. Sec- context of the waveforms delivered
run, e.g., between t 1 and t 2 in Figure 3, ond, if nearly rail-to-rail clock swings by the channel–CTLE cascade, e.g., a
dynamic nonlinearities in the CTLE can are available, then C 1 and C 2 need 1111 run followed by a 0101 pattern
cause departure from our foregoing not be much greater than the input (Figure 3). The swing received by the
results. In other words, some iteration capacitance of M 5 and M 6 . This is FF after the long run is small and of
in the DFE tap values may be necessary because we prefer to maintain a mod- opposite value, requiring that the
if the greatest eye opening is desired.

FF Design
The performance of the DFE shown
1.2 Channel Output
in Figure 2 hinges primarily upon
CTLE Output
the FF design. While the simplicity 1
and efficiency of the StrongARM
0.8
Voltage (mV)

latch make it a desirable candidate


here, we recognize from (1) that 0.6
the FF delay must remain below
0.4
Tb = 17.9 ps, a condition that such
a latch cannot fulfill in 28-nm tech- 0.2
nology. We, therefore, resort to cur-
rent-mode logic (CML) and construct 0
the latch illustrated in Figure 5 [9].
0 50 100 150 200
Here, the circuit senses the input Time (ps)
when M 5 is on and regenerates when
M 6 is on. To save voltage headroom,
FIGURE 4: The impulse response observed at the channel and CTLE outputs.
the bias currents of M 5 and M 6 are
defined by a mirror arrangement
rather than by a tail current source.
VDD
The latch design begins with a W1–4 = 5 µm RD = 500 Ω
power budget, e.g., 1 mW. We bias RD RD W5–7 = 2.5 µm RB = 5 kΩ
M 5 and M 6 at a drain current I D of L = 30 nm C1,2 = 75 fF
X Y
0.5 mA, assuming that I D5 rises to +
Din M1 M2 M3 M4 VDD
about 1 mA when CK is high and M 6 –
is off, and vice versa. To carry a peak Din IR 0.5 mA
c1 c2
tail current of 1 mA, M 1 –M 4 must CK M5 M6 CK RB
be wide enough so as to leave suf- M7
ficient headroom for M 5 and M 6 . We RB
then select (W/L) 1- 4 = 5 nm/30 nm.
The voltage swings at X a n d Y a r e FIGURE 5: The CML latch design.

IEEE SOLID-STATE CIRCUITS MAGAZINE W I N T E R 2 0 2 2 9


Authorized licensed use limited to: UCLA Library. Downloaded on February 17,2022 at 10:45:39 UTC from IEEE Xplore. Restrictions apply.
first latch “recover” from the previ- for proper FF behavior. Interesting­ tail current source a reasonable voltage
ous, large overdrive and still respond ­ly, this value is greater than that headroom. The first tap is scaled down
correctly. Illustrated in Figure 6 is the predicted by (2), implying that the by a factor of four according to our
overdrive recovery, where V +sum and summing junction swing impulse response study.
V -sum denote the summer outputs. If is dictated by the FF over- We also add 5-fF capaci-
the first latch begins to sense V +sum drive recovery rather than Some iteration in tors at A and B as an esti-
and V -sum at t = t 1, then the latch out- noise and offset. the DFE tap values mate of layout parasitics.
may be necessary
put voltages in Figure 5 must cross The FF offset is also A number of points
if the greatest
before the circuit enters the regen- of interest. We write the should be borne in mind
eye opening
eration mode. That is, we must have threshold m i s m a t c h is desired. regarding DFE simula-
t 2 - t 1 1 TCK /2. between M 1 and M 2 in tions. First, a proper phase
We simulate the FF in such a sce- Figure  5 as TVTH1, 2 = relationship must be estab-
nario with V0 = 70 mV, arriving at A VTH / WL [11]. With A VTH . 4 mV $ nm, lished between the summer output
the results plotted in Figure 7(a). The we have TVTH1, 2 . 10 mV. The thresh- and FF clocks. As our initial try, we
latch output voltages barely cross old mismatch between M 3 and M 4 place the c ­rossing points of VA and
before regeneration begins. Thus, in has the same standard deviation, but VB near those of CK and CK such
the presence of offsets due to M 1 –M 2 it is divided by g m1, 2 R D - g m1, 2 /g m3, 4 that, when CK goes high, L 1 senses
or M 3 – M 4, the circuit produces an [6], which is about 2 in our design. VA - VB . Second, the feedback pro-
incorrect output. Figure 7(b) depicts The 4v offset of L 1 is, therefore, vided by M 3 and M 4 must be nega-
the outputs for V0 = 100 mV, reveal- around 45 mV. tive. Third, we monitor the eye at
ing more robust sampling. Thus, The input-referred noise of the FF the summing junction, but we must
in addition to overcoming various can be computed using the method de­­ also ensure that the FF output indeed
offsets and noise contributions at scribed in [10] and is about 1.5 mVrms . matches the original data applied to
the summing junction, the swing To this, we must add the CTLE and the channel. That is, an open eye does
must include at least another 100 mV summer noise. The simulated noise not necessarily imply correct opera-
spectrum shown in Figure 8 yields an tion. Fourth, the simulation must
rms value of 2.6 mV integrated up to run long enough (about 20 ns in our
+
Vsum 100 GHz. It follows that Vn, rms . 3 mV example) so that the channel “settles,”
in (3), suggesting that the differential and the summer output represents
V0
eye height, 2TV, must exceed 132 mV. the steady-state behavior. We then

Vsum construct the eye diagram from the
DFE Design last few nanoseconds.
t We implement the DFE architecture Figure 10 plots the differential eye
VX of Figure 2 as shown in Figure  9. diagrams at the CTLE and DFE sum-
The summer senses the CTLE out- mer outputs. The DFE increases the
put voltage, converts it to current vertical opening by about 250 mV.
VY by means of M 1 and M 2, adds the Next, we apply resistive and
result to the output of M 3 and M 4, capacitive degeneration to the sum-
t1 t2 t and allows the sum to flow through mer input stage (Figure 11). Provid-
the load resistors. For M 1 and M 2, we ing some linear equalization, this
FIGURE 6: The latch overdrive recovery. select W = 10 nm so as to give their method leads to the eye shown in

0.9 +
Din VX VX
0.85 – 0.9 0.9
Din VY VY
0.8 0.8
0.8
Voltage (V)

Voltage (V)

Voltage (V)

0.75
0.7 0.7 0.7
0.65 0.6
0.6
0.6
0.55 0.5
0.5
0.5 0.4
4.8 4.85 4.9 4.95 5 4.8 4.85 4.9 4.95 5 4.8 4.85 4.9 4.95 5
Time (ns) Time (ns) Time (ns)
(a) (b)

FIGURE 7: The (a) input and output waveforms of a CML latch in an overdrive recovery test with V0 = 70 mV and (b) latch output for the case
of V0 = 100 mV.

10 W I N T E R 2 0 2 2 IEEE SOLID-STATE CIRCUITS MAGAZINE

Authorized licensed use limited to: UCLA Library. Downloaded on February 17,2022 at 10:45:39 UTC from IEEE Xplore. Restrictions apply.
Figure 12 and slightly improves the
eye opening.
× 10–15 In the last step of our design,
1.4 we ponder higher-order taps. Our
impulse response analysis has yielded
1.2
Noise Spectrum (V2/Hz)

h 2 = -3% and h 3 = -6%. Neglecting the


1 former, we add two more FFs to the DFE
and apply the third tap with positive
0.8
feedback (Figure 13). The resulting
0.6 eye diagram is shown in Figure 14,
0.4 exhibiting vertical and horizontal
eye openings equal to 500 mV and
0.2 15.2 ps, respectively. Checked against
0 the original data, the first FF output
106 107 108 109 1010 1011 indicates correct operation.
Frequency (Hz) The DFE draws about 4.5 mW, meeting
our overall equalizer power budget of
10 mW. It is possible to add compo-
FIGURE 8: The noise spectrum observed at the DFE summing junction.
nents such as inductive peaking, a
gain stage between the summer and
FF1 [6], or a 0.5-unit-interval feed-
back tap. For this particular equal-
VDD izer design and channel response,
these techniques do not significantly
RD RD improve the eye opening. However, for
A other channels and/or with all of the
L1 L2
B layout parasitics included, these con-
M1 M2 cepts and other DFE architectures can
From
CTLE be considered.
CK CK CK CK
ISS1 1 mA Corrections to Previous Articles
M3 M4 In [12], (9) should read:
W1,2 = 10 µm
W3,4 = 2.5 µm N = C E C B (L2 - M 2 )s 4

L = 30 nm ISS2 250 µA + (2C B L + 2C B M - C E M )s 2 + 1, (4)
RD = 500 Ω and (10) should be revised to

D = C BC E (L2 - M 2)s 4 + 2C BC E R L(L + M )s 3


+ (2C BL + 2C BM +C E L)s 2 + R LC E s +1.
FIGURE 9: The DFE design. (5)

0.5 0.5 VDD


0.4 0.4
0.3 0.3 RD RD
0.2 0.2 To
VCTLE (V)

0.1 0.1
Vsum (V)

FF1
0 0
From M1 M2
–0.1 –0.1
CTLE
–0.2 –0.2 100 Ω
–0.3 –0.3
–0.4 –0.4
–0.5 –0.5 0.5 mA 200 fF 0.5 mA
0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35
Time (ps) Time (ps)
(a) (b)

FIGURE 11: The summer input stage with


FIGURE 10: The output eyes at the (a) CTLE output and (b) DFE summing junction. degeneration.

IEEE SOLID-STATE CIRCUITS MAGAZINE W I N T E R 2 0 2 2 11


Authorized licensed use limited to: UCLA Library. Downloaded on February 17,2022 at 10:45:39 UTC from IEEE Xplore. Restrictions apply.
0.5 A proper phase
0.4
relationship must
be established
0.3
between the
0.2 summer output
0.1 and FF clocks.
Vsum (V)

0
–0.1
–0.2 References
–0.3 [1] B. Razavi, “The design of an equalizer —
Part one,” IEEE Solid State Circuits Mag.,
–0.4 vol. 13, no. 4, pp. 7–160, Fall 2021, doi:
10.1109/MSSC.2021.3111426.
–0.5 [2] J. Im et al., “A 40-to-56 Gb/s PAM-4 receiv-
0 5 10 15 20 25 30 35 er with ten-tap direct decision-feedback
Time (ps) equalization in 16-nm FinFET,” IEEE J. Sol-
id-State Circuits, vol. 52, no. 12, pp. 3486–
3502, Dec. 2017, doi: 10.1109/JSSC.2017.
FIGURE 12: The DFE summer eye diagram in the presence of degeneration. 2749432.
[3] T. Shibasaki et al., “A 56-Gb/s receiver
front-end with a CTLE and 1-tap DFE in
20-nm CMOS,” in Proc. VLSI Circuits Symp.
Dig., Jun. 2014, pp. 1–2, doi: 10.1109/VL-
VDD SIC.2014.6858400.
[4] A. Roshan-Zamir et al., “A 56-Gb/s PAM4
RD RD receiver with low-overhead techniques
for threshold and edge-based DFE FIR-
and IIR-tap adaptation in 65-nm CMOS,”
FF1 FF2 FF3
IEEE J. Solid-State Circuits, vol. 54, no.
3, pp. 672–684, Mar. 2019, doi: 10.1109/
JSSC.2018.2881278.
[5] A. Cevrero et al., “A 100Gb/s 1.1pJ/b
PAM-4 RX with dual-mode 1-tap PAM-
4 / 3-tap NRZ speculative DFE in 14nm
CMOS FinFET,” in Proc. IEEE Int. Solid-
ISS1 250 µA ISS4 60 µA State Circuits Conf. (ISSCC), Feb. 2019,
pp. 112–113, doi: 10.1109/ISSCC.2019.
8662495.
[6] S. Ibrahim and B. Razavi, “Low-power CMOS
equalizer design for 20-Gb/s systems,”
FIGURE 13: The addition of the third tap to the DFE. IEEE J. Solid-State Circuits, vol. 46, no. 6,
pp. 1321–1336, Jun. 2011, doi: 10.1109/
JSSC.2011.2134450.
[7] B. Razavi, “The decision-feedback equal-
izer [A Circuit for All Seasons],” IEEE
0.5 Solid-State Circuits Mag., vol. 9, no. 4, pp.
13–16, Fall 2017, doi: 10.1109/MSSC.2017.
0.4 2745939.
[8] P. Peng, J. Li, L. Chen, and J. Lee, “A 56Gb/s
0.3 PAM-4/NRZ transceiver in 40nm CMOS,”
0.2 in Proc. IEEE Int. Solid-State C ­ ircuits Conf.
(ISSCC), Feb. 2017, pp. 110–111, doi:
0.1 10.1109/ISSCC.2017.7870285.
Vsum (V)

[9] J. Lee and B. Razavi, “A 40-Gb/s clock and


0 data recovery circuit in 0.18um CMOS
–0.1 technology,” IEEE J. Solid-State Circuits,
vol. 38, pp. 2181–2190, Dec. 2003, doi:
–0.2 10.1109/JSSC.2003.818566.
[10] B. Razavi, “The design of a comparator
–0.3 [The Analog Mind],” IEEE Solid-State Cir-
–0.4 cuits Mag., vol. 12, no. 4, pp. 8–14, Fall
2020, doi: 10.1109/MSSC.2020.3021865.
–0.5 [11] M. J. M. Pelgrom, A. C. J. Duinmaijer, and
0 5 10 15 20 25 30 35 A. P. G. Welbers, “Matching properties of
Time (ps) MOS transistors,” IEEE J. Solid-State Cir-
cuits, vol. 24, no. 5, pp. 1433–1440, Oct.
1989, doi: 10.1109/JSSC.1989.572629.
[12] B. Razavi, “The design of broadband I/O
FIGURE 14: The DFE summer eye with the third tap added. circuits [The Analog Mind],” IEEE Solid-
State Circuits Mag., vol. 13, no. 2, pp. 6–12,
Spring 2021, doi: 10.1109/MSSC.2021.
The subsequent results remain all-pass transfer function must 3072299.
[13] B. Razavi, “The design of a low-voltage
unchanged. Richard Schreier has contain poles that mirror or cancel bandgap reference [The Analog Mind],”
pointed out that the T-coil design its zeros. IEEE Solid-State Circuits Mag., vol. 13, no.
3, pp. 6–13, Summer 2021, doi: 10.1109/
equations can also be derived by In [13], the gates of Mc and Md in MSSC.2021.3088963.
decomposing N (s) into t wo qua- Figure 9 should be tied to the drains

dratics and using the fact that an of Ma and Mb, respectively.

12 W I N T E R 2 0 2 2 IEEE SOLID-STATE CIRCUITS MAGAZINE

Authorized licensed use limited to: UCLA Library. Downloaded on February 17,2022 at 10:45:39 UTC from IEEE Xplore. Restrictions apply.

You might also like