Professional Documents
Culture Documents
Abstract— A 5-13.5 Gbps multi-standard I/O link receiver is applications [1], [3]. Considering the system cost, the receiver
presented in this paper. An inductor-free CTLE, whose gain should support the major industrial standards [4], such as USB
and bandwidth are highly adjustable, is achieved by using the 3.2, PCIe 3.0/4.0, DisplayPort 1.4, SATA, and JESD204B.
second-order negative capacitance circuit. A high jitter tolerance
clock and data recovery (HJTOL-CDR) is proposed for Spread Such a multi-standard receiver design has many technical
Spectrum Clock applications. In this work, the JTOL is improved challenges [5], [8], such as various data rates, high jitter corner
by two ways: first, a partial-noise-shaping-based digital loop filter frequency and jitter tolerance (JTOL) amplitude. As shown
(PNS-DLF) is implemented to reduce the output jitter caused by in Fig. 1(a), the standards provide the transfer functions
the truncation error in the integral path; second, the proposed of continuous time linear equalizer (CTLE) for reference,
CDR logic is fully custom designed to operate at a quarter-
rate clock of 5 GHz. Moreover, the CDR bandwidth can be therefore, the proposed multi-standard CTLE should have
tuned to satisfy various data rates and jitter masks. Post-layout wide tuning ranges of gain, besides wide bandwidth and
simulation shows that the proposed CTLE can provide wide gain high boost gain [9], [11]. We have slightly improved the
tuning range, the boost gain at 10 GHz is beyond 29 dB and active feedback topology [21], [37], and negative capacitance
the proposed CDR can tolerate up to 31-kppm frequency offset. circuit (NC) for wide CTLE gain control without using
The proposed receiver is fabricated in 40-nm CMOS with an
active area of 0.06 mm2 and 65 mW power consumption at inductors. Moreover, the peak gain frequency can be changed
20 Gbps from 1.2-V supply. Measurement results show that, when from 3.7 GHz to 11.4 GHz by changing the tail current
receiving the PRBS31 data at 10 Gbps with 5-kppm frequency for optimum loss compensation. Additionally, a high jitter
offset across a channel with 14-dB loss, the JTOL is 0.39 UIpp at tolerance clock and data recovery (HJTOL-CDR) is tailored
10 MHz and can guarantee a 0.34 UIpp at high frequency, which to SSC applications for frequency offset tracking. Increasing
proves that the proposed receiver can meet stringent standards
such as PCIe 3.0/4.0, USB 3.2, DisplayPort 1.4. the CDR loop gain is a way to expand the CDR bandwidth
so as to ameliorate jitter tolerance [12], [13]. However,
Index Terms— Multi-standard receiver, clock and data recov- too large loop gain decreases phase margin, causing jitter
ery, continuous time linear equalizer, jitter tolerance, digital loop
filter, spread spectrum clock. peaking [14], [15]. Another way is to improve the speed of
the CDR logic [16], but it sharply increases the circuit design
I. I NTRODUCTION difficulty, especially for SSC applications. Consequently,
many designs [17], [26] adopted subsampling/demultiplexing
Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 19:02:58 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 19:02:58 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
SHU et al.: 5–13.5 Gb/s MULTISTANDARD RECEIVER WITH HIGH JITTER TOLERANCE DIGITAL CDR IN 40-nm CMOS PROCESS 3
K 1 K p f s −Ndel ·s / fs
LG(s) = ·e (4)
s
The effect of loop gain on JTOL is shown in range, which may have an negative impact on some SSC
Fig. 4(a, b), the graph indicates increasing the loop gain applications [15]. Therefore K i should be chosen considering
(K d , K p ) will create peaking in JTOL. Fig. 4(c) plots the both the JTOL and SSC requirements.
calculated JTOL as a function of loop latency (Ndel ), which
explains why minimizing the loop latency is critical to avoid
the jitter peaking [18]. Our systematic latency of the CDR C. Frequency Tracking Range
(including DFE) is about 32 UI, which is designed in the
C4 domain and reduces 1/3 loop latency compared with the Frequency tracking range of the CDR is an essential spec-
design in [19]. Fig. 4(d) illustrates that raising the operating ification, because it determines maximum SSC modulation
rate of the CDR can increase the jitter tolerance. P. K. Hanu- depth. The integrator needs to provide enough high-significant
molu et al. defines a stability factor S = K p K i that needs bits to guarantee the tracking range for SSC, and truncate
to be large enough (>100) to prevent the jitter peaking [30]. enough low-significant bits to obtain a fractional gain, so that
However, very small K i could reduce the frequency tracking the CDR can satisfy enough phase margin. The frequency
Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 19:02:58 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 19:02:58 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
SHU et al.: 5–13.5 Gb/s MULTISTANDARD RECEIVER WITH HIGH JITTER TOLERANCE DIGITAL CDR IN 40-nm CMOS PROCESS 5
Serr_int ( f ) = Sφint ( f )
The signal transfer function (STF) of the first-order DSM
⎪
⎪
⎨ 1 + L(s)
s= j 2π f can be seen as a unit gain, while the noise transfer function
K d · K pi −sTs Ndel
⎪ G F G_int (s) = e (8) (NTF) equals 1 − z −1 . As shown in Fig. 8, benefiting from
⎪
⎪ sTs
⎪
⎪ K K K K K the noise shaping function of the DSM, the overall residual
⎪
⎩ L(s) = pd mv pi d (K p + i )e−sTs Ndel quantization noise fed into the following phase accumulator
sTs sTs
consists of two parts: 1) the truncated 4 LSBs di [3 : 0] with
where G F G_int (s) is the forward gain function starting from the amplitude Q L4 of 16 1
; 2) the shaped quantization error
the quantization noise injection point and L(s) is the CDR of DSM with the amplitude of ×NTF. Similar to the normal
open-loop gain. In order to have a more intuitive understanding DLF model analyzed in section II.B, the proposed PNS-DLF
of the truncation error effect, the variance of the output phase PSD can be derived as
error can be calculated by (9), the frequency range for noise ⎧
⎪
⎪ S ( f ) = Sφ ( f ) + S φ ( f ) · |s · T |2
integration is 0 to 2.5 Ghz. ⎪
⎪
err_dsm
L4
dsm s
⎪
⎪
G F G_int (s)
2
f s/2 ⎪
⎪
⎨ ×
σerr_int =
2
Serr _int ( f )d f (9) 1 + L(s)
s= j 2π f (10)
0 ⎪
2
⎪
⎪ 1
⎪
⎪ Sφ L4 ( f ) = Sφint ( f )
The blue line with diamonds in Fig. 9(a) illustrates that ⎪
⎪ 16
as the loop bandwidth increases, the RMS jitter σerr_int of ⎪
⎩
Sφdsm ( f ) = Sφint ( f )
the high-speed CDR also rises. To obtain a better JTOL, it is
necessary to decrease the PSD of the quantization noise due to In order to further quantitatively estimate the noise reduction
the integrator bit truncation, especially for SSC applications. effect, a noise rejection ratio R of the output phase error PSD,
between the proposed PNS-DLF and the normal DLF, can be
defined as
B. Proposed PNS-DLF With Delta-Sigma Modulator
2
Serr_dsm ( f ) 1
This section describes introducing a DSM to realize the R= = + |s · Ts |2s= j 2π f (11)
proposed DLF with the capability of quantization noise shap- Serr_int ( f ) 16
ing. Firstly, the jitter model of the proposed PNS-DLF is built Fig. 9(a) shows the attenuated RMS jitter of the CDR with
in Fig. 8, and then, the noise PSD and the RMS jitter reduction the proposed PNS-DLF under the same loop condition. The
can be calculated compared with the conventional high-speed calculated results of R for different DSM orders are shown
CDR without quantization noise shaping. As shown in Fig. 2, in Fig. 9(b). Indicated by the blue line for the 1st order DSM
Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 19:02:58 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 9. (a) Calculated contributions to output RMS jitter from the integral
path varying with bandwidth; (b) jitter suppression comparison of different
order DSM.
Fig. 10. Simulated time-domain phase error and histogram distribution results
scheme, the contribution to output jitter PSD from the integral for designs without PNS-DLF (a, b), and with PNS-DLF (c, d), respectively.
path with the proposed PNS-DLF is reduced by about 22 dB,
compared with the conventional high-speed CDR. We have
tried to decrease clock frequency of the phase accumulator to
attain better noise filtering, but it will increase the loop latency.
In addition, unless a higher order filter is used, increasing over-
sampling rate and order of the DSM are not good ways to
improve noise shaping. While considering the hardware com-
plexity and power consumption, some compromises are made
in current work. However, the gain of the phase accumulator
(Kd) can be adjusted to slightly improve noise filtering.
Fig. 11. Comparison of simulated jitter of phase error between the CDR with
and without PNS-DLF for (a) RMS jitter (b) peak to peak jitter, respectively.
C. Simulation Results Comparison of the CDR
Since the proposed CDR is all-digital design which can
be accurately simulated [35], a Matlab/Simulink model is
constructed in this work to evaluate the noise performance
improvement of the CDR using the proposed PNS-DLF in
comparison with the normal high-speed CDR. Fig. 10 (a, c)
show time-domain phase errors, histogram and fitted normal
distributions for CDR with or without the proposed PNS-DLF
(b and d), when receiving 33-kHz, 5000-ppm SSC-modulated
20-Gbps data. A normal high-speed CDR has a phase error
of 0.0266 UIrms/0.1689 UIpp, as indicated in Fig. 10(b). Fig. 12. Jitter tolerance of PNS-DLF at different data rate against (a) PCIe
In contrast, as illustrated in Fig. 10(d), a high-speed 4.0 jitter mask, and (b) USB 3.2 jitter mask.
CDR with the proposed PNS-DLF remains a phase error
of 0.0127 UIrms/0.1065 UIpp for the same loop parameters.
the truncation error of integral path. But when modulation
The results in Fig. 10 are from the time-accurate behavioral
depth of SSC is more than 1 kppm, the proposed CDR with
simulations with SSC stress in Simulink, this model is more
PNS-DLF has advantage in suppressing the quantization error.
close to the actual schematic, However, the results in Fig. 9 are
Fig. 12 demonstrates the simulated JTOL results of
theoretically calculated by linearized CDR model. So the
the proposed CDR when receiving 33-kHz, 5,000-ppm
results in Fig.10 are not in good agreement with the results
SSC-modulated PRBS7 data. A random jitter of 0.02UIrms is
in Fig.9. But importantly, they all illustrate that adopting the
injected into the recovered clock to represent the noise from
proposed method is beneficial to improve JTOL.
PLL and input data. To reduce the simulation time, the CDR
When the loop bandwidth is fixed and modulation depth of
only receives one million data bits to confirm that the BER
SSC is varied from 0 to 13 kppm, the RMS jitter and the peak
is below 10−6 . Fig. 12(a) and (b) illustrate that the proposed
to peak jitter of phase error is shown in Fig. 11(a) and (b),
HJTOL-CDR has enough jitter budget to pass multi-standard
respectively. It can be found that when owing to SSC mod-
masks.
ulation depth is less than 1 kppm, there is no enough fre-
quency deviation to cause large truncation error, therefore the
proposed CDR with PNS-DLF cannot show full advantage of D. Implementation of High-Gain and Wide-Band CTLE
quantization noise shaping. This is one reason for that many As shown in Fig. 1(a), multiple I/O link standards ask for
CDRs without requirements of SSC tracking don’t care about different channel loss compensation requirements, which need
Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 19:02:58 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
SHU et al.: 5–13.5 Gb/s MULTISTANDARD RECEIVER WITH HIGH JITTER TOLERANCE DIGITAL CDR IN 40-nm CMOS PROCESS 7
Fig. 13. The schematic of the proposed CTLE circuit, with 2nd -order negative capacitor (NC).
that CTLE has reconfigurable features with wide bandwidth, large (> 200 mV ), the NC can be turned off to save power,
high boost gain, and wide DC gain range [36]. However, because the CTLE can provide enough high-frequency gain
in this work, realizing such a CTLE faces two main challenges, for the channel compensation.
which are: 1) using no inductor to save chip area; 2) large ⎧
⎪ Vout G m1 R L1 · G m2 R L2
parasitic capacitance due to six samplers needed in the half- ⎪
⎪ =
⎪
⎪ 1 + G m f b R L1 · G m2 R L2
rate receiver. To solve these problems, a CTLE structure ⎨ Vin
based on RC source-degeneration equalizer is implemented G m f b = gm3 gm4 gm5 R3 R4 (12)
⎪
⎪
with a proposed 2nd -order negative capacitor (NC) circuit, ⎪ R L1 = R L1 −R NC1 − R I 1
⎪
⎪
⎩
as shown in Fig. 13. Firstly, the CTLE employs the classic R L2 = R L2 −R NC2 − R I 2
current-mode-logic (CML) stages, and a zero is created by
Adjustable degeneration resistor (Rs1 , Rs2 ) and capacitor
inserting a resistor R p and a capacitor C p in the feedback
(Cs1 , Cs2 ) are also adopted to further extend the tuning ranges.
loop to maximize the CTLE bandwidth [9]. Additionally,
Shown by the transconductance expressions in 13 [37], the DC
differential Miller capacitors C M are adopted to reduce the
gain increases as the degeneration resistor reduces, and the
output capacitance of the first stage, and thus the bandwidth
peak gain can be adjusted by changing the value of the
is extended. Moveover, a 2nd order NC circuit is introduced
degeneration capacitor.
to enhance the CTLE gain and bandwidth further. Z NC is
the equivalent impedance of the NC circuit, which can be gm1
G m1 =
derived as −R NC − SC1NC in the high frequency region, where 1 + gm1 ( 1+sRRs1s1 Cs1 )
R NC = −(2 + C gs 2C NC ) gmNC . Because of C gs < C NC , 1 + s Rs1 Cs1
R NC approximately equals −2 gm NC , where gm NC and C gs =
Rs1 + 1+sgRm1
s1 C s1
are the transconductance and the parasitic capacitance of the
1
cross-coupled NMOS transistors.
As a result, the NC1 circuit ≈ + sCs1 , i f 1 + s Rs1 Cs1 gm1 Rs1 (13)
can create a zero (ωz = 1 R NC1 C NC1 ) in the CTLE transfer Rs1
function, and therefore the low-frequency pole can be can- According to the simulation result at the slow-slow (ss)
celed. As shown in Fig. 14(a), when NC1 is enabled, the peak process corner and 125o C, tuning Rs1 can provide a wide
gain can be increased by 5.42 dB. gain range of 14.67 dB, as depicted in Fig. 14(c). When
Importantly, the proposed CTLE explores active feedback the receiver needs to satisfy the Display Port 1.4 (high peak
technology, and we insert two amplifiers (gm3 and gm4) in the gain), the NC1 and NC2 can be simultaneously enabled to
feedback loop to achieve lower DC gain, which can be proved enhance the peak gain and the DC gain. Fig. 14(d) shows
by (12). In addition, the second stage NC2 is mainly adopted that increasing Cs1 can maximize 4.34 dB of the peak gain.
to improve the DC gain. In low frequency region, the capacitor Additionally, Rs2 can provide about 8 dB coarse tuning range
C NC2 can be seen as disconnected, thus the impedance Z NC2 of the DC gain, the simulated maximum boost gain is above
approximately equals −R NC2 − R I 2 , where R I 2 is resistance 29 dB, the peak gain frequency is changed from 3.7 GHz to
of the current source in the NC2. After applying standard 11.4 GHz by controlling the tail current consumption. The
feedback equations, the DC gain of the overall CTLE can be maximum GBW is around 12.6 GHz, the load capacitance
expressed as 12. Where G m is the equivalent transconductance, at the CTLE output is more than 500 fF, including parasitic
R L represents the equivalent load resistance. If appropriate cap extracted by the layout, and this load capacitance is used
values of R L2 , R I 2 and R NC2 are chosen, a large value of for the simulation results. As shown in Fig.15, although the
R L2 can be achieved, thereby the DC gain can be increased. incoming data attenuated by imperfect channel, the equalized
As shown in Fig. 14(b) with the enabled NC2, the DC gain is data show wide eye widths. In conclusion, the small-signal
increased by 6.59 dB without sacrificing the peak gain at high analysis and post-layout simulation results indicate that the
frequency. When the swing of the received data is relatively proposed CTLE can compensate for various channel losses,
Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 19:02:58 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 14. Frequency response curves of the CTLE with and without
(a) NC1 circuit (NC2 off), (b) NC2 circuit (NC1 on), and varies with
(c) Rs1 , (d) Cs1 .
Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 19:02:58 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
SHU et al.: 5–13.5 Gb/s MULTISTANDARD RECEIVER WITH HIGH JITTER TOLERANCE DIGITAL CDR IN 40-nm CMOS PROCESS 9
TABLE I
S UMMARY AND C OMPARISON TO D IGITAL -CDR BASED RX
Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 19:02:58 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 19:02:58 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
SHU et al.: 5–13.5 Gb/s MULTISTANDARD RECEIVER WITH HIGH JITTER TOLERANCE DIGITAL CDR IN 40-nm CMOS PROCESS 11
[27] M. Verbeke et al., “A 1.8-pJ/b, 12.5–25-Gb/s wide range all-digital clock Peng Yin received the B.S. degree from the College
and data recovery circuit,” IEEE J. Solid-State Circuits, vol. 53, no. 2, of Mobile Telecommunications, Chongqing Univer-
pp. 470–483, Feb. 2018. sity of Posts and Telecommunications, Chongqing,
[28] J. L. Sonntag and J. Stonick, “A digital clock and data recovery archi- China, in 2014. He is currently pursuing the Ph.D.
tecture for multi-Gigabit/s binary links,” IEEE J. Solid-State Circuits, degree with School of Microelectronics and Com-
vol. 41, no. 8, pp. 1867–1875, Aug. 2006. munication Engineering, Chongqing University. His
[29] T. Lee, Y.-H. Kim, J. Sim, J.-S. Park, and L.-S. Kim, “A 5-Gb/s research interest includes digital integrated circuit
2.67-mW/Gb/s digital clock and data recovery with hybrid dithering design.
using a time-dithered delta–sigma modulator,” IEEE Trans. Very Large
Scale Integr. (VLSI) Syst., vol. 24, no. 4, pp. 1450–1459, Apr. 2016.
[30] P. K. Hanumolu, G.-Y. Wei, and U.-K. Moon, “A wide-tracking range
clock and data recovery circuit,” IEEE J. Solid-State Circuits, vol. 43,
no. 2, pp. 425–439, Feb. 2008.
[31] W.-C. Chen et al., “A 2.5-8Gb/s transceiver with 5-tap DFE and second
order CDR against 28-inch channel and 5000ppm SSC in 40 nm CMOS
technology,” in Proc. IEEE Custom Integr. Circuits Conf., Sep. 2010,
pp. 1–4. Jiandong Zang received the B.Sc. and master’s
[32] J. Liang, A. Sheikholeslami, H. Tamura, and H. Yamaguchi, “On-chip degrees from the University of Electronic Sci-
jitter measurement using jitter injection in a 28 Gb/s PI-based CDR,” ence and Technology of China (UESTC), Chengdu,
IEEE J. Solid-State Circuits, vol. 53, no. 3, pp. 750–761, Mar. 2018. China, in 2005 and 2008, respectively. He is cur-
[33] H. Won et al., “A 0.87 W transceiver IC for 100 gigabit Ethernet rently a Researcher with the Science and Tech-
in 40 nm CMOS,” IEEE J. Solid-State Circuits, vol. 50, no. 2, nology on Analog Integrated Circuit Laboratory,
pp. 399–413, Feb. 2015. Chongqing, China. His research interests focus on
[34] H. Song, D.-S. Kim, D.-H. Oh, S. Kim, and D.-K. Jeong, high-speed digital-to-analog converters and signal
“A 1.0–4.0-Gb/s all-digital CDR with 1.0-ps period resolution DCO and processing.
adaptive proportional gain control,” IEEE J. Solid-State Circuits, vol. 46,
no. 2, pp. 424–434, Feb. 2011.
[35] Y.-C. Huang, P.-Y. Wang, and S.-I. Liu, “An all-digital jitter tolerance
measurement technique for CDR circuits,” IEEE Trans. Circuits Syst. II,
Exp. Briefs, vol. 59, no. 3, pp. 148–152, Mar. 2012.
[36] S. Choi et al., “A 0.65-to-10.5 Gb/s reference-less CDR with asyn-
chronous baud-rate sampling for frequency acquisition and adaptive
equalization,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 63, no. 2, Dongbing Fu received the B.Sc. and master’s
pp. 276–287, Feb. 2016. degrees from the University of Electronic Science
[37] Y. Tomita, M. Kibune, J. Ogawa, W. W. Walker, H. Tamura, and and Technology of China (UESTC), Chengdu,
T. Kuroda, “A 10-Gb/s receiver with series equalizer and on-chip ISI China, in 2000 and 2003, respectively. He is
monitor in 0.11-μm CMOS,” IEEE J. Solid-State Circuits, vol. 40, no. 4, currently a Senior Engineer with the Science
pp. 986–993, Apr. 2005. and Technology on Analog Integrated Circuit
Laboratory, Chongqing, China. His research interests
focus on high performance mixed-signal IC designs.
Zhou Shu received the B.Sc. and master’s degrees
from the College of Communication Engineer-
ing, Chongqing University, Chongqing, China,
in 2015 and 2017, respectively, where he is currently
pursuing the Ph.D. degree. His research interest
includes mixed-signal Integrated circuit design for
high speed interconnection.
Zhipeng Li received the B.Sc. degree from the Col- Amine Bermak (Fellow, IEEE) received the M.Eng.
lege of Electrical Information Engineering, Henan and Ph.D. degrees in electronic engineering from
University of Engineering, Henan, China, in 2016. Paul Sabatier University, Toulouse, France, in 1994
He is currently pursuing the master’s degree with and 1998, respectively. He was the Founder and
Chongqing University. His research interest focuses the Leader of the Smart Sensory Integrated Systems
on mixed-signal integrated circuit design for bio- Research Laboratory, HKUST. He is currently a
medical sensor. Full Professor with the College of Science and
Engineering, Hamad Bin Khalifa University, Qatar.
Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 19:02:58 UTC from IEEE Xplore. Restrictions apply.