You are on page 1of 16

546 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 57, NO.

2, FEBRUARY 2022

A 0.0285-mm2 0.68-pJ/bit Single-Loop Full-Rate


Bang-Bang CDR Without Reference and Separate
FD Pulling Off an 8.2-Gb/s/μs Acquisition Speed
of the PAM-4 Input in 28-nm CMOS
Xiaoteng Zhao , Student Member, IEEE, Yong Chen , Senior Member, IEEE, Pui-In Mak , Fellow, IEEE,
and Rui P. Martins , Fellow, IEEE

Abstract— This article reports a single-loop full-rate bang-bang enables flexible support for a range of applications while
clock and data recovery (BBCDR) circuit supporting a four- avoiding the use of external crystals. The key challenges in
level pulse amplitude modulation (PAM-4) pattern. We eliminate designing such a referenceless CDR include its robustness,
both the reference and the separate frequency detector (FD) by
deliberately adding two fixed strobe points in the bang-bang the speed and range of the frequency acquisition, and its
phase detector (BBPD) curve via a clock-selection scheme. area and energy (<1 pJ/bit) efficiency. Most of the existing
As such, we can achieve a wide frequency-capture range in a solutions, based on the separate frequency detector (FD)
single-sided FD polarity. The BBPD also incorporates a hybrid or lock detector [1]–[29], use the input data stream [1]–
control circuit to automate the frequency acquisition over a wide [8] to sample the multi-phase clocks for the frequency-
frequency range. Prototyped in a 28-nm CMOS, the proposed
BBCDR occupies a tiny area of 0.0285 mm2 and exhibits a acquisition function, implying that they are not suitable
23-to-29-Gb/s capture range. The acquisition speed [8.2 Gb/s/µs] for multi-level data input, e.g., four-level pulse amplitude
and energy efficiency (0.68 pJ/bit) compare favorably with the modulation (PAM-4). The injection-locking-type CDR also
state of the art. demands large-swing data signals [9]. Although the dual-
Index Terms— Acquisition speed, bang-bang clock and data loop architecture [10]–[16] provides more design freedom,
recovery (BBCDR), charge pump (CP), CMOS, four-level pulse it is hard to guarantee robustness during the state transition
amplitude modulation (PAM-4), frequency detector (FD), hybrid between frequency acquisition and phase locking. To sur-
control circuit (HCC), jitter tolerance (JTOL), jitter transfer mount this issue, a simpler single-loop CDR topology merges
function (JTF), phase detector (PD), strobe point (SP).
clock-phase-selection (CPS)- [4] and data-phase-selection
I. I NTRODUCTION (DPS)-based [5] FDs. The above FDs effectively increase the
capture range of the open-loop FD but still require the data as
T HERE is a strong trend toward realizing clock and data
recovery (CDR) circuits for serializer and de-serializer
(SerDes) receivers to support a wide capture range without
the clock signal. To avoid this, Huang et al. [17] reported an
FD-less referenceless CDR based on the linear phase detector
(PD) with a deliberate strobe point (SP). Regrettably, it suffers
requiring a reference frequency input [1]–[31]. This approach
from the following limits: 1) the delay-based offset is sensitive
Manuscript received February 24, 2021; revised June 10, 2021 and July 20, to the process, voltage, and temperature (PVT) variations;
2021; accepted September 13, 2021. Date of publication September 29, 2) large off-chip capacitors (2 × 510 pF) are required to
2021; date of current version January 28, 2022. This article was approved
by Associate Editor Daniel Friedman. This work was supported in part complete the control logic; 3) the single-sided capture range
by Guangzhou Science and Technology Innovation and Development of enhancement must bind with a certain frequency error polar-
Special Funds (GSTIC) under Grant EF002/IME-CY/2019/GSTIC and in ity, and 4) it experiences a long frequency-acquisition time
part by Macau Science and Technology Development Fund (FDCT) through
the SKL Fund under Grant SKL-AMSV(UM)-2020-2022. (Corresponding (116 μs). While the bang-bang phase/FD [18]–[20] uses the
author: Yong Chen.) always-on multi-phase oversampling to improve the acqui-
Xiaoteng Zhao, Yong Chen, and Pui-In Mak are with the State Key sition speed of 1.005 Gb/s/μs, it almost doubles the power
Laboratory of Analog and Mixed-Signal VLSI, University of Macau, Macau
999078, China, and also with the Department of Electrical and Com- budget of the clock generation to obtain extra clock phases,
puter Engineering (DECE), Faculty of Science and Technology, Institute of lowering the energy efficiency of 2.25 pJ/bit.
Microelectronic (IME), University of Macau, Macau 999078, China (e-mail: This article extends [30] that reported the single-loop full-
ychen@um.edu.mo).
Rui P. Martins is with the State Key Laboratory of Analog rate bang-bang clock and data recovery (BBCDR) for PAM-4
and Mixed-Signal VLSI, University of Macau, Macau 999078, China, signaling, without reference and separate FD. By deliberately
and also with the Department of Electrical and Computer Engi- adding two fixed SPs in the bang-bang PD (BBPD) curve
neering (DECE), Faculty of Science and Technology, Institute of
Microelectronic (IME), University of Macau, Macau 999078, China, compared with the linear counterpart [17], we realize a wide
on leave from the Instituto Superior Técnico, Universidade de Lisboa, frequency-capture range in a single-sided FD polarity. Also,
1049-001 Lisbon, Portugal. unlike the typical dual-loop CDRs that may suffer from
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/JSSC.2021.3113773. the transition issue between frequency acquisition and phase
Digital Object Identifier 10.1109/JSSC.2021.3113773 locking, the proposed BBCDR manifests fast and automatic
0018-9200 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Inha University. Downloaded on February 07,2023 at 11:42:23 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.: 0.0285-mm2 0.68-pJ/bit SINGLE-LOOP FULL-RATE BANG-BANG CDR 547

Fig. 2. State transition between NSP, ZSP, and PSP modes in the proposed
BBCDR for frequency acquisition and phase locking.

polarity over a certain  f , and it leads to VCONT decreasing


as desired. Likewise, if CKS = CK270 , it will imply a positive
SP (PSP) in the PD curve [Fig. 1(b)], and the single-sided
positive polarity of the open-loop FD output results in VCONT
increasing. Differing from the open-loop situation, the closed-
loop FD curve exhibits a stable point at  f = 0. Regardless
of whether the closed-loop experiences single-sided polarity
in PSP or NSP, it can still reach the locking condition.
The above observation leads to the SP-selection scheme
for both frequency acquisition and phase locking, as shown
Fig. 1. (a) Block diagram of the proposed BBPD in full-rate topology and
in Fig. 2. Given fDR = f CK , the proposed BBCDR can
(b) its operation principle that uses clock-phase switching to realize ZSP, NSP, lock in all cases of NSP, ZSP, and PSP. Otherwise, PSP/NSP
and PSP. increases/decreases VCONT , which corresponds to a “searching”
process in a certain voltage-controlled oscillator (VCO) band,
regardless of the polarity of  f . If we properly change the
frequency acquisition in a single loop via a hybrid control VCO band until we find the correct value, the BBCDR will
circuit (HCC). Prototyped in 28-nm CMOS, the design auto- finally lock at the desired data rate. With the BBCDR locked
matically tracks 23-to-29-Gb/s PAM-4 input data with a faster stably, we will set the SP to zero for better jitter performance.
acquisition speed of 8.2 Gb/s/μs while outperforming the
recent CDRs in terms of smaller area (0.0285 mm2 ) and better III. C IRCUIT I MPLEMENTATION
energy efficiency (0.68 pJ/bit).
The remainder of this article is organized as follows. A. Complete Diagram of the Proposed BBCDR
Section II introduces the proposed frequency-acquisition tech- Fig. 3(a) shows the complete diagram of the proposed
nique based on deliberate SPs and its control principle. BBCDR to implement the developed principles (Figs. 1 and 2).
Section III details the circuit design of the proposed BBCDR Recalling Fig. 1(a), the three SP conditions share the same
with the simulated results. Section IV presents the measured flip-flops (FFs), XOR gates, and charge pump (CP). They can
results. Finally, Section V draws the conclusions. only identify themselves by the phase of CKS . Yet, for the
circuit realization, there are two reasons to introduce additional
II. P ROPOSED F REQUENCY-ACQUISITION T ECHNIQUE hardware [Fig. 3(a)]. First, it is challenging to design a
The full-rate BBPD [Fig. 1(a)] runs by the complementary three-phase clock selector for CK90 /CK180 /CK270 because of
clocks: CK = CK0 and CKS = CK180 . With the symmetric the undesired phase deviation, while the multiplexer (MUX1 )
PD output about the origin, there is an equal chance of being easily works as the two-phase selector for CK90 /CK270 . Sec-
Early or Late in the timing diagram. If the data rate ( f DR ) ond, the clock-to-Q delay causes insufficient time margin for
deviates largely from the clock frequency ( f CK ), the average the two-stage DFFs when CKs = CK270 . Thus, FF9 is inserted
output current (Iout,av ) of FD for both the open loop and closed between FF8 and FF10 and triggered by CK180 to entail enough
loop will yield zero, indicating that the typical BBPD cannot time margin. Meanwhile, FF3 is also introduced to align the
capture the frequency variation for a certain frequency error timing in PSP. In summary, (FF1-2 , FF4-5 ), (FF1-2 , FF6-7 ), and
( f = f DR − f CK ). To solve it, we propose a clock-phase (FF1-3 , FF8-10 ) are used for ZSP, NSP, and PSP, respectively.
switching scheme to dynamically alter the PD/FD character- Correspondingly, the XOR / XNOR gates along with the CPs
istic. Specifically, we denote the case as zero SP (ZSP) when are also replicated three to match the output of three groups
CKS = CK180 [Fig. 1(b)]. If CKS = CK90 , the PD output will of FFs.
show a polarity change at ϕ = −π/2. The negative SP (NSP) The proposed BBCDR loop contains two operating modes,
favors the chance of being Early than Late. In this case, namely, the frequency-acquisition mode (FAM) and phase
Iout,av of the open-loop FD will keep the single-sided negative acquisition mode (PAM). Functionally, the former drives the

Authorized licensed use limited to: Inha University. Downloaded on February 07,2023 at 11:42:23 UTC from IEEE Xplore. Restrictions apply.
548 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 57, NO. 2, FEBRUARY 2022

Fig. 3. Complete diagram of the proposed single-loop full-rate BBCDR without reference and separate FD, along with its control summary. (b) Its detailed
operation under different modes.

loop to the correct frequency, whereas the latter aims at CKS between CK90 and CK270 by transmission-gate-based
recovering clock and retiming data at a specific data rate. multiplexer [Fig. 3(b)].
In PAM (i.e., ZSP), besides the clock generator, the loop Fig. 4 shows the schematic of the HCC to perform the
consists of FF1,2 , FF4,5 , and CPPAM with the corresponding control algorithm (Figs. 2 and 3) consisting of the VCO
XOR / XNOR gates. Though FF3 and FF6-10 are still working, hopping control and mode switch control. The former part
CP90 and CP270 are turned off by their grounded bias voltage functions as the PSP/NSP controller in FAM. First, SW toggles
through MUX5,6 . Therefore, the FAM is inactive, and the only when VCONT crosses VS+ during its uptrend or falls
loop exactly operates like a conventional BBCDR loop. Sim- below VS− during its decline, where VS+ and VS− are the
ilarly, in FAM, the PSP/NSP together with the corresponding upper and lower threshold voltages of the Schmitt trigger,
CPs are also operating in the “one-hot” mode, as shown respectively. Whether SW outputs a rising or falling edge, the
in Fig. 3(b). successive edge detector delivers a pulse on the band switch
The proposed single-loop full-rate scheme [Fig. 3(a)] uses signal (BANDSW), thus triggering the counter to increase and
a double-tail comparator [31] as an FF to shift and retime upshift the VCO band by one. Meanwhile, the hopping SW
the PAM-4 input by comparing it with the external reference switches CKS between CK90 and CK270 , ensuring the reverse
voltages (VRef,A and VRef,C ) and finally outputs the most trend for VCONT in the adjacent band. This process continues
significant bit (MSB) and least significant bit (LSB) data by until the correct VCO band is found.
the multiplexer-based decoder [32], [33]. The proposed BBPD The mode switch control assists the BBCDR in switching
generates three groups of three successive samples for ZSP between PAM and FAM automatically. Recalling Fig. 1(b),
(S2 , S3 , and S4 ), NSP (S2 , S3 , and S5 ), and PSP (S1 , S2 , the proposed BBCDR can lock not only in ZSP, but also in
and S6 ). Thus, the three CPs handle the respective Late/Early PSP/NSP. Thus, we identify the locking when VCONT remains
signals to charge/discharge the loop filter (LF). Here, the CPs stable longer than a preset time (TPRE ). To realize it, we use a
based on the dual compensation paths alleviate the mismatch timer triggered by a f Baud /8192 clock to monitor the locking
between the UP and DN currents caused by the channel-length state. When the timer overflows, it produces a rising edge on
modulation [19], [34]. OVER, leading to a high-level “mode switch” signal (MDSW)
An LC-VCO supplies the required clock phases operating by G3 . At this instant, the loop enters PAM and operates as a
at 2 f Baud followed by a quadrature output divider-by-two, normal BBCDR. TPRE should be longer than the searching time
where f Baud is the baud rate of the input data (the analysis in any VCO band, thus eliminating incorrect mode switches.
of these blocks is in Section III-C). Fig. 3(a) (bottom left) Furthermore, we design the three-input AND gate (G1 ) and
shows the control logic provided by the HCC. Here, MDSW = comparators (G4 and G5 ) to restart the FAM. First, when Reset
0/1 denotes the FAM and PAM state, and plus, the CKS switch is low, the switch (G2 ) turns on, pulling down BANDSW,
signal (SW) indicates the VCO band switching state, toggling thereby G1 outputs a low-level MDCLR to reset MDSW,

Authorized licensed use limited to: Inha University. Downloaded on February 07,2023 at 11:42:23 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.: 0.0285-mm2 0.68-pJ/bit SINGLE-LOOP FULL-RATE BANG-BANG CDR 549

Fig. 4. Block diagram of the HCCs.

Fig. 5. Operating illustration of frequency and phase acquisition starting from Reset signal.

ensuring the BBCDR begins with FAM. Second, with the B. Operating Dynamics
data rate changed after locking the loop (BBCDR operates With the hardware described in Section III-A, we exemplify
in PAM), VCONT may drift out of the allowed range ([VREF− , a typical frequency acquisition and phase tracking process
VREF+ ]). Namely, if VCONT > VREF+ or VCONT < VREF− , in Fig. 5, which starts from an active Reset signal. Here,
MDCLR will also return to zero, forcing the BBCDR to we extract the involved building blocks from Figs. 3 and 4,
enter into the FAM again, since the counter starts over to be and the sampling relationship refers to that in Fig. 1(b).
0 after counting up one bit from 7, and we can always find At the beginning, phase 1 delivers a negative pulse on
the correct band. [VS− , VS+ ] is the intrinsic threshold voltage Reset to the 3-bit counter in HCC, thus resetting the VCO
of the Schmitt trigger, while [VREF− , VREF+ ] is the external to Band 0. Without loss of generality, we assume that the
reference voltage for defining the reasonable VCONT range. initial SW is low and the targeted frequency locates in Band 2.
From a mathematical perspective, [VS− , VS+ ] is a subset of Once Reset returns to high, the BBCDR enters PSP in FAM,
[VREF− , VREF+ ]. with CK270 selected as CKS by SW. Thereby, the samples

Authorized licensed use limited to: Inha University. Downloaded on February 07,2023 at 11:42:23 UTC from IEEE Xplore. Restrictions apply.
550 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 57, NO. 2, FEBRUARY 2022

Fig. 6. (a) Complete algorithm of our BBCDR operation and (b) dynamic acquisition process including all the states.

(S1 , S6 , and S2 ) serve CP270 to deliver more IUP than IDN . PAM to FAM, implying the process of the automatic frequency
Therefore, the BBCDR searches upwardly in Band 0 (phase 2). recapture, which happens when the data rate changes.
When VCONT climbs to VS+ , the loop enters the first transition Fig. 6(b) shows a process containing the above states
and it functions as follows: 1) the larger-than-VS+ VCONT transitions. It consists of two parts, where the preceding one
toggles the Schmitt trigger, outputting a rising edge on SW; (t0 –t1 ), beginning with an active Reset signal, is similar to
2) a pulse shows on BANDSW to trigger the VCO band that in Fig. 5. Differently, at t1 , the input data rate suddenly
counter; and 3) the VCO jumps to Band 1 with VCONT holding changes, the loop loses lock and VCONT drifts out of [VREF− ,
on around VS+ . Therefore, in the next step (phase 3), with VREF+ ] at t2 (VREF+ = VS+ in this case). Thus, MDCLR
CK90 activated, the valid samples change to S2 , S5 , and S3 , changes to the low level, resetting MDSW and forcing the loop
delivering more IDN than IUP instead. Thus, the BBCDR to enter FAM again. The BBCDR executes another complete
searches downward in Band 1. When VCONT falls to VS− , the frequency acquisition from t1 to t3 , indicating that we can
NSP-to-PSP transition occurs similar to its reciprocal case. reach the re-locking regardless of the initial locking frequency.
The falling edge of SW upshifts VCO and restarts the new
upward searching in Band 2 (phase 4). When we finally find C. Design Considerations
the targeted frequency in Band 2 with a PSP, the average of Fig. 7 shows the schematic of the quadrature clock scheme.
VCONT remains stable. When the timer in HCC monitors and It consists of a class-B LC VCO operating at 2 f Baud with
detects that VCONT stays in [VS− , VS+ ] for a time longer than an ac-coupled divider-by-two based on a low-power dynamic
TPRE , it alarms and generates a pulse on OVER and finally latch [35]. The n-type LC VCO uses a 3-bit switched capacitor
switches MDSW to the high level (phase 5). At this moment, to achieve the overall eight bands covering 23.4–29.1 GHz,
the phase of CKS is unconcerned since only CPPAM [Fig. 3(b)] consuming 4.2 mW at 28 GHz with a 0.6-V supply, where
is on duty. The locking converges to a more stable condition C1 = 22.4 fF and the two-turns inductor in the LC tank
due to the SP vanishing (phase 6). features ∼370 pH with the quality factor of 16 at 28 GHz,
Fig. 6(a) shows the detailed control algorithm, among which occupying 150 × 135 μm2 .
Fig. 5 mainly involves the states transitions of RESET to FAM, Although our clock-selection scheme requires the quadra-
SP switching in FAM, and FAM to PAM. Fig. 6(a) shows ture phases at f Baud , the divider-by-two in our design only
the reverse transition paths, especially for the transition from occupies 3.3 × 3.4 μm2 and consumes ∼0.8 mW at 28 GHz

Authorized licensed use limited to: Inha University. Downloaded on February 07,2023 at 11:42:23 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.: 0.0285-mm2 0.68-pJ/bit SINGLE-LOOP FULL-RATE BANG-BANG CDR 551

Fig. 9. Comparison of the open-loop FD feature between the proposed PD


and the PD described in [17].

Fig. 7. Schematic of the quadrature clock scheme.

Fig. 8. (a) Details of the Schmitt trigger and (b) its design consideration.

with 1-V supply and 0.45-V common-mode bias voltage


(VCKCM ). Thus, the extra area and power are <0.1% and
<11%, respectively.
The alternative clock scheme is the quadrature LC VCO
operating at f Baud , which needs two larger inductors. The
phase ambiguity in the quadrature LC VCO [36] is unac-
ceptable in the clock-selection scheme because the chosen Fig. 10. Comparison of the UP/DN and VCONT signals when the SP is zero,
positive, and negative in the pull-in process if the VCO frequency is slower
phase binds the SP polarity and further affects the frequency than the input data rate.
acquisition.
We design the Schmitt trigger [Fig. 8(a)] in the HCC to
eliminate improper band hopping. Considering a VCONT signal
shown in Fig. 8(b), although its overall trend is rising, there is a 14-Gb/s non-return-to-zero (NRZ) input and a 50-μA CP
still up and down in the whole process. When VCONT climbs current with the SPs in [17] set to ±15 ps for comparison.
near to VS+ , the constant-threshold comparator in [17] may Here, we use the pre-layout double-tail-based DFFs along with
output multiple edges on SW to increase the VCO band more the ideal delay cell and clock phases.
than once. The Schmitt trigger ensures that the VCO band We observe that the FD curves of the proposed scheme for
jumps only once due to its hysteresis feature. Meanwhile, M1 both PSP and NSP are symmetric about the x-axis. It implies
[Fig. 8(a)] functions as the threshold-voltage tuner to provide that the strength of the single-side polarity is almost the same
a proper VCONT range. for the two SP conditions. Compared with the topology in [17],
the BBPD shows a wider range (∼4–40 GHz) maintaining
the desired single-sided polarity. Moreover, Iout,av surpasses its
D. Simulation Results linear counterpart throughout the whole frequency range, indi-
Fig. 9 shows the simulation results of the open-loop FD cating the faster acquisition speed for the proposed scheme.
for the proposed BBCDR and its linear counterpart in [17]. We also perform different input patterns to verify the robust-
The simulation is conducted under a full-rate structure with ness of the clock-selection scheme.

Authorized licensed use limited to: Inha University. Downloaded on February 07,2023 at 11:42:23 UTC from IEEE Xplore. Restrictions apply.
552 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 57, NO. 2, FEBRUARY 2022

Fig. 11. Negative single-sided polarity comparison when setting (a) f CK = 9.5 GHz and (b) f CK = 8.5 GHz.

Fig. 12. Positive single-sided polarity comparison when setting (a) f CK = 26 GHz and (b) f CK = 28 GHz.

We further compare these two PDs in the closed loop with


a behavioral simulation. Using the same loop parameters [17],
where K VCO = 1 GHz/V, R1 = 1 k, C1 = 1 nF, and C2 =
100 pF, we set the CP current to 500 μA and the SPs in [17]
to ±15 ps. For the simulations (Figs. 10–12), we input a
“0101” pattern at 10 Gb/s. Fig. 10 compares the UP–DN and
VCONT signals in ZSP, PSP, and NSP when we set an initial
f CK to 9 GHz. The single-sided polarity is reversed for [17]
and this work. Hence, we unify the non-ZSP conditions by
the pulling-in direction. Under ZSP, the BBPD shows almost
equal intervals for charging and discharging. The average of
VCONT is almost unchanged, corresponding to the closed-loop Fig. 13. Dynamic process under PSP using the same loop parameter of that
curve in Fig. 1(b). Also, its UP–DN changes only once in a in Fig. 10 when setting the input of a 10-Gb/s PRBS 27 − 1 pattern and the
initial f CK = 19.9 GHz.
cycle slip. Differently, the linear PD generates more pulses
in a cycle slip since  f is in the pull-in range of the linear
PD, it shows a little more charging than discharging, and thus,
the corresponding VCONT gradually climbs up. As for the pull- Fig. 11(a) and (b) shows the processes with an initial f CK of
up case [Fig. 10 (middle)], both PDs output much more UPs 9.5 and 8.5 GHz, respectively. VCONT of BBPD keeps decreas-
than DNs, and therefore, the loop has higher capability of ing in Fig. 11(a) and (b) as desired, while the linear PD fails
charging than discharging the LF. Conversely, the pull-down when f CK = 8.5 GHz. Fig. 12 shows the reciprocal case, where
case [Fig. 10 (bottom)] supplies more DNs to decrease VCONT . the BBPD operates in PSP and the linear PD works in NSP
In the non-ZSP conditions, VCONT from both the PDs shares (see Fig. 9) for an upward searching. At 26-GHz f CK , both
the same trend, but the BBPD always provides a faster pulling- PDs exhibit an increasing VCONT , as shown in Fig. 12(a). How-
in speed, agreeing with Fig. 9. ever, the linear PD fails to maintain the positive single-side
Since the frequency acquisition in the proposed BBCDR polarity at 28-GHz f CK [Fig. 12(b)], indicating that our
and that in [17] relies on the available single-side polarity BBCDR has a wider frequency-capture range.
range, Figs. 11 and 12 show the pull-in process. Recall- A harmonic-locking issue may occur if the variable fre-
ing Fig. 9, the PSP in [17] fails to maintain the negative quency range of the VCO is too large [38], which happens
single-sided polarity at low frequency. With a 10-Gb/s input, with the loop locked to any of the harmonics rather than the

Authorized licensed use limited to: Inha University. Downloaded on February 07,2023 at 11:42:23 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.: 0.0285-mm2 0.68-pJ/bit SINGLE-LOOP FULL-RATE BANG-BANG CDR 553

Fig. 14. Illustration of the harmonic-locking issue in terms of f 1 , f 2 , and Reset signal, where f 1 and f 2 denote the first and second target frequencies,
respectively.

fundamental of the target frequency [39]. Then, the harmonic


locking may also impact the proposed BBPD. Using the loop
parameter in Fig. 10, we simulate a dynamic process under
PSP (Fig. 13), where the input is a 10-Gb/s pseudorandom
binary sequence (PRBS) 27 − 1 pattern, with the initial f CK
set at 19.9 GHz, the loop finally locks at 20 GHz, namely,
the second harmonic of the target frequency. However, there
are mechanisms to weaken its effect. We sketch four cases,
as shown in Fig. 14, assuming that the BBCDR maintains
the single-sided FD polarity over the frequency range of
interest. At t1 , the loop locks at f 1 , with the data rate changed
at t2 . Since the harmonic frequency also changes with the
input data rate, as long as f 2 > f 1 , the harmonic-locking
issue will not affect the loop [Fig. 14(a) and (b)]. Yet, when
f 2 < f 1 , the loop will lock at the frequency of 2 f 2 , which
is not desirable, as shown in Fig. 14(c). To avoid it, if we
reset the loop at t2 , the VCO will return to its lowest band Fig. 15. Dynamic process with 2.9× frequency variation based on the same
and starting a new frequency-acquisition process [Fig. 14(d)]. loop parameter of that in Fig. 10 with 100-μA CP current.
Finally, the hunt of the correct frequency will start.
To assess the frequency-acquisition ability, we develop a
5-bit VCO model and VCO hopping control logic (see To examine the effect of the input data randomness,
Figs. 4 and 6) in MATLAB/Simulink to support a wide-range Fig. 16 shows the simulated dynamic processes under PRBS
frequency acquisition and to eliminate the limitation of the 27 − 1, 215 − 1, and 231 − 1. Here, we use the same loop
real VCO. We conduct a behavioral simulation, as shown parameters of Fig. 10 with 100-μA CP current, where we set
in Fig. 15, where we set the initial f CK at 7.8 GHz by the the initial and target frequency to 7.8 and 12 GHz, respectively.
negative pulse of Reset. While we input a 22.5-Gb/s PRBS With the same Reset, we obtain the fastest process under PRBS
27 − 1 to obtain a 2.9× frequency variation from the initial 27 − 1, while the slowest results from PRBS 231 − 1, which
f CK to the target frequency. The BBCDR loop searches from verifies the robustness of the proposed BBCDR in terms of
Band 0 to Band 21 and finally locks at 22.5 GHz. the input randomness.

Authorized licensed use limited to: Inha University. Downloaded on February 07,2023 at 11:42:23 UTC from IEEE Xplore. Restrictions apply.
554 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 57, NO. 2, FEBRUARY 2022

peak-to-peak swing of the PAM-4 input is 1.43 Vppd and the


horizontal eye opening is 61.38 ps.

A. Performance Assessment
To assess the settling process, we first examine the case
starting from the Reset signal. From Fig. 18(a), initially,
the proposed BBCDR locked at 26 Gb/s corresponds to the
VCO Band 4 (where Band 0 is the lowest and Band 7 is
the highest). At t0 , Reset is active, and the loop starts to
unlock. Then, Reset goes high at t1 , activating FAM and
VCONT wanders between VS+ and VS− to search for the desired
frequency. After 2.052 μs, with Band 4 hunted, the loop locks
Fig. 16. Dynamic processes under the PRBS 27 − 1, 215 − 1, and at t2 . When the timer overflows at t3 , MDSW goes high and the
231 − 1 patterns. PAM takes over, while further suppresses the voltage ripple on
VCONT . Fig. 18(a) (bottom) shows the time-domain dynamic
variation of f VCO . To further illustrate the acquisition behavior,
Fig. 18(b) shows the corresponding frequency traveling route
in the overall VCO tuning range. At t0 , the frequency drops
sharply from Band 4 to Band 0 due to the active-low Reset
signal. Once VCONT climbs to VS+ for the first time, the
VCO band does not jump to Band 1 because Reset is still
active until t1 . Then, f VCO moves up in a zigzag manner
alongside each VCO band. Finally, at t2 , we re-reach the
targeted frequency.
Fig. 19 shows the second settling process (Dynamic II).
The dynamic process begins with NSP at t0 and locks in
PSP in Band 6 at t2 , indicating that the proposed BBCDR
can lock regardless of the initial or final SP polarity. When
comparing it to Dynamic I in Fig. 18, we can conclude
that the initial SW level dominates the initial SP, while the
number of bands jumped before locking determines the ended
SP polarity.
Fig. 20(a) shows the third dynamic process which jumps
from one band to another without a Reset signal, verifying
the frequency recapture [Fig. 6(b)]. Initially, the BBCDR
locks at 27.8 Gb/s in Band 6 until the data rate suddenly
drops to 25.2 Gb/s at t0 . Consequently, the loop loses lock-
Fig. 17. (a) Chip micrograph with area and power breakdown. (b) PAM-4 eye ing and VCONT drifts up gradually, reaching VREF+ at t1 .
diagram of the sub-block (E) by performing the post-layout simulation.
At this moment, we reset MDCLR to zero with the gate (G1 )
(Fig. 4), outputting a low-level MDSW, indicating that the
BBCDR switches to FAM and searches for the new frequency.
IV. M EASUREMENT R ESULTS
On account of finding the targeted frequency in Bands 6 and 7,
Fig. 17(a) shows the chip photograph along with the area a new round acquisition starts at t2 , leading to the re-locking
and power breakdown of the proposed BBCDR fabricated at t3 and entering PAM at t4 . Fig. 20(b) shows the frequency-
in a 28-nm CMOS. The VCO is characterized by a 0.6-V domain searching process. With the cyclical counter in HCC
supply and other sub-blocks are powered by a 1.2-V supply. (Fig. 4), the proposed BBCDR can always find the tar-
Occupying a 0.0285-mm2 core area, the proposed prototype geted frequency regardless of the polarity of the data-rate
dissipates 19.16 mW at 28 Gb/s, excluding the 27 − 1 PRBS difference.
pattern generator (E) and the test buffers (F1,2 ). The addi- Fig. 21 shows the acquisition speed versus the bias of the
tional hardware for the frequency acquisition only occupies CP in FAM. At the highlighted point in Band 7, the initial data
0.0006 mm2 (2.1%) and consumes ∼8 mW (41.8%). In this rate is 23.45 Gb/s, while the target data rate is 28.6 Gb/s, and
design, we perform a similar testing platform [33] and use the CP bias (VCP_FAM ) is 0.75 V. Because the unbalanced CP
the bit error ratio tester (BERT) to provide the half baud rate current would change with the real-time frequency difference,
clock for an on-chip PAM-4 generator with a 27 − 1 PRBS. the frequency acquisition speed here is an average value. The
Fig. 17(b) shows the 28-Gb/s eye diagram of the PAM-4 acquisition time becomes shorter when VCP_FAM rises in each
input obtained from 1-μs post-layout simulation, including band due to the larger loop bandwidth. Furthermore, we obtain
the effect of the on-chip metal trace and load capacitors. The all the results with a 0.32-V fixed VCP_PAM . It implies that the

Authorized licensed use limited to: Inha University. Downloaded on February 07,2023 at 11:42:23 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.: 0.0285-mm2 0.68-pJ/bit SINGLE-LOOP FULL-RATE BANG-BANG CDR 555

Fig. 18. (a) Measured dynamic process which starts from the Reset state, ends in NSP, and locks in Band 4. (b) Corresponding frequency traveling route
along with the overall VCO tuning range.

Fig. 19. (a) Measured dynamic process which starts from the Reset state, ends in PSP, and locks in Band 6. (b) Corresponding frequency traveling route
along with the overall VCO tuning range.

acceleration of locking consumes no extra power in the PAM where the voltage bias difference is 0.4 V between FAM and
and exhibits robustness for the mode switch between the FAM PAM. The VCONT variance under PAM is also less than half
and the PAM when varying the CP voltage bias. of that in FAM [Fig. 22(d)]. These results prove a better jitter
Fig. 22 shows the VCONT comparison between FAM and performance when disabling the deliberate SP.
PAM in two cases. The first is at 24.16 Gb/s in Band 1 with Entering the PAM, we further measure the other jitter-
0.61-V VCP_FAM and 0.39-V VCP_PAM . The time-domain wave- related performance of the BBCDR. Fig. 23(a) shows the
form [Fig. 22(a)] intuitively suggests that VCONT under PAM retimed MSB/LSB data and recovered clock at 28 Gb/s. The
is steadier than that in FAM. We perform a statistical analysis histogram of the recovered clock shows a standard devia-
in both FAM and PAM and obtain the VCONT histogram tion of 789.3 fs, including the 270.6-fs standard deviation
[Fig. 22(b)]. VCONT under PAM concentrates more on its jitter of the data-homologous trigger signal [Fig. 23(b)].
median with less variance of 2.19 × 10−5 V2 . We extract the Fig. 23(c) and (d) shows the spectrum and phase noise of the
other case from a dynamic at 24.8 Gb/s in Band 2 [Fig. 22(c)], 14-GHz recovered clock, respectively. The integrated jitter

Authorized licensed use limited to: Inha University. Downloaded on February 07,2023 at 11:42:23 UTC from IEEE Xplore. Restrictions apply.
556 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 57, NO. 2, FEBRUARY 2022

Fig. 20. (a) Measured dynamic process which starts from Band 6 in locking state, ends in PSP, and re-locks in Band 3. (b) Corresponding frequency traveling
route along with the overall VCO tuning range.

Fig. 21. Acquisition speed versus the CP bias (VCP_FAM ) in each band.

Fig. 23. (a) Retimed data and recovered clock eyes. (b) Histogram of the
recovered clock along with the data-homologous trigger signal. (c) Recovered
clock spectrum. (d) Phase noise of the recovered clock. All the results are
obtained by supplying a 28-Gb/s PAM-4 input pattern.

(Keysight M8040A) assesses JTF, exhibiting a bandwidth of


∼12 MHz [Fig. 24(a)]. Here, we input the modulated jitter
for the JTOL assessment before the frequency acquisition.
Fig. 24(b) shows the measured JTOL curve, which achieves
0.41- and 0.4-UIPP out-of-band tolerance for MSB and LSB
data with 27 − 1 PRBS PAM-4 input, respectively.

B. Robustness Verification
Fig. 22. Comparison of the steady VCONT in (a) Band 1 and (b) Band 2 and
the corresponding histogram in (c) Band 1 and (d) Band 2. To verify the long-term robustness, such as the slow fre-
quency drift, we measure the spectrum and spectrogram at
14 GHz using the N9040B signal analyzer. From Fig. 25,
is 486.8 fsrms from 100 Hz to 1 GHz, including the data- we carried out the measurement continuously for more than
dependent spurs. 8 min, selecting the maximum-hold mode for the spectrum
We also measure the jitter specifications under PAM, i.e., collection. During the total test time, the spectrum peak only
the jitter transfer function (JTF) and the jitter tolerance appears at 14 GHz as desired, indicating that there is no
(JTOL) [37]. A 0.2-UIpp jitter injection from the BERT frequency drift.

Authorized licensed use limited to: Inha University. Downloaded on February 07,2023 at 11:42:23 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.: 0.0285-mm2 0.68-pJ/bit SINGLE-LOOP FULL-RATE BANG-BANG CDR 557

on VCONT under the PAM. [Fig. 28(a)] (phenomenon detailed


later). Fig. 28(c)–(f) shows the dynamic processes and spectra
with 100-MHz f jitter , which is out of the loop bandwidth and
thus will be attenuated by the loop. As a result, the spurs in
the spectra and ripples on VCONT are smaller than those in the
lower f jitter cases (see Figs. 26 and 27).
Ajitter in Fig. 28(e) and (f) slightly exceeds JTOL at
100 MHz. Thus, the dynamics show a two-phase process.
In Phase I, if there is no jitter injection, the loop should lock
Fig. 24. Measured (a) JTF and (b) JTOL with a 28-Gb/s PAM-4 input. at t1 , Yet, the 0.42-UIPP jitter injection unlocks the loop. At t2 ,
when VCONT exceeds VREF+ , the loop returns to the NSP to
start another frequency-acquisition process in Phase II and
finally locks at t3 . The spectrum [Fig. 28(f)] exhibits a terrible
phase noise and will go out of lock. The above phenomena
imply that the loop operates at the critical point of normal and
abnormal.
For a quantitative analysis, we derive the ripple on VCONT
due to the sinusoidal input jitter. Assume that the input clock
only includes the desired carrier signal and the sinusoidal jitter.
The excess phase of the input clock becomes
 
S Jφ (t) = A P sin 2π fjitter t + ϕ0 (1)
where A P denotes the amplitude of the sinusoidal jitter in rad
and is equal to π Ajitter and ϕ0 is the initial phase. Without
loss of generality, we set ϕ0 to 0 in the following derivation.
Fig. 25. Spectrum and spectrogram of the clock locking at 14 GHz. The excess phase (SJ ) will lead to a frequency fluctuation
( f FLUC ) since the frequency is the derivative of the phase
d S Jφ (t)  
To further assess the impact of the jittery input data on f FLUC = /2π = π f jitter Ajitter cos 2π f jitter t . (2)
dt
the frequency acquisition, we supplement the measurements
Provided a constant A P , (2) indicates that the higher f jitter
(Figs. 26–30), of which Figs. 26 and 27 show the jitter
injected, the larger f FLUC will be present. Considering the gain
injection method along with the loop behavior at low f jitter
(K VCO ) of the VCO, the fluctuation on VCONT is
with a small Ajitter . Here, f jitter and Ajitter denote the jitter  
frequency in Hz and peak-to-peak jitter amplitude in UIPP , fFLUC π f jitter Ajitter cos 2π f jitter t
VCONT,FLUC = = . (3)
respectively. K VCO K VCO
Utilizing the on-chip PRBS 27 − 1 generator, we determine Therefore, the amplitude of the VCONT fluctuation is
the data rate and jitter of the PAM-4 input data with its
π f jitter Ajitter
input clock, supplied by the off-chip Keysight M8040A. From AVC,FLUC = . (4)
Fig. 26(a), we modulate a 100-kHz, 0.2-UIPP sinusoidal jitter K VCO
on the half-baud-rate input clock with a 5.17-ps standard The measurements (Figs. 29 and 30) verify the above
deviation. The same jitter will also appear on the PAM-4 derivation. With a ∼0.6-GHz/V K VCO , the calculated AVC,FLUC
input. Fig. 26(b) and (c) shows the corresponding dynamic is 52.4 mV based on (4). From Fig. 29(c), we measure the
process and spectrum of the recovered clock, respectively. zoomed-in VCONT with the active probe of a Keysight real-time
Since the BBCDR well tracks the in-band noise, the spectrum oscilloscope (DSAV334A), and it shows a 59.2-mV AVC,FLUC ,
[Fig. 26(c)] shows spurs spaced by 100 kHz around the center which includes the impact of the sinusoidal jitter, random jitter,
frequency at 13.8 GHz. Fig. 27 shows the case under a and so on.
1-MHz f jitter with a 0.4-UIPP Ajitter . Here, the spurs [Fig. 27(c)] In Fig. 30, we enlarge f jitter by ten times when comparing
are spaced by 1 MHz instead. In Fig 27(a), the standard with Fig. 29, while Ajitter decreases from 1000 to 100 UIPP .
deviation is double when comparing it with the case in Fig. 26. Based on (4), AVC,FLUC remains unchanged. The measure-
In addition, the steady VCONT [Fig. 27(b)] exhibits a larger ments [Fig. 30(c)] show a 59.5-mV AVC,FLUC , which is almost
ripple due to the increased jitter amplitude. the same as in Fig. 29 as desired.
Fig. 28 shows the measurements under higher f jitter with Table I benchmarks our work with prior arts [5], [17], [19],
a considerable Ajitter . With a 5-MHz, 0.5-UIPP jitter injec- [22], [23]. Regarding the power, Huang et al. [17] used
tion, Fig. 28(a) shows the corresponding dynamic process. current-mode-logic circuits to tune the delay on the high-
Recalling the JTF in Fig. 24(a), the bandwidth of the pro- speed data path and suffers from 1.8-V VDD in a 180-nm
posed BBCDR is ∼12 MHz under a 0.2-UIPP jitter injection. CMOS. Park et al. [19] needed a 4× oversampling ratio, thus
Therefore, the 5-MHz jitter would be well-tracked by the loop. consuming 18 mW in both the high-speed clock path and bang-
We can observe a sinusoidal ripple with the frequency of f jitter bang phase/FD at 10 Gb/s. In [23], the synthesized digital

Authorized licensed use limited to: Inha University. Downloaded on February 07,2023 at 11:42:23 UTC from IEEE Xplore. Restrictions apply.
558 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 57, NO. 2, FEBRUARY 2022

Fig. 26. (a) Measured jitter histogram of the data-homologous input clock, (b) dynamic process, and (c) recovered clock under 0.2-UIPP , 100-kHz sinusoidal
jitter injection at 27.6 Gb/s in Band 6.

Fig. 27. (a) Measured jitter histogram of the data-homologous input clock, (b) dynamic process, and (c) recovered clock under 0.4-UIPP , 1-MHz sinusoidal
jitter injection at 27.6 Gb/s in Band 6.

Fig. 28. Measured dynamic processes and spectra of the recovered clock at 27.6 Gb/s in Band 6. (a) and (b) f jitter = 5 MHz and Ajitter = 0.5 UIPP . (c) and
(d) f jitter = 100 MHz and Ajitter = 0.25 UIPP . (e) and (f) f jitter = 100 MHz and Ajitter = 0.42 UIPP .

core and VCO consume 34.08 and 14.47 mW in a 28-nm signal [32], [33] reduces the number of high-speed
CMOS at 32 Gb/s, respectively, whereas in [5], only the FD DFFs by ∼2/3. Furthermore, the multiplexer-based LSB
logic and delay line consume 11 and 6.6 mW, respectively. decoder [32], [33] only draws tens of nano-watt of power.
Shu et al. [22] employed three loops in the overall CDR: Although we add an HCC to facilitate the frequency and phase
1) the multiplying delay-locked loop with a digitally controlled acquisition, it only handles low-speed signals at several MHz
delay line; 2) the frequency-locking loop; and 3) the phase- and consumes 1.9 mW in 0.0001 mm2 . The inductor in the
locked loop, which consumes a total power of 15.3 mW. LC-VCO and the second-order LF occupy most of the core
In the proposed design, the VCO core and divider-by-two area. As a result, this work shows better energy (>3.3×)
[35] draw 4.2 and 0.8 mW at 28 Gb/s, respectively. The and area (>1.6×) efficiencies for recovering the clock from
balanced-threshold phase detection scheme for the PAM-4 PAM-4 data with a 3× oversampling ratio.

Authorized licensed use limited to: Inha University. Downloaded on February 07,2023 at 11:42:23 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.: 0.0285-mm2 0.68-pJ/bit SINGLE-LOOP FULL-RATE BANG-BANG CDR 559

Fig. 29. Measured (a) spectrum, (b) dynamic process, and (c) zoomed-in VCONT after locking under 1000-UIPP , 10-kHz sinusoidal jitter injection at 27.6 Gb/s
in Band 6.

Fig. 30. Measured (a) spectrum, (b) dynamic process, and (c) zoomed-in VCONT after locking under 100-UIPP , 100-kHz sinusoidal jitter injection at 27.6 Gb/s
in Band 6.
TABLE I
P ERFORMANCE S UMMARY AND C OMPARISON W ITH S TATE OF THE A RT

In view of the acquisition speed, Huang et al. [17] used a jump of 5.15 Gb/s, which is at least 8.1× faster than
5-bit VCO to search the overall frequency range by jump- recent art.
ing the 31 SPs. It cannot set a too large SP; otherwise,
it loses the locking. Thus, the single-sided polarity is weaker V. C ONCLUSION
than that in the proposed design (Figs. 9–12), leading to a This article presented a full-rate PAM-4 BBCDR without
longer acquisition time. While the frequency detection logic the reference and the separate FD. Combining the deliber-
in [22] and [23] requires an accumulation for the Fast/Slow ately inserted SPs and HCC, the proposed BBCDR realized
signals, they take a longer acquisition time. In this work, fast and robust frequency acquisition in a true single loop.
the strong single-sided polarity and a 3-bit VCO contribute Designed in 28-nm CMOS, the proposed prototype mea-
to a fast acquisition speed of 8.2 Gb/s/μs for a data-rate sured a 23-to-29-Gb/s data rate with an acquisition speed

Authorized licensed use limited to: Inha University. Downloaded on February 07,2023 at 11:42:23 UTC from IEEE Xplore. Restrictions apply.
560 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 57, NO. 2, FEBRUARY 2022

of 8.2 Gb/s/μs. Occupying a core area of 0.0285 mm2 , [19] K. Park, W. Bae, J. Lee, J. Hwang, and D.-K. Jeong,
it scored a 0.68-pJ/bit energy efficiency at 28 Gb/s and “A 6.7–11.2 Gb/s, 2.25 pJ/bit, single-loop referenceless CDR with
multi-phase, oversampling PFD in 65-nm CMOS,” IEEE J. Solid-State
0.4-UIPP JTOL at 200-MHz jitter frequency, favorably com- Circuits, vol. 53, no. 10, pp. 2982–2993, Oct. 2018.
paring with the recent CDRs that support only the NRZ data [20] K. Park et al., “A 4–20-Gb/s 1.87-pJ/b continuous-rate digital CDR
stream. circuit with unlimited frequency acquisition capability in 65-nm CMOS,”
IEEE J. Solid-State Circuits, vol. 56, no. 5, pp. 1597–1607, May 2021.
[21] M. H. Perrott et al., “A 2.5-Gb/s multi-rate 0.25-μm CMOS clock and
R EFERENCES data recovery circuit utilizing a hybrid analog/digital loop filter and all-
digital referenceless frequency acquisition,” IEEE J. Solid-State Circuits,
[1] S. B. Anand and B. Razavi, “A 2.75 Gb/s CMOS clock recovery vol. 41, no. 12, pp. 2930–2944, Dec. 2006.
circuit with broad capture range,” in IEEE Int. Solid-State Circuits Conf. [22] G. Shu et al., “A 4-to-10.5 Gb/s continuous-rate digital clock and data
(ISSCC) Dig. Tech. Papers, Feb. 2001, pp. 214–215. recovery with automatic frequency acquisition,” IEEE J. Solid-State
[2] J. Kenney et al., “A 6.5 Mb/s to 11.3 Gb/s continuous-rate clock and Circuits, vol. 51, no. 2, pp. 428–439, Feb. 2016.
data recovery,” in Proc. IEEE Custom Integr. Circuits Conf. (CICC), [23] W. Rahman et al., “A 22.5-to-32-Gb/s 3.2-pJ/b referenceless baud-rate
San Jose, CA, USA, Sep. 2014, pp. 1–4. digital CDR with DFE and CTLE in 28-nm CMOS,” IEEE J. Solid-State
[3] J. Lee and K.-C. Wu, “A 20-Gb/s full-rate linear clock and data Circuits, vol. 52, no. 12, pp. 3517–3531, Dec. 2017.
recovery circuit with automatic frequency acquisition,” IEEE [24] S.-K. Lee, Y.-S. Kim, H. Ha, Y. Seo, H.-J. Park, and J.-Y. Sim,
J. Solid-State Circuits, vol. 44, no. 12, pp. 3590–3602, “A 650 Mb/s-to-8 Gb/s referenceless CDR circuit with automatic acqui-
Dec. 2009. sition of data rate,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig.
[4] M. S. Jalali, R. Shivnaraine, A. Sheikholeslami, M. Kibune, and Tech. Papers, San Francisco, CA, USA, Feb. 2009, pp. 184–185.
H. Tamura, “An 8 mW frequency detector for 10 Gb/s half-rate CDR [25] F.-T. Chen et al., “A 10-Gb/s low jitter single-loop clock and data
using clock phase selection,” in Proc. IEEE Custom Integr. Circuits recovery circuit with rotational phase frequency detector,” IEEE Trans.
Conf., San Jose, CA, USA, Sep. 2013, pp. 1–4. Circuits Syst. I, Reg. Papers, vol. 61, no. 11, pp. 3278–3287, Nov. 2014.
[5] M. S. Jalali, A. Sheikholeslami, M. Kibune, and H. Tamura, [26] K. Park, M. Shim, H.-G. Ko, and D.-K. Jeong, “A 6.4-to-32 Gb/s
“A reference-less single-loop half-rate binary CDR,” IEEE J. Solid-State 0.96 pJ/b referenceless CDR employing ML-inspired stochastic phase-
Circuits, vol. 50, no. 9, pp. 2037–2047, Sep. 2015. frequency detection technique in 40 nm CMOS,” in IEEE Int. Solid-State
[6] D. Dalton et al., “A 12.5-Mb/s to 2.7-Gb/s continuous-rate CDR Circuits Conf. (ISSCC) Dig. Tech. Papers, San Francisco, CA, USA,
with automatic frequency acquisition and data-rate readback,” IEEE J. Feb. 2020, pp. 124–126.
Solid-State Circuits, vol. 40, no. 12, pp. 2713–2725, Dec. 2005. [27] R.-J. Yang, K.-H. Chao, S.-C. Hwu, C.-K. Liang, and S.-I. Liu,
[7] N. Kocaman, S. Fallahi, M. Kargar, M. Khanpour, and “A 155.52 Mbps–3.125 Gbps continuous-rate clock sand data recovery
A. Momtaz, “An 8.5–11.5-Gbps SONET transceiver with referenceless circuit,” IEEE J. Solid-State Circuits, vol. 41, no. 6, pp. 1380–1390,
frequency acquisition,” IEEE J. Solid-State Circuits, vol. 48, no. 8, Jun. 2006.
pp. 1875–1884, Aug. 2013. [28] S. H. Lin and S. I. Liu, “Full-rate bang-bang phase/frequency detectors
[8] J. Jin, X. Jin, J. Jung, K. Kwon, J. Kim, and J.-H. Chun, for unilateral continuous-rate CDRs,” IEEE Trans. Circuits Syst. II, Exp.
“A 0.75–3.0-Gb/s dual-mode temperature-tolerant referenceless CDR Briefs, vol. 55, no. 12, pp. 1214–1218, Dec. 2008.
with a deadzone-compensated frequency detector,” IEEE J. Solid-State [29] R. Shivnaraine, M. S. Jalali, A. Sheikholeslami, M. Kibune, and
Circuits, vol. 53, no. 10, pp. 2994–3003, Oct. 2018. H. Tamura, “An 8–11 Gb/s reference-less bang-bang CDR enabled by
[9] T. Masuda et al., “A 12 Gb/s 0.9 mW/Gb/s wide-bandwidth injection- ‘phase reset,”’ IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 61, no. 7,
type CDR in 28 nm CMOS with reference-free frequency cap- pp. 2129–2138, Jul. 2014.
ture,” IEEE J. Solid-State Circuits, vol. 51, no. 12, pp. 3204–3215, [30] X. Zhao, Y. Chen, P.-I. Mak, and R. P. Martins, “A 0.0285 mm2
Dec. 2016. 0.68 pJ/bit single-loop full-rate bang-bang CDR without reference and
[10] C.-L. Hsieh and S.-I. Liu, “A 1–16-Gb/s wide-range clock/ddata recovery separate frequency detector achieving an 8.2 (Gb/s)/μs acquisition speed
circuit with a bidirectional frequency detector,” IEEE Trans. Circuits of PAM-4 data in 28 nm CMOS,” in Proc. IEEE Custom Integr. Circuits
Syst. II, Exp. Briefs, vol. 58, no. 8, pp. 487–491, Aug. 2011. Conf. (CICC), Boston, MA, USA, Mar. 2020, pp. 1–4.
[11] R. Inti, W. Yin, A. Elshazly, N. Sasidhar, and P. K. Hanumolu, [31] D. Schinkel, E. Mensink, E. Klumperink, E. Van Tuijl, and B. Nauta,
“A 0.5-to-2.5 Gb/s reference-less half-rate digital CDR with unlim- “A double-tail latch-type voltage sense amplifier with 18 ps setup+hold
ited frequency acquisition range and improved input duty-cycle time,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers,
error tolerance,” IEEE J. Solid-State Circuits, vol. 46, no. 12, San Francisco, CA, USA, Feb. 2007, pp. 314–315.
pp. 3150–3162, Dec. 2011. [32] X. Zhao, Y. Chen, P.-I. Mak, and R. P. Martins, “A 0.14-to-0.29-
[12] Y. Tsunoda et al., “A 24-to-35Gb/s ×4 VCSEL driver IC with multi-rate pJ/bit 14-GBaud/s trimodal (NRZ/PAM-4/PAM-8) half-rate bang-bang
referenceless CDR in 0.13 μm SiGe BiCMOS,” in IEEE Int. Solid-State clock and data recovery circuit (BBCDR) in 28-nm CMOS,” in Proc.
Circuits Conf. (ISSCC) Dig. Tech. Papers, San Francisco, CA, USA, IEEE Asia Pacific Conf. Circuits Syst. (APCCAS), Bangkok, Thailand,
Feb. 2015, pp. 414–415. Nov. 2019, pp. 229–232.
[13] J.-H. Yoon, S.-W. Kwon, and H.-M. Bae, “A DC-to-12.5 Gb/s [33] X. Zhao et al., “A 0.14-to-0.29-pJ/bit 14-GBaud/s trimodal (NRZ/PAM-
9.76 mW/Gb/s all-rate CDR with a single LC VCO in 90 nm CMOS,” 4/PAM-8) half-rate bang-bang clock and data recovery circuit (BBCDR)
IEEE J. Solid-State Circuits, vol. 52, no. 3, pp. 856–866, Mar. 2017. in 28-nm CMOS,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 68,
[14] S. Choi et al., “A 0.65-to-10.5 Gb/s reference-less CDR with asyn- no. 1, pp. 89–102, Jan. 2021.
chronous baud-rate sampling for frequency acquisition and adaptive [34] M.-S. Hwang et al., “Reduction of pump current mismatch in charge-
equalization,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 63, no. 2, pump PLL,” IET Electron. Lett., vol. 45, no. 3, pp. 135–136, Jan. 2009.
pp. 276–287, Feb. 2016. [35] Y. Chen et al., “A 6.5×7 μm2 0.98-to-1.5 mW nonself-oscillation-mode
[15] Y.-L. Lee, S.-J. Chang, Y.-C. Chen, and Y.-P. Cheng, “An unbounded frequency divider-by-2 achieving a single-band untuned locking range
frequency detection mechanism for continuous-rate CDR circuits,” of 166.6% (4–44 GHz),” IEEE Solid-State Circuits Lett., vol. 2, no. 5,
IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 64, no. 5, pp. 37–40, May 2019.
pp. 500–504, May 2017. [36] S. Li, I. Kipnis, and M. Ismail, “A 10-GHz CMOS quadrature
[16] K.-S. Son, T.-J. An, Y.-H. Moon, and J.-K. Kang, LC-VCO for multirate optical applications,” IEEE J. Solid-State Circuits,
“A 0.42–3.45 Gb/s referenceless clock and data recovery circuit vol. 38, no. 10, pp. 1626–1634, Oct. 2003.
with counter-based unrestricted frequency acquisition,” IEEE Trans. [37] X. Ge, Y. Chen, X. Zhao, P.-I. Mak, and R. P. Martins, “Analysis and
Circuits Syst. II, Exp. Briefs, vol. 67, no. 6, pp. 974–978, Jun. 2020. verification of jitter in bang-bang clock and data recovery circuit with
[17] S. Huang, J. Cao, and M. M. Green, “An 8.2 Gb/s-to-10.3 Gb/s full- a second-order loop filter,” IEEE Trans. Very Large Scale Integr. (VLSI)
rate linear referenceless CDR without frequency detector in 0.18 μm Syst., vol. 27, no. 10, pp. 2223–2236, Oct. 2019.
CMOS,” IEEE J. Solid-State Circuits, vol. 50, no. 9, pp. 2048–2060, [38] S. Lee et al., “250 Mbps–5 Gbps wide-range CDR with digital Vernier
Sep. 2015. phase shifting and dual-mode control in 0.13 μm CMOS,” IEEE
[18] K. Park, W. Bae, and D.-K. Jeong, “A 27.1 mW, 7.5-to-11.1 Gb/s J. Solid- State Circuits, vol. 46, no. 11, pp. 2560–2570, Nov. 2011.
single-loop referenceless CDR with direct up/dn control,” in Proc. IEEE [39] S. Park et al., “A 0.1–1.5-GHz wide harmonic-locking-free delay-locked
Custom Integr. Circuits Conf. (CICC), Austin, TX, USA, Apr. 2017, loop using an exponential DAC,” IEEE Microw. Wireless Compon. Lett.,
pp. 1–4. vol. 29, no. 8, pp. 548–550, Aug. 2019.

Authorized licensed use limited to: Inha University. Downloaded on February 07,2023 at 11:42:23 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.: 0.0285-mm2 0.68-pJ/bit SINGLE-LOOP FULL-RATE BANG-BANG CDR 561

Xiaoteng Zhao (Student Member, IEEE) received IEEE T RANSACTIONS ON C IRCUITS AND S YSTEMS —I: R EGULAR PAPERS
the B.Eng. degree in integrated circuit design and from 2010 to 2011 and from 2014 to 2015, and IEEE T RANSACTIONS
integration system from Xidian University, Xi’an, ON C IRCUITS AND S YSTEMS —II: E XPRESS B RIEFS from 2010 to 2013.
Shaanxi, China, in 2014, and the M.Eng. degree in He is/was the TPC Vice Co-Chair of ASP-DAC in 2016, a TPC Member
integrated circuit engineering from the University of Asian Solid-State Circuits Conference (A-SSCC) from 2013 to 2016 and
of Chinese Academy of Sciences, Beijing, China, in 2019, European Solid-State Circuits Conference (ESSCIRC) from 2016 to
in 2017. He is currently pursuing the Ph.D. degree 2017, and International Solid-State Circuits Conference (ISSCC) from 2017 to
with the State-Key Laboratory of Analog and Mixed- 2019). He is/was a Distinguished Lecturer of the IEEE Circuits and Systems
Signal VLSI (AMSV), University of Macau, Macau, Society from 14 to 15 and the IEEE Solid-State Circuits Society from 2017
China. to 2018. He was the Chairman of the Distinguished Lecturer Program of
His research interests are focused on the high- IEEE Circuits and Systems Society from 2018 to 2019. He (co)-received the
speed wireline backplane transceiver design. DAC/ISSCC Student Paper Award in 2005, the CASS Outstanding Young
Author Award in 2010, the National Scientific and Technological Progress
Yong Chen (Senior Member, IEEE) received Award in 2011, the Best Associate Editor of IEEE T RANSACTIONS ON
the B.Eng. degree in electronic and information C IRCUITS AND S YSTEMS —II: E XPRESS B RIEFS from 2012 to 2013, the A-
engineering, Communication University of China SSCC Distinguished Design Award in 2015, and the ISSCC Silkroad Award in
(CUC), Beijing, China, in 2005, and the Ph.D. 2016. In 2005, he was decorated with the Honorary Title of Value for scientific
degree in microelectronics and solid-state electron- merits by the Macau Government. He has been inducted as an Overseas Expert
ics from the Institute of Microelectronics, Chinese of the Chinese Academy of Sciences since 2018.
Academy of Sciences (IMECAS), Beijing, in 2010.
From 2010 to 2013, he worked as a Post-
Doctoral Researcher at the Institute of Microelec-
tronics, Tsinghua University, Beijing. From 2013 to
2016, he was a Research Fellow at VIRTUS/EEE, Rui P. Martins (Fellow, IEEE) was born in
Nanyang Technological University, Singapore. He has been an Assistant April 1957. He received the bachelor’s, master’s,
Professor with the State Key Laboratory of Analog and Mixed-Signal VLSI Ph.D., and the Habilitation for Full-Professor
(AMSV), University of Macau, Macao, China, since March 2016. His degrees in electrical engineering and computer
research interests include integrated circuit designs involving analog/mixed- science from the Department of Electrical and Com-
signal/RF/millimeter-wave/sub-terahertz/wireline. puter Engineering (DECE), Instituto Superior Téc-
Dr. Chen was a recipient of the “Haixi” (three places across the Straits) nico (IST), University of Lisbon, Lisbon, Portugal,
postgraduate integrated circuit design competition (Second Prize) in 2009. in 1980, 1985, 1992, and 2001, respectively.
He was a co-recipient of the Best Paper Award at the IEEE Asia Pacific Con- He has been with the DECE/IST, University of
ference on Circuits and Systems (APCCAS) in 2019, the Best Student Paper Lisbon, since October 1980. Since October 1992, has
Award (Third Place) at the IEEE Radio Frequency Integrated Circuits (RFIC) been on leave from the University of Lisbon and the
Symposium in 2021, and the Macao Science and Technology Invention DECE, Faculty of Science and Technology (FST), University of Macau (UM),
Award (First Prize) in 2020. His team reported three chip inventions at the Macao, China, where he is a Chair-Professor since August 2013. In FST,
IEEE International Solid-State Circuits Conference (ISSCC; Chip Olympics): he was Dean from 1994 to 1997 and has been a Vice-Rector of UM since
mm-wave PLL 2019, VCO 2019, and radio frequency VCO 2021. He has September 1997. From September 2008 to August 2018, he was a Vice-Rector
been serving as an Associate Editor for IEEE T RANSACTIONS ON V ERY of Research, and from September 2018 to August 2023, he was a Vice-Rector
L ARGE S CALE I NTEGRATION (TVLSI) S YSTEMS since 2019, IEEE A CCESS of Global Affairs. He created in 2003 the Analog and Mixed-Signal VLSI
since 2019, and IET Electronics Letters (EL) since 2020, an Editor of Research Laboratory of UM, elevated in January 2011 to the State Key
International Journal of Circuit Theory and Applications (IJCTA) since Laboratory (SKLAB) of China (the first in Engineering in Macao), being its
2020, and a Guest Editor of IEEE T RANSACTIONS ON C IRCUITS AND Founding Director. He was the Founding Chair of UMTEC (UM company)
S YSTEMS —II: E XPRESS B RIEFS since 2021. He serves as the Vice-Chair from January 2009 to March 2019, supporting the incubation and creation
from 2019 and 2021 and the Chair for the term 2021–2023 of IEEE in 2018 of Digifluidic, the first UM spin-off, whose CEO is a SKLAB Ph.D.
Macau CAS Chapter. He was the Tutorial Chair of ICCS in 2020, a Con- graduate. He was a Co-Founder of Chipidea Microelectronics [later Synopsys-
ference Local Organization Committee Member of Asian Solid-State Cir- Macao], Macao, from 2001 to 2002. Within the scope of his teaching and
cuits Conference (A-SSCC) in 2019, a member of the IEEE Circuits and research activities, he has taught 21 bachelor and master courses and, in UM,
Systems Society, Circuits and Systems for Communications (CASCOM) has supervised (or cosupervised) 47 theses, Ph.D. (26) and masters (21).
Technical Committee for the term 2020–2021, a member of the Technical He has authored or coauthored eight books and 12 book chapters; 48 patents,
Program Committee (TPC) of A-SSCC in 2021, APCCAS for the term USA (38), Taiwan (3), an China (7); 628 papers, in scientific journals (263)
2019–2021, ICTA for the term 2020–2021, NorCAS for the term 2020–2021, and conference proceedings (365); and other 69 academic works, in a total
ICECS in 2021, and ICSICT in 2020, a Review Committee Member of ISCAS of 765 publications.
in 2021, and the TPC Co-Chair of ICCS in 2021. Dr. Martins was a member of the IEEE CASS Fellow Evaluation Com-
mittee (2013, 2014, 2018—Chair and 2019, 2021, and 2022—Vice-Chair),
Pui-In Mak (Fellow, IEEE) received the Ph.D. the IEEE Nominating Committee of Division I Director (CASS/EDS/SSCS)
degree from the University of Macau (UM), Macao, (2014), and the IEEE CASS Nominations Committee from 2016 to 2017.
China, in 2006. He received two Macao Government decorations: the Medal of Professional
He is currently a Full Professor at the Depart- Merit (Portuguese 1999) and the Honorary Title of Value (Chinese 2001).
ment of Electrical and Computer Engineering, In July 2010, he was elected, unanimously, as a Corresponding Member of
Faculty of Science and Technology, and the Asso- the Lisbon Academy of Sciences, being the only Portuguese Academician
ciate Director (Research) at the UM Institute working and living in Asia. He was the Founding Chair of the IEEE Macau
of Microelectronics and State Key Laboratory of Section from 2003 to 2005 and the IEEE Macau Joint-Chapter on Circuits
Analog and Mixed-Signal VLSI, UM. His research and Systems (CAS)/Communications (COM) from 2005 to 2008) [2009 World
interests are on analog and radio frequency (RF) cir- Chapter of the Year of IEEE CAS Society (CASS)], the General Chair of the
cuits and systems for wireless and multidisciplinary IEEE Asia-Pacific Conference on CAS (APCCAS 2008), the Vice-President
innovations. (VP) of Region 10 (Asia, Australia, and Pacific) from 2009 to 2011, and
Dr. Mak has been a fellow of U.K. Institution of Engineering and Technol- VP-World Regional Activities and Membership of IEEE CASS from 2012 to
ogy (IET) for contributions to engineering research, education, and services 2013. He was an Associate Editor of IEEE T RANSACTIONS ON C IRCUITS
since 2018, the IEEE for contributions to radio frequency and analog circuits AND S YSTEMS —II: E XPRESS B RIEFS from 2010 to 2013, nominated Best
since 2019, and U.K. Royal Society of Chemistry since 2020. His involve- Associate Editor for the term 2012–2013. He was the General Chair of the
ments with IEEE are as follows: an Editorial Board Member of IEEE Press ACM/IEEE Asia South Pacific Design Automation Conference (ASP-DAC
from 2014 to 2016, a member of Board of Governors of the IEEE Circuits 2016), receiving the IEEE Council on Electronic Design Automation (CEDA)
and Systems Society from 2009 to 2011, a Senior Editor of IEEE J OURNAL Outstanding Service Award in 2016, and the General Chair of the IEEE Asian
ON E MERGING AND S ELECTED T OPICS IN C IRCUITS AND S YSTEMS from Solid-State Circuits Conference (A-SSCC 2019). He was the Vice-President
2014 to 2015, and an Associate Editor of IEEE J OURNAL OF S OLID -S TATE from 2005 to 2014 and the President from 2014 to 2017 of the Association
C IRCUITS since 2018, IEEE S OLID -S TATE C IRCUITS L ETTERS since 2017, of Portuguese Speaking Universities (AULP).

Authorized licensed use limited to: Inha University. Downloaded on February 07,2023 at 11:42:23 UTC from IEEE Xplore. Restrictions apply.

You might also like