Professional Documents
Culture Documents
I. I NTRODUCTION
Authorized licensed use limited to: University of Science & Technology of China. Downloaded on May 30,2022 at 01:26:27 UTC from IEEE Xplore. Restrictions apply.
1200 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 57, NO. 4, APRIL 2022
Authorized licensed use limited to: University of Science & Technology of China. Downloaded on May 30,2022 at 01:26:27 UTC from IEEE Xplore. Restrictions apply.
POON et al.: 1.24-pJ/b 112-Gb/s (870 Gb/s/mm) TRANSCEIVER FOR IN-PACKAGE LINKS IN 7-nm FinFET 1201
Authorized licensed use limited to: University of Science & Technology of China. Downloaded on May 30,2022 at 01:26:27 UTC from IEEE Xplore. Restrictions apply.
1202 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 57, NO. 4, APRIL 2022
parallel clock), as shown in Fig. 9(a). The error samples are tuning is done, the QLL loop is enabled to lock the frequency
shown in Fig. 9(b). of ILO by quadrature error cancellation [11].
Both ILOs are identical to minimize phase error caused by
D. Quadrature Clock Generation mismatch. The LDO is placed in the middle of two ILOs
to lower any IR introduced frequency mismatch. In addition,
As shown in Fig. 4, quadrature clocks are generated per lane ILO1 and PI are placed close to each other to save power on
using the ILO-PI-ILO scheme. To achieve good PI linearity, eight-phase clocks delivery.
eight clock phases are generated from a single-phase clock
using a ring-based ILO (ILO1). The PI drives another identical IV. T RANSMITTER
ILO (ILO2) to generate four sampling phases needed in the
Fig. 10 shows the TX architecture with 4:1 MUX as the final
slicers.
serialization stage, the voltage-mode driver employing delay-
The ILO-PI-ILO structure shares the same supply and is
based sub-UI two-tap FFE that supports up to 0.7-V diff-pp
tuned using a single quadrature error correction loop (QLL).
swing.
The quadrature error is sensed at the ILO2 output to tune
A voltage-mode driver is chosen for its power benefit. The
the supply voltage accordingly (typically in the range of
TX impedance calibration is done digitally to ensure good
0.4–0.7 V) to ensure that injection locking is optimally close
RL based on an externally calibrated resistor. A delay-based
to the natural oscillation frequency of ILOs. Since sensing is
sub-UI FFE provides up to 2 dB of pre-cursor equalization.
done at ILO2 output (slicer input), <2◦ phase error is achieved
Post-cursor ISI is equalized by RX CTLE.
at RX slicer at 14 GHz. This comes in an expense of phase
A 4:1 MUX as final muxing stage eliminates the need
error at ILO1 output, < 5◦ even though they are identical due
for high-speed 2-UI clock (28 GHz), enabling the TX clock
to the Monte Carlo variation. The ILO1 outputs drive RX PI
distribution to operate at a low regulated voltage of 0.7 V for
inputs and are the source for the TX IQ clocks, and the phase
power saving and supply noise rejection. Since the 4:1 MUX
error from ILO1 will degrade PI linearity and TX output jitter
uses all edges of the quadrature clocks, any duty cycle (DCD)
but at acceptable levels. A digital timing calibration loop is
or quadrature (I/Q) error directly translates into deterministic
described in the TX section to minimize jitter due to I/Q phase
jitter. To correct these phase errors, an area-efficient digital
error. Integrated non-linearity (INL) of PI is less than 3.5 LSB,
calibration loop is proposed.
including mismatch that is acceptable for source synchronous
links.
To accommodate the wideband operation of 4–16 GHz A. TX Front End
while avoiding large Kvco and false sub-harmonic locking, the TX front end comprises of pre-driver followed by the
ILOs are segmented into three operation ranges (4–6, 6–10, voltage-mode driver. To enable large fan-out for power
and 10–16 GHz). This also ensures that the adjusted supply reduction, sub-UI de-emphasis is adopted at pre-driver [6].
of the structure (regulated from 0.88 V) is within 0.4–0.7 V A four-stage CMOS pre-driver with average fan-out of 2 is
in each band to avoid the headroom issue of the LDO. used to drive the voltage-mode driver while keeping the
The ILO-PI-ILO structure provides two major benefits: data-dependent jitter below 1 ps (56-m UI) at 56-Gb/s NRZ
small area and clock jitter filtering. An ILO has a low-pass to reduce power.
noise response, which filters the HF jitter from its input clock Fig. 11 shows the voltage-mode driver with programmable
(e.g., DCD and RJ from PI). two-tap FFE capable of 2-dB pre-cursor equalization. A three-
During initial coarse frequency tracking, the off-chip control stage inverter chain is used to provide the open-loop delay
logic searches for the correct band setting of ILOs and sweeps (12–17 ps across PVT) in the two-tap sub-UI FFE, eliminating
the on-chip voltage DAC by comparing the injection frequency the need of flip-flops and clock buffer typically associated with
with the free-running frequency of the ILOs. Once coarse tap generation. The drawback of this FFE implementation is
Authorized licensed use limited to: University of Science & Technology of China. Downloaded on May 30,2022 at 01:26:27 UTC from IEEE Xplore. Restrictions apply.
POON et al.: 1.24-pJ/b 112-Gb/s (870 Gb/s/mm) TRANSCEIVER FOR IN-PACKAGE LINKS IN 7-nm FinFET 1203
Fig. 12. Voltage-mode driver slice: (a) with programmability and (b) without
programmability.
that delay does not scale with data rate. However, at a lower
data rate, the need for equalization diminished, justifying the
tradeoff for its area and power benefit.
The two variants of driver slice are shown in Fig. 12: one
includes gating logic to enable or disable the slice, as in
Fig. 12(a), while the other is without it, as shown in Fig. 12(b). Fig. 14. TX 4:1 MUX diagram.
The number of driver slices is programmable to provide ±20%
tuning range with 5% resolution to cover resistance variation There are three steps in the calibration. First, the dynamic
over PVT. The basic structure of the driver slice remains the comparator offset is calibrated out. By shorting the positive
same with a common Hi-R poly resistor and active device and negative inputs of the comparator, offset will result as
operating in the triode region for pull-up/ down. The ratio uneven number of sampled “1s” and “0s.” Based on the result,
of resistance contributed by the Hi-R poly resistor to an comparator offset is corrected by adjusting the capacitive DAC
active device is approximately 1.5–1. The lower bound of this at the drain node of input devices (Fig. 13). The maximum
ratio is limited by output linearity as active device resistance CDAC step size is 1.8 mV referred to as comparator input,
becomes non-linear with large drain–source voltage drop. A t- which is less than half of the voltage resolution of Vdrv . Once
coil inductor is used to help maintain good RL at the driver the comparator offset is corrected, the driver replica pull-up
that includes the ESD diode. path is connected in series with the trimmed resistor. If the
midpoint voltage, Vdrv , in Fig. 13, is higher than the reference
voltage (0.5∗ Vrefp ), replica pull-up slice count reduces and vice
B. Digital Impedance Calibration
versa. Finally, a similar approach is taken to calibrate replica
Analog impedance calibration loop often occupies a large pull-down resistance against Rtrim by putting them in series.
area due to the large device needed for low offset and capacitor The calibration codes are sent to the TX driver once all three
needed for stability. An all-digital approach is adopted to loops are done.
circumvent this problem. As shown in Fig. 13, a digital The use of replica path allows background calibration for
impedance calibration circuit employs a dynamic comparator voltage and happens slowly, and an all-digital loop allows
to compare the pull-up and pull-down resistance of driver power cycling or duty cycling the calibration loop to further
replica (LSB half cell replica) against a calibrated resistor reduce power consumption.
(Rtrim ). Rtrim is digitally calibrated to the ideal output resis-
tance of LSB driver (50 ∗ 3 = 150 ) to serve as a
target resistance for the replica pull-up and pull-down path. C. Digital Timing Calibration
To minimize the comparator noise requirement, a large sample Final 4:1 MUX utilizes four-phase 4-UI clocks to generate
size (nominally 1024) is taken for averaging. a 1-UI bitstream (Fig. 14). Since only 4-UI clocks are used in
Authorized licensed use limited to: University of Science & Technology of China. Downloaded on May 30,2022 at 01:26:27 UTC from IEEE Xplore. Restrictions apply.
1204 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 57, NO. 4, APRIL 2022
Fig. 16. 4:1 MUX output (Dout ) with training patterns A and B. Perfect I/Q
clock alignment (left). Misalignment between I/Q clocks (right).
Authorized licensed use limited to: University of Science & Technology of China. Downloaded on May 30,2022 at 01:26:27 UTC from IEEE Xplore. Restrictions apply.
POON et al.: 1.24-pJ/b 112-Gb/s (870 Gb/s/mm) TRANSCEIVER FOR IN-PACKAGE LINKS IN 7-nm FinFET 1205
Authorized licensed use limited to: University of Science & Technology of China. Downloaded on May 30,2022 at 01:26:27 UTC from IEEE Xplore. Restrictions apply.
1206 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 57, NO. 4, APRIL 2022
Fig. 21. SBR without (top) and with (bottom) on-package inductor. CHout:
@Receiver bump. capFFin_post_bw: @Slicer input.
Authorized licensed use limited to: University of Science & Technology of China. Downloaded on May 30,2022 at 01:26:27 UTC from IEEE Xplore. Restrictions apply.
POON et al.: 1.24-pJ/b 112-Gb/s (870 Gb/s/mm) TRANSCEIVER FOR IN-PACKAGE LINKS IN 7-nm FinFET 1207
Fig. 27. PRBS23 BER at 112 Gb/s (20 mm). Bathtub with PI code
sweep (top) and continuous-time BER (bottom).
Fig. 26. PRBS23 BER at 106.25 Gb/s (30 mm). Bathtub with PI code
sweep (top) and continuous-time BER (bottom).
Authorized licensed use limited to: University of Science & Technology of China. Downloaded on May 30,2022 at 01:26:27 UTC from IEEE Xplore. Restrictions apply.
1208 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 57, NO. 4, APRIL 2022
TABLE I
C OMPARISON W ITH S TATE OF THE A RT
R EFERENCES
[1] M. Thiara, “Die-to-die interconnects for chip disaggregation,” Semicond.
Eng., Tech. Rep., Nov. 2018.
[2] N. Tracy et al., “112 Gbps electrical interfaces—An OIF update on CEI-
112G,” presented at the OFC, 2020. [Online]. Available: https://www.
oiforum.com/wp-content/uploads/00311c-OIF-112G-OFC-
slides_ofc20_presentation.pdf
[3] J. Im et al., “A 40-to-56 Gb/s PAM-4 receiver with ten-tap direct
Fig. 29. Eye diagram at 112 Gb/s with 20- (top) and 30-mm (bottom) decision-feedback equalization in 16-nm FinFET,” IEEE J. Solid-State
channels. Eye-opening BER <1E−7 . Left: PAM-4 eye. Right: concatenation Circuits, vol. 52, no. 12, pp. 3486–3502, Dec. 2017.
of the three eyes. Y -axis: RX slicing-level DAC code (LSB = 1.5 mV). [4] P. Upadhyaya et al., “A fully adaptive 19–58-Gb/s PAM-4 and
9.5–29-Gb/s NRZ wireline transceiver with configurable ADC in 16-nm
ones. The eye diagram is formed by piecing the PDFs for all FinFET,” IEEE J. Solid-State Circuits, vol. 54, no. 1, pp. 18–28,
Jan. 2019.
PI codes together. [5] J. Im et al., “A 112 Gb/s PAM-4 long-reach wireline transceiver using a
Table I shows the comparison of this work to the state-of- 36-way time-interleaved SAR-ADC and inverter-based RX analog front-
the-art in-package links. Based on the author’s best knowl- end in 7 nm FinFET,” IEEE J. Solid-State Circuits, vol. 56, no. 1, pp7-18,
Jan. 2021.
edge, this work achieved the best shoreline bandwidth density [6] M. Erett et al., “A 126 mW 56 Gb/s NRZ wireline transceiver for syn-
(870 Gb/s/mm) and energy efficiency (1.24 pJ/b). chronous short-reach applications in 16 nm FinFET,” in IEEE Int. Solid-
State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2018, pp. 274–276.
[7] K. Tan et al., “A 112-GB/S PAM4 transmitter in 16 nm FinFET,” in
VIII. C ONCLUSION Proc. IEEE Symp. VLSI Circuits, Jun. 2018, pp. 45–46.
Progress made in heterogenous integration in recent years [8] R. Yousry et al., “A 1.7 pJ/b 112 Gb/s XSR transceiver for intra-package
communication in 7 nm FinFET technology,” in IEEE Int. Solid-State
offers a path to drive down cost per function and improves Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2021, pp. 180–182.
Authorized licensed use limited to: University of Science & Technology of China. Downloaded on May 30,2022 at 01:26:27 UTC from IEEE Xplore. Restrictions apply.
POON et al.: 1.24-pJ/b 112-Gb/s (870 Gb/s/mm) TRANSCEIVER FOR IN-PACKAGE LINKS IN 7-nm FinFET 1209
[9] R. Shivnaraine et al., “A 26.5625-to-106.25 Gb/s XSR SerDes with Yipeng Wang received the B.S. degree in electri-
1.55 pJ/b efficiency in 7 nm CMOS,” in IEEE Int. Solid-State Circuits cal engineering from Xiamen University, Xiamen,
Conf. (ISSCC) Dig. Tech. Papers, Feb. 2021, pp. 181–183. China, in 2010, the M.S. degree in electronics
[10] K. McCollough, S. D. Huss, J. Vandersand, R. Smith, C. Moscone, and and communication engineering from the Univer-
Q. O. Farooq, “A 480 Gb/s/mm 1.7 pJ/b short-reach wireline transceiver sity of California at Santa Barbara, Santa Barbara,
using single-ended NRZ for die-to-die applications,” in IEEE Int. Solid- CA, USA, in 2012, and the Ph.D. degree in elec-
State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2021, pp. 1–3. tronics and communication engineering from The
[11] M. Raj, S. Saeedi, and A. Emami, “A wideband injection locked Hong Kong University of Science and Technology,
quadrature clock generation and distribution technique for an energy- Hong Kong, in 2016.
proportional 16–32 Gb/s optical receiver in 28 nm FDSOI CMOS,” IEEE He is currently a Staff Mixed-Signal Design Engi-
J. Solid-State Circuits, vol. 51, no. 10, pp. 2446–2462, Oct. 2016. neer with Xilinx, Inc., Singapore, where he is
involved in high-speed wireline transceiver design.
Ying Cao received the B.S. degree in physics from Jilin University,
Chi Fung Poon received the B.S. degree in electrical Changchun, China, in 1998, and the M.S. degree in physics from California
engineering from the University of California at State University, Long Beach, CA, USA, in 2000.
Santa Barbara, Santa Barbara, CA, USA, in 2004, In 2009, she joined the SerDes Technology Group, Xilinx, Inc., San Jose,
and the M.S. degree in electrical engineering from CA, USA, where she has been working on high-speed SerDes transceiver
Stanford University, Stanford, CA, USA, in 2006. circuits. Her current interests include high-speed clocking, transceiver front-
He is currently a Senior Staff Design Engineer end circuits, clock and data recovery (CDR), and phase-locked loops (PLLs).
with the SerDes Technology Group, Xilinx, Inc.,
San Jose, CA, USA, where he works on the high-
speed mixed-signal circuit. His current research
interests include data converters, clocking, and ana-
log front end for the serial link. Asma Laraba received the M.Sc. degree in micro-
electronics and nanoelectronics from Joseph Fourier
University, Grenoble, France, in 2010, and the Ph.D.
degree in electrical engineering from the Grenoble
Institute of Technology, Grenoble, in 2013. Her
Ph.D. thesis research was conducted at the TIMA
Wenfeng Zhang received the M.S. degree in elec- Laboratory, Grenoble, and focused on DFT and
trical engineering from Oklahoma State University, BIST of analog-to-digital converters.
Stillwater, OK, USA, in 1994. Since joining Xilinx in 2014, she worked on
From 1994 to 2004, he was a Circuit Designer data converter DFT, RFSoC analog-to-digital con-
with Datapath Systems, Los Gatos, CA, USA, and verter (ADC) design, and various high-speed SerDes
Micron Technology, San Jose, CA, USA. Since projects. Her current research interests include data converters and the analog
2005, he has been with Xilinx, Inc., San Jose, frontend of wireline links.
where he is involved in various projects on DDR Dr. Laraba was a recipient of the 2012 European Test Symposium Best
memory interfaces (DLL) and high-speed IOs. He is Paper Award.
involved in high-speed low-power analog front end
and custom digital blocks for transceivers.
Authorized licensed use limited to: University of Science & Technology of China. Downloaded on May 30,2022 at 01:26:27 UTC from IEEE Xplore. Restrictions apply.
1210 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 57, NO. 4, APRIL 2022
Daniel Zhaoyin Wu received the B.S. degree in Parag Upadhyaya (Member, IEEE) received the
electro-physics from National Chao Tung University, B.S., M.S., and Ph.D. degrees in electrical engineer-
Hsinchu, Taiwan, in 1991, and the M.S. degree in ing from Washington State University, Pullman, WA,
electrical engineering from National Taiwan Univer- USA, in 2000, 2005, and 2008, respectively.
sity, Taipei, Taiwan, in 1993. From 2001 to 2003, he was with Cypress Semicon-
He joined Xilinx, Inc., San Jose, CA, USA, ductor, Austin, TX, USA, working on the develop-
in 2010, and is currently a Principal Engineer at ment of high-speed wireline and optical transceivers.
the Wire Engineering Group, working on archi- Since 2008, he has been with Xilinx, Inc., San Jose,
tecture development and system-level modeling of CA, USA, where he is currently the Director of
high-speed SerDes and silicon photonics. Prior to Engineering with the Wired and Wireless Group,
Xilinx, he worked for a few companies, including leading to the development of high-speed trans-
Ansoft Corporation, Altrabroadband, and ITRI/CCL, designing the built-in ceivers for field-programmable gate array applications. He has authored
passive components for RF front end of cell phone and analog front end of or coauthored over 72 journal, conference, and book chapter publications.
high-speed SerDes. He holds more than 47 U.S. patents. His research interests include high-
speed mixed-signal circuits for wireline, wireless, and optical transceivers;
high-speed data converters; and phase-locked loops.
Authorized licensed use limited to: University of Science & Technology of China. Downloaded on May 30,2022 at 01:26:27 UTC from IEEE Xplore. Restrictions apply.