You are on page 1of 3

ISSCC 2024 / SESSION 18 / HIGH-PERFORMANCE OPTICAL TRANSCEIVERS / 18.

3
18.3 An 8b 160GS/s 57GHz Bandwidth Time-Interleaved DAC and periods (Ts and 2Ts). This duty cycle target estimation is common for all the slices,
Driver-Based Transmitter with Adaptive Calibration for 800Gb/s considering that slice-to-slice trigger point mismatch is very small.
Coherent Optical Applications in 5nm
eq (1)
F. Ahmad1, A. Mellati1, A. Fernandez2, A. Iyer 3, A. Fan1, B. Reyes 2, C. Abidin 1,
C. Nani 4, D. Albano 4, F. Solis 2, G. Minoia 4, G. Hatcher 1, H. Carrer 2, K. Kota 1,
L. Wang 1, M. Bachu 3, M. Garampazzi 4, M. Hassanpourghadi 1, N. Fan 1, Time-interleaved DACs suffer from mismatch between the slices, mainly due to skew in
P. Prabha 1, R. Nguyen 1, S. Ho5, T. Dusatko 5, T. Wu 1, W. Elsharkasy 1, Z. Sun6, clock phases (timings), gain, and offset. These mismatches cause modulated tones and
degrade the spectral performance. In this architecture these mismatches are calibrated
S. Jantzi 1, L. Tse 3
out using adaptive filtering techniques. Figure 18.3.3a shows that for any analog system
2024 IEEE International Solid-State Circuits Conference (ISSCC) | 979-8-3503-0620-0/24/$31.00 ©2024 IEEE | DOI: 10.1109/ISSCC49657.2024.10454480

a digital model can be generated using adaptive filtering techniques. This digital model
1
Marvell, Irvine, CA
can model both the analog system response and its non-idealities. As shown in Fig.
2
Marvell, Cordoba, Argentina
18.3.3a, in this architecture a 40-tap FIR filter (Wiener filter) is used. A kernel is added
3
Marvell, Santa Clara, CA
for offset adaptation as well. TI-DAC responses and mismatches are modelled in the
4
Marvell, Pavia, Italy
adaptive filter (AF). Based on this adaptive filter model, feedback is generated to calibrate
5
Marvell, Vancouver, Canada
rising edge mismatches of the clocks and gain and offset mismatch of the TI-DAC. To
6
Marvell, Singapore, Singapore
control the falling edges of the clock, there is one duty cycle distortion (DCD) loop for
each clock phase. A fixed desired percentage DCD is maintained by 8 separate DCD loops.
The consistent demand for high-speed wireline communication due to high-performance
An automatic gain-control loop and an offset-correction loop are added to keep good
computing and most recently due to machine learning and artificial intelligence has driven
dynamic range and remove any offset in the calibration path as shown in Fig. 18.3.3b.
the development of transmitters operating beyond 100Gb/s per lane [1-4] and optical
modules operating beyond 400Gb/s [4]. High-speed wireline communication requires
Figure 18.3.4a shows the driver architecture. The driver receives the signal current from
high-bandwidth high-sampling rate ADCs and DACs with low-jitter PLLs as the main
the TI-DACs and multiplies it 4 times to generate the desired output current. Multiple T-
analog building blocks. In this paper, we present an energy-efficient optical transmitter
coils are used to tune out the parasitic capacitance and improve the bandwidth. The
fully integrated in an 800G coherent DSP chip using 4 reconfigurable 60-160GS/s 8b
TI-DAC current output sees the sources of the transistor MN_sp and MP_sp, which is a
DACs with a 57GHz AFE bandwidth, fabricated in a 5nm FinFET process.
low-impedance node (almost 10Ω) and helps in maintaining high bandwidth at the TI-
DAC output. Large gate capacitors of MN_d and MP_d are tuned out by T-coils as well.
A coherent transmitter requires coherent DSP along with 4 DAC channels and a single
Similarly, the large capacitance of the output nodes are tuned out by output T-coils.
PLL driving 4 channels as shown in Fig. 18.3.1. The optical transmitter has separate
horizontal and vertical optical polarizations, and each optical polarization path requires
The chip was fabricated in a 5nm FinFET process. Overall measured bandwidth of the
two DACs for in-phase and quadrate-phase modulation. These four channels require
DAC and driver is 57GHz as shown in Fig. 18.3.4b. To measure this bandwidth, a transient
synchronization between them. Since these transmitters are spaced on the chip by
waveform is captured using a real-time scope. The impulse response is captured using
several mm, conventional digital synchronization techniques like clock and reset
adaptive filtering techniques and the FFT of the impulse response was calculated to get
synchronization are not preferred. In this coherent transmitter architecture, a calibration
the frequency response. Board and socket de-embedding were done to get the de-
loopback is added. The calibration calculates loopback delay using autocorrelation or
embedded DAC and driver bandwidth.
match filter and loopback delay is equalized in the digital domain for synchronization.
The loopback path consists of a variable gain amplifier (VGA) and a low-frequency
Figure 18.3.5a shows the measured delay of 8 TI-DAC slices inside all four channels
subsampled ADC. The VGA maintains good dynamic range at the ADC input. A dedicated
(total 32) before and after loopback delay equalization. The x-axis is delay in UIs, while
calibration PLL is added to generate the clock for the calibration ADC. A divided version
the y-axis is scaled received power. Figure 18.3.5b shows the measured spectrum for a
of the main PLL clock drives the calibration PLL. To achieve the required 160GS/s speed,
1GHz sine input signal. The calibration improves the performance by 6 to 10dB based
time-interleaved (TI) DACs are used with TI serializers, which also suffer from
on initial random variation on the chip. Figure 18.3.5c shows the measured PAM eye at
synchronization issues. Similar to channel synchronization, the loopback delay is
279.2Gb/s. The measured total transmitter jitter including the PLL is 65fs. The measured
equalized for TI serializer synchronization. A low-power DLL is added to assist timing
power of each transmitter channel is 0.25W. Figure 18.3.6 shows the performance
closure between serializer and the DAC data input. Both the serializer and clocking circuits
comparison table with the state-of-the-art DACs showing the highest reported bandwidth
run on an AVS supply to achieve performance and save power. To further save power,
and state of the art <0.9pJ/b FOM. The TX Walden FOM without the PLL achieves <40fJ/c-
a current gain stage is used to generate the current needed for the desired output swing
s with a 30GHz input and 160GS/s as the sampling frequency. The die photograph of 4
on 100Ω differential load. The addition of the current buffer reduces the required size of
channels along with the common PLL are shown in Fig. 18.3.7. Total TX area including
the DAC, which reduces the clock buffer size and lowers the power from the clocking
the PLL is 3.2mm2.
circuits. In this design, the power savings on clock buffers due to this architecture is
estimated to be 4× vs. a driver-less design where the DAC directly drives the output.
Acknowledgement:
However, the smaller DAC and clock buffer sizes increase random mismatches. These
The authors would like to thank the entire development team of this project, System,
effects are compensated by various calibration techniques.
Digital, Analog, Layout, PD, DFT, and CAD for their dedication and valuable support.
Figure 18.3.2a shows the TI DAC architecture, it consists of 8 unit-DAC instances called
References:
DAC slices. The 8 DAC slices are controlled by 8 clock phases of divide-by-8 clocks, each
[1] J. Kim et al. “A 224Gb/s DAC-based PAM-4 transmitter with 8-tap FFE in 10nm
operating at 20GHz. Each slice is activated for one unit interval (UI) and controlled by
CMOS”, ISSCC, pp. 126-127, Feb. 2021.
two clock phases, the 0th clock phase and 1st clock phase as an example, as shown in
[2] M. Choi et al. “An output-bandwidth-optimized 200Gb/s PAM-4 100Gb/s NRZ
Fig. 18.3.2b. Each DAC slice has 2b thermometer and 6b binary segmentation.
transmitter with 5-Tap FFE in 28nm CMOS”, ISSCC, pp. 128-129, Feb. 2021.
Figure 18.3.2c shows the unit cell schematics of the DAC architecture. All the units of
[3] E. Chong et al. “112G+ 7-Bit DAC-Based Transmitter in 7-nm FinFET With PAM4/6/8
the same slice are controlled by the same clock phases. As shown in Fig. 18.3.2c the
Modulation.” IEEE SSCL, vol. 5, pp. 21-24, Feb. 2022.
slice is activated by NMOS (MN2_P/MN2_N) and the rising edge of the clock phase
[4] R. L. Nguyen et al. “A Highly Reconfigurable 40-97GS/s DAC and ADC with 40GHz
clk[0]. The DAC slice is disabled by PMOS (MP1_P/MP1_N) and the falling edge of the
AFE Bandwidth and Sub-35fJ/conv-step for 400Gb/s Coherent Optical Applications in
clock phase clkb[1]. By design MN1_P turns off after MP1_P turns on. 50% duty cycle
7nm FinFET”, ISSCC, pp. 136-137, Feb. 2021.
clocks are not ideal for this DAC because NMOS and PMOS have different trip points.
Across process corners this DAC can act as a return-to-zero (RZ) DAC. Calibration is
needed to generate process-dependent duty cycle percentage to make this DAC non-
return-to-zero (NRZ) from an RZ DAC. Zero-order hold for RZ system can be described
using eq (1), which can be simplified as (TS-TOFF)/Ts for a DC input signal to the DAC.
Where TON is the ON time for the slices, Ts is the sampling time, TOFF is the time for which
the slice is off. Since TOFF is just dependent upon the mismatch of the NMOS and PMOS
trigger point, it is not sampling-frequency-dependent. For an NRZ system, TOFF is zero
and DC power is the same irrespective of sampling frequency. During calibration, duty
cycle is adjusted such that power for the DC input signal is the same for two sampling

Authorized licensed use limited to: Tsinghua University. Downloaded on March 31,2024 at 23:37:21 UTC from IEEE Xplore. Restrictions apply.
342 • 2024 IEEE International Solid-State Circuits Conference 979-8-3503-0620-0/24/$31.00 ©2024 IEEE
ISSCC 2024 / February 20, 2024 / 2:20 PM

Figure 18.3.1: Coherent transmitter architecture and channels and slice Figure 18.3.2: (a) Time interleaved DAC, (b) clock phases driving the interleaved
synchronization. DAC, and (c) unit cell schematic.

18

Figure 18.3.3: (a) Digital modelling of the analog system, (b) TIDAC calibration using
adaptive filters. Figure 18.3.4: (a) Driver architecture, (b) overall transmitter bandwidth.

Figure 18.3.5: (a) Measured results, MF response before and after synchronization,
(b) sine spectrum before and after calibration, (c)139.6GBd PAM eye. Figure 18.3.6: Performance comparison table.
Authorized licensed use limited to: Tsinghua University. Downloaded on March 31,2024 at 23:37:21 UTC from IEEE Xplore. Restrictions apply.
DIGEST OF TECHNICAL PAPERS • 343
ISSCC 2024 PAPER CONTINUATIONS

Figure 18.3.7: Die micrograph.

Authorized licensed use limited to: Tsinghua University. Downloaded on March 31,2024 at 23:37:21 UTC from IEEE Xplore. Restrictions apply.
• 2024 IEEE International Solid-State Circuits Conference 979-8-3503-0620-0/24/$31.00 ©2024 IEEE

You might also like