You are on page 1of 21

electronics

Review
Supply-Scalable High-Speed I/O Interfaces
Woorham Bae 1,2
1 Department of Electrical Engineering and Computer Sciences, University of California at Berkeley,
Berkeley, CA 94720, USA; wrbae@eecs.berkeley.edu
2 Ayar Labs, Santa Clara, CA 95054, USA

Received: 25 July 2020; Accepted: 13 August 2020; Published: 15 August 2020 

Abstract: Improving the energy efficiency of computer communication is becoming more and more
important as the world is creating a massive amount of data, while the interface has been a bottleneck
due to the finite bandwidth of electrical wires. Introducing supply voltage scalability is expected
to significantly improve the energy efficiency of communication input/output (I/O) interfaces as
well as make the I/Os efficiently adapt to actual utilization. However, there are many challenges to
be addressed to facilitate the realization of a true sense of supply-scalable I/O. This paper reviews
the motivations, background theories, design considerations, and challenges of scalable I/Os from
the viewpoint of computer architecture down to the transistor level. Thereafter, a survey of the
state-of-the-arts fabricated designs is discussed.

Keywords: computer communication; dynamic voltage and frequency scaling; energy-efficient


computing; high-speed interface; low power

1. Introduction
Nowadays, there are huge demands on a smarter world for better human convenience and
happiness (i.e., manufacturing, smart city, autonomous vehicle, security . . . ). In order to realize those,
it is inevitable that we have to create, replicate, and process tremendous amount of data [1]. For example,
ref. [2] forecasts that the amount of data will increase by more than 3× in five years. Handling those
explosive data is a large burden on computer communications in the current computer architecture [3].
As a result, high-speed input/output (I/O) standards used for computer communications are evolving
very rapidly, and moreover, new I/O standards are also being introduced to address various needs.
In accordance with those trends, multi-standard I/O transceivers (transmitter and receiver) are getting
a huge attention from industry to provide considerable flexibilities to IC products [4–11]. To support
multiple standards with a single transceiver design, it needs to be flexible for a variety of specifications,
especially for a wide range of operating data rates. In addition, even for the cases of dealing
with a single I/O standard, many standards require backward compatibility to their own legacy
generations; for example, PCI Express, USB, serial ATA, etc.; thus, the wide-range operation is also
very important [12,13].
However, implementing such wide-range capability introduces a large burden on the I/O design.
Before deep diving into what the burden is, we briefly review the overall I/O transceiver architecture
and how it works. As shown in the overall block diagram in Figure 1, the I/O transceiver is composed
of three big parts: clock generation, a transmitter (TX), and a receiver (RX) [14]. In the clock generation,
a phase-locked loop (PLL) is generally used to generate a high-frequency clock for the high-speed I/O
from a low-frequency reference clock. These days, a half-rate clock (e.g., 14 GHz for 28 Gb/s NRZ
datastream, also referred to as a double-data rate (DDR) clock) and a quarter-rate clock (i.e., 7 GHz
for 28 Gb/s NRZ, also referred to as a quarter-data rate (QDR) clock) are generally used. The clock
generation circuitry can be shared across multiple transmitter and receiver lines to amortize the area

Electronics 2020, 9, 1315; doi:10.3390/electronics9081315 www.mdpi.com/journal/electronics


Electronics 2020, 9, 1315 2 of 21

and power consumption [15–17]. The high-frequency clock is distributed to the individual transmitters
and receivers with a clock buffer chain. Depending on the macro floorplan, the distribution distance
can be a few hundred µm or even longer; a long chain of buffers needs to be used so that the distribution
Electronics 2020,a9,significant
circuit consumes x FOR PEER REVIEW
amount of power. In addition, the high-speed clock may2 of 21
experience
duty-cycle distortion and phase skew across the chain because of inherent device mismatch and
amortize the area and power consumption [15–17]. The high‐frequency clock is distributed to the
duty-cycle error amplification
individual transmitters and [18,19].
receiversAswith
a result, a duty-cycle
a clock buffer chain.correction
Depending(DCC) on the circuit or a quadrature
macro floorplan,
error correction
the distribution(QEC) circuit
distance canisbefrequently
a few hundred used μmatortheeven transmit
longer; a side to correct
long chain of bufferssuch distortions
needs to be for
used so that the distribution circuit consumes a significant
better eye opening at the transmitter output. Based on the timing information from the corrected amount of power. In addition, the
high‐speed
clock, parallel inputclock mayare
data experience
serialized duty‐cycle
into a distortion
high-speed and bitstream
phase skew(serializer).
across the chain Thebecause of
serialized data
inherent device mismatch and duty‐cycle error amplification [18,19]. As a result, a duty‐cycle
stream is transmitted through the transmission line driven by the TX output driver, whose output
correction (DCC) circuit or a quadrature error correction (QEC) circuit is frequently used at the
impedance is supposed
transmit to match
side to correct with the characteristic
such distortions for better eye openingimpedance at theof the transmission
transmitter output. Based line on
for signal
integrity. To compensate inter-symbol interference (ISI) due to the
the timing information from the corrected clock, parallel input data are serialized into a high‐speedchannel loss at high frequency,
a feed-forward
bitstream equalizer
(serializer).(FFE), which data
The serialized combines
stream is multiple
transmitted adjacent
throughbits the to cancel outline
transmission the ISI effect, is
driven
by the TX output driver, whose output impedance is supposed
frequently implemented at the transmit side. To adapt with various channel conditions and receiver to match with the characteristic
impedance of the transmission line for signal integrity. To compensate inter‐symbol interference
sensitivity, the output voltage swing is configurable. On the receiver side, the received data is usually
(ISI) due to the channel loss at high frequency, a feed‐forward equalizer (FFE), which combines
highly distorted by the ISI due to the limited channel bandwidth. Therefore, a continuous time linear
multiple adjacent bits to cancel out the ISI effect, is frequently implemented at the transmit side. To
equalizer that boosts
adapt with various high-frequency gain is generally
channel conditions and receiver placed at the front-end
sensitivity, the outputofvoltage the receiver.
swing RX is slicers
are usedconfigurable.
to decide whetherOn the receiver side, the received data is usually highly distorted by the ISI due to the timing
the received analog signal is ‘1’ or ‘0’ by sampling the signal at the
limited channel
when the signal-to-noise bandwidth.
ratio (SNR)Therefore, a continuousA
is maximized. time linear equalizer
variable-gain that boosts
amplifier (VGA)high‐frequency
can be preceded
gain is generally placed at the front‐end of the receiver.
by the slices to amplify the input signal for better sensitivity. Since it is important RX slicers are used to decide whether the
to sample the
received analog signal is ‘1’ or ‘0’ by sampling the signal at the timing when the signal‐to‐noise ratio
received signal when the SNR is maximized, finding the optimum sampling timing is one of the utmost
(SNR) is maximized. A variable‐gain amplifier (VGA) can be preceded by the slices to amplify the
tasks ofinput
the receiver.
signal for Thebetterclock and data
sensitivity. Sincerecovery (CDR)
it is important circuitthe
to sample collects
received such timing
signal wheninformation
the SNR is from
edge slicers
maximized, finding the optimum sampling timing is one of the utmost tasks of the receiver.information,
(bang-bang CDR) or error slicers (baud-rate CDR), and based on the collected The
it adjusts theand
clock sampling timing(CDR)
data recovery by usingcircuitacollects
phase such interpolator (PI) [20]. from
timing information A multi-phase generation circuit
edge slicers (bang‐bang
such asCDR) or error slicers
a delay-locked loop(baud‐rate
(DLL) orCDR), and based on the
an injection-locked collected(ILO)
oscillator information,
is widely it adjusts
used to theprovide
samplingnumber
a high enough timing byofusingphasesa phase interpolator
for the PI. On the (PI) other
[20]. Ahand,multi‐phase generation circuit equalizer
a decision-feedback such as a (DFE)
delay‐locked loop (DLL) or an injection‐locked oscillator (ILO) is widely used to provide a high
is used to fully compensate the residual ISI. The basic concept of the DFE is to utilize the previously
enough number of phases for the PI. On the other hand, a decision‐feedback equalizer (DFE) is used
receivedtodata
fullystream
compensate to help the decision
the residual ISI. Theofbasic
the currently
concept of the received
DFE is bit to compensate
to utilize the previously thereceived
ISI. Unlike the
FFE or continuous-time
data stream to help the linear equalizer
decision (CTLE),received
of the currently the DFE bit does not degrade
to compensate the ISI.the SNR,
Unlike thesoFFEnowadays,
or
it has become a very essential
continuous‐time building
linear equalizer blockthe
(CTLE), of the
DFEreceiver.
does not Finally,
degrade the recovered
the SNR, data is de-serialized
so nowadays, it has
become stream.
to the parallel a very essential building block of the receiver. Finally, the recovered data is de‐serialized to
the parallel stream.

Figure 1. Example block diagram of input/output (I/O) transceiver.


Electronics 2020, 9, x FOR PEER REVIEW 3 of 21

Electronics 2020, 9, 1315 3 of 21


Figure 1. Example block diagram of input/output (I/O) transceiver.

As mentioned
As mentionedbefore, before,introducing
introducing wide-range
wide-range operation
operation resultsresults in significant
in significant overhead overhead
on design on
design complexity and an energy/area efficiency penalty as well.
complexity and an energy/area efficiency penalty as well. For example, because it is not easy to cover For example, because it is not easy
to cover
the entirethe entire frequency
frequency range with range withPLL
a single a single
or a PLL
single orvoltage-controlled
a single voltage-controlled oscillatoroscillator
(VCO), multiple(VCO),
multiple PLLs (or VCOs) are frequently implemented together
PLLs (or VCOs) are frequently implemented together and are selected with respect to the operating and are selected with respect to the
operating frequency. In addition, most of the building blocks
frequency. In addition, most of the building blocks needs to be designed to work properly at the needs to be designed to work properly
at the highest
highest rate. Itrate.leadsIt to leads to a significant
a significant impactimpacton the on the energy
energy efficiency efficiency
at a lower at a data
lowerrate.
dataFor rate. For
better
better understanding, Figure 2 shows a simple example of a
understanding, Figure 2 shows a simple example of a CMOS inverter. The delay (τd ) and rise/fall time CMOS inverter. The delay (τ d) and

rise/fall time (approximately


(approximately 2τd ) of a CMOS 2τinverter
d) of a CMOS inverter is proportional to the fan-out of the inverter,
is proportional to the fan-out of the inverter, which is a ratio
which
of is aoutput
the gate ratio of the gate output
capacitance over thecapacitance over the gate
gate input capacitance. input acapacitance.
To retain full rail-to-railToCMOSretain swing
a full
rail-to-rail CMOS swing and to reduce jitter generation, the rise/fall
and to reduce jitter generation, the rise/fall time must be fast enough for given operating frequency. time must be fast enough for
given operating frequency. As a rule of thumb, τd should be no larger than one-fourth of bit period
As a rule of thumb, τd should be no larger than one-fourth of bit period (Tbit ) [21,22]. It implies that
(Tbitfan-out
the ) [21,22]. has It to
implies
be lowthat enoughthe fan-out
to sustain hasa to be low
proper enoughattothe
operation sustain
highest a proper
frequency. operation
On the at the
other
highest frequency. On the other hand, the dynamic switching
hand, the dynamic switching power consumption of a CMOS inverter chain whose final output load is power consumption of a CMOS
inverter chain whose final output load is CL is given as
CL is given as
FO
𝐹𝑂 2
P 𝑃== CL VDD 2 fclk (1)
(1)
FO −
𝐹𝑂 − 11 𝐶𝐿 𝑉𝐷𝐷 𝑓𝑐𝑙𝑘

where FO,
where FO, VVDD , and fclk represent
DD, and fclk
represent the the fan-out
fan-out of of the
the chain,
chain, the the supply
supply voltage,
voltage, and and the
the switching
switching
frequency,
frequency, respectively [23]. That means that if the fan-out is set for the optimum operation at
respectively [23]. That means that if the fan-out is set for the optimum operation at the
the
highest frequency, then in fact, it is overkill and causes unnecessary
highest frequency, then in fact, it is overkill and causes unnecessary power consumption at a lower power consumption at a lower
frequency.
frequency. In In addition,
addition, there there are are also
also overheads
overheads to to make
make circuits
circuits workwork at at aa low
low frequency,
frequency, which which
makes
makes the efficiency at high frequency worse (i.e., capacitors). These observations can
the efficiency at high frequency worse (i.e., capacitors). These observations can be be proven
proven
from
from the
the survey
survey of of wide-range
wide-range and and supply-scalable
supply-scalable transceivers
transceivers shown shown in in Figure
Figure 3, 3, where
where the the energy
energy
efficiencies of transmitters, receivers, and transceivers in [15,16,22,24–35]
efficiencies of transmitters, receivers, and transceivers in [15,16,22,24–35] are marked in the scatter are marked in the scatter plot
with respect to how wide the operating range is (highest data
plot with respect to how wide the operating range is (highest data rate/lowest data rate). Note that rate/lowest data rate). Note that the
energy
the energyefficiencies
efficiencies depicted
depicted in the in Figure
the Figure 3 are3measured
are measured at the athighest
the highestdata data
rate ofrateeach design.
of each It is
design.
observed that the energy efficiency becomes worse as the range
It is observed that the energy efficiency becomes worse as the range widens, which proves the widens, which proves the overhead of
the wide range.
overhead of the wide range.
To
To resolve such
resolve such efficiency
efficiency loss loss duedue toto the
the wide-range
wide-range operation, introducing supply
operation, introducing supply scalability,
scalability,
whose
whose impact has been already well-demonstrated in microprocessors and Internet of Things (IoT)
impact has been already well-demonstrated in microprocessors and Internet of Things (IoT)
applications
applications [36], [36], to to the
the I/O
I/O transceiver
transceiver has has been
been proposed.
proposed. The The basic
basic concept
concept is is to
to lower
lower thethe supply
supply
voltage
voltage when
when the the I/OI/O works
works at at low
low speed
speed where
where it it does
does not not require
require fullfull voltage
voltage due due toto the
the relaxed
relaxed
bandwidth,
bandwidth, in order to reduce the power consumption, which is similar to the concept of dynamic
in order to reduce the power consumption, which is similar to the concept of dynamic
voltage
voltageand and frequency
frequency scaling (DVFS)
scaling of microprocessors
(DVFS) [24,37–40].[24,37–40].
of microprocessors More detailsMore on the details
supply-scalable
on the
I/O interfaces will be discussed in the following
supply-scalable I/O interfaces will be discussed in the following section. section.

Tbit
FO = Cin/Cout
out
out
in

in
Cin Cout
td = a×FO

Figure 2. CMOS inverter delay and rise/fall time as a function of fan-out.


Figure 2. CMOS inverter delay and rise/fall time as a function of fan-out.
Electronics 2020, 9, 1315 4 of 21
Electronics 2020, 9, x FOR PEER REVIEW 4 of 21

100

Energy efficiency (pJ/bit)


10

Trendline

0.1
1 2 4 8 16 32 64 128
Max. rate / Min. rate

: TRX with off-chip VDD scaling : TRX with on-chip VDD scaling
: TX with off-chip VDD scaling : TX with on-chip VDD scaling
: RX with off-chip VDD scaling : RX with on-chip VDD scaling

Figure3.3.Survey
Figure Surveyofofenergy
energyefficiency
efficiencyofofwide-range,
wide‐range,supply-scalable
supply‐scalableI/O
I/Otransceivers
transceiverswith
withrespect
respecttoto
operatingrange.
operating range.

2. Basic Concept of Supply-Scalable I/O


2. Basic Concept of Supply‐Scalable I/O
In addition to the demands on the multi-standard or the backward-compatible I/Os discussed in
In addition to the demands on the multi‐standard or the backward‐compatible I/Os discussed
the previous section, the fact that the link utilization of computer communications is rarely maximum
in the previous section, the fact that the link utilization of computer communications is rarely
is another important motivation of supply-scalable I/O [37–40]. For example, in [39], Google disclosed
maximum is another important motivation of supply‐scalable I/O [37–40]. For example, in [39],
that the servers’ utilization is mostly between 10% and 50% of their maximum utilization levels, where
Google disclosed that the servers’ utilization is mostly between 10% and 50% of their maximum
they exhibit poor efficiency. For example, ref. [40] shows that if a system (servers and network) is 15%
utilization levels, where they exhibit poor efficiency. For example, ref. [40] shows that if a system
utilized while the servers are fully energy-proportional, the network will then consume nearly 50% of
(servers and network) is 15% utilized while the servers are fully energy‐proportional, the network
the overall power, which is a huge waste. As a result, if the communication network is able to adapt the
will then consume nearly 50% of the overall power, which is a huge waste. As a result, if the
power consumption by scaling the supply voltage according to the communication bandwidth required
communication network is able to adapt the power consumption by scaling the supply voltage
by the utilization, a significant energy saving is expected. In addition, recent FinFET technology
according to the communication bandwidth required by the utilization, a significant energy saving
offers a better performance at a lower supply voltage compared to conventional planar MOSFET,
is expected. In addition, recent FinFET technology offers a better performance at a lower supply
which amplifies the benefits of the supply scaling [41]. However, conventional I/O designs typically
voltage compared to conventional planar MOSFET, which amplifies the benefits of the supply
assume fixed-voltage operation. Even for some wide-range designs, the power consumption is usually
scaling [41]. However, conventional I/O designs typically assume fixed‐voltage operation. Even for
dominated by the supply voltage post-fabrication; frequency scaling alone rarely saves much power
some wide‐range designs, the power consumption is usually dominated by the supply voltage
but instead hurts performance [42].
post‐fabrication; frequency scaling alone rarely saves much power but instead hurts performance
To see more deeply how the supply scaling improves the energy efficiency, we review the power
[42].
consumption of I/O interface circuits, which can be divided into three groups [43,44]. The first one is
To see more deeply how the supply scaling improves the energy efficiency, we review the
CMOS dynamic switching power, which is proportional to CV2 f. In the block diagram of Figure 1,
power consumption of I/O interface circuits, which can be divided into three groups [43,44]. The first
the serializer, deserializer, digital circuits, slicers, and some of the clocking circuits fall into this category.
one is CMOS dynamic switching power, which is proportional to CV2f. In the block diagram of
The second thing is the power consumption from the static current, which includes analog circuits
Figure 1, the serializer, deserializer, digital circuits, slicers, and some of the clocking circuits fall into
relying on current
this category. Thebiases—for
second thingexample, amplifiers
is the power and current-mode
consumption from the logic
static(CML)
current, buffers.
whichThe last
includes
one is the signaling power, which includes equalization circuits and TX drivers
analog circuits relying on current biases—for example, amplifiers and current‐mode logic (CML) as well as the power
dissipated
buffers. The at 50-Ω terminations.
last one Those three
is the signaling power, types of power
which consumption
includes equalizationexhibit veryand
circuits different aspectsas
TX drivers
when
well asthethe
operating
power speed changes,
dissipated at 50‐Ωwhich is illustrated
terminations. Thoseon the
threeleft side of
types of power
Figure consumption
4. As we can exhibiteasily
expect from CV 2 f, the dynamic switching power scales linearly as the data rate scales. On the other
very different aspects when the operating speed changes, which is illustrated on the left side of
hand,
Figure the4.others
As wedo cannot scaleexpect
easily with the dataCV
from rate. In fact,
2f, the at high
dynamic frequency,
switching the signaling
power poweras
scales linearly increases
the data
non-linearly
rate scales. On as the
thedata rate
other increases,
hand, because
the others do notextensive equalization
scale with the data and
rate.higher
In fact,transmit
at high swing are
frequency,
the signaling power increases non‐linearly as the data rate increases, because extensive equalization
Electronics 2020, 9, 1315 5 of 21
Electronics 2020, 9, x FOR PEER REVIEW 5 of 21

and higher
required to transmit
overcome swing are required
the channel tosuch
loss at overcome the channel
high frequency loss at As
[25,45]. such high frequency
a result, as shown[25,45].
in the
As a result,
leftmost as Figure
plot of shown4,inthe the leftmost
total powerplot of Figureof4,the
consumption theI/O
total power consumption
transceiver increases linearlyof the I/O
as the
transceiver increases linearly as the data rate increases but becomes non‐linear
data rate increases but becomes non-linear at the high-frequency region. It also consumes considerable at the high‐frequency
region.
static It also
power evenconsumes
at very low considerable
speed. The static power even
corresponding at very
efficiency lowisspeed.
curve Theincorresponding
illustrated the inset plot,
efficiency
where we curve is illustrated
find that in the
the efficiency is inset plot,degraded
severely where weatfind that
a low the efficiency
speed region. For is severely
example,degraded
ref. [40]
shows that a link configured for 2.5 Gb/s consumes as much as 42% of the power for 40 Gb/s, which as
at a low speed region. For example, ref. [40] shows that a link configured for 2.5 Gb/s consumes is
much as
ideally 42% of the
expected to bepower
only for 40 Gb/s, which is ideally expected to be only 6.25%.
6.25%.
If we
If we reduce
reduce thethe supply
supply voltage
voltage atat aa lower
lower speed,
speed, then
then how
how much
much we we can
can reduce
reduce it?
it? Basically,
Basically,
the supply voltage can be reduced until the circuits marginally work at the
the supply voltage can be reduced until the circuits marginally work at the given speed. The free-running given speed. The
free‐running
frequency frequency
of CMOS of CMOS ring
ring oscillators, whichoscillators,
is usually which
used toisevaluate
usually theused to evaluate
speed of CMOSthe speedisof
devices, a
CMOS devices, is a great indicator to see the relation between the speed
great indicator to see the relation between the speed and the voltage. The free-running frequency of a and the voltage. The
free‐running
CMOS frequency
ring oscillator of a CMOSasring oscillator is expressed as
is expressed
𝑓 1 β∙(V − V ,)α (2)
DD TH
f = · , (2)
where N, β, VTH, and C represent the number 2N of stages,CVDD transistor coefficient, threshold voltage of
CMOSN,
where devices,
β, VTH ,and andnodal capacitance
C represent of the of
the number ring, respectively
stages, transistor[19]. α is another
coefficient, coefficient
threshold voltagethat of
characterizes
CMOS devices, a transistor
and nodalperformance,
capacitance which
of the equals two in the conventional
ring, respectively [19]. α is anotherlong‐channel
coefficientdevices
that
but becomes acloser
characterizes to one
transistor for recent short‐channel
performance, which equalsdevices.
two in the(2)conventional
is illustrated long-channel
in Figure 5 for α = 2 and
devices but
1.5. For both cases, we can see a linear‐like relation as long as V is
becomes closer to one for recent short-channel devices. (2) is illustrated in Figure 5 for α = 2 and
DD higher than V TH, so we can1.5.
assume
For boththat
cases,thewe supply
can see voltage can berelation
a linear-like linearlyasscaled
long aswith
VDDthe is data
higher rate
than[24].
VTH The
, soresulting power
we can assume
consumption
that the supplyand efficiency
voltage can bewhen wescaled
linearly applywith
the linear
the datascaling of supply
rate [24]. voltagepower
The resulting are depicted in the
consumption
right‐side plot and inset in Figure 4. The power consumption by the
and efficiency when we apply the linear scaling of supply voltage are depicted in the right-side plot static current scales linearly
withinset
and the frequency
in Figure 4.(PThe = VI), whileconsumption
power the CMOS dynamic power
by the static scalesscales
current cubically (P =with
linearly CV2f) [44].
the On the
frequency
other hand, it is assumed that the signaling power cannot scale
(P = VI), while the CMOS dynamic power scales cubically (P = CV f) [44]. On the other hand, with2 the supply voltage due toittheis
SNR constraint.
assumed that theThe resulting
signaling powerefficiency
cannot curve is much
scale with flatter voltage
the supply than that due without supply
to the SNR scaling,
constraint.
implying
The resultingthat efficiency
the powercurve consumption
is much is almost
flatter thanlinearly proportional
that without supply to scaling,
the dataimplying
rate, which thatis the
the
ideal case assumed in [40]. It also shows that the optimum efficiency
power consumption is almost linearly proportional to the data rate, which is the ideal case assumed may locate at an intermediate
speed
in [40]. because
It also shows of the thatsignaling
the optimumpower, which may
efficiency is practically
locate at anobserved
intermediatein many
speed state‐of‐the‐art
because of the
engineering samples [15,27,32]. In addition, the signaling power
signaling power, which is practically observed in many state-of-the-art engineering samples overhead at a higher data[15,27,32].
rate can
be confirmed from the survey in Figure 6, where the energy efficiencies
In addition, the signaling power overhead at a higher data rate can be confirmed from the survey at the highest rate of various in
scalable I/O designs are scattered with respect to the highest data rate. The
Figure 6, where the energy efficiencies at the highest rate of various scalable I/O designs are scattered efficiency becomes worse
at a higher
with respectdatato the rate abovedata
highest a certain dataefficiency
rate. The rate (approximately
becomes worse 8 Gb/s), implying
at a higher datathe
rateenergy
aboveoverhead
a certain
to drive
data ratesuch high‐speed signaling,
(approximately which isthe
8 Gb/s), implying originated from heavy
energy overhead equalization
to drive circuits and
such high-speed a larger
signaling,
signal swing [44].
which is originated from heavy equalization circuits and a larger signal swing [44].

without VDD scaling with VDD scaling


Frequency

Static Signaling Switching Total Static Signaling Switching Total


1
1.25 1
1.25

4 4

Power efficiency Power efficiency


Normalized power

Normalized power

0.81 0.81
3 3

2 Voltage 2

0.6
0.75
1
0.6
0.75 1

0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

0.40.5 0.40.5

0.2
0.25
w/ Linear V-F 0.2
0.25

00 00
00 0.2
0.2
0.4
0.4
0.6
0.6
0.8
0.8
11 00 0.2
0.2
0.4
0.4
0.6
0.6
0.8
0.8
11

Normalized frequency Normalized frequency

Figure 4. Impact of supply scaling on the power consumption and efficiency of I/O across the
Figure 4. Impact of supply scaling on the power consumption and efficiency of I/O across the
operating frequency.
operating frequency.
Electronics 2020, 9, 1315 6 of 21
Electronics 2020, 9, x FOR PEER REVIEW 6 of 21
Electronics 2020, 9, x FOR PEER REVIEW 6 of 21

=2.0
=2.0
=1.5
=1.5

0 VTH 2VTH 3VTH


0 VTH 2VTH 3VTH
VDD
VDD
Figure5.5.Relation
Figure Relationbetween
betweenoperating
operatingspeed
speedand
andsupply
supplyvoltage
voltageof
ofthe
theCMOS
CMOScircuit.
circuit.
Figure 5. Relation between operating speed and supply voltage of the CMOS circuit.

100
100

10
10

1
1
Trendline
Trendline

0.1
0.1
2 4 8 16 32
2 4 8 16 32
Max. data rate (Gb/s)
Max. data rate (Gb/s)
: TRX with off-chip VDD scaling : TRX with on-chip VDD scaling
: TRX with off-chip VDD scaling : TRX with on-chip VDD scaling
: TX with off-chip VDD scaling : TX with on-chip VDD scaling
: TX with off-chip VDD scaling : TX with on-chip VDD scaling
: RX with off-chip VDD scaling : RX with on-chip VDD scaling
: RX with off-chip VDD scaling : RX with on-chip VDD scaling
Figure6.6. Survey
Figure Survey of
of energy
energyefficiency
efficiencyof
ofscalable
scalableI/O
I/Oacross
acrossthe
themaximum
maximumdata
datarate.
rate.
Figure 6. Survey of energy efficiency of scalable I/O across the maximum data rate.
3.3. Design
DesignConsiderations
Considerationsof ofSupply-Scalable
Supply‐ScalableI/O I/O
3. Design Considerations of Supply‐Scalable I/O
3.1. Base Circuit Topology
3.1. Base Circuit Topology
3.1. Base Circuit Topology
Figure 7 shows two major circuit topologies in CMOS technology, CMOS and CML circuits, which
Figure 7 shows two major circuit topologies in CMOS technology, CMOS and CML circuits,
are mostFigure 7 showsused
frequently twoinmajor
modern circuit topologies
analog in CMOSICtechnology,
and mixed-signal CMOS and
designs. Although CML circuits,
current-mode logic
which are most frequently used in modern analog and mixed‐signal IC designs. Although
which are most frequently used in modern analog and mixed‐signal IC designs.
(CML) circuits have been a majority for base circuit topology for high-speed I/O interfaces owing to their Although
current‐mode logic (CML) circuits have been a majority for base circuit topology for high‐speed I/O
current‐mode logic (CML)
high-speed capability [46], circuits
using CMOShave been a majority
circuits as muchfor as base circuit
possible topology for
is preferred for supply-scalable
high‐speed I/O
interfaces owing to their high‐speed capability [46], using CMOS circuits as much as possible is
interfaces
I/O in order to maximize the frequency range and the benefit of the supply scalingas[44,47].
owing to their high‐speed capability [46], using CMOS circuits as much possible It is
is
preferred for supply‐scalable I/O in order to maximize the frequency range and the benefit of the
preferred for supply‐scalable I/O in order to maximize the frequency range
mainly because the power consumption of CML circuits follows the blue line in Figure 4 (static currentand the benefit of the
supply scaling [44,47]. It is mainly because the power consumption of CML circuits follows the blue
supply scaling [44,47]. Itthe
consumption), is CMOS
mainlyexhibits
becausecubic
the power
powerconsumption of CMLbesides
circuitsthefollows
powerthe blue
line in Figure whereas
4 (static current consumption), whereas scaling. In addition,
the CMOS exhibits cubic power scaling,
scaling. In
line in Figure 4 (static current consumption), whereas the CMOS exhibits cubic
the CMOS circuit dissipates less power as long as the fan-out is no less than 2 [23], and it also provides power scaling. In
addition, besides the power scaling, the CMOS circuit dissipates less power as long as the fan‐out is
addition,
a higher swingbesides the the
than power scaling, the CMOS circuit
the dissipates less power as long ascontrollability
the fan‐out is
no less than 2 [23], and CML.
it also On the
provides other hand,
a higher swingCML circuit
than the CML. has a much
On the better
other hand, the CML
no
of lessspeed
its than compared
2 [23], and toit also
the provides
CMOS a higher swing
counterpart whose than
speedtheisCML.
highlyOndeterministic
the other hand, the given
to the CML
circuit has a much better controllability of its speed compared to the CMOS counterpart whose
circuit
process has a much better controllability of its speed compared to the CMOS counterpart whose
speed istechnology and supply
highly deterministic tovoltage.
the givenThe time constant
process technologyof aand
CML circuit
supply is a product
voltage. The timeof the load
constant
speed is highly deterministic to the given process technology and supply voltage. The time constant
of a CML circuit is a product of the load resistance (R) and the load capacitance (CL), so tweaking R is
of a CML circuit is a product of the load resistance (R) and the load capacitance (CL), so tweaking R is
Electronics 2020, 9, 1315 7 of 21
Electronics 2020, 9, x FOR PEER REVIEW 7 of 21

resistance (R) and the load capacitance (CL ), so tweaking R is a powerful knob of controlling the circuit
a powerful knob of controlling the circuit speed in the design phase. The CMOS circuit can be
speed in the design phase. The CMOS circuit can be accelerated by reducing the fan-out; however,
accelerated by reducing the fan‐out; however, a too‐small fanout (i.e., less than 2) is not feasible in a
a too-small fanout (i.e., less than 2) is not feasible in a practical circuit implementation and also results
practical circuit implementation and also results in a non‐linear increase of power consumption (see
in a non-linear increase of power consumption (see (1)). As a result, the CML circuit exhibits a better
(1)). As a result, the CML circuit exhibits a better capability of high‐speed operation, which is a
capability of high-speed operation, which is a dominant issue to achieve the maximum speed of the
dominant issue to achieve the maximum speed of the scalable I/O. It has been one of the main
scalable I/O. It has been one of the main reasons to make high-speed interfaces highly rely on CML
reasons to make high‐speed interfaces highly rely on CML circuits, although their scalability is not
circuits, although their scalability is not as good as CMOS circuits. In order to resolve this issue,
as good as CMOS circuits. In order to resolve this issue, ref. [25] proposed adjusting the current bias
ref. [25] proposed adjusting the current bias of CML circuits in accordance with the supply scaling.
of CML circuits in accordance with the supply scaling. For example, at the minimum data rate and
For example, at the minimum data rate and supply voltage (0.65 V) condition, the current bias is
supply voltage (0.65 V) condition, the current bias is reduced to half of the nominal value, whereas it
reduced to half of the nominal value, whereas it is raised to 1.5× when the data rate and supply voltage
is raised to 1.5× when the data rate and supply voltage (1.05 V) are maximized. Instead of using a
(1.05 V) are maximized. Instead of using a fixed resistance as CML load resistance, a symmetric load
fixed resistance as CML load resistance, a symmetric load whose resistance automatically adapts
whose resistance automatically adapts with the bias current is employed in [25] to scale the circuit
with the bias current is employed in [25] to scale the circuit bandwidth while maintaining the
bandwidth while maintaining the voltage swing. As an alternative approach, a CMOS inverter with
voltage swing. As an alternative approach, a CMOS inverter with resistive feedback can be used to
resistive feedback can be used to hit high bandwidth for fully utilizing CMOS circuits even above the
hit high bandwidth for fully utilizing CMOS circuits even above the CMOS limit, because the
CMOS limit, because the resistive feedback extends the bandwidth of the CMOS circuit at the cost of
resistive feedback extends the bandwidth of the CMOS circuit at the cost of increased power
increased power consumption [48].
consumption [48].

Figure
Figure 7.
7. Comparison
Comparison of
of CMOS
CMOS and
and current‐mode
current-mode logic
logic (CML)
(CML) circuit.
circuit.

3.2. On‐Chip
3.2. On-Chip Supply
Supply Control
Control
From the surveys shown in Figures 3 and 6, we observe there have been many works relying on
off-chip supply
off‐chip supplycontrol
controlrather than
rather on-chip
than control,
on‐chip and generally
control, those works
and generally those exhibit
works aexhibit
better efficiency
a better
than those with on-chip control. It is mainly because they do not include
efficiency than those with on‐chip control. It is mainly because they do not include any power any power overhead of
supply control (either of on-chip and off-chip), so it is important to understand
overhead of supply control (either of on‐chip and off‐chip), so it is important to understand what the what the overhead is.
In addition,
overhead is.ultimately,
In addition,the ultimately,
supply control the circuit
supplymust be integrated
control circuit mustwithbe theintegrated
core circuits. The
with supply
the core
control scheme can be divided into two types: a switching regulator (DC–DC
circuits. The supply control scheme can be divided into two types: a switching regulator (DC–DC converter) and linear
regulator (specifically,
converter) a low-dropout
and linear regulator (LDO)aregulator),
(specifically, low‐dropout which
(LDO) areregulator),
depicted in Figure
which are8.depicted
Generally,in
the switching
Figure regulators
8. Generally, exhibit higher
the switching efficiency
regulators exhibit compared to linearcompared
higher efficiency regulators, tobecause there is a
linear regulators,
non-negligible
because there isvoltage dropout in
a non‐negligible the linear
voltage regulators
dropout [24]. However,
in the linear regulatorsbecause of theirbecause
[24]. However, switching of
behavior,
their the switching
switching behavior,regulators have regulators
the switching a periodic switching rippleswitching
have a periodic on the outputripple voltage
on thewhereas
output
the linear
voltage regulators
whereas the offer
linearpower supply
regulators noise
offer rejection
power supply (PSR).
noiseSince most (PSR).
rejection of the building
Since most blocks of
of the
high-speed
building I/O are
blocks very sensitive
of high‐speed I/Otoare
thevery
supply noise [49,50],
sensitive the linear
to the supply regulators
noise canlinear
[49,50], the be more suitable
regulators
for high-performance
can be more suitable for applications [12,31]. For
high‐performance example, [31]
applications reveals
[12,31]. Forthat the link[31]
example, bit-error-rate
reveals that (BER)
the
degrades when a switching regulator is used. In addition, the supply sensitivity
link bit‐error‐rate (BER) degrades when a switching regulator is used. In addition, the supply increases at a lower
sensitivity increases at a lower supply voltage, so it is more critical in supply‐scalable I/O compared
Electronics 2020, 9, 1315 8 of 21

Electronics 2020, 9, x FOR PEER REVIEW 8 of 21


supply voltage, so it is more critical in supply-scalable I/O compared to fixed-supply I/O. For example,
toElectronics
the fixed‐supply
transition2020,delay I/O. aFor
ofPEER
9, x FOR CMOSexample, the
inverter
REVIEW transition delay
is approximately of a CMOS
expressed as to inverter is approximately
8 of 21
expressed as to
to fixed‐supply I/O. For example, the transition CVdelay
DD of a CMOS inverter is approximately
τd =𝜏 , (3)
expressed as to β(VDD − VTH )α
, (3)

whereττd isisthe
where thetransition
transitiondelay delayof ofthetheCMOS
CMOS inverter[19].
𝜏 inverter [19].
, Bydifferentiating
By differentiating(3) (3)with
withVVDD,, we we(3) cancan
d DD
obtain
obtain the supply
theτdsupply sensitivity
sensitivitydelay of the transition
of theoftransition delay,delay,
whichwhich represents
represents the jitter
the jitter induced induced by supply
where is the transition the CMOS inverter [19]. By differentiating (3) withby VDDsupply
, we can noise
noise of
∆VDD as
of obtain ∆V DD as
the supply sensitivity of the transition delay, which represents the jitter induced by supply
dτd 1 α

noise of ∆VDD as = τd 𝜏 − , , (4)
(4)
dVDD VDD VDD − VTH
whereα αequals
where equals twotwo in long‐channel
in long-channel devices but𝜏becomes
devices but becomes
smaller , smaller in short‐channel devices,
in short-channel devices, as mentioned (4) as
mentioned
in the Section in 2.
the(4)Section 2. (4) isinillustrated
is illustrated Figure in Figure
9 for αbut
= 2.0 9and
for α = 2.0
1.5, whereandwe 1.5,find
where thewe find sensitivity
supply the supply
where α equals two in long‐channel devices becomes smaller in short‐channel devices, as
sensitivity
dramatically dramatically
mentioned increases
in the Section increases
as the 2. supply as the supply
voltage scales
(4) is illustrated in Figure voltage
[19,51].
9 forAs scales
α =a2.0 [19,51].
result,
and for 1.5,the As a result,
casewe
where of find for
usingthe the case of
a switching
supply
using a switching
regulator,
sensitivity thedramatically
switching regulator,
frequency
increasesthe and switching
as CDR
the supply frequency
bandwidth and CDR
voltage should
scales bandwidth
be carefully
[19,51]. chosen
As a result, should becase
to minimize
for the carefully
ofthe
chosen
using a switching regulator, the switching frequency and CDR bandwidth should be carefullyofthe
impact ofto minimize
supply ripple the onimpact
the linkof supply
BER ripple
[22,33]. on
Figure the8b link
shows BER [22,33].
one of the Figure
design 8b shows
considerations one of a
design
linear
chosen considerations
regulator
to minimize regarding of a
the impactlinear
PSR. The regulator regarding
linearripple
of supply regulator on the PSR.
is link
based The
BER linear
on[22,33]. regulator
a negative Figure is
feedbackbased
8b shows on
loop a
onewith negative
of the two
feedback
design
poles, loop
oneconsiderations
is at thewithoutput
twoofpoles, theone
aoflinear is at the
regulator
amplifier (ω output
regarding
1 , gate of
PSR.
of the
pass Theamplifier
linear
transistor) (ω 1, gate
regulator
and is
the of pass
based
other on
istransistor)
a
at negative
the output and
the other
feedback is
loopat the
with output
two (ω
poles, , regulated
one is at the supply
output voltage).
of the Depending
amplifier
(ω2 , regulated supply voltage). Depending on the pole locations, the regulator exhibits different PSR
2 (ω 1 , gate on ofthe
pass pole locations,
transistor) and the
regulator exhibits
the other is at
characteristics. Forthedifferent PSR characteristics.
output if(ωω2,1 regulated
example, is the dominant For
supplypole example, if
(ω2 > Depending
voltage). ω is
ω1 ), the PSR
1 the dominant
onhas pole
thea peaking, (ω
pole locations, >
whereas
2 ω ),
thethe
1 the
PSR has case
regulator
opposite a peaking,
exhibits
exhibits whereas
different
a plain the
PSR opposite
characteristics.
low-pass case
filter exhibits
For example,
characteristic a plainif ω1low‐pass
[19,52]. is the filtermaking
dominant
However, characteristic ω1),[19,52].
pole (ωω2 2> dominant the
PSR
However,has a peaking,
making ω 2whereas
dominant the opposite
requires a case
huge exhibits a
requires a huge capacitance because of low output resistance at the VDD node due to the circuitat
capacitance plain low‐pass
because of filter
low characteristic
output resistance [19,52]. the
load.
However,
ItVcosts
DD node
a huge making
duesilicon ω
to thearea 2 dominant
circuit
and load. requires
frequently It costsa huge
needs a huge capacitance because
siliconcapacitor,
an off-chip area andand of low output
frequently resistance
needs constraints
it also severely at the
an off‐chip
theVsupply
DD node
capacitor, and dueit to
transitionalso the
time. circuit
severely load. It costs
constraints
To summarize, thea supply
there huge silicon
are lots transition
of designarea time.
and frequently
To summarize,
trade-offs (e.g., PSRneeds an off‐chip
there
versus are lots of
efficiency,
capacitor,
design and it also
trade‐offs (e.g., severely
PSR constraints
versus the supply
efficiency, PSR transition
versus time.area),
silicon To summarize,
which needthereto are
be lots of
carefully
PSR versus silicon area), which need to be carefully examined to find the best topology and design
design
examined to trade‐offs (e.g., PSR versus efficiency, PSR versus silicon area), which need to be carefully
parameters forfind the best
a given topology and design parameters for a given application.
application.
examined to find the best topology and design parameters for a given application.
(a) Switching regulator (b) Linear regulator
(a) Switching regulator (b) Linear regulator
VDDH PSR
VDDH VDDH PSR
VDDH 1> 2
1> 2
VREF
VREF PSR
1 PSR freq.
D VDDH 1 freq.
PWM D VDDH
PWM VDD 2> 1
VDD VDD 2> 1
2 VDD PSR
VREF 2 PSR
VREF D
D Circuit
Circuit Circuit
Circuit

freq. freq.

Figure 8.8.Block
Figure8.
Figure Block diagrams
Blockdiagrams of (a)
diagramsof (a) switching
switchingregulator
switching regulator(buck
regulator (buckconverter)
(buck converter)
converter)(b)(b) linear
linear
(b) regulator.
regulator.
linear regulator.

Figure 9. Supply sensitivity of transition delay of CMOS circuit.


Figure9.9. Supply
Figure Supply sensitivity
sensitivity of
of transition
transitiondelay
delayof
ofCMOS
CMOScircuit.
circuit.
3.3. Clock Generation
3.3. Clock Generation
Electronics 2020, 9, 1315 9 of 21

3.3. Clock Generation


As mentioned briefly in the introduction, multiple PLLs (or multiple oscillators in a single PLL) are
Electronics 2020, 9, x FOR PEER REVIEW 9 of 21
frequently implemented in wide-range transceivers to cover the entire frequency range [4,8,10,11,53].
It is mainly Asbecause
mentioned LCbriefly
oscillators
in theare popular for
introduction, high-speed
multiple PLLs (orI/Omultiple
applications because
oscillators of itsPLL)
in a single superior
phasearenoise performance
frequently and high-speed
implemented capability
in wide‐range comparedtoto cover
transceivers ring oscillators,
the entire and they arerange
frequency also less
sensitive to supplyIt voltage
[4,8,10,11,53]. is mainlyvariation.
because LCHowever,
oscillators LC oscillators
are popular occupy a much
for high‐speed larger area,
I/O applications so using
because
of itsLC
multiple superior phase noise
oscillators is an performance and high‐speed
expensive solution. The capability compared
survey results to ring
given oscillators,
in Figures 10and
and 11
validate such trade-off between LC-based PLL and ring-based PLL. Figure 10 illustratesmuch
they are also less sensitive to supply voltage variation. However, LC oscillators occupy a a scatter
plot oflarger area, so using
figure-of-merit multiple
(FoM) LC oscillators
of PLL is an expensive
versus operating frequencysolution. Thedesigns
of PLL survey results
presentedgivenininIEEE
Figures 10 and 11 validate such trade‐off between LC‐based PLL and ring‐based PLL. Figure 10
International Solid-State Circuits Conference (ISSCC) from 2010 to 2019 [19]. The FoM of the PLL is
illustrates a scatter plot of figure‐of‐merit (FoM) of PLL versus operating frequency of PLL designs
defined as
presented in IEEE International Solid‐State CircuitsConferenceσ J 2  PPLL(ISSCC)
! from 2010 to 2019 [19]. The
FoM of the PLL is defined as FoM PLL = 10 log · . (5)
1s 1mW
where σJ and PPLL denote the absolute 𝐹𝑜𝑀jitter and 10 log ∙
power consumption . of a PLL, respectively, which (5) is
widely used to evaluate the PLL jitter performance with equalized power consumption [54]. The trend
shown where σJ and10
in Figure PPLLimplies
denote that
the absolute jitter andPLLs
the LC-based power consumption
generally of abetter
exhibit PLL, respectively, which is and
jitter performance
widely used to evaluate the PLL jitter performance with equalized power consumption [54]. The
higher operating frequency than the ring-based counterpart. On the other hand, we can find that the
trend shown in Figure 10 implies that the LC‐based PLLs generally exhibit better jitter performance
ring-based PLL occupies much less silicon area from Figure 11, where the FoM versus silicon area is
and higher operating frequency than the ring‐based counterpart. On the other hand, we can find that
plotted.
the In addition,PLL
ring‐based theoccupies
tuning range
much oflessthe ring area
silicon is much
fromwider
Figurethan LC [19].
11, where the As
FoMa result,
versus the design
silicon
trade-off
areabetween
is plotted.LC- and ring-based
In addition, the tuningPLLs
rangeinofwide-range I/O wider
the ring is much can bethansummarized
LC [19]. As as follows:
a result, the the
LC PLLdesign trade‐off between LC‐ and ring‐based PLLs in wide‐range I/O can be summarized as follows: LC
offers better jitter performance; however, its range is limited so that no less than two
PLLs the
are LC
required to cover
PLL offers betterajitter
wide range, which
performance; resultsits
however, inrange
significant area
is limited soconsumption.
that no less thanOn twotheLCother
hand,PLLs are required
the ring to cover
PLL achieves a wide
wide range,
range withwhich results
a small in significant
area; however, its areahigher
consumption.
phase noiseOn the
andother
supply
hand, results
sensitivity the ringinPLLworseachieves wide rangeConventionally,
performance. with a small area; however,
using its higher
multiple PLLs has phase noise
been and
a majority
optionsupply sensitivityI/O
for wide-range results in worse [8,10,11,55,56].
transceivers performance. Conventionally,
Recently, thereusing
havemultiple
been some PLLs has been works
ring-based a
majority option for wide‐range I/O transceivers [8,10,11,55,56]. Recently, there have been some
by introducing some circuit techniques to overcome the limit of ring PLLs; for example, a two-stage
ring‐based works by introducing some circuit techniques to overcome the limit of ring PLLs; for
ring oscillator [23], clock multiplying delay-locked loop (MDLL) [31], and multi-phase calibration [57].
example, a two‐stage ring oscillator [23], clock multiplying delay‐locked loop (MDLL) [31], and
For specific applications
multi‐phase calibrationwhere
[57].only discreteapplications
For specific rates are used,wherea single PLL with
only discrete a configurable
rates division
are used, a single
ratio can be also used [16,58].
PLL with a configurable division ratio can be also used [16,58].

0.1 1 10 100
-200

-210

-220
FoM (dB)

-230

-240

-250

-260

10. Comparison
FigureFigure of ring-based
10. Comparison and
of ring‐based andLC-based
LC‐based clock generatorsininterms
clock generators terms
of of figure-of-merit
figure‐of‐merit and and
frequency, based
frequency, on the
based works
on the workspresented
presentedininInternational Solid-StateCircuits
International Solid‐State Circuits Conference
Conference (ISSCC)
(ISSCC)
2010–2019
2010–2019 [19]. [19].
Electronics 2020, 9, 1315 10 of 21
Electronics 2020, 9, x FOR PEER REVIEW 10 of 21

0.001 0.01 0.1 1 10


-200

-210

-220

FoM (dB)
-230

-240

-250

-260

-270

Figure
Figure 11. 11. Comparison
Comparison ofofring-
ring‐and
andLC-based
LC‐based clock
clockgenerators
generatorsin in
terms of figure‐of‐merit
terms and silicon
of figure-of-merit and silicon
area, based on the works presented in ISSCC 2010–2019
area, based on the works presented in ISSCC 2010–2019 [19]. [19].

3.4. TX
3.4. Driver Topology
TX Driver Topology
TX output
TX output driver
driver topologyisisanother
topology another important
importantfactor factorthat impacts
that impacts the the
efficiency of scalable
efficiency IO, IO,
of scalable
since the TX output driver is mainly responsible for the signaling
since the TX output driver is mainly responsible for the signaling power we discussed above, although a power we discussed above,
although a separate supply voltage is usually dedicated for the TX driver to decouple the TX swing
separate supply voltage is usually dedicated for the TX driver to decouple the TX swing from the supply
from the supply scaling [12,25]. The requirements for the TX driver are summarized as a proper
scaling [12,25]. The requirements for the TX driver are summarized as a proper output impedance
output impedance for signal integrity, enough output swing to guarantee sufficient SNR, and
for signal
optional integrity, enough output
FFE equalization swing to
for a high‐loss guarantee
channel. Figuresufficient
12 shows SNR,
threeand optional
popular FFE equalization
topologies of TX
for adriver
high-loss channel. Figure
implementation: 12 shows driver,
a current‐mode three popular
P‐over‐Ntopologies
voltage‐mode of TX driver
driver, andimplementation:
N‐over‐N
a current-mode
voltage‐modedriver, driver P-over-N
[25,59–62].voltage-mode
In the current‐mode driver, and N-over-N
driver, the transistors voltage-mode
operate in the driver [25,59–62].
saturation
In the current-mode
region, the outputdriver,
impedance the transistors
of the driver operate in the dominated
is generally saturation by region, the output
the passive impedance
load resistance R. of the
Asisa generally
driver result, it provides
dominated goodby impedance
the passive matching across a wide
load resistance R. Asrange of output
a result, swing.good
it provides Compared
impedance
to the across
matching CML circuit
a wide that has an
range of output
outputswing
swing. of Compared
IBIASR, the current‐mode
to the CML driver circuithas thata halved
has an swing
outputofswing
IBIASR/2 (single‐ended), because the IBIAS splits into the load resistance and RX termination. For
of IBIAS R, the current-mode driver has a halved swing of IBIAS R/2 (single-ended), because the IBIAS
example, with 50‐Ω termination, 20‐mA current consumption is needed for 500‐mV swing
splits into the load resistance and RX termination. For example, with 50-Ω termination, 20-mA current
(=Swing/25Ω). On the other hand, the voltage‐mode drivers consume less current than the
consumption is needed for 500-mV swing (=Swing/25Ω). On the other hand, the voltage-mode drivers
current‐mode driver, because there are not spilt current paths. Such a difference originates from
consume
whereless the current
terminationthanisthe current-mode
placed relative to the driver,
signal because
path, suchthere arethe
that nottermination
spilt current paths.inSuch a
is placed
difference originates from where the termination is placed relative
parallel with the signal path in the current‐mode driver but in series with the signal path to the signal path, such that the
in the
termination
voltage‐mode is placed
driver.inForparallel with the
that reason, the signal path indriver
voltage‐mode the current-mode
is also referreddriver to the but in series with
source‐series
the signal
terminatedpath (SST)
in thedriver.
voltage-mode
In a P‐over‐N driver. For that reason,
voltage‐mode driver,theanvoltage-mode driver is also referred
inverter‐like PMOS‐over‐NMOS
structure
to the drives the
source-series channel and
terminated the RX
(SST) termination.
driver. In an N‐over‐N
In a P-over-N voltage‐mode
voltage-mode driver,driver, NMOS
an inverter-like
transistors are used for both pull‐up and pull‐down, but they are
PMOS-over-NMOS structure drives the channel and the RX termination. In an N-over-N voltage-mode driven by an opposite polarity of
input voltage. As a result, pull‐up and pull‐down paths are not
driver, NMOS transistors are used for both pull-up and pull-down, but they are driven by an opposite turned on simultaneously, the
current solely flows from VDD,PU (or Vswing) to VCM (or VCM to ground). As a result, assuming matched
polarity of input voltage. As a result, pull-up and pull-down paths are not turned on simultaneously,
impedance for both paths, both the output swing and the common‐mode voltage (VCM) are VDD,PU/2.
the current solely flows from VDD,PU (or Vswing ) to VCM (or VCM to ground). As a result, assuming
For example, it leads to the current consumption of 5 mA for a 500‐mV output swing, which is only
matched
¼ thatimpedance for both paths,
of the current‐mode driver.both the output
The same is also swing
appliedandto thetheN‐over‐N
common-mode voltage‐modevoltage (VCM ) are
driver.
On/2.
VDD,PU theFor example,
other hand, the it leads to the current
impedance matching consumption
of voltage‐mode of 5 mAdriversfor ais500-mV
more complex output than
swing, thewhich
1
is only that
current‐mode
4 of the current-mode
driver, because they driver.
rely on The
the same
active is also
devices applied
rather than to
a the
passive N-over-N
resistor. voltage-mode
Basically,
driver.
the On the other
transistors hand,
should the impedance
operate as a resistormatching
to assure of lowvoltage-mode
output impedance drivers is more
so that they complex
should bethanin the
a linear region, which constraints the gate‐overdrive voltage (V ) to be
current-mode driver, because they rely on the active devices rather than a passive resistor. Basically,
OV higher than the drain‐source
the transistors should operate as a resistor to assure low output impedance so that they should be in a
linear region, which constraints the gate-overdrive voltage (VOV ) to be higher than the drain-source
voltage (VDS ). As a result, the P-over-N is more appropriate for high output swing, whereas the
Electronics 2020, 9, 1315 11 of 21
Electronics 2020, 9, x FOR PEER REVIEW 11 of 21

voltage (VDS). As a result, the P‐over‐N is more appropriate for high output swing, whereas the
N-over-N fits better for low output swing [12]. A passive resistor is usually placed in series with the
N‐over‐N fits better for low output swing [12]. A passive resistor is usually placed in series with the
driver to reduce
driver to reduce VDSVDS
the the forforbetter
betterlinearity
linearity [60,63].
[60,63].

Figure 12. Transmitter


Figure (TX)
12. Transmitter output
(TX) outputdriver
drivertopologies, R0, and
topologies, R0, andVVCMCM denote
denote characteristic
characteristic impedance
impedance of of
channel and common-mode
channel and common‐mode voltage, respectively;
voltage, (a) current-mode
respectively; driver, (b)
(a) current‐mode P-over-N
driver, (b) voltage-mode
P‐over‐N
voltage‐mode
driver, (c) N-over-Ndriver, (c) N‐over‐N
voltage voltage mode driver.
mode driver.

As shown
As shown in Figure12b,c,
in Figure 12b,c,the
the voltage‐mode
voltage-mode drivers
driversare are
driven by CMOS
driven by CMOS inverters, whereaswhereas
inverters, the
current‐mode driver needs to be driven by a CML pre‐driver, so the voltage‐mode
the current-mode driver needs to be driven by a CML pre-driver, so the voltage-mode drivers are drivers are more
moreappropriate
appropriate forfor
supply‐scalable
supply-scalable design, as we
design, as wediscussed
discussed in Section 3.1. 3.1.
in Section However, there there
However, are are
constraints on the supply voltage of such inverters; hence, it limits the scalability. Since the output
constraints on the supply voltage of such inverters; hence, it limits the scalability. Since the output
impedance is set by the transistors, and it is dominated by the VGS of the transistors, the supply
impedance is set by the transistors, and it is dominated by the VGS of the transistors, the supply voltage
voltage can only be adjusted in a range where the output impedance is matched within the
can only be adjusted
configurable rangeinof athe
range
driverwhere the output
segmentation. impedance
In the P‐over‐N, isthematched
VGS of PMOS within andthe configurable
NMOS are
rangeVDD,PU
of the anddriver
VDD,PDsegmentation.
, respectively. InInthe theN‐over‐N,
P-over-N, thetheVGSVof of PMOS
GS pull‐up andand NMOS NMOSs
pull‐down are VDD,PU
are and
VDD,PDVDD, ‐V
respectively. In the N-over-N,
CM and VDD, respectively; therefore, VGS
thethe of pull-up
pull‐up deviceand pull-down
is typically muchNMOSs are Vthe
larger to have -V
DD CM and
same
VDD ,impedance.
respectively; Fortherefore,
both cases,thethe
pull-up
outputdevice is typically
impedance much larger
is a function of thetosupply
have the same impedance.
voltages of the
pre‐driver
For both cases,inverters,
the output so impedance
the controllability on thoseof
is a function supplies is constrained
the supply voltages by the controllability
of the of
pre-driver inverters,
so thethe output driveronstrength.
controllability For theissupply
those supplies voltagebyofthe
constrained thecontrollability
output driver of itself, the voltage‐mode
the output driver strength.
driver
For the is fully
supply constrained
voltage by the output
of the output driverswing, but voltage-mode
itself, the the current‐mode driver
driver is can
fullyscale the supply
constrained by the
until the differential pair and the current source faces a voltage headroom issue.
output swing, but the current-mode driver can scale the supply until the differential pair and the
current
3.5.source
Clockingfaces a voltage headroom issue.
Architecture
Clocking
3.5. Clocking architecture is one of the most important design choices that impacts the performance
Architecture
and characteristic of a generic I/O link. Sometimes, it is not choosable, as it is pre‐defined in
Clocking architecture
specification; nevertheless,is one
it isofstill
thevery
most important
important design choices
to understand thatclocking
various impactsarchitectures
the performance and and
characteristic of a generic
their pros/cons. I/O link.
Figure 13 shows Sometimes,three itpopular
is not choosable,
clocking as it is pre-defined
architectures: in specification;
plesiochronous,
nevertheless, it is still very
source‐synchronous, andimportant
mesochronous. to understand various clocking
In a plesiochronous architectures
link, the TX and the RX andhavetheir
their pros/cons.
own
FigurePLLs, which are
13 shows synchronized
three to different
popular clocking reference clockplesiochronous,
architectures: sources. As a result, there is an intrinsic and
source-synchronous,
frequency offset
mesochronous. In between the TX and the
a plesiochronous link,RX.theAt TX
the receiver
and theside,
RXthe frequency
have offset PLLs,
their own is tracked
whichby are
rotating the
synchronized to different reference clock sources. As a result, there is an intrinsic frequencyaoffset
sampling clock phase, which is controlled by a CDR loop. On the other hand,
source‐synchronous link TX forwards its clock to the RX through a dedicated clock channel so there
between the TX and the RX. At the receiver side, the frequency offset is tracked by rotating the
is no frequency offset. In addition, it is assumed that the latencies of all data channels are
sampling clock phase, which is controlled by a CDR loop. On the other hand, a source-synchronous
synchronized; hence, no per‐lane phase tracking circuit is used, but the forwarded clock is simply
link TX forwards
distributed its clock
to each to the
data lane. TheRXdelay
throughof theadata
dedicated clock
and clock channel
paths so there
are corrected by is no frequency
introducing
offset. In addition,
additional delay to it either
is assumed
TX or RX, that the latencies
to maximize jitter of all data
tolerance as channels are synchronized;
well as eliminate static skew [64]. hence,
no per-lane
Comparedphaseto thetracking circuit islink,
plesiochronous used, thebut the forwarded
source synchronousclock is simply
link has a muchdistributed to each
simpler circuitry at data
lane. the
Thecost
delay of the
of the data andforwarded
additional clock paths clockarechannel.
corrected Asbya introducing additional
result, it significantly delaythe
relaxes to either
designTX or
complexity introduced by the supply scalability. However, it becomes impossible
RX, to maximize jitter tolerance as well as eliminate static skew [64]. Compared to the plesiochronous to eliminate the
skew across data and clock lanes as the data rate increases; thus,
link, the source synchronous link has a much simpler circuitry at the cost of the additional forwarded the source‐synchronous
clockarchitecture
channel. As cannot be a viable
a result, solution for
it significantly today’s
relaxes thehigh‐speed interfaces. The
design complexity mesochronous
introduced by thelinksupply
scalability. However, it becomes impossible to eliminate the skew across data and clock lanes as the
data rate increases; thus, the source-synchronous architecture cannot be a viable solution for today’s
high-speed interfaces. The mesochronous link resolves such issues of the source-synchronous link
Electronics 2020, 9, 1315 12 of 21
Electronics 2020, 9, x FOR PEER REVIEW 12 of 21

with resolves
per-lanesuch
de-skew
issuescircuits. Since there is no frequency
of the source‐synchronous offset, the
link with per‐lane de-skew
de‐skew circuit
circuits. does
Since not
there have to
is no
rotatefrequency
the phaseoffset,
continuously,
the de‐skewso the circuit
circuit doesimplementation is the
not have to rotate stillphase
simpler than the plesiochronous
continuously, so the circuit RX.
implementation is still simpler than the plesiochronous RX.

Figure
Figure 13.13.Clocking
Clocking architectures
architectures ofofthe
theI/O interface.
I/O interface.

Parallelized
Parallelized architecture
architecture with
with a areduced
reducedclock
clock rate
rate but
butwith
witha amulti‐phase
multi-phaseclock (i.e.,(i.e.,
clock half‐rate or or
half-rate
quarter‐rate clocking) has been one of the major innovations enabling today’s ultra‐high‐speed
quarter-rate clocking) has been one of the major innovations enabling today’s ultra-high-speed I/O I/O
with reasonable power consumption [22]. It plays a critical role in the supply‐scalable I/O, because it
with reasonable power consumption [22]. It plays a critical role in the supply-scalable I/O, because it
relaxes the required circuit bandwidth at reduced supply voltages. As a result, many supply‐scalable
relaxes the required circuit bandwidth at reduced supply voltages. As a result, many supply-scalable
I/O designs have adopted highly parallelized architecture (e.g., quarter rate) from early works [22] to
I/O designs have [16,32,34,57].
recent works adopted highly parallelized
However, architecture
the downside (e.g., quarter
of parallelized rate) from
architecture early works
with scalable supply[22] to
recentis works [16,32,34,57]. However, the downside of parallelized architecture with
that the mismatch effect goes worse with the scaled supply voltage [65]. As a result, duty‐cycle scalable supply is
that the mismatch
correction (foreffect goes clocking)
half‐rate worse with or the
the scaled supplyalignment
multi‐phase voltage [65]. As a result,
technique becomesduty-cycle
essentialcorrection
for
(for half-rate clocking)
supply‐scalable I/Oor the multi-phase
design. Specifically,alignment technique
for an example becomesclocking
of quarter‐rate essentialthat
forissupply-scalable
most popular I/O
forSpecifically,
design. scalable I/O,forquadrature
an exampleerror correction clocking
of quarter-rate (QEC) schemes, such
that is most as phase
popular calibration
for scalable I/O,with a
quadrature
quadrature phase detector [47,66] or with asynchronous sampling [15,16,67] or time‐division
error correction (QEC) schemes, such as phase calibration with a quadrature phase detector [47,66] or phase
calibration [57,68],
with asynchronous are widely
sampling employed.
[15,16,67] or time-division phase calibration [57,68], are widely employed.
4. Survey
4. Survey on State‐Of‐The‐Art
on State-Of-The-Art Supply‐Scalable I/O
Supply-Scalable I/O

In this section, we attempt to compare the previously published supply-scalable I/O designs in
terms of various aspects we discussed in the previous sections. Tables 1–3 show the comprehensive
review of supply-scalable transmitters, receivers, and transceivers, respectively. Note that they are
sorted with oldest publication first, and the clock generation (PLL) power is included in the transmitter.
Electronics 2020, 9, 1315 13 of 21

Throughout the surveys, we can find various meaningful trends. First, the data rate tends to increase
continuously. Second, the figure-of-merit (FoM, energy efficiency) gap between the minimum and
maximum data rates has been converging as the process technology scales down. It can be interpreted
as the dynamic switching power occupied the dominant portion of I/O power in older technologies
(e.g., [22,24]); however, it has been significantly reduced owing to the technology scaling, whereas the
others (the static current and the signaling power) have not [41,43]. Third, the quarter-rate clocking
has become the major clocking architecture and tends to exhibit better energy efficiency. Lastly, as we
already discussed in Figures 3 and 6, the works relying on off-chip sources for supply scaling do not
consider the overhead due to the supply scaling circuitry, such as non-zero energy loss, they tend
to exhibit better efficiency. Only five works include the on-chip supply scaling, where the regulator
efficiencies of around 80–90% have been reported.
Specifically, looking at Table 1, we can observe that the voltage-mode driver has become the
mainstream topology for the transmitters for the reasons we discussed in Section 3.4. On the other
hand, more than half of the transmitter works rely on the external clock source to mitigate the design
complexity due to the supply sensitivity of PLL. [12,16,22,31] managed to cover the entire range with a
single PLL; however, [16] uses an LC-PLL followed by a programmable divider but provides only a
few quantized frequencies. [12,22,31] use a ring-PLL to cover the entire range; however, their supply
voltages are separated from the other building blocks, which can also imply that it is difficult to scale
the clocking circuit with the other building blocks because of its sensitivity to the supply voltage.
The energy efficiency achieved from the highest data rate of each design ranges from 0.44 pJ/bit [27] to
43.3 pJ/bit [22], but we have to note that [27] does not include an equalizer, PLL, and on-chip supply
scaling. If we narrow down the scope to the transmitters having at least two of them, the best energy
efficiency becomes 1.97 pJ/bit at 32 Gb/s [16].
Looking into Table 2 where the survey of supply-scalable I/O receivers is presented, we can observe
that mesochronous clocking has been a mainstream architecture [15,24–27], as we discuss in Section 3.5.
The receiver designs that rely on external CDR/deskew calibration tend to exhibit better energy efficiency,
which implies the hardware overhead of robust operation with the on-chip CDR/deskew. For example, [27]
achieves 0.22 pJ/bit at 8 Gb/s from the 0.75-V supply, but [16] exhibits 1.06 pJ/bit at 8 Gb/s from the 0.72-V
supply. Note that it does not mean the overhead is almost 80% of a complete receiver, because there are
many differences in the level of completeness between [16] and [27]; for instance, [16] has much more
extensive equalization and wider eye opening. Therefore, we have to be sure of whether a design includes
such on-chip circuitry or not, while evaluating the performance of a receiver.
Table 3 shows the performance survey of the scalable I/O designs where complete transceivers
have been presented. The energy efficiency ranges from 0.51 to 15 pJ/bit for the lowest data rates and
from 0.66 to 76 pJ/bit for highest data rates. A few notable works are as follows. The pioneer works of
supply-scalable parallel [24] and serial I/Os from Stanford University [22], utilize DC–DC converters to
scale the on-chip supply voltages. They take the control voltages of analog delay-locked loop [24] and
analog ring-PLL [22], which can be a benchmark of operating frequency because they intrinsically track
the operating frequency, as already discussed in the Section 2, as a reference voltage of the DC–DC
converters. As a result, the supply voltages can scale automatically with no off-chip control. In [25],
Intel presents the voltage calibration scheme using a calibration VCO that replicates the critical path
in the transmitter, and it achieves an energy efficiency of 2.7–5.0 pJ/bit in 65-nm CMOS technology,
which is approximately a 10× improvement over [24] and [22]—however, without on-chip supply
scaling. Intel also presents [16], which achieves 32 Gb/s with a complete set of equalizers, on-chip
PLL, and CDR, with an energy efficiency of 3.25–6.41 pJ/bit. [31] incorporates the combination of a
DC–DC converter and LDO regulators to achieve high conversion efficiency (DC–DC) as well as to
protect supply-noise-sensitive circuits from the noisy output of a DC–DC converter, while achieving
3.6–7.24 pJ/bit efficiency with a 3–10 Gb/s range with supply scaling. In addition, [31] combines the
scalable supply technique with burst-mode operation, lowering the minimum effective data rate of
16 Mb/s with 34 pJ/bit.
Electronics 2020, 9, 1315 14 of 21

Table 1. Survey of supply-scalable transmitters. FFE: feed-forward equalizer, FoM: figure-of-merit, LDO: low-dropout, PLL: phase-locked loop.

Min. Max. Min. Max. FoM


Process Signaling # of Supply TX Swing Area FoM (pJ/b)
Rate Rate Equalizer Clocking Supply Supply (pJ/bit)
(nm) Mode PLLs Scaling (Vppd ) (mm2 ) @Min Rate
(Gb/s) (Gb/s) (V) (V) @Max Rate
[24] 350 0.2 0.8 Open drain None Half rate External DC–DC 1.3 3.2 0.1–0.15 N/A N/A 26.875
[22] 250 0.65 5 Current None 1/5 rate 1 DC–DC 0.9 2.5 0.1–0.3 N/A 8.5 43.3
External
[25] 65 5 15 Current 3-tap FFE Half rate External 0.68 1.05 0.1–0.72 0.033 1.5 2.3
source
External
[26] 45 5 25 Voltage None Half rate External 0.75 1.1 0.082–0.36 0.077 N/A N/A
source
Voltage/ Quarter External
[15] 32 2 16 3-tap FFE External 0.6 1.08 0.36–0.5 0.014 0.47 1.56
current rate source
Quarter External
[27] 65 4.8 8 Voltage None External 0.6 0.8 0.1–0.2 0.027 0.34 0.44
rate source
Quarter External
[16] 22 8 32 Voltage 3-tap FFE 1 0.72 1.07 0.1–0.6 N/A 2.19 1.97
rate source
DC–DC
[31] 65 3 10 Current 3-tap FFE Half rate 1 0.7 1.4 N/A N/A 2.7 4.8
+ LDO
Quarter
[12] 65 5 32 Voltage 2-tap FFE 1 LDO 0.85 1.3 0.4–1.3 0.17 3.45 2.74
rate
Quarter External
[34] 65 3 16 Voltage PWM External 0.5 0.9 N/A N/A 1.04 2.42
rate source
Electronics 2020, 9, 1315 15 of 21

Table 2. Survey of supply-scalable receivers.

Min. Max. Eye Min. Max.


Process CDR/Deskew Supply Area FoM (pJ/b) Fom (Pj/Bit)
Rate Rate Equalizer Clocking Opening Supply Supply
(nm) Loop Scaling (mm2 ) @Min Rate @Max Rate
(Gb/s) (Gb/s) (UI) (V) (V)
Mesochronous
[24] 350 0.2 0.8 None Half rate N/A DC–DC 1.3 3.2 N/A N/A 38.75
DLL + PI
Plesiochronous
[22] 250 0.65 5 None 1/5 rate N/A DC–DC 0.9 2.5 N/A 6.5 32.7
PLL
Mesochronous External
[25] 65 5 15 CTLE Half rate N/A 0.68 1.05 0.055 1.2 2.7
External CDR source
Mesochronous External
[26] 40 1.6 6.4 None Half rate N/A 0.75 1.1 0.133 N/A N/A
DLL + PI source
Quarter Mesochronous External
[15] 32 2 16 CTLE 0.5 0.6 1.08 0.02 0.52 1.02
rate External CDR source
Quarter Mesochronous External
[27] 65 4.8 8 CTLE 0.05 0.6 0.75 0.032 0.17 0.22
rate External CDR source
CTLE + Quarter Plesiochronous External
[16] 22 8 32 0.5 0.72 1.07 N/A 1.06 4.45
6-tap DFE rate DLL + PI source
Plesiochronous
[33] 110 0.5 4 None Half rate N/A DC–DC 0.685 0.784 0.56 5.36 0.97
PLL
Electronics 2020, 9, 1315 16 of 21

Table 3. Survey of supply-scalable transceivers. CDR: clock and data recovery.

Min. Max. Channel Min.


Process Supply Max. Area FoM (pJ/b) FoM (pJ/b)
Rate Rate Clocking Clock Rate # of PLLs Loss Equalizer Supply
(nm) Scaling Supply (V) (mm2 ) @Min Rate @Max Rate
(Gb/s) (Gb/s) (dB) (V)
[24] 350 0.2 0.8 Mesochronous Half rate External N/A None DC–DC 1.3 3.2 1.625 N/A 65.625
[22] 250 0.65 5 Plesiochronous 1/5 rate 1 N/A None DC–DC 0.9 2.5 0.63 15 76
TX FFE + External
[25] 65 5 15 Mesochronous Half rate External 10 0.68 1.05 0.088 2.7 5
CTLE source
TX FFE + External
[26] 45 5 25 Mesochronous Half rate External N/A 0.75 1.1 0.21 1.6 2.6
DFE source
Quarter TX FFE + External
[15] 32 2 16 Mesochronous External 11 0.6 1.08 0.039 0.99 2.56
rate CTLE source
Quarter External
[27] 65 4.8 8 Mesochronous External 8.4 CTLE 0.6 0.8 0.057 0.51 0.66
rate source
TX FFE +
Quarter External
[16] 22 8 32 Plesiochronous 1 16 CTLE + 0.72 1.07 0.079 3.25 6.41
rate source
DFE
Source DC-DC
[31] 65 3 10 Half rate 1 N/A TX FFE 0.9 1.3 2.37 3.6 7.24
synchronous + LDO
Quarter TX PWM + External
[34] 65 3 16 N/A (no CDR) External 24 0.5 0.9 0.13 1.65 3.14
rate RX passive source
Electronics 2020, 9, 1315 17 of 21

5. Summary and Conclusions


This paper overviews the supply-scalable I/O interfaces. At first, the motivations for the
supply-scalable I/O are discussed in Sections 1 and 2, from the computer architecture level down to
transistor level. The basic concepts and expected behavior of the supply-scalable I/O are introduced in
Section 2. Throughout Section 3, circuit techniques and critical building blocks to enable supply-scalable
I/O are reviewed. Section 4 presents a comprehensive survey of supply-scalable I/O designs whose
functionality has been verified from fabrication results and discusses the trend and where we stand.
From the survey, we can find that there have been many wonderful efforts to enable supply-scalable
I/O for energy-efficient computing; however, a true sense of complete supply-scalable high-speed I/O,
such that it includes on-chip supply scaling, on-chip PLL, per-lane CDR/deskew, and equalization, has
not yet been presented so far. In addition, the energy efficiency of those supply-scalable I/Os (3–7 pJ/bit)
has not reached that of non-scalable I/Os (<3 pJ/bit) [69]. The focus of this paper is not just introducing
the supply-scalable I/O technology but encouraging prospective researchers to work on this topic.

Funding: This research received no external funding.


Conflicts of Interest: The author declares no conflict of interest.

References
1. Kim, K. Silicon Technologies and Solutions for the Data-Driven World. In Proceedings of the IEEE
International Solid-State Circuits Conference-(ISSCC), San Francisco, CA, USA, 22–26 February 2015; pp. 1–7.
2. Cisco Annual Internet Report (2018–2023) White Paper. CISCO. Available online: https://www.cisco.com/
c/en/us/solutions/collateral/executive-perspectives/annual-internet-report/white-paper-c11-741490.pdf
(accessed on 8 July 2020).
3. Bae, W.; Yoon, K.J. Comprehensive Read Margin and BER Analysis of One Selector-One Memristor Crossbar
Array Considering Thermal Noise of Memristor with Noise-Aware Device Model. IEEE Trans. Nanotechnol.
2020, 19, 553–564. [CrossRef]
4. Frans, Y.; Carey, D.; Erett, M.; Amir-Aslanzadeh, H.; Fang, W.Y.; Turker, D.; Jose, A.P.; Bekele, A.; Im, J.;
Upadhyaya, P.; et al. A 0.5–16.3 Gb/s Fully Adaptive Flexible-Reach Transceiver for FPGA in 20 nm CMOS.
IEEE J. Solid-State Circuits 2015, 50, 1932–1944. [CrossRef]
5. Jalali, M.S.; Taghavi, M.H.; Melaren, A.; Pham, J.; Farzan, K.; DiClemente, D.; Van Ierssel, M.; Song, W.;
Asgaran, S.; Holdenried, C.; et al. A 4-Lane 1.25-to-28.05Gb/s Multi-Standard 6pJ/b 40dB Transceiver in 14nm
FinFET with Independent TX/RX Rate Support. In Proceedings of the IEEE International Solid-State Circuits
Conference-(ISSCC), San Francisco, CA, USA, 11–15 February 2018; pp. 106–107.
6. Upadhyaya, P.; Bekele, A.; Melek, D.T.; Zhao, H.; Im, J.; Cho, J.; Tan, K.H.; McLeod, S.; Chen, S.; Zhang, W.;
et al. A Fully-Adaptive Wideband 0.5-32.75Gb/s FPGA Transceiver in 16nm FinFET CMOS Technology.
In Proceedings of the Symposium on VLSI Circuits, Honolulu, HI, USA, 13–16 June 2016; pp. 1–2.
7. Zhang, B.; Khanoyan, K.; Hatamkhani, H.; Tong, H.; Hu, K.; Fallahi, S.; Abdul-Latif, M.; Vakilian, K.;
Fujimori, I.; Brewster, A. A 28 Gb/s Multistandard Serial Link Transceiver for Backplane Applications in
28 nm CMOS. IEEE J. Solid-State Circuits 2015, 50, 3089–3100. [CrossRef]
8. Upadhyaya, P.; Savoj, J.; An, F.-T.; Bekele, A.; Jose, A.; Xu, B.; Wu, D.; Turker, D.; Aslanzadeh, H.; Hedayati, H.;
et al. A 0.5-to-32.75Gb/s Flexible-Reach Wireline Transceiver in 20nm CMOS. In Proceedings of the IEEE
International Solid-State Circuits Conference-(ISSCC), San Francisco, CA, USA, 22–26 February 2015;
pp. 56–57.
9. Nishi, Y.; Abe, K.; Ribo, J.; Roederer, B.; Gopalan, A.; Benmansour, M.; Ho, A.; Bhoi, A.; Konishi, M.;
Moriizumi, R.; et al. An ASIC-Ready 1.25-6.25Gb/s SerDes in 90nm CMOS with Multi-Standard Compatibility.
In Proceedings of the 2008 IEEE Asian Solid-State Circuits Conference-(A-SSCC), Fukuoka, Japan, 3–5
November 2008; pp. 37–40.
10. Kimura, H.; Aziz, P.M.; Jing, T.; Sinha, A.; Kotagiri, S.P.; Narayan, R.; Gao, H.; Jing, P.; Hom, G.; Liang, A.;
et al. A 28 Gb/s 560 mW Multi-Standard SerDes with Single-Stage Analog Front-End and 14-Tap Decision
Feedback Equalizer in 28 nm CMOS. IEEE J. Solid-State Circuits 2014, 49, 3091–3103. [CrossRef]
Electronics 2020, 9, 1315 18 of 21

11. Kawamoto, T.; Norimatsu, T.; Kogo, K.; Yuki, F.; Nakajima, N.; Tsuge, M.; Usugi, T.; Hokari, T.; Koba, H.;
Komori, T.; et al. Multi-Standard 185fsrms 0.3-to-28Gb/s 40dB Backplane Signal Conditioner with Adaptive
Pattern-Match 36-Tap DFE and Data-Rate-Adjustment PLL in 28nm CMOS. In Proceedings of the IEEE
International Solid-State Circuits Conference-(ISSCC), San Francisco, CA, USA, 22–26 February 2015;
pp. 54–55.
12. Bae, W.; Ju, H.; Park, K.; Han, J.; Jeong, D.-K. A supply-scalable-serializing transmitter with controllable
output swing and equalization for next-generation standards. IEEE Trans. Ind. Electron. 2018, 65, 5979–5989.
[CrossRef]
13. Li, S.; Spagna, F.; Chen, J.; Wang, X.; Tong, L.; Gowder, S.; Jia, W.; Nicholson, R.; Iyer, S.; Song, R.; et al.
A Power and Area Efficient 2.5-16 Gbps Gen4 PCIe PHY in 10nm FinFET CMOS. In Proceedings of the 2018
IEEE Asian Solid-State Circuits Conference-(A-SSCC), Tainan, Taiwan, 5–7 November 2018; pp. 5–8.
14. Alon, E. Mixed-Signal Electrical Interfaces. In Proceedings of the IEEE Custom Integrated Circuits Conference,
Austin, TX, USA, 14–17 April 2019; pp. 1–57.
15. Mansuri, M.; Jaussi, J.E.; Kennedy, J.T.; Hsueh, T.-C.; Shekhar, S.; Balamurugan, G.; O’Mahony, F.; Roberts, C.;
Mooney, R.; Casper, B. A Scalable 0.128–1 Tb/s, 0.8–2.6 pJ/bit, 64-Lane Parallel I/O in 32-nm CMOS. IEEE J.
Solid-State Circuits 2013, 48, 3229–3242. [CrossRef]
16. Musah, T.; Jaussi, J.E.; Balamurugan, G.; Hyvonen, S.; Hsueh, T.C.; Keskin, G.; Shekhar, S.; Kennedy, J.;
Sen, S.; Inti, R.; et al. A 4–32 Gb/s Bidirectional Link with 3-Tap FFE/6-Tap DFE and Collaborative CDR in
22 nm CMOS. IEEE J. Solid-State Circuits 2014, 49, 3079–3090. [CrossRef]
17. Dickson, T.O.; Liu, Y.; Rylov, S.V.; Agrawal, A.; Kim, S.; Hsieh, P.H.; Bulzacchelli, J.F.; Ferriss, M.; Ainspan, H.A.;
Rylyakov, A.; et al. A 1.4 pJ/bit, Power-Scalable 16×12 Gb/s Source-Synchronous I/O with DFE Receiver in
32 nm SOI CMOS Technology. IEEE J. Solid-State Circuits 2015, 50, 1917–1931. [CrossRef]
18. Casper, B.; O’Mahony, F. Clocking Analysis, Implementation and Measurement Techniques for High-Speed
Data Links—A Tutorial. IEEE Trans. Circuits Syst. I Regul. Pap. 2009, 56, 17–39. [CrossRef]
19. Bae, W.; Jeong, D.-K. Analysis and Design of CMOS Clocking Circuit for Low Phase Noise; Institute of Engineering
and Technology: London, UK, 2020.
20. Sidiropoulos, S.; Horowitz, M. A semidigital dual delay-locked loop. IEEE J. Solid-State Circuits 1997, 32,
1683–1692. [CrossRef]
21. Lee, M.J.; Dally, W.J.; Chiang, P. Low-power area-efficient high-speed I/O circuit techniques. IEEE J.
Solid-State·Circuits 2000, 35, 1591–1599. [CrossRef]
22. Kim, J.; Horowitz, M. Adaptive Supply Serial Links with Sub-1-V Operation and Per-Pin Clock Recovery.
IEEE J. Solid-State Circuits 2002, 37, 1403–1413. [CrossRef]
23. Bae, W.; Ju, H.; Park, K.; Cho, S.-Y.; Jeong, D.-K. A 7.6 mW, 414 fs RMS-Jitter 10 GHz Phase-Locked Loop for
a 40 Gb/s Serial Link Transmitter Based on a Two-Stage Ring Oscillator in 65 nm CMOS. IEEE J. Solid-State
Circuits 2016, 51, 2357–2367. [CrossRef]
24. Wei, G.-Y.; Kim, J.; Liu, D.; Sidiropoulos, S.; Horowitz, M. A Variable-Frequency Parallel I/O Interface with
Adaptive Power-Supply Regulation. IEEE J. Solid-State Circuits 2000, 35, 1600–1610.
25. Balamurugan, G.; Kennedy, J.; Banerjee, G.; Jaussi, J.E.; Mansuri, M.; O’Mahony, F.; Casper, B.; Mooney, R. A
Scalable 5–15 Gbps, 14–75 mW Low-Power I/O Transceiver in 65 nm CMOS. IEEE J. Solid-State Circuits 2008,
43, 1010–1019. [CrossRef]
26. Balamurugan, G.; O’Mahony, F.; Mansuri, M.; E Jaussi, J.; Kennedy, J.T.; Casper, B. A 5-to-25Gb/s
1.6-to-3.8mW/(Gb/s) Reconfigurable Transceiver in 45nm CMOS. In Proceedings of the IEEE International
Solid-State Circuits Conference-(ISSCC), San Francisco, CA, USA, 8 February 2010; pp. 372–373.
27. Song, Y.-H.; Bai, R.; Hu, K.; Yang, H.-W.; Chiang, P.Y.; Palermo, S. A 0.47–0.66 pJ/bit, 4.8–8 Gb/s I/O Transceiver
in 65 nm CMOS. IEEE J. Solid-State Circuits 2013, 48, 1276–1289. [CrossRef]
28. Song, Y.-H.; Yang, H.-W.; Li, H.; Chiang, P.Y.; Palermo, S. An 8–16 Gb/s, 0.65–1.05 pJ/b, Voltage-Mode
Transmitter with Analog Impedance Modulation Equalization and Sub-3 ns Power-State Transitioning.
IEEE J. Solid-State Circuits 2014, 49, 2631–2643. [CrossRef]
29. Inti, R.; Shekhar, S.; Balamurugan, G.; Jaussi, J.; Roberts, C.; Hsueh, T.-C.; Casper, B.; Rajesh, I. A 0.5-to-0.75V,
3-to-8 Gbps/lane, 385-to-790 fJ/b, Bi-Directional, Quad-Lane Forwarded-Clock Transceiver in 22 nm CMOS.
In Proceedings of the Symposium on VLSI Circuits, Kyoto, Japan, 16–19 June 2015; pp. C346–C347.
Electronics 2020, 9, 1315 19 of 21

30. Shekhar, S.; Inti, R.; Jaussi, J.; Hsueh, T.-C.; Casper, B. A 1.2–5Gb/s 1.4–2pJ/b serial link in 22 nm CMOS with a
direct data-sequencing blind oversampling CDR. In Proceedings of the Symposium on VLSI Circuits, Kyoto,
Japan, 16–19 June 2015; pp. C350–C351.
31. Shu, G.; Hanumolu, P.K.; Choi, W.-S.; Saxena, S.; Kim, S.-J.; Talegaonkar, M.; Nandwana, R.; Elkholy, A.;
Wei, D.; Nandi, T. A 16Mb/s-to-8Gb/s 14.1-to-5.9pJ/b Source Synchronous Transceiver Using DVFS and Rapid
On/Off in 65nm CMOS. In Proceedings of the IEEE International Solid-State Circuits Conference-(ISSCC),
San Francisco, CA, USA, 31 January–4 February 2016; pp. 398–399.
32. Bae, W.; Ju, H.; Park, K.; Jeong, D.-K. A 6-to-32 Gb/s Voltage-Mode Transmitter with Scalable Supply, Voltage
Swing, and Pre-Emphasis in 65-nm CMOS. In Proceedings of the 2016 IEEE Asian Solid-State Circuits
Conference-(A-SSCC), Toyama, Japan, 7–9 November 2016; pp. 241–244.
33. Byun, S. 0.97 mW/Gb/s, 4 Gb/s CMOS clock and data recovery IC with dynamic voltage scaling. IET Circuits
Devices Syst. 2016, 10, 220–228. [CrossRef]
34. Ramachandran, A.; Anand, T. A 0.5-to-0.9V, 3-to-16Gb/s, 1.6-to-3.1pJ/b Wireline Transceiver Equalizing 27dB
Loss at 10Gb/s with Clock-Domain Encoding Using Integrated Pulse-Width Modulation (iPWM) in 65nm
CMOS. In Proceedings of the IEEE International Solid-State Circuits Conference-(ISSCC), San Francisco, CA,
USA, 11–15 February 2018; pp. 268–269.
35. Shekhar, S.; Inti, R.; Jaussi, J.; Hsueh, T.-C.; Casper, B. A Low-Power Bidirectional Link with a Direct
Data-Sequencing Blind Oversampling CDR. IEEE J. Solid-State Circuits 2019, 54, 1669–1681. [CrossRef]
36. Aiello, O.; Crovetti, P.; Alioto, M. Standard cell-based ultra-compact DACs in 40-nm CMOS. IEEE Access
2019, 7, 126479–126488. [CrossRef]
37. Shang, L.; Peh, L.-S.; Jha, N.K. Dynamic Voltage Scaling with Links for Power Optimization of Interconnection
Networks. In Proceedings of the International Symposium on High-Performance Computer Architecture
(HPCA), Anaheim, CA, USA, 8–12 February 2003; pp. 91–102.
38. Shin, D.; Kim, J. Power-Aware Communication Optimization for Networks-on-Chips with Voltage Scalable
Links. In Proceedings of the IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and
System Synthesis, Stockholm, Sweden, 8–10 September 2004; pp. 170–175.
39. Barroso, L.A.; Hölzle, U. The Case for Energy-Proportional Computing. Computer 2007, 40, 33–37. [CrossRef]
40. Abts, D.; Marty, M.R.; Wells, P.M.; Klausler, P.; Liu, H. Energy Proportional Datacenter Networks.
In Proceedings of the International Symposium on Computer Architecture (ISCA), Saint-Malo, France, 19–23
June 2010; pp. 338–347.
41. Bohr, M.T.; Young, I.A. CMOS Scaling Trends and Beyond. IEEE Mirco 2017, 37, 20–29.
42. Chen, X.; Wei, G.; Peh, L.-S. Design of Low-Power Short-Distance Opto-Electronic Transceiver Front-Ends
with Scalable Supply Voltages and Frequencies. In Proceedings of the 2008 International Symposium on Low
Power Electronics & Design, Bangalore, India, 11–13 August 2008; pp. 277–282.
43. Chang, K.; Wei, J.; Huang, C.; Li, S.; Donnelly, K.; Horowitz, M.; Li, Y.; Sidiropoulos, S. A 0.4–4-Gb/s CMOS
Quad Transceiver Cell Using On-Chip Regulated Dual-Loop PLLs. IEEE J. Solid-State Circuits 2003, 38,
747–754. [CrossRef]
44. Eble, J.C.; Best, S.; Leibowitz, B.; Luo, L.; Palmer, R.; Wilson, J.; Zerbe, J.; Amirkhany, A.; Nguyen, N.
Power-Efficient I/O Design Considerations for High-Bandwidth Applications. In Proceedings of the IEEE
Custom Integrated Circuits Conference, San Jose, CA, USA, 17–20 September 2011; pp. 1–4.
45. Hatamkhani, H.; Yang, C.-K. A Study of the Optimal Data Rate for Minimum Power of I/Os. IEEE Trans.
Circuits Syst. II Express Briefs 2006, 53, 1230–1234. [CrossRef]
46. Zhang, B. Multi-Gbps Serial Backplane Transceiver: From Dilemma to Solution. In Proceedings of the 2015
IEEE Asian Solid-State Circuits Conference-(A-SSCC), Xiamen, China, 9–11 November 2015; pp. 1–87.
47. Chen, S.; Zhou, L.; Zhuang, I.; Im, J.; Melek, D.; Namkoong, J.; Raj, M.; Shin, J.; Frans, Y.; Chang, K.
A 4-to-16GHz Inverter-Based Injection-Locked Quadrature Clock Generator with Phase Interpolators
for Multi-Standard I/Os in 7nm FinFET. In Proceedings of the IEEE International Solid-State Circuits
Conference-(ISSCC), San Francisco, CA, USA, 11–15 February 2018; pp. 390–391.
48. Bae, W. CMOS Inverter as Analog Circuit: An overview. J. Low Power Electron. Appl. 2019, 9, 26. [CrossRef]
49. Frans, Y.; McLeod, S.; Hedayati, H.; Elzeftawi, M.; Namkoong, J.; Lin, W.; Im, J.; Upadhyaya, P.; Chang, K. A
40-to-64 Gb/s NRZ Transmitter with Supply-Regulated Front-End in 16 nm FinFET. IEEE J. Solid-State Circuits
2016, 51, 3167–3177. [CrossRef]
Electronics 2020, 9, 1315 20 of 21

50. Mansuri, M.; Yang, C.-K. A low-power adaptive bandwidth PLL and clock buffer with supply-noise
compensation. IEEE J. Solid-State Circuits 2003, 38, 1804–1812. [CrossRef]
51. Lee, D.; Kim, Y.-H.; Lee, D.; Kim, L.-S. A 0.65-V, 11.2-Gb/s Power Noise Tolerant Source-Synchronous
Injection-Locked Receiver with Direct DTLB DFE. IEEE Trans. Circuits Syst. II Express Briefs 2018, 65,
1564–1568. [CrossRef]
52. Alon, E.; Kim, J.; Pamarti, S.; Chang, K.; Horowitz, M. Replica Compensated Linear Regulators for
Supply-Regulated Phase-Locked Loops. IEEE J. Solid-State Circuits 2006, 41, 413–424. [CrossRef]
53. Savoj, J.; Hsieh, K.C.-H.; An, F.-T.; Gong, J.; Im, J.; Jiang, X.; Jose, A.P.; Kireev, V.; Lim, S.-W.; Roldan, A.;
et al. A Low-Power 0.5–6.6 Gb/s Wireline Transceiver Embedded in Low-Cost 28 nm FPGAs. IEEE J.
Solid-State Circuits 2013, 48, 2582–2594. [CrossRef]
54. Gao, X.; Klumperink, E.A.; Geraedts, P.F.; Nauta, B. Jitter analysis and a benchmarking figure-of-merit for
phase-locked loops. IEEE Trans. Circuits Syst. II Express Briefs 2009, 56, 117–121.
55. Hossain, M.; El-Halwagy, W.; Hossain, A.D. Fractional-N DPLL-Based Low-Power Clocking Architecture for
1–14 Gb/s Multi-Standard Transmitter. IEEE J. Solid-State Circuits 2017, 52, 2647–2662. [CrossRef]
56. Kim, J.; Balankutty, A.; Elshazly, A.; Huang, Y.-Y.; Song, H.; Yu, K.; O’Mahony, F. A 16-to-40Gb/s Quarter-Rate
NRZ/PAM4 Dual-Mode Transmitter in 14nm CMOS. In Proceedings of the IEEE International Solid-State
Circuits Conference-(ISSCC), San Francisco, CA, USA, 22–26 February 2015; pp. 60–61.
57. Choi, W.-S.; Shu, G.; Talegaonkar, M.; Liu, Y.; Wei, D.; Benini, L.; Hanumolu, P.K. A 0.45–0.7 V 1–6 Gb/s
0.29–0.58 pJ/b Source-Synchronous Transceiver Using Near-Threshold Operation. IEEE J. Solid-State Circuits
2018, 53, 884–895. [CrossRef]
58. Hossain, M.; Kaviani, K.; Daly, B.; Shirasgaonkar, M.; Dettloff, W.; Stone, T.; Prabhu, K.; Tsang, B.; Eble, J.;
Zerbe, J. A 6.4/3.2/1.6 Gb/s Low Power Interface with All Digital Clock Multiplier for On-the-Fly Rate
Switching. In Proceedings of the IEEE Custom Integrated Circuits Conference, San Jose, CA, USA, 9–12
September 2012; pp. 1–4.
59. Leibowitz, B.; Palmer, R.; Poulton, J.; Frans, Y.; Li, S.; Wilson, J.; Bucher, M.; Fuller, A.M.; Eyles, J.;
Aleksic, M.; et al. A 4.3 GB/s Mobile Memory Interface with Power-Efficient Bandwidth Scaling. IEEE J.
Solid-State Circuits 2010, 45, 889–898. [CrossRef]
60. Menolfi, C.; Toifl, T.; Buchmann, P.; Kossel, M.; Morf, T.; Weiss, J.; Schmatz, M. A 16Gb/s source-series
terminated transmitter in 65nm CMOS SOI. In Proceedings of the 2007 IEEE International Solid-State Circuits
Conference, Digest of Technical Papers, San Francisco, CA, USA, 11–15 February 2007; pp. 446–614.
61. Inti, R.; Elshazly, A.; Young, B.; Yin, W.; Kossel, M.; Toifl, T.; Hanumolu, P.K. A Highly Digital 0.5-to-4Gb/s
1.9mW/Gb/s Serial Link Transceiver Using Current-Recycling in 90nm CMOS. In Proceedings of the IEEE
International Solid-State Circuits Conference-(ISSCC), San Francisco, CA, USA, 20–24 February 2011;
pp. 152–153.
62. Song, Y.-H.; Palermo, S. A 6-Gbit/s Hybrid Voltage-Mode Transmitter with Current-Mode Equalization in
90-nm CMOS. IEEE Trans. Circuits Syst. II Express Briefs 2012, 59, 491–495. [CrossRef]
63. Chan, K.L.; Tan, K.H.; Frans, Y.; Im, J.; Upadhyaya, P.; Lim, S.W.; Chiang, P.C. A 32.75-Gb/s voltage-mode
transmitter with three-tap FFE in 16-nm CMOS. IEEE J. Solid-State Circuits 2017, 52, 2663–2678. [CrossRef]
64. Lee, K.; Kim, S.; Shin, Y.; Jeong, D.-K.; Lim, G.; Kim, B.; Da Costa, V.; Lee, D. A Jitter-Tolerant 4.5Gb/s
CMOS Interconnect for Digital Display. In Proceedings of the IEEE International Solid-State Circuits
Conference-(ISSCC), San Francisco, CA, USA, 5 February 1998; pp. 310–311.
65. Hu, K.; Bai, R.; Jiang, T.; Ma, C.; Ragab, A.; Palermo, S.; Chiang, P.Y. 0.16-0.25 pJ/bit, 8 Gb/s Near-Threshold
Serial Link Receiver with Super-Harmonic Injection-Locking. IEEE J. Solid-State Circuits 2012, 47, 1842–1853.
[CrossRef]
66. Bae, W.; Jeong, G.-S.; Park, K.; Cho, S.-Y.; Kim, Y.; Jeong, D.-K. A 0.36 pJ/bit, 0.025 mm2 , 12.5 Gb/s
Forwarded-Clock Receiver with a Stuck-Free Delay-Locked Loop and a Half-Bit Delay Line in 65-nm CMOS
Technology. IEEE Trans. Circuits Syst. I Regul. Pap. 2016, 63, 1393–1403. [CrossRef]
67. Baronti, F.; Lunardini, D.; Roncella, R.; Saletti, R. A self-calibrating delay-locked delay line with shunt-capacitor
circuit scheme. IEEE J. Solid-State Circuits 2004, 39, 384–387. [CrossRef]
Electronics 2020, 9, 1315 21 of 21

68. Kim, S.; Ko, H.-G.; Cho, S.-Y.; Lee, J.; Shin, S.; Choo, M.-S.; Chi, H.; Jeong, D.-K. 29.7 A 2.5GHz injection-locked
ADPLL with 197fsrms integrated jitter and −65dBc reference spur using time-division dual calibration.
In Proceedings of the IEEE International Solid-State Circuits Conference-(ISSCC), San Francisco, CA, USA,
5–9 February 2017; pp. 494–495.
69. Daly, D.C.; Fujino, L.C.; Smith, K.C. Through the looking glass-the 2018 edition: Trends in solid-state circuits
from the 65th ISSCC. IEEE Solid-State Circuits Mag. 2018, 10, 30–46. [CrossRef]

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

You might also like