You are on page 1of 11

applied

sciences
Article
A Novel Cross-Latch Shift Register Scheme for Low
Power Applications
Po-Yu Kuo 1 , Ming-Hwa Sheu 1 , Chang-Ming Tsai 1 , Ming-Yan Tsai 1 and Jin-Fa Lin 2, *

1 Electronic Engineering, National Yunlin University of Science and Technology, Yunlin County,
Douliu City 64002, Taiwan; kuopy@yuntech.edu.tw (P.-Y.K.); sheumh@yuntech.edu.tw (M.-H.S.);
M10813014@gemail.yuntech.edu.tw (C.-M.T.); M10513211@gemail.yuntech.edu.tw (M.-Y.T.)
2 Information and Communication Engineering, Chaoyang University of Technology, Wufeng District,
Taichung City, Taichung 413310, Taiwan
* Correspondence: jflin@cyut.edu.tw

Abstract: The conventional shift register consists of master and slave (MS) latches with each latch
receiving the data from the previous stage. Therefore, the same data are stored in two latches
separately. It leads to consuming more electrical power and occupying more layout area, which
is not satisfactory to most circuit designers. To solve this issue, a novel cross-latch shift register
(CLSR) scheme is proposed. It significantly reduced the number of transistors needed for a 256-bit
shifter register by 48.33% as compared with the conventional MS latch design. To further verify its
functions, this CLSR was implemented by using TSMC 40 nm CMOS process standard technology.
The simulation results reveal that the proposed CLSR reduced the average power consumption by
36%, cut the leakage power by 60.53%, and eliminated layout area by 34.76% at a supply voltage of
0.9 V with an operating frequency of 250 MHz, as compared with the MS latch.

Keywords: shift register; power consumption; cross-latch scheme; leakage power; layout area



1. Introduction
Citation: Kuo, P.-Y.; Sheu, M.-H.;
The shift register has been commonly used in various digital circuits for decades. This
Tsai, C.-M.; Tsai, M.-Y.; Lin, J.-F. A
circuit block can be applied in conversion between serial and parallel interfaces during data
Novel Cross-Latch Shift Register
transmission, and it can be used as a delay circuit too. The shift register is a key component
Scheme for Low Power Applications.
Appl. Sci. 2021, 11, 129.
in digital circuits and is generally used in active-matrix displays, sensors, memories,
https://dx.doi.org/app11010129
communication receivers, and real-time image processing chips [1–8]. The shrinking feature
size of process technology has enhanced the potential performance of electronic devices,
Received: 22 November 2020 and the capacity of shift registers has increased significantly. Therefore, asking for less
Accepted: 23 December 2020 overhead integrated circuitry layout area [9–12] and less power consumption [9,11,13–26]
Published: 25 December 2020 in the shift registers design then becomes more important with the capacity increasing [27].
The conventional architecture of a shift register consists of N-bit flip-flops connected
Publisher’s Note: MDPI stays neu- in series. Circuit implementation in such architecture is fast and efficient. However,
tral with regard to jurisdictional claims requesting less overhead layout area and less power consumption still cannot be fulfilled.
in published maps and institutional Additionally, in master and slave (MS) latch architecture, the same data are stored in two
affiliations. latches during one clock cycle. Hence, the MS latch tends to exhibit redundant area and
dissipate more electric power. Obviously, if shift registers are implemented by MS latches
without a compensation circuit, the data will be transmitted to multiple outputs at the
same time and may result in a malfunction. Recently, B. D. Yang proposed a shift register
Copyright: © 2020 by the authors. Li-
censee MDPI, Basel, Switzerland. This
that employed a pulse-latched methodology to solve this issue [27]. The concept behind
article is an open access article distributed
this reported shift register is using a sub-shift register with an N-bit latch and using an
under the terms and conditions of the
N + 1-bit clock pulse generator to generate an N + 1 non-overlapping clock in one clock
Creative Commons Attribution (CC BY) cycle to avoid data duplication. Although in this work, the author employed one additional
license (https://creativecommons.org/ latch in each sub-register to store 1-bit temporary data to solve the timing problem between
licenses/by/4.0/). pulsed latches; however, the circuitry complexity was increased as well. The adoption

Appl. Sci. 2021, 11, 129. https://dx.doi.org/10.3390/app11010129 https://www.mdpi.com/journal/applsci


Appl. Sci. 2021, 11, 129 2 of 11

of multiple non-overlap delayed pulsed clock signals limits the circuit operating clock
frequency. Moreover, this shift register uses N sub-shift registers to increase the number
of latches at the same time. In [27], the author proposed a latch controlled by a pulse
generator. This is similar to the behavior of a trigger flip-flop, and the area can be decreased
by sharing a pulse generator so that it is suitable for high-speed application. However,
the pulse width is difficult to control due to the process variations. Based upon the above
discussions, the simultaneous requests for the elimination of the integrated circuit layout
area and the reduction of electrical power consumption still could not be realized in the
reported work. To solve that issue, in this study, a cross-latch scheme combined with a
flip-flop design is proposed to achieve area saving under low operation voltage.
To overcome the current drawbacks of the conventional MS latch design, in this paper,
we propose a novel cross-latch shift register (CLSR) scheme. The proposed CLSR requires
only one latch to perform the same functions as those of the MS latches in a conventional
shift register. Thus, the proposed CLSR consumes less power and occupies less layout area
by reducing the number of transistors.
Details of the proposed configuration are explained in the next section. In Section 2,
the principle of the proposed cross-latch shift register (CLSR) is discussed. Experimental
validation and a mechanism depiction for the proposed CLSR are presented in Section 3.

2. The Proposed Cross-Latch Shift Register (CLSR) Design


Figure 1 shows the process of data transmission for a conventional shift register under
different clock signals (CLK = 1 and CLK = 0). As shown in Figure 1a, for a shift register
with a positive edge triggered flip-flop, when CLK = 0 at the first stage flip-flop (FF0),
the master latch (M0) is opened and receives the input data while the slave latch (S0) is
closed and stores the previous data. At the same time, in the second stage flip-flop (FF1),
the master latch (M1) will track the data stored in the slave latch (S0) of FF0. Likewise,
as shown in Figure 1b, when CLK = 1, the slave latch of each flip-flop will track the data
in the master latch and send it to the output. According to a data transmission analysis,
it can be observed that when CLK = 0, the slave latch (S0) of FF0 and master latch (M1)
of FF1 store the same data. When CLK = 1, each flip-flop holds the same data in the
master and slave latches, respectively. This situation leads to storage of redundant data in
many latches. Implementing and keeping this state may consume extra electrical power
and may occupy more overhead integrated circuitry layout area. Note that in addition to
showing the traditional transmission gate flip-flop (TGFF), Figure 2 also includes three
recently proposed low-power FF designs [28–30]. Figure 2b shows the adaptive coupling
(AC) flip-flop design [28]. Unlike the traditional MS latch-based design, this design uses
a differential latch structure with pass transistor logic to achieve true single-phase clock
(TSPC) operation. To overcome the effects of process variation on the master latch, a pair of
level recovery circuits were inserted into the cross-coupled path. In this design, the clock
drives only 4 transistors, and the total number of transistors is 22. When the FF operates at
lower data-switching activity, lighter clock loads and reduced circuitry in the FF design can
significantly reduce dynamic power consumption. However, there are floating problems in
several nodes inside this FF design, which impose limitations on the applications of the
ACFF design [28]. Figure 2c shows the true single-phase clock flip-flop, named TSPCFF [29].
It is composed of a conventional dynamic TSPC-based FF design with 9 transistors colored
in blue and an additional 9 transistors to ensure its static operations and sufficient output
drive capability. This FF design provides better power performance compared with the
traditional TGFF design. Similar to the ACFF design, it also suffers from the floating
problem of internal nodes. The CSFF (charge sensing flip-flop) design incorporates an
XOR logic in its master latch to compare inputs D/DN with outputs QI/QN, as shown in
Figure 2d [30]. In this architecture, the input data are captured only when a discrepancy
occurs. This gives a performance edge in power consumption over conventional MS
latch-based FF designs when the data switching activity is low. The simulation results
indicate that this design encounters a floating problem of an internal node, as shown in
Appl. Sci. 2021, 11, x FOR PEER REVIEW 8 of 11
Appl. Sci. 2021, 11, x FOR PEER REVIEW 8 of 11

Appl. Sci. 2021, 11, 129 discrepancy occurs.


discrepancy occurs. ThisThis gives
gives aa performance
performance edge edge in in power
power consumption
consumption over over conven-
conven-
3 of 11
tional MS
tional MS latch-based
latch-based FF FF designs
designs whenwhen the the data
data switching
switching activity
activity isis low.
low. The The simulation
simulation
results indicate that this design encounters a floating
results indicate that this design encounters a floating problem of an internal node, problem of an internal node, as as
shown
shown in Figure 3. Referring to the simulation waveforms shown below, if input data D
Figure in 3. Figure
Referring 3. Referring to the simulation
to the simulation waveforms waveforms
shown below,shownifbelow,
input if input
data data D
D change
change
change when
when CK = 1, internal nodes CS and DN are actually in a floating state, which
when CK = 1, CK = 1, internal
internal nodes CSnodes and DN CS are
andactually
DN arein actually in astate,
a floating floating which state, which
adversely
adversely
adversely leads
leadspowerto extra
to extra power consumption
power consumption in following
in following inverter.
inverter. Due
Due to this
to thisthese reason,
reason,
leads to extra consumption in following inverter. Due to this reason, FF
these FF
these FF designs
designs are thus thus excluded from from thethe performance
performance comparison
comparison in in this
this paper.
paper. These
These
designs are thus are excludedexcluded
from the performance comparison in this paper. These circuits
circuits use
circuits use aa true-signal-clock
true-signal-clock phasing phasing scheme to to reduce clockclock load
load andand reduce dynamic dynamic
use a true-signal-clock phasing schemescheme to reduce reduceclock load and reduce reduce dynamic power
power
power consumption.
consumption. However,
However, some internal
some internal floating nodes exist. Figure 3 shows thesim-
sim-
consumption. However, some internal floatingfloating nodesFigure
nodes exist. exist. Figure
3 shows 3 shows the
the simulation
ulation
ulation waveforms
waveforms of
ofFF these
these FF designs.
FF designs. Therefore,
Therefore, in these
in these designs,
designs, the same internal float-
waveforms of these designs. Therefore, in these designs, the the
same same internal
internal float-
floating
ing
ing nodes
nodes dissipate
dissipate additional
additional static power consumption. Although this issue does not
nodes dissipate additional staticstatic
power power consumption.
consumption. Although
Although this issue
this issue does not does not
affect
affect its
affect its function,
function, additional
additional static
static power
power consumption
consumption is is confirmed,
confirmed, especially
especially at at low
low
its function, additional static power consumption is confirmed, especially at low operating
operating
operating speeds or standby mode. Table 1 shows a comparison of the total number of
speeds or speedsstandbyormode. standby mode.
Table 1 showsTablea 1comparison
shows a comparison
of the totalof the total
number of number
transistors of
transistors
transistors
and the layout and the
and areas layout
the layout areas
of theareas
2-bitofof the 2-bit
the 2-bit
shifted shifted
shifted
flip-flop flip-flop
flip-flop
designs, designs,
designs,the
including including
including
transistors the transis-
the transis-
used to
tors used
tors useddifferential
generate to generate
to generateclock differential
differential clock
signalsclock signalsclock
signals
and pulsed and pulsed
and pulsed clock
signalsclock signals [27].
[27]. signals [27].

(a)
(a) (b)
(b)
Figure 1. Data
Data transmission
transmission of
of a conventionalshift
shift register.(a)
(a) WhenCLK
CLK = 0. (b)When
(b)When CLK
CLK == 1.
1.
Figure
Figure 1.
1. Data transmission of aa conventional
conventional shift register.
register. (a) When
When CLK == 0.
0. (b)When CLK = 1.

(a)
(a) (b)
(b)

(c)
(c) (d)
(d)
Figure 2. Previous master and slave (MS) latches based flip-flop (FF) designs. (a) Transmission gate flip-flop (TGFF). (b)
Figure 2. Previous
Previousmaster
masterandandslave
slave(MS)
(MS)latches based
latches flip-flop
based (FF)
flip-flop designs.
(FF) (a) Transmission
designs. (a) Transmission gategate
flip-flop (TGFF).
flip-flop (b)
(TGFF).
Adaptive coupling flip-flop (ACFF) [28]. (c) True single-phase clock flip-flop (TSPCFF) [29]. (d) Charge sensing flip-flop
Adaptive coupling flip-flop (ACFF) [28]. (c) True single-phase clock flip-flop (TSPCFF) [29]. (d) Charge sensing
(b) Adaptive coupling flip-flop (ACFF) [28]. (c) True single-phase clock flip-flop (TSPCFF) [29]. (d) Charge sensing flip-flopflip-flop
(CSFF) [30].
(CSFF)
(CSFF) [30].
[30].
Appl. Sci. 2021, 11, x FOR PEER REVIEW 8 of 11
Appl. Sci. 2021, 11, 129 4 of 11

CLK

Data

0.81V Fig2b_XB
0.13V Fig2b_X

Fig2b_Power

Fig2c_N1
0.36V
Fig2c_N2

Fig2c_Power

Fig2d_CS
0.4V
0.26V 0.17V
Fig2d_DN

Fig2d_Power

Figure 3. Snapshots of post-layout simulation waveforms. ACFF design: top rows. TSPCFF design: next rows. CSFF design:
Figure 3. Snapshots of post-layout simulation waveforms. ACFF design: top rows. TSPCFF design: next rows. CSFF de-
bottom rows.
sign: bottom rows.

Table 1. Transistor-count comparison of 2-bit latch shift register.


Table 1. Transistor-count comparison of 2-bit latch shift register.
TGFF ACFF [28] TSPCFF [29] CSFF [30] CLSR
TGFF ACFF [28] TSPCFF [29] CSFF [30] CLSR
Number of transistors 48 44 36 48 36
Number of transistors
1-bit P/N of transistors
48
12/12
44
11/11
36
8/10
48
11/13 9/9
36
1-bit P/N of transistors
Floating problem 12/12
N 11/11
Y 8/10
Y 11/13
Y N 9/9
Power
Floating (nW)
problem 300.10
N 304.00
Y 700.70
Y 271.10
Y 163.90N
Area (µm2 ) 16.76 15.67 13.16 18.18 13.31
Power (nW) 300.10 304.00 700.70 271.10 163.90
Area (µm2) 16.76 15.67 13.16 18.18 13.31
2.1. Principle of the Proposed Circuit Architecture
To solve the issues of high overhead layout area and high power consumption, the
2.1. Principle of the Proposed Circuit Architecture
proposed CLSR scheme includes cross-latch architecture in optimizing the integrated
To solve
circuitry thearea
layout issues of high overhead
and lowering layout area and
the power consumption. Figure high power
4 shows the consumption,
architectures the
proposed
of the MS CLSR
designscheme
(TGFF)includes cross-latch
and the proposed architecture
cross-latch shift in optimizing
register (CLSR)the integrated
design. The cir-
proposed
cuitry layout CLSR
areascheme consiststhe
and lowering of one cross-latch
power that performs
consumption. Figurethe same functions
4 shows as
the architectures
those of the master latch and slave latch of the conventional shift register.
of the MS design (TGFF) and the proposed cross-latch shift register (CLSR) design. When CLK = 0, The
the slave latch S0 and master latch M1 store the same data. Thus, these
proposed CLSR scheme consists of one cross-latch that performs the same functions astwo latches can be
replaced by one cross-latch in the proposed CLSR design. When CLK = 1, the proposed
those of the master latch and slave latch of the conventional shift register. When CLK = 0,
CLSR uses one latch to hold the same data stored in the master latch and slave latch,
therespectively.
slave latchInS0this
and master latch M1 store the same data. Thus, these two latches can be
CLSR scheme, we replaced the conventional master–slave (MS) latches
replaced
(with theby one cross-latch
drawback of havingintwothe proposed
latches CLSR
hold the samedesign.
data at aWhen CLK
different = 1,
clock the proposed
frequency)
CLSR
withuses one latch to hold the same data stored in the master latch and slave latch, re-
one cross-latch.
spectively. In this CLSR scheme, we replaced the conventional master–slave (MS) latches
(with the drawback of having two latches hold the same data at a different clock fre-
quency) with one cross-latch.
In the following paragraph, we use a 3-bit shift register to explain the difference be-
tween the proposed CLSR scheme and the conventional MS latch architecture. Figure 5
shows the schematic of the proposed cross-latch shift register (CLSR) design. When CLK
= 1, the master and slave latches of each flip-flop (FF0, FF1, FF2) in the conventional MS
latch architecture store the same data. Thus, the two latches can be replaced by one latch
(green box) and still hold the necessary data. Likewise, when CLK = 0, the master and
slave latches of different flip-flops store the same data so that the data can be stored by
the path shown in the blue box. The data are conducted through the latch in the blue box
when CLK = 0, and they are conducted through the latch in the green box when CLK = 1.
To implement this proposed CLSR scheme, an additional inverter is added in the flip-
flop FF0 to ensure that the input data and output Q are at the same level. In the last stage,
flip-flop FF2, an independent latch is adopted to ensure that static operation is maintained
in the output stage. According to the above discussion, when the N-bit shift register is
built, we need to copy FF1 until N-2 bit, and FF0 and FF2 stay in the first and last stages,
Appl. Sci. 2021, 11, 129 5 of 11
respectively. In sum, the CLSR design fulfills the fundamental functions that the MS de-
sign has and wipes out the drawbacks that the MS design bears.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 8 of 11

The proposed CLSR scheme will not store redundant data during data transmission.
Therefore, low power consumption and less integrated circuitry layout area design goals
can be achieved simply because we have reduced the number of transistors dramatically.
To implement this proposed CLSR scheme, an additional inverter is added in the flip-
flop FF0 to ensure that the input data and output Q are at the same level. In the last stage,
flip-flop FF2, an independent latch is adopted to ensure that static operation is maintained
(a) (b)
in the output stage. According to the above discussion, when the N-bit shift register is
Figure
Figure 4.4.Shift
Shiftregister built, we(a)
registerarchitectures:
architectures: need to copy FF1
(a)master–slave
master–slave until
design
design N-2
and
and bit,
(b)the
(b) and FF0 cross-latch
theproposed
proposed and FF2 stay
cross-latch in
shift
shift the first
register
register and design.
(CLSR)
(CLSR) last stages,
design.
respectively. In sum, the CLSR design fulfills the fundamental functions that the MS de-
FF0
sign hasIn the
andfollowing
wipes outparagraph, FF1we use
the drawbacks thatathe
3-bit
MSshift register
design FF2to explain the difference
bears.
between Latch path when CLK=0
the proposed CLSR scheme and the conventional MS latch architecture. Figure 5
CLKI CLKI CLKI
shows the schematic of the proposed cross-latch shift register (CLSR) design. When
CLK = 1, the masterCLKBI and slave latches of each flip-flop CLKBI (FF0, FF1, FF2)CLKBI
in the conventional
CLKI MS latch architecture
CLKBI
Q0
store the same data. Thus,
CLKI CLKBI
Q1
the two latches can be replaced by one
CLKI CLKBI
Data Q2
latch (green box) andCLKBI still hold the necessary data. CLKBI Likewise, when CLK = 0, the master and
CLKBI CLKI CLKI CLKI
slave latches of different flip-flops store the same data so that the data can be stored by the
CLKBI CLKBI CLKBI
path shown in the blue box. The data are conducted through the latch in the blue box when
CLK CLKI= 0, and they are conducted CLKI through the latch in the green CLKIbox when CLK = 1. The
Latch path when CLK=1
proposed CLSR scheme will not store redundant data during data transmission. Therefore,
(a) (b)
low power
Figure consumption
5. Schematic and lessCLSR
of the proposed integrated circuitry layout area design goals can be
architecture.
achieved(a)simply
Figure 4. Shift register architectures: because
master–slave we have
design reduced
and (b) the number
the proposed of transistors
cross-latch dramatically.
shift register (CLSR) design.
2.2. Delay Clock Circuit
FF0 Figure 6bwhen
Latch path shows FF1 FF2
CLK=0the post-layout simulation waveforms of the 256-bit shift register
implemented using CLKI the proposed CLSR scheme CLKI design when operating
CLKI at 0.9 V/1.0
GHz/TT corner. The industry uses a two-letter designation to describe the different cor-
CLKBI CLKBI CLKBI
ners, where the first letter refers to the NMOS device, and the second refers to the PMOS
CLKI CLKBI CLKI CLKBI CLKI CLKBI
Data device. The TTQ0corner is the center corner where Q1 wafers are normally produced Q2 (e.g., typ-
CLKBI ical process parameters).
CLKI CLKBI The proposed
CLKI CLSRCLKBIscheme designCLKI satisfies the realization of

the shift register while reducingCLKBI


CLKBI circuit complexity function and maintaining a fully static
CLKBI

operation.
CLKI
Although the proposed
CLKI
CLSR scheme reduces both
CLKI
the power consumption
Latch path when
andCLK=1
layout area of the circuit, it must bear the challenges caused by data conflicts.
Figure
Figure 6a shows
5. Schematic of theof
a specificCLSR
proposed
situation caused by the rising edge of the clock. As the
architecture.
Figure 5. Schematic the proposed CLSR architecture.
clock driver changes its phase from low to high, it produces a short phase difference be-
tween
2.2. the rising
To implement
Delay Clock edge
this and
Circuit fallingCLSR
proposed edge, scheme,
so that CLKB and CLKI
an additional achieveisequal
inverter added potential.
in the
The previous
flip-flop FF0 todata feedback
ensure that pathinput
the (red)data
is stronger
and than the
output Q current
are at input
the samedata path (green).
level. In the
Figure 6b shows the post-layout simulation waveforms of the 256-bit shift register
Therefore, a data
last stage, flip-flop conflict exists between nodes X and Y. The voltage of this glitch is about
implemented usingFF2,
theanproposed
independent CLSRlatch is adopted
scheme to when
design ensureoperating
that static at
operation
0.9 V/1.0 is
0.167 V. This
maintained issue
in the will stage.
output cause additional
According power consumption,
to the above discussion,andwhenthe reliability of the
the N-bit shift
GHz/TT corner. The industry uses a two-letter designation to describe the different cor-
circuit is
register might
built,bewereduced
need to[31].
copy FF1 until N-2 bit, and FF0 and FF2 stay in the first and
ners, where the first letter refers to the NMOS device, and the second refers to the PMOS
last stages, respectively. In sum, the CLSR design fulfills the fundamental functions that
device. The TT corner is the center corner where wafers are normally produced (e.g., typ-
the MS design has and wipes out the drawbacks that the MS design bears.
ical process parameters). The proposed CLSR scheme design satisfies the realization of
the
2.2. shift
Delayregister while reducing circuit complexity function and maintaining a fully static
Clock Circuit
operation. Although the proposed CLSR scheme reduces both the power consumption
Figure 6b shows the post-layout simulation waveforms of the 256-bit shift register im-
and layout area of the circuit, it must bear the challenges caused by data conflicts.
plemented using the proposed CLSR scheme design when operating at 0.9 V/1.0 GHz/TT
Figure 6a shows a specific situation caused by the rising edge of the clock. As the
corner. The industry uses a two-letter designation to describe the different corners, where
clock driver changes its phase from low to high, it produces a short phase difference be-
the first letter refers to the NMOS device, and the second refers to the PMOS device. The
tween the rising edge and falling edge, so that CLKB and CLKI achieve equal potential.
TT corner is the center corner where wafers are normally produced (e.g., typical process
The previous data feedback path (red) is stronger than the current input data path (green).
parameters). The proposed CLSR scheme design satisfies the realization of the shift reg-
Therefore,
ister whileareducing
data conflict exists
circuit betweenfunction
complexity nodes X and and maintaining
Y. The voltage of this
a fully glitch
static is about
operation.
0.167 V. This issue will cause additional power consumption,
Although the proposed CLSR scheme reduces both the power consumption and layout and the reliability of the
circuit
area ofmight be reduced
the circuit, it must[31].
bear the challenges caused by data conflicts.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 8 of 11
Appl. Sci. 2021, 11, 129 6 of 11

Writing data Weak 0


CLKB FF0 CLKI
FF1
CLKI
Strong CLKB
Last data Strong 1
CLKI CLKB CLKI CLKB
Data=1
0 1 1 0 X Y Q0 0 1 1 0 Q1
Weak
CLKB CLKI CLKB CLKI

CLKB CLKB

CLKB
CLK=0 1 CLKI CLKI CLKI
1 0 0 1

(a)

(b)

VDD VDD
CLK CLKB
I0 I1 CLKI

GND GND
CLK
I2 CLKBI
CLK
(c) (d)
Figure 6.
Figure 6. (a)–(b) Glitchissue
(a,b) Glitch issueof
of the
the proposed
proposed CLSR
CLSR design.
design.(c)–(d) Proposed delay
(c,d) Proposed delay clock
clock circuit.
circuit.

Figure 6a shows a specific situation caused by the rising edge of the clock. As the clock
driver changes its phase from low to high, it produces a short phase difference between the
rising edge and falling edge, so that CLKB and CLKI achieve equal potential. The previous
data feedback path (red) is stronger than the current input data path (green). Therefore, a
FOR PEER REVIEW 8 of 11

Appl. Sci. 2021, 11, 129


To solve this issue, we proposed a circuit that can output CLKBI and CLKI synchro-7 of 11
nously, shown in Figure 6c. The delay circuit I2 uses only two transistors (1pMOS plus
1nMOS). Since I1 and I2 receive a signal from the CLKB node at the same time, the VGS
data conflict exists between nodes X and Y. The voltage of this glitch is about 0.167 V. This
of I1 and I2 are changed at the same time. In other words, I1 and I2 can provide CLKBI
issue will cause additional power consumption, and the reliability of the circuit might be
and CLKI signals reduced
synchronously
[31]. to the proposed CLSR design and help transmission
gates to be started and Toclosed
solveatthis
theissue,
samewetime [32]. The
proposed simulation
a circuit that can waveforms
output CLKBIare andshown
CLKI syn-
in Figure 6d. In the master–slave latch, there is one inverter delay between CLKB and plus
chronously, shown in Figure 6c. The delay circuit I2 uses only two transistors (1pMOS
1nMOS). Since I1 and I2 receive a signal from the CLKB node at the same time, the VGS
CLKI. Thus, a short contention exists and power consumption increases (red circle). By
of I1 and I2 are changed at the same time. In other words, I1 and I2 can provide CLKBI
using the delay clocking
and CLKIcircuit,
signalsthe synchronous
synchronously to theoutput
proposedof CLSR
CLKBI andand
design CLKI
help (blue circle)gates
transmission
can be achieved. Thus, the proposed design not only solves the power problem,
to be started and closed at the same time [32]. The simulation waveforms are but it shown
also in
has better race characteristics
Figure 6d. In than a conventional
the master–slave master–slave-based
latch, there is one inverter delaydesign.
between CLKB and CLKI.
Thus, a short contention exists and power consumption increases (red circle). By using
the delay clocking circuit, the synchronous output of CLKBI and CLKI (blue circle) can be
3. Analysis of Layout Area and Power Consumption of the Proposed CLSR Design
achieved. Thus, the proposed design not only solves the power problem, but it also has
To accomplishbetter
the race
goals of achieving
characteristics thanlow integrated
a conventional circuitry layoutdesign.
master–slave-based area and low
power consumption, the proposed flip-flop is designed with cross-latch architecture.
3. Analysis of Layout Area and Power Consumption of the Proposed CLSR Design
Since the conventionalTo master–slave (MS) latch transmission gate flip-flop (TGFF) is the
accomplish the goals of achieving low integrated circuitry layout area and low
most widely used flip-flop, it is necessary
power consumption, to make
the proposed a comparison
flip-flop is designed withbetween thisarchitecture.
cross-latch proposedSince
the conventional master–slave (MS) latch transmission
CLSR scheme and the traditional MS latch flip-flop. The proposed CLSR scheme was gate flip-flop (TGFF) is the
im-most
widely used flip-flop, it is necessary to make a comparison between this proposed CLSR
plemented in TSMC-40 nm CMOS process standard technology [33]. Figure 7 shows the
scheme and the traditional MS latch flip-flop. The proposed CLSR scheme was imple-
post-simulation results
mentedof the proposed
in TSMC-40 nm CMOS 256-bit
process CLSR
standarddesign under[33].
technology theFigure
process of the
7 shows
1GHz/TT corner frequency at a results
post-simulation supply voltage
of the proposedof 0.9 V. CLSR
256-bit The output QN the
design under is generated by
process of 1GHz/TT
shifting the input data according
corner frequencyto at the clockvoltage
a supply signal. of When theoutput
0.9 V. The clockQN is triggered
is generated positively
by shifting the
input
16 times, the data are data according
shifted by 16-bit, to the
and clock
thesignal.
signal When the clock
of Q15 is triggeredWhen
is generated. positively
the16clock
times, the
data are shifted by 16-bit, and the signal of Q15 is generated. When the clock is triggered
is triggered positively 256 times, the data are shifted by 256-bit, and the signal of Q255 is
positively 256 times, the data are shifted by 256-bit, and the signal of Q255 is generated. It
generated. It can becanseen that
be seen thatafter
after adding
adding a a delay
delay clockclock
circuitcircuit
(Figure(Figure 6c), the above-glitch
6c), the above-mentioned
mentioned glitch issue
issue is clearly
clearlywiped
wiped out.out.

Figure Figure Post layout


7. Post7. layout simulationofof256-bit
simulation 256-bit shift
shiftregister design
register based
design on theon
based proposed CLSR design.
the proposed CLSR de-
sign.

Figure 8 shows the layout comparison between a 256-bit shift register that was con-
structed based on this proposed CLSR scheme and the conventional master–slave latch
architecture. The number of transistors in the conventional MS latches shift register design
is 6144, and the number in the proposed CLSR scheme design is 3174. Compared with the
Appl. Sci. 2021, 11, 129 8 of 11

Figure 8 shows the layout comparison between a 256-bit shift register that was con-
structed based on this proposed CLSR scheme and the conventional master–slave latch
architecture. The number of transistors in the conventional MS latches shift register design
Appl. Sci. 2021, 11, x FOR PEER REVIEW 8 of 11
is 6144, and the number in the proposed CLSR scheme design is 3174. Compared with
the conventional MS latches shift register, the number of transistors in the proposed CLSR
scheme can be reduced by 48.33%. To show the advantages of the proposed design in terms
scheme can be reduced by 48.33%. To show the advantages of the proposed design in
of
Appl. Sci. 2021, 11, x FOR PEER average
REVIEW
terms power
of average consumption,
power we
consumption, we performed
performed simulations
simulations for different
for different conditions. conditions.
8 of 11

scheme can be reduced by 48.33%. To show the advantages of the proposed design in
terms of average power consumption, we performed simulations for different conditions.

Figure 8.8.
Figure 256-bit shift register,
256-bit proposedproposed
shift register, CLSR scheme and the
CLSR conventional
scheme and master–slave latch
the conventional master–slave latch
architecture layout.
architecture layout.
Figure 9a shows the power consumption of the proposed CLSR scheme design in
Figure
different
Figure
9a shows
switching activities
8. 256-bit
the power
(0–100%)
shift register,
consumption
at 0.9
proposed V/250
CLSR MHz.and
scheme
of
Becausethetheproposed
required
the conventional
CLSRoflatch
number
master–slave
scheme design in
transistors
different is decreased
switching
architecture dramatically, the proposed CLSR scheme reduces
layout. activities (0–100%) at 0.9 V/250 MHz. Because the required number the average
power
of consumption
transistors by 36% compared
is decreased with the conventional
dramatically, the proposed MS latch
CLSRdesign.
schemeAddition-
reduces the average
ally, the gates capacitances
Figure 9a showsare thereduced,
power and the dynamic
consumption power
of the dissipation
proposed CLSR is suppressed
scheme design in
power consumption
as well.different
Figure 9b
by 36%
shows aactivities
compared
comparison
with
of leakage
the
powers
conventional
under different
MS latch design. Additionally,
switching (0–100%) at 0.9 V/250 MHz. Because the word lengths
required number of
the gates
between capacitances
MS latch and
transistors are reduced,
CLSR architectures.
is decreased dramatically, and
Thethe the
leakage dynamic
current
proposed power
of anscheme
CLSR MS latch dissipation
shift register
reduces is
the average suppressed as
well.
is mainlyFigure
power 9b shows
contributed a36%
comparison
by thebyinverters
consumption of the
leakage
and is proportional
compared with powers
to the
conventionalnumbers under different
of inverters.
MS latch design. In
Addition- word lengths
the proposed
between theCLSR
ally,MS gates scheme,
latchcapacitances
and CLSR the number of inverters
architectures.
are reduced, and the is reduced
The leakage
dynamic significantly.
powercurrent ofThus,
dissipation anis MSthelatch shift register
suppressed
proposed 256-bit
as well. CLSR9bscheme
Figure shows design saves 60.53%
a comparison leakage current as compared with thelengths
is mainly contributed by the invertersofand leakage powers
is proportional under different
to word
the numbers of inverters. In
conventional
between design.
MS latch and CLSR architectures. The leakage current of an MS latch shift register
the proposed CLSR scheme, the number of inverters is reduced significantly. Thus, the
is mainly contributed by the inverters and is proportional to the numbers of inverters. In
proposed 256-bit CLSR
the proposed CLSRscheme,
scheme thedesign
numbersaves 60.53%
of inverters leakagesignificantly.
is reduced current asThus, compared
the with the
conventional
proposed design.
256-bit CLSR scheme design saves 60.53% leakage current as compared with the
conventional design.

(a) (b)

(a) (b)

(c) (d)
Figure 9. Power consumption performance. (a) Different data switching activity. (b) Leakage power. (c) Different operation
frequency. (d) Different supply voltage.

(c) (d)
Figure 9. Power consumption performance. (a) Different data switching activity. (b) Leakage power. (c) Different operation
Figure 9. Power consumption performance. (a) Different data switching activity. (b) Leakage power. (c) Different operation
frequency. (d) Different supply voltage.
frequency. (d) Different supply voltage.
layout simulation results, we find that when the proposed CLSR scheme operates from
0.4 V to 0.9 V, it can save 31% power consumption on average compared with the tradi-
tional design. Figure 10 shows its optimal power delay product (PDP) performance. The
PDP
Appl. Sci. 2021, was employed as a composite performance index in this work. For each supply volt-
11, 129 9 of 11
age, the CQ delay was scanned to obtain the best PDP number. Both shift register designs
were determined to function properly under process variations. The proposed CLSR
scheme works well in any 9c
Figure kind of simulation
shows the power trials. According
consumption to the above
at different operatingresults, the
frequencies
proposed CLSR (1 scheme
MHz−applied to thethe
1 GHz). From application
results, the of shift registers
proposed CLSR scheme achieved better
achieves per-
an average
power saving of 35% as compared with the conventional MS
formance such as area, power, and PDP, compared with that of the master–slave architec- latch design. Figure 9d shows
the power consumption of the proposed CLSR scheme design under different voltages. To
ture. Moreover, to achieve low power design, recently reported works try to extremely
ensure both designs can operate normally, the operating frequency is set at 25 MHz. From
reduce the number of transistors
the post-layout [28–30].
simulation However,
results, we find that these
whenflip-flops
the proposed have
CLSRfloating
scheme prob-
operates
lems of internal nodes.
from 0.4Thus, theV,reported
V to 0.9 it can saveflip-flops
31% power must be selected
consumption carefully
on average for different
compared with the
applications. traditional design. Figure 10 shows its optimal power delay product (PDP) performance.
Finally, Table 2PDP
The showswas employed
a detailedasperformance
a composite performance
comparison indexofindifferent
this work. 256-bit
For each shift
supply
voltage, the CQ delay was scanned to obtain the best PDP number. Both shift register
registers. The area of the proposed CLSR scheme design is 1932.47 µm2. Compared with
designs were determined to function properly under process variations. The proposed
the conventionalCLSR MS latch
schemeshift register
works well in(2961.88
any kind ofµm 2), the proposed CLSR scheme design
simulation trials. According to the above results,
achieves a 34.76% thereduction
proposed CLSRin the integrated
scheme appliedcircuitry layout area.
to the application of shiftTable 2 also
registers records
achieved better
performance
the power consumption, such aspower,
leakage area, power,
andandPDP PDP,atcompared
0.9 V and with
0.4that
V. of the master–slave
When operatingarchi- at
tecture. Moreover, to achieve low power design, recently reported works try to extremely
0.9 V/250 MHz, the proposed design can save up to 36% power consumption and 16.90%
reduce the number of transistors [28–30]. However, these flip-flops have floating problems
PDP. At the sameoftime, due
internal to theThus,
nodes. largethesaving
reportedof flip-flops
the number mustof betransistors, the proposed
selected carefully for different
CLSR scheme can effectively suppress leakage power consumption [34].
applications.

800
Master-slave CLSR
700

600

500
PDP(fJ)

400

300

200

100

0
0.4 0.5 0.6 0.7 0.8 0.9
Operation voltage(V)
Figure 10. Power delay product performance at different supply voltages.
Figure 10. Power delay product performance at different supply voltages.
Finally, Table 2 shows a detailed performance comparison of different 256-bit shift
registers.
Table 2. Performance The area
comparison ofofshift
the proposed CLSR
registers at scheme
TSMC design is 1932.47 µm2 . Compared with
40 nm.
the conventional MS latch shift register (2961.88 µm2 ), the proposed CLSR scheme design
Master–Slave
achieves a 34.76% reduction Latch circuitry layout area.
in the integrated CLSR Table 2 also records
the
Process power consumption, leakage
40 nm power, and PDP at 0.9 V and 0.4 V.
40 nm When operating at
0.9 V/250 MHz, the proposed design can save up to 36% power consumption and 16.90%
Word length of shifter
PDP. Atregister 256large saving of the number of transistors,
the same time, due to the 256 the proposed
Number of transistors 6144 leakage power consumption
CLSR scheme can effectively suppress 3174
[34].
Layout area (µm2) 2961.88 1932.47
Typical voltage (V) 0.9 0.4 0.9 0.4
Appl. Sci. 2021, 11, 129 10 of 11

Table 2. Performance comparison of shift registers at TSMC 40 nm.

Master–Slave Latch CLSR


Process 40 nm 40 nm
Word length of shifter register 256 256
Number of transistors 6144 3174
Layout area (µm2 ) 2961.88 1932.47
Typical voltage (V) 0.9 0.4 0.9 0.4
Frequency (MHz) 250 25 250 25
Power (µW) 794 26.09 504 14.24
PDP (fJ) 144.43 690.22 120.03 569.61
Leakage power (µW) 35.33 3.36 13.95 2.33

4. Conclusions
This paper proposed a novel cross-latch shift register (CLSR) scheme attempting to
reduce power consumption and to eliminate integrated circuity overhead layout area. By
using one cross-latch with delay clocking circuit typology to replace the master and slave
latches, we reduced the number of transistors by 48.33% as compared with the conventional
MS latch shift register. At the same time, the total gates capacitances were reduced, and the
static and dynamic power dissipation were suppressed as well. The proposed CLSR scheme
was verified by employing TSMC 40 nm CMOS process standard technology. Experimental
comparisons with respect to the conventional MS latch design showed that the proposed
CLSR scheme not only works well in reducing average power consumption and leakage
power but also presents a better performance in circuitry layout area elimination. The
research group suggests that a potential application for this novel CLSR scheme is adopting
it for digital circuitry-related design approaches.

Author Contributions: J.-F.L. and M.-Y.T. proposed the idea and method; C.-M.T. performed the
simulations and experiments; M.-H.S. and P.-Y.K. analyzed the data; P.-Y.K. reviewed the manuscript.
All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: The data presented in this study are available on request from the
corresponding author.
Acknowledgments: This work was supported by the Ministry of Science & Technology, Taiwan
under contract No. 109-2221-E-324-028, No. 109-2221-E-224-050 and a grant (Grant No. 109AS-
11.3.2-ST-a9) from the Council of Agriculture, ROC. The authors would like to acknowledge for
technical support in simulation by Taiwan Semiconductor Research Institute, EDA tool support for
IC implementation.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Ng, T.N.; Schwartz, D.E.; Lavery, L.L.; Whiting, G.L.; Russo, B.; Krusor, B.; Veres, J.; Bröms, P.; Herlogsson, L.; Alam, N. Scalable
printed electronics: An organic decoder addressing ferroelectric non-volatile memory. Sci. Rep. 2012, 2, 585. [CrossRef]
2. Gelinck, G.H.; Huitema, H.E.A.; Van Veenendaal, E.; Cantatore, E.; Schrijnemakers, L.; Van Der Putten, J.B.; Geuns, T.C.;
Beenhakkers, M.; Giesbers, J.B.; Huisman, B.-H. Flexible active-matrix displays and shift registers based on solution-processed
organic transistors. Nat. Mater. 2004, 3, 106–110. [CrossRef] [PubMed]
3. Hatamian, M.; Agazzi, O.E.; Creigh, J.; Samueli, H.; Castellano, A.J.; Kruse, D.; Madisetti, A.; Yousefi, N.; Bult, K.; Pai, P. Design
considerations for gigabit Ethernet 1000Base-T twisted pair transceivers. In Proceedings of the IEEE 1998 Custom Integrated
Circuits Conference (Cat. No. 98CH36143), Santa Clara, CA, USA, 14 May 1998; pp. 335–342.
4. Yamasaki, H.; Shibata, T. A real-time image-feature-extraction and vector-generation VLSI employing arrayed-shift-register
architecture. IEEE J. Solid-State Circuits 2007, 42, 2046–2053. [CrossRef]
5. Kim, H.-S.; Yang, J.-H.; Park, S.-H.; Ryu, S.-T.; Cho, G.-H. A 10-bit column-driver IC with parasitic-insensitive iterative charge-
sharing based capacitor-string interpolation for mobile active-matrix LCDs. IEEE J. Solid-State Circuits 2014, 49, 766–782.
[CrossRef]
Appl. Sci. 2021, 11, 129 11 of 11

6. Chiang, S.-H.W.; Kleinfelder, S. Scaling and design of a 16-mega-pixel CMOS image sensor for electron microscopy. In Proceedings
of the 2009 IEEE Nuclear Science Symposium Conference Record (NSS/MIC), Orlando, FL, USA, 24 October–1 November 2009;
pp. 1249–1256.
7. Xie, Z.; Wu, Z.; Wu, J. Low Voltage Delay Element with Dynamic Biasing Technique for Fully Integrated Cold-Start in Battery-
Assistance DC Energy Harvesting Systems. Appl. Sci. 2020, 10, 6993. [CrossRef]
8. Nomura, S.; Tachibana, F.; Fujita, T.; Teh, C.K.; Usui, H.; Yamane, F.; Miyamoto, Y.; Kumtornkittikul, C.; Hara, H.;
Yamashita, T.; et al. A 9.7 mW AAC-decoding, 620 mW H.264 720p 60fps decoding, 8-core media processor with embedded
forward body-biasing and power-gating circuit in 65 nm CMOS technology. In Proceedings of the IEEE International Solid-State
Circuits Conference-Digest of Technical Papers, San Francisco, CA, USA, 3–7 February 2008; pp. 262–612.
9. Stojanovic, V.; Oklobdzija, V.G. Comparative analysis of master-slave latches and flip-flops for high-performance and low-power
systems. IEEE J. Solid-State Circuits 1999, 34, 536–548. [CrossRef]
10. Schwartz, D.E.; Ng, T.N. Comparison of static and dynamic printed organic shift registers. IEEE Electron Device Lett. 2013, 34,
271–273. [CrossRef]
11. Lin, J.-F.; Sheu, M.-H.; Hwang, Y.-T.; Wong, C.-S.; Tsai, M.-Y. Low-power 19-transistor true single-phase clocking flip-flop design
based on logic structure reduction schemes. IEEE Trans. Very Large Scale Integr. (TVLSI) Syst. 2017, 25, 3033–3044. [CrossRef]
12. Otfinowski, P.; Grybos, P. Low area 4-bit 5MS/s flash-type digitizer for hybrid-pixel detectors—Design study in 180 nm and 40
nm CMOS. Nucl. Instrum. Methods Phys. Res. Sect. A Accel. Spectrom. Detect. Assoc. Equip. 2015, 800, 104–110. [CrossRef]
13. Kawai, N.; Takayama, S.; Masumi, J.; Kikuchi, N.; Itoh, Y.; Ogawa, K.; Ugawa, A.; Suzuki, H.; Tanaka, Y. A fully static topologically-
compressed 21-transistor flip-flop with 75% power saving. IEEE J. Solid-State Circuits 2014, 49, 2526–2533. [CrossRef]
14. Ruiz, G.A.; Granda, M. Efficient low-power register array with transposed access mode. Microelectron. J. 2014, 45, 463–467. [CrossRef]
15. Jamshidi, V.; Fazeli, M. Design of ultra-low power current mode logic gates using magnetic cells. Int. J. Electron. Commun. 2018,
83, 270–279. [CrossRef]
16. Taghizadeh, A.; Koozehkanani, Z.D.; Sobhi, J. A new high-speed low-power and low-offset dynamic comparator with a
current-mode offset compensation technique. Int. J. Electron. Commun. 2017, 81, 163–170. [CrossRef]
17. Kumar, D.; Kumar, M. Comparative analysis of adiabatic logic challenges for low power CMOS circuit designs. Microprocess.
Microsyst. 2018, 60, 107–121. [CrossRef]
18. Murugasami, R.; Ragupathy, U. Design and comparative analysis of D-Flip-flop using conditional pass transistor logic for
high-performance with low-power systems. Microprocess. Microsyst. 2019, 68, 92–101. [CrossRef]
19. Hassanli, K.; Sayedi, S.M.; Dehghani, R.; Jalili, A.; Wikner, J.J. A highly sensitive, low-power, and wide dynamic range CMOS
digital pixel sensor. Sens. Actuators Phys. 2015, 236, 82–91. [CrossRef]
20. Jeong, T.T. Implementation of low power adder design and analysis based on power reduction technique. Microelectron. J. 2008,
39, 1880–1886. [CrossRef]
21. Piguet, C. Low-power and low-voltage CMOS digital design. Microelectron. Eng. 1997, 39, 179–208. [CrossRef]
22. Sudheer, A.; Ravindran, A. Design and Implementation of Embedded Logic Flip-Flop for Low Power Applications. Procedia
Comput. Sci. 2015, 46, 1393–1400. [CrossRef]
23. Mahmoud, M.M.; El-Dib, D.A.; Fahmy, H.A. Low energy pipelined Dual Base (decimal/binary) Multiplier, DBM, design.
Microelectron. J. 2017, 65, 11–20. [CrossRef]
24. Tavana, M.K.; Khameneh, S.A.; Goudarzi, M. Dynamically adaptive register file architecture for energy reduction in embedded
processors. Microprocess. Microsyst. 2015, 39, 49–63. [CrossRef]
25. Brelsford, K.; López, S.A.P.; Fernandez-Gomez, S. Energy efficient computation: A silicon perspective. Integration 2014, 47, 1–11.
[CrossRef]
26. Katreepalli, R.; Haniotakis, T. Power efficient synchronous counter design. Comput. Electr. Eng. 2019, 75, 288–300. [CrossRef]
27. Yang, B.-D. Low-power and area-efficient shift register using pulsed latches. IEEE Trans. Circuits Syst. I Regul. Pap. 2015, 62,
1564–1571. [CrossRef]
28. Teh, C.K.; Fujita, T.; Hara, H.; Hamada, M. A 77% energy-saving 22-transistor single-phase-clocking D-flip-flop with adaptive-
coupling configuration in 40 nm CMOS. In Proceedings of the 2011 IEEE International Solid-State Circuits Conference, San
Francisco, CA, USA, 20–24 February 2011; pp. 338–340.
29. Stas, F.; Bol, D. A 0.4-V 0.66-fJ/Cycle Retentive True-Single-Phase-Clock 18T Flip-Flop in 28-nm Fully-Depleted SOI CMOS. IEEE
Trans. Circuits Syst. I Regul. Pap. 2017, 65, 935–945. [CrossRef]
30. Li, J.; Chang, A.; Kim, T.T.-H. A 0.4-V, 0.138-fJ/Cycle Single-Phase-Clocking Redundant-Transition-Free 24T Flip-Flop with
Change-Sensing Scheme in 40-nm CMOS. IEEE J. Solid-State Circuits 2018, 53, 2806–2817.
31. Kim, Y.; Jung, W.; Lee, I.; Dong, Q.; Henry, M.; Sylvester, D.; Blaauw, D. A static contention-free single-phase-clocked 24t flip-flop
in 45nm for low-power applications. In Proceedings of the 2014 IEEE International Solid-State Circuits Conference Digest of
Technical Papers (ISSCC), San Francisco, CA, USA, 9–13 February 2014; pp. 466–467.
32. Chang, C.H.; Gu, J.M.; Zhang, M. A review of 0.18-/spl mu/m full adder performances for tree structured arithmetic circuits.
IEEE Trans. Very Large Scale Integr. (TVLSI) Syst. 2005, 13, 686–695. [CrossRef]
33. TSMC. 40nm CMOS ASIC Process Digest; Taiwan Semiconductor Manufacturing Company: Taipei, Taiwan, 2009.
34. Roy, K.; Mukhopadhyay, S.; Mahmoodi-Meimand, H. Leakage current mechanisms and leakage reduction techniques in deep-
submicrometer CMOS circuits. Proc. IEEE 2003, 91, 305–327. [CrossRef]

You might also like