You are on page 1of 50

Design of Low-Power and High-Speed

PLA and ROM


for SOC Applications

中正大學電機系
王進賢 教授
Outline

Introduction
Low Power Design Concepts
Circuit Design Guidelines
Layout Guidelines
LP/HS CMOS PLA
LP/HS CMOS ROM
Conclusions
Video IP Phone

Ethernet RISC System


Controller (CCU32)
DLL
Memory
DLL SOC
Video PLA for decoding
IO control lines
Clock
Camera DLL CLIW Control ROM
Buffer
NTSC DSP PLA for
CRT applicatios with
programmability
HAC1 HAC2 HACn

A/D Program Data


POTS SLIC D/A Memory Memory
PLL
Speech IO
CLK
A Typical µ-Controller
ROM Clocking System

PAR Resident
Timer/
& ROM Port 0,1,2,3
Event Counter
PCH 2K*8
Bus

ACC B IR RAR
&
Decoder
TMP2 TMP1 Control
SRAM
&
128*8
Timing
Conditional
Branch
ALU Logic

Standard Cells or PLA RAM PLA


Typical Standard Cell Library
HS or LP?

High Speed (HS) implies High Power?


Usually a good design is a design with high
speed and low power.
Starting to design for achieving lower power
usually leads to a design also with high speed.
Block Diagram of ROM

Address Row
Row
Decoder
Decoder

Core Cell

Column
Column
Decoder Column Mux. && Driver
Decoder
Clock Data out
Block Diagram of PLA
Differences Between ROM and PLA

ROM: 1 programmable plane


PLA: 2 programmable planes
Similarities Between ROM and PLA

Two-Level Logic Design


Usually using dynamic CMOS circuit
techniques for small area
Regular and Programmable
Design considerations for reducing
power consumption
Outline

Introduction
Low Power Design Concepts
Circuit Design Guidelines
Layout Guidelines
LP/HS CMOS PLA
LP/HS CMOS ROM
Conclusions
Where Does Power Goes?
Power Consumption Formula:

Ptotal = Ps + Pd + Psc

Ps : static power (dc & leakage)

Pd : dynamic power

Psc : short circuit power


Low-Power Design Concepts
Low
Power
Psc

Pdc
1. For low-power, static power consumption
should be avoid or controlled.
2. Short circuit current can be ignored by
keeping the internal signal sharp.
3. Dynamic power is the most significant
concern!
Low Power Concept (cont.)
Dynamic Power Consumption:

Pd = ∑α
i
i ⋅ C i ⋅ (V vdd ⋅ V swing ) ⋅ f

αi : Switching activity of node i Usually,


C i : Lump capacitance of node i supply voltage
V dd : Supply voltage & operation
V swing : Voltage swing of node i frequency are
f : Operation frequency given spec.
Circuit Design for Low Power
Some circuit design guidelines :
Use
Use more
more static
static circuit
circuit instead
instead ofof dynamic
dynamic circuits
circuits
Use
Use efficient
efficient dynamic
dynamic circuits
circuits ifif necessary
necessary
Reduce
Reduce switching
switching activity
activity
Optimize
Optimize buffer
buffer sizing
sizing
Clever
Clever circuit
circuit techniques
techniques
Reduce
Reduce V VDD in non-critical path
DD in non-critical path
(Dual
(Dual Supply
Supply Approach)
Approach)
Custom
Custom design
design to to reduce
reduce the
the power
power consumption
consumption
Layout Guidelines for Low Power
Some layout guidelines :
Place
Place NMOS
NMOS and and PMOS
PMOS transistors
transistors ofof as
as close
close to
to
the
the output
output node
node as as possible.
possible.
Use
Use low-capacitance
low-capacitance layers
layers such
such asas metal2,
metal2, metal3
metal3
for
for high-activity
high-activity nodes
nodes
Keep
Keep thethe wires
wires ofof high
high activity
activity nodes
nodes short
short
Use
Use low–capacitance
low–capacitance layerslayers for
for high
high capacitive
capacitive
nodes
nodes andand busses
busses
For
For large
large devices,
devices, useuse finger-type
finger-type and
and donut
donut
layout
layout styles
styles for
for small
small junction
junction capacitance.
capacitance.
Full-custom
Full-custom design
design design
design critical
critical circuits
circuits
Criteria and Guides for
Low-Power ROM and PLA
Criteria: Need to use dynamic circuits for small area
Design Methodologies:
Full-custom design
Overall Design Considerations
αi ↓
Vswing ↓
Ci↓
I short-circuit ↓
Outline

Introduction
Low Power Design Concepts
Circuit Design Guidelines
Layout Guidelines
LP/HS CMOS PLA
LP/HS CMOS ROM
Conclusions
Evolution of PLA Circuit Structure
AND-OR
À NAND-NAND
☺compact Clock-delayed Dynamic CMOS
Low speed Require depletion NMOS’s
À INV-NOR-NOR-INV
not compact enough Clock-delayed Dynamic CMOS
☺High speed ☺Typical ASIC process
High Power Consumption
NMOS Old technology
ÀPseudo-NMOS CMOS technology High power consumption
ÀClock-delayed, Blair’s, Dhong’s, and NSYU’s PLA’s
Dynamic CMOS circuits Speed/Power performance not good enough
M P1 M D1
AND Plane VDD
1
φ φ1
a A
3
2 B
b M N1

c MN 2

d MN 3

Clock-
Delayed e MN 4

CMOS
inter-plane MN 5
buffers
PLA 4
φ1d
product terms
P1 P2 P3 P4 P5 P6 P7 P8 MP2
Z1 6
MN6 5 MN7 MN8 MN9 MD2

Z2

Z3

Z4

OR Plane
Critical-Path of the
Clock-Delayed PLA
φ φ1 φ1d
1 M P1 φ1 M P2

CCK1 CCK2 6 Z1
C
3 4
A B CL
critical path

a 2 5
M N1 MN 2 M N5 CAND M N7 M N6 M N8 M N9
CInter COR
C'IN CIN M D1 C'AND M D2 C'OR
Signal Propagation
φ Precharge Evaluation Precharge
Evaluation
2
1
a

φ1
2

φ1d td
td

6 error
tiacc
Z1

t acc
Critical Path & Signal Waveforms

φ Precharge Evaluation Precharge


Evaluation
2
1
a

1
φ φ1 φ1d
1 M P1 φ1 M P2 φ1
CCK1 CCK2 6 Z1
3 4
C 2
A B CL
t 3
critical path

a 4
2 M N1 MN 2 M N5 CAND 5 M N7 M N6 M N8 M N9
CInter COR 5
C'IN CIN M D1 C'AND M D2 C'OR
φ1d td
td

6 error
tiacc
Z1

t acc

tcyc = 2 × tiacc
Critical Path & Power Consumption
φ φ1 φ1d
1 M P1 φ1 M P2

CCK1 CCK2 6 Z1
C
3 4
A B CL
critical path

a 2 t 5
M N1 MN 2 M N5 CAND M N7 M N6 M N8 M N9
CInter COR
C'IN CIN M D1 C'AND M D2 C'OR


( ) ( )
I P
α (C
 CK CK 1 + C CK 2 ) + ∑ IN IN IN ∑ AND AND AND AND + α Inter CInter
α C '
+ C + α C + α '
C '

1 1
P=   ⋅V 2 ⋅ f
 O  DD
(
+ ∑ α OR COR + α OR' '
COR + α OR CCL ) 
 1 
Blair’s PLA

φ φ1 φ1d
1
M P1 φ1 A M P2
critical path 5 Z1
CCK1 CCK2
2 3
CL
M D1

a a' 4
M N1 MN 2 M N5 CAND
CInter COR
C'IN CIN
Dhong’s PLA

φ 1 φ1 φ1d
M P1 φ1 A B M P2

CCK1 critical path


CCK0 CShare
3
a M D2 Z1
a' CCK2
5
CL
C'IN 2 M N1 MN 2 M N5 CAND 4
MD1 CInter COR
CIN
NSYU’s PLA

φ 1 φ1 φ1d
φ1
M P1
C D M P2

CCK1 CCK2
CCK0 4
3 A B
a critical path 6 Out
a'
CL
C'IN 2 M N1 MN 2 M N5 CAND 5
MD1 CInter COR
CIN
Power Factors
PLA PFIN PFAND PFInter PFOR PFOUT
 
1 − ∏ (1 − p i )
1 '
( ) 2 N − 1  C AND  2N −1  
Clock-
C IN + C IN   C Inter 1 − ∏ (1 − p )C L
4  '  2N   
i

2  + C AND 
N i i
delayed
( '
• C OR + C OR )
2 N − 1 PDC
1 ' 1
Blair’s
4
(C IN + C IN ) 2
2 N V DD f C Inter

1 − ∏ (1 − p )
C OR

1 − ∏ (1 − p )
C L
2N
i
 
i
i  i 
1
+ C AND
2N

1 ' 1 2N −1 2N −1 2
∏ (1 − pi ) ⋅ 3 C OR + ∏ (1 − pi )C L
Dhong’s
C IN + C IN C AND C Inter
4 2 2N 2N i i

 
1 − ∏ (1 − p i ) ⋅ 2C OR
 i 

1 ' 1 2N −1 2N −1    
Modified
C IN + C IN C AND < N C Inter 1 − ∏ (1 − p )C OR 1 − ∏ (1 − p i )C L
4 2 2N 2
i
Wang’s  i   i 
Features of the Proposed PLA

Pseudo-footless dynamic logic circuit


Ceff(AND/OR)↓
AND-type inter-plane buffer
αinter ↓
Considering the switching probability, the
circuit-style is selectable.
αAND ↓ and αOR ↓
Buffer Sizing
Short circuit current ↓
Pseudo-footless Dynamic Circuit
Charge unit

VDD
φ φ φ

φ φ φ
GND

Discharge unit
Example
A multiple-input NOR gate:

VDD
φ M P M Pf When k is large, C0 << C1.
2k − 1
α NOR =
MD 2k

Ceff ↓
φ M AD C0
C1
GND
AND-Type Inter-plane Buffer
φ 1 φ1 φ1 φ1d
M P1 * M P2 *
3
A B 6
M D1 4 M D2 Z1

3' MAD
5
MAD M N1 MN 2 M N5 CAND
OR

In CINT

φ = 0 , node 5 = 0
φ = 1 , node 5 = 1 only if all AND-plane inputs are 1
Therefore, αinter ↓
Selectable Circuit Styles
φ 1 φ1
M P1 * φ1d M P2

3 critical path Z1
φ1d A B
CCKd
6
D C 4 CL
M D1
CC
CP
K P3
a 2 5
M N7
CO
M N1 MN 2 M N5 CAND M N6
CInter R

C'IN CIN

φ1d M P3 *

8
M D2 Z3
COut C'L

M N10
7
P4 C'OR
C'Inter

[1] J. S. Wang, C. R. Chang, and C.W. Yeh, “Analysis and design of high-speed and
low-power CMOS PLAs,” IEEE JSSC, vol. 36, no. 8, pp. 1250-1262, Aug. 2001.
Circuit Selection Example
Table: Power consumption of the OR-plane circuit
Power @ Power @
PLA’s 100MHz (mW) 50MHz (mW)
POR,0 POR,1 POR,0 POR,1

New Pseudo 0.11 0.11 0.06 0.06


footless
footless 0.18 0.00 0.09 0.00
0.11 p OR ,1 + 0.11(1 − p OR ,1 ) < 0 × p OR ,1 + 0.18(1 − pOR ,1 )
⇒ p OR ,1 < 0.39

If the criterion is met, Pseudo-Footless is chosen.


Signal Waveforms
φ Pr echar ge Evaluation Pr echar ge Evaluation

φ1 CS CS

φ1d
6 td

Z1 tiacc

t acc

CS stands for “Charge Sharing”.


A Complete Example
Input AND Plane
Buffer GND
a
2
Z 1 = ab d e + abc d e + bc + de
pa = 1 / 2

b M N1
pb = 1 / 2
M N 14
c
pc = 1 / 2
MN 2 Z 2 = a ce
M N 15

Z 3 = bc + de + c d e + bd
d MN 3
pd = 1 / 2

e MN 4

Z 4 = a ce + ce
pe = 1 / 2

M N5
MAD
1
φ1 MD1
φ
VDD
3
Clock M P1
Buffer * * * * * * * *
C
A In the example,
D
4
Output Inter-Plane
Buffer Buffer B OR plane circuits are
p Z 1 = 0 .511 6 pP1 = 1/16 pP 2 = 1 / 8 pP3 = 1/ 32 pP4 = 1/ 4 pP5 = 1 / 4 pP6 = 1 / 4 pP 7 = 1 / 8 pP8 = 1 / 4
Z1
M P2 MN6 5 MN7 MN8 MN9 determined by
p Z 2 = 0.875
Z2
p Z 3 = 0.369
switching probability.
8 M D2 7
Z3
MP3 M N10 M N11 MN12 M N13
*
p Z 4 = 0.656
Z4
φ1d
OR Plane
Test Chip

Inter-plane Buffer Output Buffer


Process 0.35-μm 1P4M CMOS

Core Area 1.216×1.330 mm2

AND OR Transistor Count


Power Supply
6772
3.3 V
Power Consumption @ 167MHz 8.0mW 8.03mW
(measured) (simulated)
Power Consumption @ 278MHz n.a. 13.34mW
Input Buffer Output Buffer (simulated)
Performance Comparison

PLAs (ns) (ns) (ns) (ns) (MHz)

Clock-delayed 1.7 4.1 3.6 7.2 139 0.81

Blair’s - 4.0 3.5 7.0 143 0.83

Dhong’s 2.1 3.5 3.2 6.4 156 0.90

Modified 2.0 3.2 2.9 5.8 172 1.00


Wang’s

New 1.4 2.6 1.8 3.6 278 1.61


Power Comparison
Pnormalize
P@250M P@100MH Pnormalized P@50M
d
PLAs Hz z @100MH Hz
@50M
(mW) (mW) z (mW)
Hz

Clock- Fail 0.79 1.80 0.40 1.25


delayed
Blair’s Fail 0.44 1.00 0.32 1.00

Dhong’s Fail 0.73 1.66 0.39 1.22

Modified Fail 0.69 1.57 0.35 1.09


Wang’s

New 0.84 0.36 0.82 0.18 0.56


Outline

Introduction
Low Power Design Concepts
Circuit Design Guidelines
Layout Guidelines
LP/HS CMOS PLA
LP/HS CMOS ROM
Conclusions
Memory Organization (1)

Row Decoder
AK
AK +1

Cell
AL −1 2 ( L− K ) × 2 K × M
s

Memory Cell

φ Control Sense Amplifier

A0 Column Decoder
AK −1

D0 ~ DM −1
Memory Organization (2)
(m x 2k) bits
bit-line load
n-bit Row decoder

word-line
Μ Μ Μ
A(n-k)

2n-k words
WordLine Driver
(A0~An-k) memory cell

Address buffer

Row Decoder
ΛΛ
bit-line
A(n-1)
Cell Array

Column Select
A(k)
Read/Write Mux
(An-k+1~An-1) data-line
k-bit Column Decoder Sense amplifiers

n-bit address Input/Output buffers/latches


m-bit data width
mem size: (2n x m) m bits
Low-Power NOR-type ROM
Features :[2]
NAND type decoder (αi ↓)
NHS-PDCMOS logic structure (Vswing ↓)
Bit-line Selective Precharge ( Ceff(bitline )↓)
NMOS precharge ( Vswing ↓)
Clock buffer ( short circuit current ↓)

Goal :
Minimize power consumption of each part!

[2] C.-R Chang, J.-S Wang, and C.-H Yang,”Low-power and high-speed
ROM modules for ASIC applications,” IEEE J. Solid-State Circuits.
Critical Path of Low-Power ROM

NAND-type decoders
NHS-PDCMOS logic for small bit-line swing
Low-Power ROM with
Selective precharge
Dynamic Inverter

φ φ Data
φ
critical path φ φ

O/P Latch

Cell Array Column Selector &


Selective Precharge
φ φ
a3 a0
a1
a10 a2
φ Row decoder
φ Column Decoder

Cost due to selective precharge is small & no dc current!


Critical Path of High-Speed ROM

φ
φ
φ
φ φ

φ φ

a3 an −1
M AD φ

Pseudo-footless NOR-type row decoders


No selective precharge to reduce critical path
Power breakdown @ 3.3V

Core
Low-Power Decoders
CK Gen.
Dynamic Inverter
Addr. Buf
o/p latch
High-Speed

Power
(mW/MHz)
0 50 100 150 200 250 300

High-Speed : NOR-type decoder, w/o selective precharge


Low-Power : NAND-type decoder, w/ selective precharge
Performance Comparison
Performance t access f max Pd @ 100MHz Area
N.S. N.P. N.A.
2
ROMs (ns) (MHz) (mW) (mm )

1 Design in [1] 6.70 74.62 0.66 na - 0.93 6.64

2 Design in [2] 4.44 112.61 1.00 30.40 1.00 0.14 1.00

3 New high-speed ROM 2.35 212.76 1.89 23.95 0.79 0.14 1.00

4 New low-power ROM 3.80 131.58 1.17 5.29 0.17 0.12 0.86

na ≡ not available

f max ≡ 1/(2 × taccess )

N.S.:Normalized Speed Index ≡ the ratio of the maximum operating frequency


N.P.:Normalized Power Index ≡ the ratio of the power consumption
N.A.:Normalized Area Index ≡ the ratio of the circuit’s area
[1] T. Tsang, “A compilable read-only-memory library for ASIC deep-sub-micron applications,” in
Proc. Int. Conf. VLSI Design, 1998, pp. 490-494.
[2] PASSPORT CB35RO122. 0.35micron, 3.3 V, ROM Compiler with CE, COMPASS Design
Automation, Inc..
Chip Features Test Chip

Process 0.35-μm SPQM CMOS


Chip Area 1.6804 × 1.6804 mm2
Transistor Count 24598
Power Supply 3.3 V
Clock Frequency of High-Speed Module 166.7 MHz (meas.) 212.8 MHz (sim.)
Power of High-Speed 0.2280 mW/MHz (meas.) 0.2395 mW/MHz (sim.)
Clock Frequency of Low-Power Module 133.3 MHz (meas.) 133.3 MHz (sim.)
Power of Low-Power 0.0609 mW/MHz (meas.) 0.0529 mW/MHz (sim.)
Conclusions
Low power and high speed may be
achieved simultaneously by clever circuit
techniques.
Critical blocks should be full-custom
designed.
Minimize power of each part by means of
αi ↓, Vswing ↓, Ci↓, and I short-circuit ↓
to achieve overall power reduction
Pd = ∑α
i
i ⋅ C i ⋅ (V vdd ⋅ V swing ) ⋅ f

You might also like