You are on page 1of 89

Digital Design

Chapter 5:
Register-Transfer Level
(RTL) Design
Slides to accompany the textbook Digital Design, First Edition,
by Frank Vahid, John Wiley and Sons Publishers, 2007.
http://www.ddvahid.com

Copyright © 2007 Frank Vahid


Instructors of courses requiring Vahid's Digital Design textbook (published by John Wiley and Sons) have permission to modify and use these slides for customary course-related activities,
Digital
subject to keeping Design
this copyright notice in place and unmodified. These slides may be posted as unanimated pdf versions on publicly-accessible course websites.. PowerPoint source (or pdf
with animations) may not be posted to publicly-accessible websites, but may be posted for students on internal protected sites or distributed directly to students by other electronic means.
Copyright © 2006 1
Instructors may make printouts of the slides available to students for a reasonable photocopying charge, without incurring royalties. Any other use requires explicit permission. Instructors
Frank Vahid
may obtain PowerPoint source or obtain special use permissions from Wiley – see http://www.ddvahid.com for information.
5.1

Introduction
• Chpt 2

Higher levels
Register-
– Capture Comb. behavior: Equations, truth tables transfer
– Convert to circuit: AND + OR + NOT  Comb. logic level (RTL)
• Chpt 3 Logic ev
l el
– Capture sequential behavior: FSMs
Tansistor
r evel
l
– Convert to circuit: Register + Comb. logic  Controller
• Chpt 4 Levels of digital
– Datapath components, simple datapaths design abstraction

• Chpt 5
– Capture behavior: High-level state machine Processors:
– Convert to circuit: Controller + Datapath  Processor • Programmable
– Known as “RTL” (register-transfer level) design (microprocessor)
• Custom
Digital Design
Copyright © 2006 2
Frank Vahid
Note: Slides with animation are denoted with a small red "a" near the animated items
RTL Design: Capture Behavior, Convert to Circuit
• Recall
– Chapter 2: Combinational Logic Design
• First step: Capture behavior (using equation
or truth table)
• Remaining steps: Convert to circuit
Capture behavior
– Chapter 3: Sequential Logic Design
• First step: Capture behavior (using FSM)
• Remaining steps: Convert to circuit
• RTL Design (the method for creating
Convert to circuit
custom processors)
– First step: Capture behavior (using high-
level state machine, to be introduced)
– Remaining steps: Convert to circuit

Digital Design
Copyright © 2006 3
Frank Vahid
5.2

RTL Design Method

Digital Design
Copyright © 2006 4
Frank Vahid
RTL Design Method: “Preview” Example
• Soda dispenser
s a
– c: bit input, 1 when coin
deposited
– a: 8-bit input having value of c Soda
deposited coin d dispenser
– s: 8-bit input having cost of a processor
soda
– d: bit output, processor sets to s a 25
1 when total value of
deposited coins equals or 50 25
0 1 0 1 0
exceeds cost of a soda c Soda tot:
tot:
d dispenser a

0 1 0 processor 50
25

How can we precisely describe this


Digital Design
Copyright © 2006 processor’s behavior? 5
Frank Vahid
Preview Example: Step 1 --
Capture High-Level State Machine s a
• Declare local register tot 8 8
c
• Init state: Set d=0, tot=0 d
Soda
dispenser
processor
• Wait state: wait for coin
– If see coin, go to Add state
Inputs: c (bit), a (8 bits), s (8 bits)
• Add state: Update total value: Outputs: d (bit)
tot = tot + a Local registers: tot (8 bits)
– Remember, a is present coin’s
c
value Add
– Go back to Wait state
Init Wait
• In Wait state, if tot >= s, go to tot=tot+a
Disp(ense) state d=0 c’*(tot<s)
tot=0 c’*(tot<s)’
• Disp state: Set d=1 (dispense
soda) Disp
– Return to Init state
d=1
Digital Design
Copyright © 2006 6
Frank Vahid
Preview Example:
Step 2 -- Create Datapath
Inputs : c (bit), a(8 bits) , s (8 bits)
Outputs : d (bit)
Local re g isters : tot (8 bits)

• Need tot register c


Add
Init Wait
• Need 8-bit comparator d=0 c‘
tot= t ot+a
c ‘ (tot<s)

to compare s and tot t ot=0 (t ot<s)‘


Disp

• Need 8-bit adder to s a


d=1

perform tot = tot + a


• Wire the components
tot_ld ld
as needed for above tot
tot_clr clr
• Create control 8
input/outputs, give 8 8
them names
8-bit 8-bit
tot_lt_s
< adder

Datapath 8
Digital Design
Copyright © 2006 7
Frank Vahid
Preview Example: Step 3 –
Connect Datapath to a Controller s a

• Controller’s inputs tot_ld ld


tot
tot_clr clr
– External input c 8
8 8
(coin detected)
8-bit
– Input from datapath tot_lt_s 8-bit
< adder

comparator’s output, Datapath 8

s a
which we named
tot_lt_s 8 8

• Controller’s outputs
– External output d c
(dispense soda)
– Outputs to datapath d tot_ld
to load and clear the
tot register tot_clr

Controller Datapath
Digital Design
tot_lt_s
Copyright © 2006 8
Frank Vahid
Preview Example: Step 4 – Derive the Controller’s
FSM s a

• Same states 8 8

and arcs as
c
high-level
d
state machine tot_ld

Controller

Datapath
tot_clr
• But set/read
datapath tot_lt_s
s a
control Inputs:: c, tot_lt_s (bit)

signals for all Outputs: d, tot_ld, tot_clr (bit)


tot_ld
tot_ld
tot_clr
ld
clr
tpt

datapath c c
Add
8
8 8
tot_clr
operations d Init Wait
tot_ld=1 tot_lt_s 8-bit
tot_lt_s 8-bit
and d=0 c’ * t
c’*tot_lt_s < adder
o t_ 8
tot_clr=1
conditions lt_
s’
Disp
Datapath

d=1
Digital Design Controller
Copyright © 2006 9
Frank Vahid
Preview Example: Completing the Design
• Implement the FSM as
a state register and

tot_lt_s
logic

tot_clr
tot_ld
s1 s0 c n1 n0 d
– As in Ch3 0 0 0 0 0 1 0 0 1
– Table shown on right 0 0 0 1 0 1 0 0 1

Init
0 0 1 0 0 1 0 0 1
0 0 1 1 0 1 0 0 1
Inputs:: c, tot_lt_s (bit)
0 1 0 0 1 1 0 0 0
Outputs: d, tot_ld, tot_clr (bit)
0 1 0 1 0 1 0 0 0

Wait
tot_ld
c c 0 1 1 0 1 0 0 0 0
Add tot_clr 0 1 1 1 1 0 0 0 0
d Init Wait 1 0 0 0 0 1 0 1 0
tot_ld=1
Add
tot_lt_s
d=0 c’ * c’*tot_lt_s
tot 1 1 0 0 0 0 1 0 0
tot_clr=1
Disp
_ lt_
s’
Disp

d=1
Controller

Digital Design
Copyright © 2006 10
Frank Vahid
Step 1: Create a High-Level State Machine
• Let’s consider each step of the
RTL design process in more
detail Inputs : c (bit), a (8 bits) , s (8 bits)
• Step 1 Outputs : d (bit)
Local registers: tot (8 bits)
– Soda dispenser example
c
– Not an FSM because:
• Multi-bit (data) inputs a and s Init Wait
• Local register tot tot= tot+a
• Data operations tot=0, tot<s, d=0 c’ (tot<s )
c’(tot<s )’
tot=tot+a. tot=0
– Useful high-level state machine: Disp
• Data types beyond just bits d=1
• Local registers
• Arithmetic equations/expressions

Digital Design
Copyright © 2006 11
Frank Vahid
Example: Laser-Based Distance Measurer
T (in seconds)
laser
D
Object of
a
interest
sensor
2D = T sec * 3*108 m/sec

• Laser-based distance measurement – pulse laser,


measure time T to sense reflection
– Laser light travels at speed of light, 3*108 m/sec
– Distance is thus D = (T sec * 3*108 m/sec) / 2

Digital Design
Copyright © 2006 12
Frank Vahid
Example: Laser-Based Distance Measurer
T (in seconds)
B L
laser from button to laser
Laser-based
distance
sensor D 16 measurer S
to display from sensor

• Inputs/outputs
– B: bit input, from button, to begin measurement
– L: bit output, activates laser
– S: bit input, senses laser reflection
– D: 16-bit output, to display computed distance

Digital Design
Copyright © 2006 13
Frank Vahid
Example: Laser-Based Distance Measurer
DistanceMeasurer from button B Laser-
L
to laser
InputsB
: (bit), S (bit) based
OutputsL: (bit), D (16 bits) distance
D 16 measurer S
Local storage: Dreg(16) to display from sensor
(required)
a
S0 ?
(first state usually
L := '0' // laser initializes the system)
Dreg := 0off //distance
is 0

• Declare inputs, outputs, and local storage


– Dreg required for multi-bit output
• Create initial state, name it S0
– Initialize laser to off (L:='0') Recall: '0' means single bit,
– Initialize displayed distance to 0 (Dreg:=0) 0 means integer

Digital Design
Copyright © 2006 14
Frank Vahid
Example: Laser-Based Distance Measurer
from button B Laser-
L
to laser
DistanceMeasurer based
... B'// button not pressed distance
D 16 measurer S
to display from sensor

S0 S1 ?
B
L := '0' // button
Dreg := 0 pressed

• Add another state, S1, that waits for a button press


– B' – stay in S1, keep waiting
– B – go to a new state S2

Q: What should S2 do? A: Turn on the laser


a
Digital Design
Copyright © 2006 15
Frank Vahid
Example: Laser-Based Distance Measurer
from button B Laser-
L
to laser
based
DistanceMeasurer distance
... B' D 16 S
to display measurer
from sensor

S0 S1 S2 S3
B
L := '0' L := '1' L := '0'
Dreg := 0 // laser on // laser
off

• Add a state S2 that turns on the laser (L:='1')


• Then turn off laser (L:='0') in a state S3

Q: What do next? A: Start timer, wait to sense reflection


a

Digital Design
Copyright © 2006 16
Frank Vahid
Example: Laser-Based Distance Measurer
B L
fr om button to laser
DistanceMeasurer InputsB
: (bit), S (bit) Outputs
L: (bit), D (16 bits) Laser-based
Local storage: Dreg,Dctr (16 bits)
D 16
distance
S
measurer
B' t o display from sensor
S' // no reflection

S //reflection
S0 S1 S2 S3 ?
B
L := '0' Dctr := 0 L := '1' L := '0'
Dreg := 0 // reset cycle Dctr := Dctr + 1
count // count cycles
a

• Stay in S3 until sense reflection (S)


• To measure time, count cycles while in S3
– To count, declare local storage Dctr
– Initialize Dctr to 0 in S1. In S2 would have been O.K. too.
• Don't forget to initialize local storage—common mistake
– Increment Dctr each cycle in S3
Digital Design
Copyright © 2006 17
Frank Vahid
Example: Laser-Based Distance Measurer
B L
from button Laser- t o laser
DistanceMeasurerInputsB
: (bit), S (bit)Outputs
L: (bit), D (16 bits) based
Local storage:
Dreg, Dctr (16 bits) distance
D 16 S
to display measurer
from sensor
B' S'

S0 S1 S2 S3 S4
B S
L := '0' Dctr := 0 L := '1' L := '0' Dreg := Dctr/2
Dreg := 0 Dctr := Dctr+1// calculate D

• Once reflection detected (S), go to new state S4


– Calculate distance
– Assuming clock frequency is 3x108, Dctr holds number of meters, so
Dreg:=Dctr/2
• After S4, go back to S1 to wait for button again
Digital Design
Copyright © 2006 18
Frank Vahid
Step 2: Create a Datapath
• Datapath must
– Implement data storage
– Implement data computations
• Look at high-level state machine, do
three substeps
– (a) Make data inputs/outputs be datapath
inputs/outputs
– (b) Instantiate declared registers into the
datapath (also instantiate a register for each Instantiate: to
data output) introduce a new
– (c) Examine every state and transition, and
instantiate datapath components and component into a
connections to implement any data design.
computations

Digital Design
Copyright © 2006 19
Frank Vahid
Step 2 Example: Laser-Based Distance Measurer
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)
(a) Make data Local Registers: Dctr (16 bits)
inputs/outputs be
datapath B‘ S‘
inputs/outputs
(b) Instantiate declared
registers into the S0 S1 S2 S3 S4
B S
datapath (also
instantiate a L=0 Dctr = 0 L=1 L=0 D = Dctr / 2
D=0 Dctr = Dctr + 1 (calculate D)
register for each
data output) a
Datap ath
(c) Examine every Dreg_clr
state and Dreg_ld
transition, and
Dctr_clr clear clear I
instantiate Dctr: 16-bit Dreg: 16-bit
datapath Dctr_cnt count load
up-counter register
components and Q Q
connections to
implement any 16
data computations
D

Digital Design
Copyright © 2006 20
Frank Vahid
Step 2 Example: Laser-Based Distance Measurer
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)
(c) (continued) Local Registers: Dctr (16 bits)
Examine every
state and B‘ S‘
transition, and
instantiate
S0 S1 S2 S3 S4
datapath B S
components and L=0 Dctr = 0 L=1 L=0 D = Dctr / 2
connections to D=0 Dctr = Dctr + 1 (calculate D)
implement any Datap ath
a

data computations
Dreg_clr >>1
16
Dreg_ld
Dctr_clr clear clear I
Dctr: 16-bit Dreg: 16-bit
Dctr_cnt cou nt load
up-cou nter register
Q Q
16

16
D
Digital Design
Copyright © 2006 21
Frank Vahid
Step 2 Example Showing Mux Use
Localregisters:
E, F, G, R (16 bits)
E F G E F G E F G

T0 R = E + F
A B A B add_A_s0 1
2× 1

+ + add_B_s0
T1 R = R + G A B
a
+
R R

R
(a) (b) (c)

(d)
• Introduce mux when one component input can come from
more than one source
Digital Design
Copyright © 2006 22
Frank Vahid
Step 3: Connecting the Datapath to a Controller

L
B to laser
from button
Controller from sensor
Dreg_clr S

Dreg_ld
• Laser-based distance
measurer example
Dctr_clr Datapath
• Easy – just connect all
Dctr_cnt
D control signals
to display between controller and
16 300 MH z Clock
datapath
Datap ath

Dreg_clr >>1
Dreg_ld 16

Dctr_clr clear clear I


count Dctr: 16-bit Dreg: 16-bit
Dctr_cnt up-counter load register
Q Q
16
Digital Design
16
Copyright © 2006 23
Frank Vahid D
Step 4: Driving the Controller’s FSM
B
L Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)
from button
Controller
to laser
Local Registers: Dctr (16 bits)
from sensor
Dreg_clr S

Dreg_ld
B’ S’
Dctr_clr Datap ath

Dctr_cnt
D S0 S1 S2 S3 S4
to display B S
16 300 MHz Clock
L=0 Dctr = 0 L=1 L=0 D = Dctr / 2
D=0 Dctr = Dctr + 1 (calculate D)
Inputs: B, S
• FSM has same Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt
structure as high-
level state machine B’ S’

– Inputs/outputs all a

bits now B S
S0 S1 S2 S3 S4
– Replace data
operations by bit L=0 L=0 L=1 L=0 L=0
Dreg_clr = 1 Dreg_clr = 0 Dreg_clr = 0 Dreg_clr = 0 Dreg_clr = 0
operations using Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 1
datapath Dctr_clr = 0 Dctr_clr = 1 Dctr_clr = 0 Dctr_clr = 0 Dctr_clr = 0
Dctr_cnt = 0 Dctr_cnt = 0 Dctr_cnt = 0 Dctr_cnt = 1 Dctr_cnt = 0
Digital Design (laser off) (clear count) (laser on) (laser off) (load D reg with Dctr/2)
Copyright © 2006 (clear D reg) (count up) (stop counting) 24
Frank Vahid
Step 4: Deriving the Controller’s FSM
B’ S’

B S
S0 S1 S2 S3 S4

L=0 L=0 L=1 L=0 L=0


Dreg_clr = 1 Dreg_clr = 0 Dreg_clr = 0 Dreg_clr = 0 Dreg_clr = 0
Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 1
Dctr_clr = 0 Dctr_clr = 1 Dctr_clr = 0 Dctr_clr = 0 Dctr_clr = 0
Dctr_cnt = 0 Dctr_cnt = 0 Dctr_cnt = 0 Dctr_cnt = 1 Dctr_cnt = 0
(laser off) (clear count) (laser on) (laser off) (load D reg with Dctr/2)
(clear D reg) (count up) (stop counting)

Inputs: B, S
• Using Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt

shorthand of B’ S’
outputs not a

assigned B S
S0 S1 S2 S3 S4
implicitly
assigned 0 L=0 Dctr_clr = 1 L=1 L=0 Dreg_ld = 1
Dreg_clr = 1 (clear count) (laser on) Dctr_cnt = 1 Dctr_cnt = 0
(laser off) (laser off) (load D reg with Dctr/2)
(clear D reg) (count up) (stop counting)
Digital Design
Copyright © 2006 25
Frank Vahid
Step 4
B L
from button to laser Datap ath

Controller
from sensor
Dreg_clr S
Dreg_clr >>1

Datapath
Dreg_ld 16
Dreg_ld
Dctr_clr
Dctr_clr clear clear I
Dctr_cnt count Dctr: 16-bit Dreg: 16-bit
Dctr_cnt up-counter load register
D
to display Q Q
16 300 MHz Clock 16
16
D

Inputs: B, S Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt

B’ S’

• Implement
B S
S0 S1 S2 S3 S4 FSM as state
L=0 Dctr_clr = 1 L=1 L=0 Dreg_ld = 1 register and
Dreg_clr = 1 (laser on) Dctr_cnt = 1
(laser off)
(clear count)
(laser off)
Dctr_cnt = 0 logic (Ch3) to
(load D reg with Dctr/2)
(clear D reg) (count up) (stop counting) complete the
design
Digital Design
Copyright © 2006 26
Frank Vahid
b Laser
Timer Example: Laser Surgery Surgery x
System
laser

clk
System (DIY) patient
(a)
• Recall Chpt 3 laser surgery 300,000 (in binary)

example b ld M
load
– Clock was 10 ns, wanted 30 ns, Controller
en
enable 32-bit
Q 1 microsec
used 3 states. clk Q
timer
x
– What if wanted 300 ms? Adding 30
million states is not reasonable. (b)

• Use timer clk


10 ns

...
– Controller FSM loads timer, Inputs:
enables, then waits for Q=1 b ...
300 ms
Q ...
Inputs:b, Q Outputs:
ld, en, x
State Off Off Off OffStrt On ... On Off
x=0 ...
ld=1 Off b' Outputs
:
en=0 x
b Q Q'
ld ...
x=0 x=1
ld=0 Strt On ld=0 en ...
Digital Design en=1 en=1
Copyright © 2006 (d) 27
Frank Vahid (c)
5.3

RTL Design Examples and Issues


• We’ll use several more Master
processor
examples to illustrate RTL
design rd

• Example: Bus interface 32 D


4 A
– Master processor can read
register from any peripheral Per0 Per1 Per15
• Each register has unique 4-bit
address to/from processor bus
rd D A
• Assume 1 register/periph.
– Sets rd=1, A=address 32 4

– Appropriate peripheral places Faddr


Bus interface
register data on 32-bit D lines 4
• Periph’s address provided on Q
32
Faddr inputs (maybe from DIP
switches, or another register) Main part

Digital Design Peripheral


Copyright © 2006 28
Frank Vahid
RTL Example: Bus Interface
Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)
Outputs: D (32 bits)
Local register: Q1 (32 bits)
rd’ rd
((A = Faddr)
and rd’)
WaitMyAddress SendData
(A = Faddr)
D = “Z” and rd D = Q1
Q1 = Q

• Step 1: Create high-level state machine


– State WaitMyAddress
• Output “nothing” (“Z”) on D, store peripheral’s register value Q into local
register Q1
• Wait until this peripheral’s address is seen (A=Faddr) and rd=1
– State SendData
• Output Q1 onto D, wait for rd=0 (meaning main processor is done
reading the D lines)
Digital Design
Copyright © 2006 29
Frank Vahid
RTL Example: Bus Interface
Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)
Outputs: D (32 bits)
Local register: Q1 (32 bits)
rd’ rd
((A = Faddr)
and rd’)
WaitMyAddress SendData
(A = Faddr)
D = “Z” and rd D = Q1
Q1 = Q

clk
Inputs
rd

State W W SD W W SD SD W
Outputs
D Z Q1 Z Q1 Z

Digital Design
Copyright © 2006 30
Frank Vahid
RTL Example: Bus Interface

Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)


Outputs: D (32 bits)
Local register: Q1 (32 bits) A Faddr Q
rd’ rd
4 4 32
((A = Faddr)
and rd)’ Q1_ld
ld Q1
WaitMyAddress SendData
(A = Faddr)
D = “Z” and rd D = Q1 = (4-bit)
Q1 = Q 32
A_eq_Faddr

D_en
32
a

• Step 2: Create a datapath Datapath


(a) Datapath inputs/outputs
Bus interface
(b) Instantiate declared registers
D
(c) Instantiate datapath components and
connections
Digital Design
Copyright © 2006 31
Frank Vahid
RTL Example: Bus Interface
Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)
Outputs: D (32 bits)
Local register: Q1 (32 bits)
rd’ rd A Faddr Q
Inputs: rd, A_eq_Faddr
((A(bit)
= Faddr)
Outputs: Q1_ld, D_en and
(bit) rd)’ 4 4 32
WaitMyAddress ‘
rdSendData Q1_ld
rd rd ld
(A = Faddr) Q1
D = “Z” and(A_eq_
rd Faddr D = Q1
Q1 = Q and rd) ‘
= (4-bit) 32
WaitMyAdd ress SendD ata A_eq_Faddr
A_eq_ Faddr
D_en = 0 and rd D_en = 1 D_en
a Q1_ld = 1 Q1_ld = 0 32

Datapath
Bus interface

• Step 3: Connect datapath to controller D

• Step 4: Derive controller’s FSM


Digital Design
Copyright © 2006 32
Frank Vahid
RTL Example: Video Compression – Sum of Absolute
Only difference: ball moving
Differences
Frame 1 Frame 2 Frame 1 Frame 2

Digitized Digitized Digitized Difference of a


frame 1 frame 2 frame 1 2 from 1

1 Mbyte 1 Mbyte 1 Mbyte 0.01 Mbyte


(a) (b)
Just send
• Video is a series of frames (e.g., 30 per second) difference
• Most frames similar to previous frame
– Compression idea: just send difference from previous frame
Digital Design
Copyright © 2006 33
Frank Vahid
RTL Example: Video Compression – Sum of Absolute
Differences
compare Each is a pixel, assume
Frame 1 Frame 2
represented as 1 byte
(actually, a color picture
might have 3 bytes per
pixel, for intensity of
red, green, and blue
components of pixel)
• Need to quickly determine whether two frames are similar
enough to just send difference for second frame
– Compare corresponding 16x16 “blocks”
• Treat 16x16 block as 256-byte array
– Compute the absolute value of the difference of each array item
– Sum those differences – if above a threshold, send complete frame
for second frame; if below, can use difference method (using
another technique, not described)
Digital Design
Copyright © 2006 34
Frank Vahid
RTL Example: Video Compression – Sum of Absolute
Differences

A SAD
256-byte array

integer
B sad
256-byte array
go

!(i<256)

• Want fast sum-of-absolute-differences (SAD) component


– When go=1, sums the differences of element pairs in arrays A and
B, outputs that sum

Digital Design
Copyright © 2006 35
Frank Vahid
RTL Example: Video Compression – Sum of Absolute
Differences
A SAD
Inputs: A, B (256 byte memory); go (bit)
Outputs: sad (32 bits)
B sad Local registers: sum, sad_reg (32 bits); i (9 bits)

go
S0 !go
go
• S0: wait for go sum = 0 a
S1
• S1: initialize sum and index i=0

• S2: check if done (i>=256) (i<256)’


S2
!(i<256)
• S3: add difference to sum, i<256
increment index S3
sum=sum+abs(A[i]-B[i])
i=i+1
• S4: done, write to output
sad_reg S4 sad_ r eg = sum

Digital Design
Copyright © 2006 36
Frank Vahid
RTL Example: Video Compression – Sum of Absolute
Differences
Inputs: A, B (256 byte memory); go (bit) AB_addr A_data B_data
Outputs: sad (32 bits)
Local registers: sum, sad_reg (32 bits); i (9 bits) i_lt_256
<256 8 8
9
S0 !go i_inc
go i_clr
i –
sum = 0 a
8
S1
i=0
sum_ld
(i<256)’ sum 32 abs
S2 sum_clr
i<256 !(i<256) 32 32 8
sum=sum+abs(A[i]-B[i]) sad_reg_ld
S3
i=i+1
sad_reg +
!(i<256) (i_lt_256)
S4 sad_ reg=sum
Datapath 32

sad
• Step 2: Create datapath
Digital Design
Copyright © 2006 37
Frank Vahid
RTL Example: Video Compression – Sum of Absolute
Differences
go AB_rd AB_addr A_data B_data

i_lt_256
<256 8 8
S0 go’
9
go i_inc
S1
sum=0 sum_clr=1
i_clr
i –
i=0 i_clr=1
8
S2 sum_ld
 i<256 i_lt_256 sum 32 abs
sum_clr
S3 sum=sum+abs(A[i]-B[i])
sum_ld=1; AB_rd=1 32 32 8
i=i+1 i_inc=1 !(i<256)
sad_reg_ld
S4 sad_reg=sum a
sad_reg +
(i<256) (i_lt_256) sad_reg_ld=1
!(i<256) (i_lt_256) Controller 32

sad
• Step 3: Connect to controller
• Step 4: Replace high-level state machine by FSM
Digital Design
Copyright © 2006 38
Frank Vahid
RTL Example: Video Compression – Sum of Absolute
Differences
• Comparing software and custom
circuit SAD
– Circuit: Two states (S2 & S3) for
each i, 256 i’s 512 clock cycles
– Software: Loop (for i = 1 to 256), but (i<256)’
S2
for each i, must move memory to
local registers, subtract, compute i<256
sum=sum+abs(A[i]-B[i])
absolute value, add to sum, S3
i=i+1
increment i – say about 6 cycles per
array item  256*6 = 1536 !(i<256)
cycles
– Circuit is about 3 times (300%)
faster
!(i<256) (i_lt_256)
– Later, we’ll see how to build SAD
circuit that is even faster

Digital Design
Copyright © 2006 39
Frank Vahid
RTL Design Pitfalls and Good Practice
• Common pitfall: Assuming Local registers: R, Q (8 bits)
register is update in the
state it’s written R<100 C

– Final value of Q? A B R>=100


– Final state?
R=99 R=R+1 D
– Answers may surprise you Q=R
(a)
• Value of Q unknown
R<100
• Final state is C, not D
clk A B C
– Why?
99 100
• State A: R=99 and Q=R R ? 99 100
happen simultaneously
• State B: R not updated with Q ? ? ?
R+1 until next clock cycle,
simultaneously with state (b)
register being updated

Digital Design
Copyright © 2006 40
Frank Vahid
RTL Design Pitfalls and Good Practice
• Solutions Local registers: R, Q (8 bits)

– Read register in R<100 C


following state (Q=R) A B B2 R>=100
– Insert extra state so that
R=99 R=R+1 D
conditions use updated Q=R Q=R
value (a)

– Other solutions are R<100 R>=100

possible, depends on clk A B B2 D


the example 99 100
R ? 99 100 100

Q ? ? 99 99

(b)

Digital Design
Copyright © 2006 41
Frank Vahid
RTL Design Pitfalls and Good Practice
• Common pitfall: Inputs: A, B (8 bits) Inputs: A, B (8 bits)
Reading outputs Outputs: P (8 bits) Outputs: P (8 bits)
Local register: R (8 bits)
– Outputs can only be
written
– Solution: Introduce S T S T
additional register,
which can be written P=A P=P+B R=A P=R+B
and read P=A

(a) (b)

Digital Design
Copyright © 2006 42
Frank Vahid
RTL Design Pitfalls and Good Practice
• Good practice: Register B B
all data outputs R R
– In fig (a), output P would
show spurious values as
addition computes
• Furthermore, longest + +
register-to-register path,
which determines clock
period, is not known until P
that output is connected
to another component (a) Preg
– In fig (b), spurious outputs
reduced, and longest P
register-to-register path is (b)
clear

Digital Design
Copyright © 2006 43
Frank Vahid
Control vs. Data Dominated RTL Design
• Designs often categorized as control-dominated or data-
dominated
– Control-dominated design – Controller contains most of the
complexity
– Data-dominated design – Datapath contains most of the complexity
– General, descriptive terms – no hard rule that separates the two
types of designs
– Laser-based distance measurer – control dominated
– Bus interface– mix of control and data
– Now let’s do a data dominated design

Digital Design
Copyright © 2006 44
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
• Filter concept
– Suppose X is data from a
temperature sensor, and
particular input sequence is
180, 180, 181, 240, 180, 181 X Y
(one per clock cycle)
– That 240 is probably wrong! 12 digital filter 12
• Could be electrical noise clk
– Filter should remove such
noise in its output Y
– Simple filter: Output average
of last N values
• Small N: less filtering
• Large N: more filtering, but
less sharp output

Digital Design
Copyright © 2006 45
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
• FIR filter
– “Finite Impulse Response” X Y
– Simply a configurable weighted 12 digital filter 12
sum of past input values clk
– y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
• Above known as “3 tap”
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
• Tens of taps more common
• Very general filter – User sets the
constants (c0, c1, c2) to define
specific filter
– RTL design
• Step 1: Create high-level state
machine
– But there really is none! Data
dominated indeed.
• Go straight to step 2
Digital Design
Copyright © 2006 46
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
• Step 2: Create datapath X Y
12 digital filter 12
– Begin by creating chain clk
of xt registers to hold
past values of X
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
Suppose sequence is: 180, 181, 240
3-tap FIR filter
x(t) x(t-1) x(t-2)

xt0 xt1 xt2


X 240
180
181 180
181 180 Y
12 12 12 12 a

clk

Digital Design
Copyright © 2006 47
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
• Step 2: Create datapath X Y
12 digital filter 12
(cont.) clk
– Instantiate registers for
c0, c1, c2
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
– Instantiate multipliers to
compute c*x values
3-tap FIR filter
x(t) x(t-1) x(t-2)
c0 c1 c2
xt0 xt1 xt2
X
a
clk
  
Y

Digital Design
Copyright © 2006 48
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
• Step 2: Create datapath X Y
12 digital filter 12
(cont.) clk
– Instantiate adders

y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)


3-tap FIR filter
x(t) x(t-1) x(t-2)
c0 c1 c2
xt0 xt1 xt2
X

clk
a
  

+ +
Y

Digital Design
Copyright © 2006 49
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
• Step 2: Create datapath (cont.) X Y
12 digital filter 12
– Add circuitry to allow loading of clk
particular c register
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
CL 3-tap FIR filter
e
3
Ca1 2x4 2
Ca0 1
0
C

x(t) x(t-1) x(t-2)


c0 c1 c2
xt0 xt1 xt2 a
X

clk

* * *

+ + yreg
Y
Digital Design
Copyright © 2006 50
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
• Step 3 & 4: Connect to controller, Create FSM
– No controller needed
– Extreme data-dominated example
– (Example of an extreme control-dominated design – an FSM, with no
datapath)
• Comparing the FIR circuit to a software implementation
– Circuit
• Assume adder has 2-gate delay, multiplier has 20-gate delay
• Longest past goes through one multiplier and two adders
– 20 + 2 + 2 = 24-gate delay
• 100-tap filter, following design on previous slide, would have about a 34-gate
delay: 1 multiplier and 7 adders on longest path
– Software
• 100-tap filter: 100 multiplications, 100 additions. Say 2 instructions per
multiplication, 2 per addition. Say 10-gate delay per instruction.
• (100*2 + 100*2)*10 = 4000 gate delays
– Circuit is more than 100 times faster (10,000% faster). Wow.

Digital Design
Copyright © 2006 51
Frank Vahid
5.4

Determining Clock Frequency


• Designers of digital circuits
often want fastest
performance clk a b
– Means want high clock
frequency
• Frequency limited by longest
register-to-register delay
2 ns 
delay
– Known as critical path
– If clock is any faster, incorrect
data may be stored into register c
– Longest path on right is 2 ns
• Ignoring wire delays, and
register setup and hold times,
for simplicity

Digital Design
Copyright © 2006 52
Frank Vahid
Critical Path
• Example shows four paths
– a to c through +: 2 ns
– a to d through + and *: 7 ns
– b to d through + and *: 7 ns a b

– b to d through *: 5 ns
• Longest path is thus 7 ns 2 ns + * 5 ns
delay delay
• Fastest frequency
7 ns 7 ns

2 ns

5 ns
7 ns
7 ns
– 1 / 7 ns = 142 MHz c d
Max
(2,7,7,5)
= 7 ns

Digital Design
Copyright © 2006 53
Frank Vahid
Critical Path Considering Wire Delays
• Real wires have delay too
– Must include in critical path
• Example shows two paths
– Each is 0.5 + 2 + 0.5 = 3 ns clk a b
• Trend
0.5 ns
– 1980s/1990s: Wire delays were tiny 0.5 ns
compared to logic delays
– But wire delays not shrinking as fast as + 2 ns
logic delays
• Wire delays may even be greater than 0.5 ns
logic delays!

3 ns

3 ns
c 3 ns
• Must also consider register setup and
hold times, also add to path
• Then add some time to the computed
path, just to be safe
– e.g., if path is 3 ns, say 4 ns instead
Digital Design
Copyright © 2006 54
Frank Vahid
A Circuit May Have Numerous Paths
• Paths can exist s a

– In the datapath Combinational logic 8 8


d
– In the controller
– Between the tot_ld
ld
controller and t ot_clr tot
c clr
datapath
(c ) 8
– May be tot_lt_s
n1

hundreds or
thousands of n0
8-bit 8-bit
< adder
paths tot_lt_s 8

• Timing analysis Datapath


s1 s0
tools that evaluate (b) (a)
all possible paths clk State register

automatically very
helpful
Digital Design
Copyright © 2006 55
Frank Vahid
5.5

Behavioral Level Design: C to Gates


C code
S0 !go
in t SAD (byte A[256], byte B[256]) // not quite C syntax
go
{
sum = 0
S1 uint sum; short uint I;
i=0
sum = 0;
(i<256)’ i = 0;
S2 while (i < 256) {
sum = sum + abs(A[i] – B[i]);
i<256
i = i + 1;
sum=sum+abs(A[i]-B[i])
S3 }
i=i+1
return sum;
}
a
S4 sad_ reg = sum

• Earlier sum-of-absolute-differences example


– Started with high-level state machine
– C code is an even better starting point -- easier to understand
Digital Design
Copyright © 2006 56
Frank Vahid
Behavioral-Level Design: Start with C (or Similar
Language)
• Replace first step of RTL design method by two steps
– Capture in C, then convert C to high-level state machine
– How convert from C to high-level state machine?

Step 1A: Capture in C


a
Step 1B: Convert to high-level state machine

Digital Design
Copyright © 2006 57
Frank Vahid
Converting from C to High-Level State Machine
• Convert each C construct to
equivalent states and
transitions
• Assignment statement
target= a
– Becomes one state with target = expression;
expression
assignment
• If-then statement
– Becomes state with condition !cond
check, transitioning to “then” cond
if (cond) {
statements if condition true, // then stmts (then stmts) a

otherwise to ending state }

• “then” statements would also (end)


be converted to states

Digital Design
Copyright © 2006 58
Frank Vahid
Converting from C to High-Level State Machine
• If-then-else
!cond
– Becomes state with condition
if (cond) { cond
check, transitioning to “then” // then stmts
(then stmts) (else stmts)
statements if condition true, or }
else { a
to “else” statements if condition // else stmts (end)
false }

• While loop statement !cond

cond
– Becomes state with condition while (cond) {
// while stmts (while stmts)
check, transitioning to while }
a

loop’s statements if true, then


transitioning back to condition
(end)
check
Digital Design
Copyright © 2006 59
Frank Vahid
Simple Example of Converting from C to High-
Level State Machine
Inputs: uint X, Y
Outputs: uint Max !(X>Y) !(X>Y)

X>Y X>Y
if (X > Y) {
Max = X; (then stmts) (else stmts) Max=X Max=Y
}
else {
Max = Y;
(end) (end)
}
a a

(a) (b) (c)

• Simple example: Computing the maximum of two numbers


– Convert if-then-else statement to states (b)
– Then convert assignment statements to states (c)
Digital Design
Copyright © 2006 60
Frank Vahid
Example: Converting Sum-of-Absolute-Differences C
code to High-Level State Machine
• Convert each construct to Inputs: byte A[256, B[256]
bit go;
!(!go)
Output: int sad
states main() !go !go go !go go
{
– Simplify when possible, uint sum; short uint I;
while (1) {
sum=0 sum=0
i=0
e.g., merge states
while (!go); i=0
• From high-level state sum = 0;
(d)
i = 0;
machine, follow RTL design while (i < 256) {
(b)
(c)

method to create circuit sum = sum + abs(A[i] - B[i]);


i = i + 1;
• Thus, can convert C to }
}
sad = sum;
gates using straightforward }
(a)
!go go !go go
a

automatable process !go go


sum=0
i=0
sum=0
i=0
– Not all C constructs can be sum=0 !(i<256) !(i<256)
efficiently converted i=0
i<256 i<256
– Use C subset if intended !(i<256)
sum=sum sum=sum
for circuit i<256
+ abs
i=i+1
+ abs
i=i+1
while stmts
– Can use languages other sad =
than C, of course sum

(g)
Digital Design
(e) sad =
Copyright © 2006 sum 61
Frank Vahid (f)
4.10

Register Files
• MxN register file
component provides er C
32
C
efficient access to M N- ert 8
d0 d0 loadload
a

reg0reg0 huge mux


bit-wide registers sompu
t T
32 o the ab
om theompu omi
i0 i0
– If we have many c
car 4 162  4 or displ T
the
c al
rcar's too much 8 mi T r
r
8-bit ror displ
registers but only need om the
alr nt 32-bit
16x41× 1 r
fanout
r r
t F e
c 4 a0
d1 load reg1 A
access one or two at a ec F n i0 o
time, a register file is 8
i1
ov ya ve
i3-i0
a1
i1 ae
d dy - D D
load reg2
more efficient d2 I 32 8

– Ex: Above-mirror display 8


i2
congestion
(earlier example), but this d3 load reg3 M
d15
e load reg15
time having 16 32-bit e
registers load i15i3 s1 s0
load 32 8
s3-s0
• Too many wires, and x y
big mux is too slow

Digital Design
Copyright © 2006 62
Frank Vahid
Register File
• Instead, want component that has one data input and one data output,
and allows us to specify which internal register to write and which to read

32 32
W_data R_data a

4 4
W_addr R_addr

W_en R_en
16×32
register file

Digital Design
Copyright © 2006 63
Frank Vahid
Register File Timing Diagram
• Can write one clk
cycle 1 cycle 2 cycle 3 cycle 4 cycle 5 cycle 6

register and read 1 2 3 4 5 6

one register each W_data 9 22 X X 177 555

clock cycle W_addr 3 1 X X 2 3


– May be same
W_en
register
R_data Z Z Z 9 Z 22 9 555

R_addr X X 3 X 1 3

R_en

0: ? 0: ? 0: ? 0: ? 0: ? 0: ? 0: ?
32 32
W_data R_data
1: ? 1: ? 1: 22 1: 22 1: 22 1: 22 1: 22
2: ? 2: ? 2: ? 2: ? 2: ? 2: 177 2: 177
2 2
W_addr R_addr 3: ? 3: 9 3: 9 3: 9 3: 9 3: 9 3: 555

W_en R_en
4x32
register file

Digital Design
Copyright © 2006 64
Frank Vahid
5.6

Memory Components
• Register-transfer level
design instantiates datapath
components to create
datapath, controlled by a
controller

M words
– A few more components are
often used outside the
controller and datapath
• MxN memory
– M words, N bits wide each N-bits
wide each
• Several varieties of memory,
M× N memory
which we now introduce

Digital Design
Copyright © 2006 65
Frank Vahid
Random Access Memory (RAM)
• RAM – Readable and writable memory 32 32
W_data R_data
– “Random access memory” 4 4
• Strange name – Created several decades ago to W_addr R_addr
contrast with sequentially-accessed storage like W_en R_en
tape drives 16×32
register file
– Logically same as register file – Memory with
address inputs, data inputs/outputs, and control Register file from Chpt. 4
• RAM usually just one port; register file usually two
or more
– RAM vs. register file 32
data
• RAM typically larger than roughly 512 or 1024 10
addr
words 1024 × 32
rw RAM
• RAM typically stores bits using a bit storage
approach that is more efficient than a flip flop en
• RAM typically implemented on a chip in a square
rather than rectangular shape – keeps longest
RAM block symbol
wires (hence delay) short
Digital Design
Copyright © 2006 66
Frank Vahid
RAM Internal Structure
32
data
10
addr Let A = log2M wdata(N-1) wdata(N-2) wdata0
1024x32
rw RAM word bit storage
en enable block
d0 (aka “cell”)
addr0 a0 word
addr1 a1 AxM
d1
decoder
data cell
addr
addr(A-1) a(A-1)
word word
e d(M-1) enable enable
clk
rw data
en
rw to all cells

rdata(N-1) rdata(N-2) rdata0 RAM cell

• Similar internal structure as register file


– Decoder enables appropriate word based on address inputs
– rw controls whether cell is written or read
– Let’s see what’s inside each RAM cell
Digital Design
Copyright © 2006 67
Frank Vahid
Static RAM (SRAM)
wdata(N-1) wdata(N-2) wdata0
SRAM cell
32 Let A = log2 M
data data’
data word bit storage
10 enable block ,, ,,
addr d0 (aka cell ) cell
1024x32 addr0 a0 word d d’
rw RAM addr1 a1 A× M
d1

addr
decoder
en addr(A-1) a(A-1) data cell
a
word word
e d(M-1) enable enable
clk
rw data
en
rw to all cells
word 0
rdata(N-1) rdata(N-2) rdata0 enable

• “Static” RAM cell SRAM cell


– 6 transistors (recall inverter is 2 transistors) data data’
1 0
– Writing this cell d
• word enable input comes from decoder a
• When 0, value d loops around inverters 1 0
– That loop is where a bit stays stored
• When 1, the data bit value enters the loop word 1
– data is the bit to be stored in this cell enable
– data’ enters on other side
data data’
– Example shows a “1” being written into cell cell
d d’
1 0 a
Digital Design
Copyright © 2006 68
Frank Vahid word 0
enable
Static RAM (SRAM)
wdata(N-1) wdata(N-2) wdata0
32 Let A = log2 M
data word bit storage
10 enable block ,, ,,
addr d0 (aka cell )
1024x32 addr0 a0 word
rw RAM addr1 a1 A× M
d1

addr
decoder
en addr(A-1) a(A-1) data cell

word word
e d(M-1) enable enable
clk
rw data
en
rw to all cells

• “Static” RAM cell rdata(N-1) rdata(N-2) rdata0


SRAM cell
– Reading this cell data data’
1 1
• Somewhat trickier
d
• When rw set to read, the RAM logic sets both data
and data’ to 1
1 0
• The stored bit d will pull either the left line or the right
a
bit down slightly below 1
• “Sense amplifiers” detect which side is slightly pulled 1 1 <1
word
down enable
– The electrical description of SRAM is really beyond To sense amplifiers
our scope – just general idea here, mainly to
contrast with DRAM...

Digital Design
Copyright © 2006 69
Frank Vahid
Dynamic RAM (DRAM)
wdata(N-1) wdata(N-2) wdata0
32 Let A = log2 M
data word bit storage
10 enable block ,, ,,
addr d0 (aka cell )
1024x32 addr0 a0 word
rw RAM addr1 a1 A× M
d1

addr
decoder
en addr(A-1) a(A-1) data cell

word word
e d(M-1) enable enable
clk
en
rw to all cells
rw data
DRAM cell
• “Dynamic” RAM cell rdata(N-1) rdata(N-2) rdata0 data

cell
– 1 transistor (rather than 6)
word
– Relies on large capacitor to store bit enable
d
capacitor
• Write: Transistor conducts, data voltage slowly
level gets stored on top plate of capacitor discharging

• Read: Just look at value of d (a)


• Problem: Capacitor discharges over time
data
– Must “refresh” regularly, by reading d and
enable
then writing it right back
discharges
d
(b)
Digital Design
Copyright © 2006 70
Frank Vahid
Comparing Memory Types
• Register file MxN Memory
– Fastest implemented as a:

– But biggest size register


file
• SRAM
– Fast SRAM
– More compact than register file DRAM
• DRAM
– Slowest
• And refreshing takes time
Size comparison for same
– But very compact
number of bits (not to scale)
• Use register file for small items,
SRAM for large items, and DRAM
for huge items
– Note: DRAM’s big capacitor requires
a special chip design process, so
DRAM is often a separate chip
Digital Design
Copyright © 2006 71
Frank Vahid
Reading and Writing a RAM
clk clk
1 2 3
addr 9 13 9 addr valid setup
time
data 500 999 Z 500 data valid hold Z 500
time
rw 1 means write setup
rw
time
en access
RAM[9] RAM[13] time
now equals 500 now equals 999
• Writing (b)
– Put address on addr lines, data on data lines, set rw=1, en=1
• Reading
– Set addr and en lines, but put nothing (Z) on data lines, set rw=0
– Data will appear on data lines
• Don’t forget to obey setup and hold times
– In short – keep inputs stable before and after a clock edge
Digital Design
Copyright © 2006 72
Frank Vahid
RAM Example: Digital Sound Recorder
4096× 16
RAM

addr
data

rw
en
wire 16
analog-to- digital-to-
digital 12 analog
ad_buf Ra Rrw Ren wire
microphone converter converter
ad_ld processor da_ld

• Behavior speaker
– Record: Digitize sound, store as series of 4096 12-bit digital values in RAM
• We’ll use a 4096x16 RAM (12-bit wide RAM not common)
– Play back later
– Common behavior in telephone answering machine, toys, voice recorders
• To record, processor should read a-to-d, store read values into
successive RAM words
– To play, processor should read successive RAM words and enable d-to-a
Digital Design
Copyright © 2006 73
Frank Vahid
RAM Example: Digital Sound Recorder
4096x16
• RTL design of processor RAM

– Create high-level state


machine 16
analog-to- digital-to-
– Begin with the record behavior digital ad_buf
12
Ra Rw Ren analog
converter converter
– Keep local register a ad_ld processor da_ld
• Stores current address,
ranges from 0 to 4095 (thus
Record behavior
need 12 bits)
Local register: a (12 bits)
– Create state machine that a<4095
counts from 0 to 4095 using a S T
• For each a a=0 ad_ld=1 a

– Read analog-to-digital conv. ad_buf=1


Ra=a U
» ad_ld=1, ad_buf=1 Rrw=1 a=a+1
– Write to RAM at address a Ren=1

» Ra=a, Rrw=1, Ren=1 a=4095

Digital Design
Copyright © 2006 74
Frank Vahid
RAM Example: Digital Sound Recorder
4096x16
– Now create play behavior RAM data bus
– Use local register a again,
create state machine that 16
counts from 0 to 4095 again analog-to-
digital 12
digital-to-
analog
ad_buf Ra Rw Ren
• For each a converter converter
ad_ld processor da_ld
– Read RAM
– Write to digital-to-analog conv.
• Note: Must write d-to-a one
Play behavior
cycle after reading RAM, when
Local register: a (12 bits)
the read data is available on
the data bus a<4095
V W
– The record and play state a=0
a
ad_buf=0
machines would be parts of a Ra=a
X
larger state machine controlled Rrw=0
Ren=1
by signals that determine when da_ld=1
a=a+1
to record or play
a=4095

Digital Design
Copyright © 2006 75
Frank Vahid
Read-Only Memory – ROM
• Memory that can only be read from, not 32
data
written to 10
addr
1024× 32
– Data lines are output only rw RAM
– No need for rw input en

• Advantages over RAM


– Compact: May be smaller RAM block symbol

– Nonvolatile: Saves bits even if power supply


is turned off 32
– Speed: May be faster (especially than data
10
DRAM) addr
1024x32
ROM
– Low power: Doesn’t need power supply to
en
save bits, so can extend battery life
• Choose ROM over RAM if stored data won’t ROM block symbol
change (or won’t change often)
– For example, a table of Celsius to Fahrenheit
conversions in a digital thermometer
Digital Design
Copyright © 2006 76
Frank Vahid
Read-Only Memory – ROM
32
data
10 1024x32
addr Let A = log2M
ROM
en
word bit storage
enable block
ROM block symbol d0 (aka “cell”)
addr0 a0 word
addr1 a1 AxM
d1
decoder
data
addr
addr(A-1) a(A-1)
word word
e d(M-1) enable enable
clk
en data

rdata(N-1) rdata(N-2) rdata0 ROM cell

• Internal logical structure similar to RAM, without the data


input lines

Digital Design
Copyright © 2006 77
Frank Vahid
ROM Types
• If a ROM can only be read, how Let A = log2 M
word bit storage

are the stored bits stored in the


enable block
,, ,,
d0 (a cell )
addr0 a0 word
addr1 a1 A× M
d1

addr
first place?
decoder
data
addr(A-1) a(A-1) cell
word word
e d(M-1) enable enable

– Storing bits in a ROM known as en


da

programming
data(N-1) data(N-2) data0

– Several methods
• Mask-programmed ROM 1 data line 0 data line

– Bits are hardwired as 0s or 1s cell cell


during chip manufacturing word
• 2-bit word on right stores “10” enable
• word enable (from decoder) simply
passes the hardwired value
through transistor
– Notice how compact, and fast, this
memory would be
Digital Design
Copyright © 2006 78
Frank Vahid
ROM Types
• Fuse-Based Programmable Let A = log2 M
word
enable
bit storage
block
,, ,,

ROM
d0 (a cell )
addr0 a0 word
addr1 a1 A× M
d1

addr
decoder
data
addr(A-1) a(A-1)

– Each cell has a fuse


cell
word word
e d(M-1) enable enable
da
en

– A special device, known as a data(N-1) data(N-2) data0

programmer, blows certain fuses


(using higher-than-normal voltage)
1 data line 1 data line
• Those cells will be read as 0s
(involving some special electronics) cell cell
• Cells with unblown fuses will be read word
a

as 1s enable

• 2-bit word on right stores “10”


fuse blown fuse
– Also known as One-Time
Programmable (OTP) ROM

Digital Design
Copyright © 2006 79
Frank Vahid
ROM Types
• Erasable Programmable ROM Let A = log2 M
word bit storage

(EPROM)
enable block
,, ,,
d0 (a cell )
addr0 a0 word
addr1 a1 A× M
d1

addr
– Uses “floating-gate transistor” in each cell
decoder
data
addr(A-1) a(A-1) cell
word word
e d(M-1) enable enable

– Special programmer device uses higher- en


da

than-normal voltage to cause electrons to data(N-1) data(N-2) data0

tunnel into the gate

floating-gate
• Electrons become trapped in the gate data line data line

transistor
• Only done for cells that should store 0 cell cell
• Other cells (without electrons trapped in 1 0
gate) will be 1 or
t
word eÐeÐ
– 2-bit word on right stores “10” enable
tingar
• Details beyond our scope – just general eatt trapped electrons
idea is necessary here g
– To erase, shine ultraviolet light onto chip
• Gives trapped electrons energy to escape
• Requires chip package to have window
Digital Design
Copyright © 2006 80
Frank Vahid
ROM Types
• Electronically-Erasable Programmable ROM
(EEPROM)
– Similar to EPROM
• Uses floating-gate transistor, electronic programming to
trap electrons in certain cells
– But erasing done electronically, not using UV light
– Erasing done one word at a time
• Flash memory
– Like EEPROM, but all words (or large blocks of
words) can be erased simultaneously 32
data
– Become common relatively recently (late 1990s) 10
addr
• Both types are in-system programmable en 1024x32
– Can be programmed with new stored bits while in the EEPROM
write
system in which the ROM operates
• Requires bi-directional data lines, and write control input busy

• Also need busy output to indicate that erasing is in


progress – erasing takes some time
Digital Design
Copyright © 2006 81
Frank Vahid
ROM Example: Digital Telephone Answering Machine
Using a Flash Memory
• Want to record the outgoing
announcement 4096x16 Flash
– When rec=1, record digitized “We’re not home.”
sound in locations 0 to 4095
busy
– When play=1, play those
stored sounds to digital-to- 16
analog converter analog-to-
digital 12 digital-to-
ad_buf Ra Rrw Ren er bu
• What type of memory? converter analog
– Should store without power ad_ld processor converter
da_ld
supply – ROM, not RAM
– Should be in-system rec
programmable – EEPROM record play
or Flash, not EPROM, OTP
microphone speaker
ROM, or mask-programmed
ROM
– Will always erase entire
memory when
reprogramming – Flash
better than EEPROM

Digital Design
Copyright © 2006 82
Frank Vahid
ROM Example: Digital Telephone Answering Machine
Using a Flash Memory
• High-level state machine 4096x16 Flash

– Once rec=1, begin


erasing flash by setting
16
er=1 analog-to-
digital 12 digital-to-
ad_buf Ra Rrw Ren er bu
– Wait for flash to finish converter
ad_ld processor
analog
converter
da_ld
erasing by waiting for
rec
bu=0 record play

– Execute loop that sets microphone speaker

local register a from 0 to


4095, reading analog-to- Local register: a (13 bits)
bu
digital converter and a<4096 a
writing to flash for each a S T bu’ U
a=0 er=0 ad_ld=1
er=1 ad_buf=1
Ra=a V
rec
Rrw=1
Ren=1
a=a+1 a=4096

Digital Design
Copyright © 2006 83
Frank Vahid
Blurring of Distinction Between ROM and RAM
• We said that
– RAM is readable and writable ROM Flash RAM
a
EEPROM NVRAM
– ROM is read-only
• But some ROMs act almost like RAMs
– EEPROM and Flash are in-system programmable
• Essentially means that writes are slow
– Also, number of writes may be limited (perhaps a few million times)
• And, some RAMs act almost like ROMs
– Non-volatile RAMs: Can save their data without the power supply
• One type: Built-in battery, may work for up to 10 years
• Another type: Includes ROM backup for RAM – controller writes RAM contents to
ROM before turning off
• New memory technologies evolving that merge RAM and ROM benefits
– e.g., MRAM
• Bottom line
– Lot of choices available to designer, must find best fit with design goals
Digital Design
Copyright © 2006 84
Frank Vahid
Hierarchy and Abstraction

• Abstraction
– Hierarchy often involves not just grouping
items into a new item, but also associating
higher-level behavior with the new item,
known as abstraction
• e.g., an 8-bit adder has an understandable a7.. a0 b7.. b0
high-level behavior – it adds two 8-bit binary
numbers 8-bit adder ci
– Frees designer from having to remember,
co s7.. s0
or even from having to understand, the
lower-level details

Digital Design
Copyright © 2006 85
Frank Vahid
Hierarchy and Composing Larger Components
from Smaller Versions
4× 1
• A common task is to compose smaller components i0 i0
into a larger one i1 i1 a
– Gates: Suppose you have plenty of 3-input AND gates, i2 i2 d
but need a 9-input AND gate
i3 i3
• Can simple compose the 9-input gate from several 3-input
gates 2× 1
– Muxes: Suppose you have 4x1 and 2x1 muxes, but s1 s0 i0
need an 8x1 mux d
4× 1 i1
• s2 selects either top or bottom 4x1
• s1s0 select particular 4x1 input i4 i0 s0
• Implements 8x1 mux – 8 data inputs, 3 selects, one output i5 i1
i6 i2 d
i7 i3
Pr
o
vin s1 s0
ec1
s1 s0 s2

Digital Design
Copyright © 2006 86
Frank Vahid
Hierarchy and Composing Larger Components
from Smaller Versions
• Composing memory very common
• Making memory words wider
– Easy – just place memories side-by-side until desired width obtained
– Share address/control lines, concatenate data lines
– Example: Compose 1024x8 ROMs into 1024x32 ROM
10
addr addr addr addr
1024x8 1024x8 1024x8 1024x8
addr ROM ROM ROM ROM
en en en en
data data data data
en
8 8 8 8

data(31..0)
10
1024x32
ROM
data
Digital Design
Copyright © 2006 32
87
Frank Vahid
Hierarchy and Composing Larger Components
from Smaller Versions
11
• Creating memory with more words a9..a0
addr
– Put memories on top of one another until the number 1x2 d0 1024x8
of desired words is achieved addr a10
i0 dcd ROM
– Use decoder to select among the memories
e d1 en data
• Can use highest order address input(s) as decoder input
• Although actually, any address line could be used 8
– Example: Compose 1024x8 memories into 2048x8
memory en addr
1024x8
11 ROM
2048x8 en data
ROM
a10 a9 a8 a0 8
0 0 0 0 0 0 0 0 0 0 0 data
0 0 0 0 0 0 0 0 0 0 1 addr 8
0 0 0 0 0 0 0 0 0 1 0 1024x8
a
ROM
0 1 1 1 1 1 1 1 1 1 0 en data
a10 just chooses
0 1 1 1 1 1 1 1 1 1 1 a
which memory to To create memory with more
access 1 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 1 addr words and wider words, can first
1 0 0 0 0 0 0 0 0 1 0 1024x8 compose to enough words, then
ROM widen.
Digital Design
1 1 1 1 1 1 1 1 1 1 0 en data
Copyright © 2006 88
Frank Vahid 1 1 1 1 1 1 1 1 1 1 1
Chapter Summary
– Modern digital design involves creating processor-level components
– Four-step RTL method can be used
• 1. High-level state machine 2. Create datapath 3. Connect datapath
to controller 4. Derive controller FSM
– Several example
• Control dominated, data dominated, and mix
– Determining fastest clock frequency
• By finding critical path
– Behavioral-level design – C to gates
• By using method to convert C (subset) to high-level state machine
– Additional RTL components
• Memory: RAM, ROM
• Queues
– Hierarchy: A key concept used throughout Chapters 2-5
Digital Design
Copyright © 2006 89
Frank Vahid

You might also like