You are on page 1of 47

Digital Design

Chapter 5:
Register-Transfer Level
(RTL) Design
Slides to accompany the textbook Digital Design, First Edition,
by Frank Vahid, John Wiley and Sons Publishers, 2007.
http://www.ddvahid.com

Copyright © 2007 Frank Vahid


Instructors of courses requiring Vahid's Digital Design textbook (published by John Wiley and Sons) have permission to modify and use these slides for customary course-related activities,
Digital
subject to keeping Design
this copyright notice in place and unmodified. These slides may be posted as unanimated pdf versions on publicly-accessible course websites.. PowerPoint source (or pdf
with animations) may not be posted to publicly-accessible websites, but may be posted for students on internal protected sites or distributed directly to students by other electronic means.
Copyright © 2006 1
Instructors may make printouts of the slides available to students for a reasonable photocopying charge, without incurring royalties. Any other use requires explicit permission. Instructors
Franksource
may obtain PowerPoint Vahid or obtain special use permissions from Wiley – see http://www.ddvahid.com for information.

5.1

Introduction outputs
inputs

FSM
FSM

bi bo
• Chapter 3: Controllers Combinational
– Control input/output: single bit (or just a logic n1
few) representing event or state n0
s1 s0
– Finite-state machine describes
State register
behavior; implemented as state register clk
and combinational logic
• Chapter 4: Datapath components
– Data input/output: Multiple bits Register Comparator
collectively representing single entity i
s

sa
i
n

– Datapath components included


ALU Register file
registers, adders, ALU, comparators,
register files, etc. e

bi bo
• This chapter: custom processors
z

Combinational Register file


– Processor: Controller and datapath logic n1
components working together to n0
implement an algorithm s1 s0 ALU
State register
Datapath
Digital Design
Copyright © 2006 Controller 2
Frank Vahid
Note: Slides with animation are denoted with a small red "a" near the animated items

1
RTL Design: Capture Behavior, Convert to Circuit
• Recall
– Chapter 2: Combinational Logic Design
• First step: Capture behavior (using equation
or truth table)
• Remaining steps: Convert to circuit
Capture behavior
– Chapter 3: Sequential Logic Design
• First step: Capture behavior (using FSM)
• Remaining steps: Convert to circuit
• RTL Design (the method for creating
Convert to circuit
custom processors)
– First step: Capture behavior (using high-
level state machine, to be introduced)
– Remaining steps: Convert to circuit

Digital Design
Copyright © 2006 3
Frank Vahid

5.2

RTL Design Method

Digital Design
Copyright © 2006 4
Frank Vahid

2
RTL Design Method: “Preview” Example
• Soda dispenser s a
– c: bit input, 1 when coin
deposited
– a: 8-bit input having value of c Soda
deposited coin d dispenser
– s: 8-bit input having cost of a processor
soda
– d: bit output, processor sets to s a 25
1 when total value of
deposited coins equals or 50 25
0 1 0 1 0
exceeds cost of a soda c Soda tot:
tot:
d dispenser a

0 1 0 processor50
25

How can we precisely describe this


Digital Design
Copyright © 2006 processor’s behavior? 5
Frank Vahid

Preview Example: Step 1 --


Capture High-Level State Machine s a
• Declare local register tot 8 8
c Soda
• Init state: Set d=0, tot=0 d dispenser
processor
• Wait state: wait for coin
– If see coin, go to Add state
Inputs: c (bit), a (8 bits), s (8 bits)
• Add state: Update total value: Outputs: d (bit)
tot = tot + a Local registers: tot (8 bits)
– Remember, a is present coin’s
c
value Add
– Go back to Wait state
Init Wait
• In Wait state, if tot >= s, go to tot=tot+a
Disp(ense) state d=0 c’*(tot<s)
tot=0 c’*(tot<s)’
• Disp state: Set d=1 (dispense
soda) Disp
– Return to Init state
d=1
Digital Design
Copyright © 2006 6
Frank Vahid

3
Preview Example:
Step 2 -- Create Datapath
Inputs : c (bit), a(8 bits), s (8 bits)
Outputs : d (bit)
Local reg isters: t ot (8 bits)

• Need tot register c


Add
Init Wait
• Need 8-bit comparator d=0 c‘
tot= tot+a
c‘ ∗(tot<s)
(tot<s)‘
to compare s and tot t ot=0
Disp

• Need 8-bit adder to s a


d=1

perform tot = tot + a


• Wire the components
tot_ld ld
as needed for above tot
tot_clr clr
• Create control
8
input/outputs, give 8 8
them names
8-bit 8-bit
tot_lt_s
< adder

Datapath 8
Digital Design
Copyright © 2006 7
Frank Vahid

Preview Example: Step 3 –


Connect Datapath to a Controller s a

• Controller’s inputs tot_ld ld


tot
tot_clr clr
– External input c 8
8 8
(coin detected)
8-bit 8-bit
– Input from datapath tot_lt_s
< adder

comparator’s output, Datapath 8

s a
which we named
tot_lt_s 8 8

• Controller’s outputs
– External output d c
(dispense soda)
– Outputs to datapath d tot_ld
to load and clear the
tot register tot_clr

Controller Datapath
Digital Design
tot_lt_s
Copyright © 2006 8
Frank Vahid

4
Preview Example: Step 4 – Derive the Controller’s
FSM s a

• Same states 8 8

and arcs as
c
high-level
d
state machine tot_ld

Controller

Datapath
tot_clr
• But set/read
tot_lt_s
datapath s a
control Inputs::c, tot_lt_s(bit)
Outputs:d, tot_ld, tot_clr (bit)
signals for all tot_ld
tot_ld
tot_clr
ld
clr
tpt

datapath c c
Add
8
8 8
tot_clr
operations d Init Wait
tot_ld=1 tot_lt_s 8-bit
tot_lt_s 8-bit
and d=0 c’*
tot
c’*tot_lt_s < adder

tot_clr=1 8
conditions _lt
_s
’ Disp
Datapath

d=1
Digital Design Controller
Copyright © 2006 9
Frank Vahid

Preview Example: Completing the Design


• Implement the FSM as
a state register and
tot_lt_s

tot_clr

logic
tot_ld

s1 s0 c n1 n0 d
– As in Ch3 0 0 0 0 0 1 0 0 1
– Table shown on right 0 0 0 1 0 1 0 0 1
Init

0 0 1 0 0 1 0 0 1
0 0 1 1 0 1 0 0 1
Inputs::c, tot_lt_s (bit)
0 1 0 0 1 1 0 0 0
Outputs:d, tot_ld, tot_clr (bit)
0 1 0 1 0 1 0 0 0
Wait

tot_ld
c c 0 1 1 0 1 0 0 0 0
Add tot_clr 0 1 1 1 1 0 0 0 0
d Init Wait 1 0 0 0 0 1 0 1 0
Add

tot_ld=1
tot_lt_s
d=0 c’* c’*tot_lt_s
tot 1 1 0 0 0 0 1 0 0
Disp

tot_clr=1 _lt
_s
’ Disp

d=1
Controller

Digital Design
Copyright © 2006 10
Frank Vahid

5
Step 1: Create a High-Level State Machine
• Let’s consider each step of the
RTL design process in more
detail Inputs: c (bit), a (8 bits), s (8 bits)
• Step 1 Outputs: d (bit)
Local registers: tot (8 bits)
– Soda dispenser example
c
– Not an FSM because:
• Multi-bit (data) inputs a and s Init Wait
• Local register tot tot= tot+a
• Data operations tot=0, tot<s, d=0 c’ (tot<s)
c’(tot<s)’
tot=tot+a. tot=0
– Useful high-level state machine: Disp
• Data types beyond just bits d=1
• Local registers
• Arithmetic equations/expressions

Digital Design
Copyright © 2006 11
Frank Vahid

Step 1 Example: Laser-Based Distance Measurer


T (in seconds)
laser
D
Object of
interest
sensor
2D = T sec * 3*108 m/sec

• Example of how to create a high-level state machine to


describe desired processor behavior
• Laser-based distance measurement – pulse laser,
measure time T to sense reflection
– Laser light travels at speed of light, 3*108 m/sec
– Distance is thus D = T sec * 3*108 m/sec / 2
Digital Design
Copyright © 2006 12
Frank Vahid

6
Step 1 Example: Laser-Based Distance Measurer
T (in seconds)
B L
laser from button to laser
Laser-based
distance
sensor D 16 measurer S
to display from sensor

• Inputs/outputs
– B: bit input, from button to begin measurement
– L: bit output, activates laser
– S: bit input, senses laser reflection
– D: 16-bit output, displays computed distance

Digital Design
Copyright © 2006 13
Frank Vahid

Step 1 Example: Laser-Based Distance Measurer


from button B
L
Laser- to laser
Inputs: B, S(1 bit each) based
Outputs: L (bit), D (16 bits) distance
D 16 measurer S
to display from sensor

S0 ?
a
L = 0 (laser off)
D = 0 (distance = 0)

• Step 1: Create high-level state machine


• Begin by declaring inputs and outputs
• Create initial state, name it S0
– Initialize laser to off (L=0)
– Initialize displayed distance to 0 (D=0)

Digital Design
Copyright © 2006 14
Frank Vahid

7
Step 1 Example: Laser-Based Distance Measurer
from button B
Inputs: B, S (1 bit each) L
Laser- to laser
Outputs: L (bit), D (16 bits) based
distance
B’ (button not pressed) to display
D 16 measurer S
from sensor

S0 S1 ?
B
L=0 (button
D=0 pressed)

• Add another state, call S1, that waits for a button press
– B’ – stay in S1, keep waiting
– B – go to a new state S2

Q: What should S2 do? A: Turn on the laser


a
Digital Design
Copyright © 2006 15
Frank Vahid

Step 1 Example: Laser-Based Distance Measurer


from button B
Inputs: B, S (1 bit each) L
Laser- to laser
Outputs: L (bit), D (16 bits) based
distance
D 16 measurer S
to display from sensor
B’

S0 S1 S2 S3
B
L=0 L=1 L=0 a

D=0 (laser on) (laser off)

• Add a state S2 that turns on the laser (L=1)


• Then turn off laser (L=0) in a state S3

Q: What do next? A: Start timer, wait to sense reflection


a

Digital Design
Copyright © 2006 16
Frank Vahid

8
Step 1 Example: Laser-Based Distance Measurer
B L
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) from button to laser
Lase
r-based
Local Registers: Dctr (16 bits) D 16
distance
measu rer S
to display from sensor
B’ S’ (no reflection)

S (reflection)
S0 S1 S2 S3 ?
B
L=0 Dctr = 0 L=1 L=0 a
D=0 (reset cycle Dctr = Dctr + 1
count) (count cycles)

• Stay in S3 until sense reflection (S)


• To measure time, count cycles for which we are in S3
– To count, declare local register Dctr
– Increment Dctr each cycle in S3
– Initialize Dctr to 0 in S1. S2 would have been O.K. too
Digital Design
Copyright © 2006 17
Frank Vahid

Step 1 Example: Laser-Based Distance Measurer


B L
from button to laser
Lase
r-based
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) distance
D 16 measu rer S
Local Registers: Dctr (16 bits) to display from sensor

B’ S’

S0 S1 S2 S3 S4
B S
L=0 Dctr = 0 L=1 L=0 D = Dctr / 2
D=0 Dctr = Dctr + 1 (calculate D)

• Once reflection detected (S), go to new state S4


– Calculate distance
– Assuming clock frequency is 3x108, Dctr holds number of meters, so
D=Dctr/2
• After S4, go back to S1 to wait for button again
Digital Design
Copyright © 2006 18
Frank Vahid

9
Step 2: Create a Datapath
• Datapath must
– Implement data storage
– Implement data computations
• Look at high-level state machine, do
three substeps
– (a) Make data inputs/outputs be datapath
inputs/outputs
– (b) Instantiate declared registers into the
datapath (also instantiate a register for each Instantiate: to
data output) introduce a new
– (c) Examine every state and transition, and
instantiate datapath components and component into a
connections to implement any data design.
computations

Digital Design
Copyright © 2006 19
Frank Vahid

Step 2 Example: Laser-Based Distance Measurer


Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)
(a) Make data Local Registers: Dctr (16 bits)
inputs/outputs be
datapath B‘ S‘
inputs/outputs
(b) Instantiate declared
registers into the S0 S1 S2 S3 S4
B S
datapath (also
L=0 Dctr = 0 L=1 L=0 D = Dctr / 2
instantiate a D=0 Dctr = Dctr + 1 (calculate D)
register for each
data output) a
Datapath
(c) Examine every Dreg_clr
state and Dreg_ld
transition, and clear clear
Dctr_clr I
instantiate Dctr: 16-bit Dreg: 16-bit
Dctr_cnt count load
datapath up-counter register
components and Q Q
connections to
implement any 16
data computations
D

Digital Design
Copyright © 2006 20
Frank Vahid

10
Step 2 Example: Laser-Based Distance Measurer
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)
(c) (continued) Local Registers: Dctr (16 bits)
Examine every
state and B‘ S‘
transition, and
instantiate
S0 S1 S2 S3 S4
datapath B S
components and L=0 Dctr = 0 L=1 L=0 D = Dctr / 2
D=0 Dctr = Dctr + 1 (calculate D)
connections to
implement any Datapath
a

data computations
Dreg_clr >>1
16
Dreg_ld

Dctr_clr clear clear I


Dctr: 16-bit Dreg: 16-bit
Dctr_cnt count load
up-counter register
Q Q
16

16
D
Digital Design
Copyright © 2006 21
Frank Vahid

Step 2 Example Showing Mux Use


Localregisters:
E,F, G, R (16 bits)
E F G E F G E F G

T0 R = E + F
A B A B add_A_s0 ⋅1
2× ⋅1

+ + add_B_s0
T1 R = R + G A B
a
+
R R

R
(a) (b) (c)

(d)
• Introduce mux when one component input can come from
more than one source
Digital Design
Copyright © 2006 22
Frank Vahid

11
Step 3: Connecting the Datapath to a Controller

L
B to laser
from button
Controller from sensor
Dreg_clr S

Dreg_ld
• Laser-based distance
measurer example
Dctr_clr Datapath
• Easy – just connect all
Dctr_cnt
D
control signals
to display between controller and
16 300 MHz Clock
datapath

Datapath

Dreg_clr >>1
Dreg_ld 16

Dctr_clr clear clear I


count Dctr: 16-bit Dreg: 16-bit
Dctr_cnt up-counter load register
Q Q
16
Digital Design
Copyright © 2006 16
23
Frank Vahid D

Step 4: Deriving the Controller’s FSM


B
L Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)
from button
Controller
to laser
Local Registers: Dctr (16 bits)
from sensor
Dreg_clr S

Dreg_ld
B’ S’
Dctr_clr Datapath

Dctr_cnt
D S0 S1 S2 S3 S4
to display B S
16 300 MHz Clock
L=0 Dctr = 0 L=1 L=0 D = Dctr / 2
D=0 Dctr = Dctr + 1 (calculate D)
Inputs: B, S
• FSM has same Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt
structure as high-
level state machine B’ S’
a
– Inputs/outputs all
bits now S0 S1
B
S2 S3
S
S4
– Replace data
operations by bit L=0 L=0 L=1 L=0 L=0
Dreg_clr = 1 Dreg_clr = 0 Dreg_clr = 0 Dreg_clr = 0 Dreg_clr = 0
operations using Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 1
datapath Dctr_clr = 0 Dctr_clr = 1 Dctr_clr = 0 Dctr_clr = 0 Dctr_clr = 0
Dctr_cnt = 0 Dctr_cnt = 0 Dctr_cnt = 0 Dctr_cnt = 1 Dctr_cnt = 0
Digital Design (laser off) (clear count) (laser on) (laser off) (load D reg with Dctr/2)
Copyright © 2006 (clear D reg) (count up) (stop counting)24
Frank Vahid

12
Step 4: Deriving the Controller’s FSM
B’ S’

B S
S0 S1 S2 S3 S4

L=0 L=0 L=1 L=0 L=0


Dreg_clr = 1 Dreg_clr = 0 Dreg_clr = 0 Dreg_clr = 0 Dreg_clr = 0
Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 1
Dctr_clr = 0 Dctr_clr = 1 Dctr_clr = 0 Dctr_clr = 0 Dctr_clr = 0
Dctr_cnt = 0 Dctr_cnt = 0 Dctr_cnt = 0 Dctr_cnt = 1 Dctr_cnt = 0
(laser off) (clear count) (laser on) (laser off) (load D reg with Dctr/2)
(clear D reg) (count up) (stop counting)

Inputs: B, S Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt


• Using
shorthand of B’ S’
a
outputs not
assigned B S
S0 S1 S2 S3 S4
implicitly
assigned 0 L=0 Dctr_clr = 1 L=1 L=0 Dreg_ld = 1
Dreg_clr = 1 (clear count) (laser on) Dctr_cnt = 1 Dctr_cnt = 0
(laser off) (laser off) (load D reg with Dctr/2)
(clear D reg) (count up) (stop counting)
Digital Design
Copyright © 2006 25
Frank Vahid

Step 4
B L
to laser
Controller

from button Datapath


from sensor
Dreg_clr S
>>1
Datapath

Dreg_ld
Dreg_clr
Dreg_ld 16
Dctr_clr
Dctr_clr clear clear I
Dctr_cnt count Dctr: 16-bit Dreg: 16-bit
Dctr_cnt up-counter load register
D
to display Q Q
16 300 MHz Clock 16
16
D

Inputs: B, S Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt

B’ S’

• Implement
B S
S0 S1 S2 S3 S4 FSM as state
L=0 Dctr_clr = 1 L=1 L=0 Dreg_ld = 1 register and
Dreg_clr = 1 (laser on) Dctr_cnt = 1
(laser off)
(clear count)
(laser off)
Dctr_cnt = 0 logic (Ch3) to
(load D reg with Dctr/2)
(clear D reg) (count up) (stop counting) complete the
design
Digital Design
Copyright © 2006 26
Frank Vahid

13
5.3

RTL Design Examples and Issues


• We’ll use several more Master
processor
examples to illustrate RTL
design rd
D
• Example: Bus interface 32
4 A
– Master processor can read
register from any peripheral Per0 Per1 Per15
• Each register has unique 4-bit
address to/from processor bus
rd D A
• Assume 1 register/periph.
– Sets rd=1, A=address 32 4

– Appropriate peripheral places Faddr


Bus interface
register data on 32-bit D lines 4
• Periph’s address provided on Q
32
Faddr inputs (maybe from DIP
switches, or another register) Main part

Digital Design Peripheral


Copyright © 2006 27
Frank Vahid

RTL Example: Bus Interface


Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)
Outputs: D (32 bits)
Local register: Q1 (32 bits)
rd’ rd
((A = Faddr)
and rd’)
WaitMyAddress SendData
(A = Faddr)
D = “Z” and rd D = Q1
Q1 = Q

• Step 1: Create high-level state machine


– State WaitMyAddress
• Output “nothing” (“Z”) on D, store peripheral’s register value Q into local
register Q1
• Wait until this peripheral’s address is seen (A=Faddr) and rd=1
– State SendData
• Output Q1 onto D, wait for rd=0 (meaning main processor is done
reading the D lines)
Digital Design
Copyright © 2006 28
Frank Vahid

14
RTL Example: Bus Interface
Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)
Outputs: D (32 bits)
Local register: Q1 (32 bits)
rd’ rd
((A = Faddr)
and rd’)
WaitMyAddress SendData
(A = Faddr)
D = “Z” and rd D = Q1
Q1 = Q

clk
Inputs
rd

State W W SD W W SD SD W
Outputs
D Z Q1 Z Q1 Z

Digital Design
Copyright © 2006 29
Frank Vahid

RTL Example: Bus Interface

Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)


Outputs: D (32 bits)
Local register: Q1 (32 bits) A Faddr Q
rd’ rd
4 4 32
((A = Faddr)
and rd)’ Q1_ld
ld Q1
WaitMyAddress SendData
(A = Faddr)
D = “Z” and rd D = Q1 = (4-bit)
Q1 = Q 32
A_eq_Faddr

D_en
32
a

• Step 2: Create a datapath Datapath


(a) Datapath inputs/outputs
Bus interface
(b) Instantiate declared registers
D
(c) Instantiate datapath components and
connections
Digital Design
Copyright © 2006 30
Frank Vahid

15
RTL Example: Bus Interface
Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)
Outputs: D (32 bits)
Local register: Q1 (32 bits)
rd’ rd A Faddr Q
Inputs: rd, A_eq_Faddr
((A =(bit)
Faddr)
Outputs: Q1_ld, D_enand
(bit)rd)’ 4 4 32

rdSendData Q1_ld
rd WaitMyAddress rd ld
(A = Faddr) Q1
and (A_eq_Faddr
rd D = Q1
D = “Z”
Q1 = Q and rd)‘
= (4-bit) 32
WaitMyAddress SendData A_eq_Faddr
A_eq_Faddr
D_en = 0 and rd D_en = 1 D_en
a Q1_ld = 1 Q1_ld = 0 32

Datapath
Bus interface

• Step 3: Connect datapath to controller D

• Step 4: Derive controller’s FSM


Digital Design
Copyright © 2006 31
Frank Vahid

RTL Example: Video Compression – Sum of Absolute


Only difference: ball moving
Differences
Frame 1 Frame 2 Frame 1 Frame 2

Digitized Digitized Digitized Difference of a


frame 1 frame 2 frame 1 2 from 1

1 Mbyte 1 Mbyte 1 Mbyte 0.01 Mbyte


(a) (b)
Just send
• Video is a series of frames (e.g., 30 per second) difference
• Most frames similar to previous frame
– Compression idea: just send difference from previous frame
Digital Design
Copyright © 2006 32
Frank Vahid

16
RTL Example: Video Compression – Sum of Absolute
Differences
compare Each is a pixel, assume
Frame 1 Frame 2
represented as 1 byte
(actually, a color picture
might have 3 bytes per
pixel, for intensity of
red, green, and blue
components of pixel)
• Need to quickly determine whether two frames are similar
enough to just send difference for second frame
– Compare corresponding 16x16 “blocks”
• Treat 16x16 block as 256-byte array
– Compute the absolute value of the difference of each array item
– Sum those differences – if above a threshold, send complete frame
for second frame; if below, can use difference method (using
another technique, not described)
Digital Design
Copyright © 2006 33
Frank Vahid

RTL Example: Video Compression – Sum of Absolute


Differences

A SAD
256-byte array

integer
B sad
256-byte array
go

<
)!(i
6
5
2

• Want fast sum-of-absolute-differences (SAD) component


– When go=1, sums the differences of element pairs in arrays A and
B, outputs that sum

Digital Design
Copyright © 2006 34
Frank Vahid

17
RTL Example: Video Compression – Sum of Absolute
Differences
A SAD
Inputs: A, B (256 byte memory); go (bit)
Outputs: sad (32 bits)
B sad Local registers: sum, sad_reg (32 bits); i (9 bits)

go
S0 !go
go
• S0: wait for go sum = 0 a
S1
i=0
• S1: initialize sum and index
(i<256)’
• S2: check if done (i>=256) <
)!(i
6
5
2

S2
• S3: add difference to sum, i<256
sum=sum+abs(A[i]-B[i])
increment index S3
i=i+1
• S4: done, write to output
S4 sad_reg = sum
sad_reg
Digital Design
Copyright © 2006 35
Frank Vahid

RTL Example: Video Compression – Sum of Absolute


Differences
Inputs: A, B (256 byte memory); go (bit) AB_addr A_data B_data
Outputs: sad (32 bits)
Local registers: sum, sad_reg (32 bits); i (9 bits) i_lt_256
<256 8 8
9
S0 !go i_inc
go i_clr
i –
sum = 0 a
8
S1
i=0
sum_ld
(i<256)’ sum 32 abs
S2 sum_clr
<
)!(i
6
5
2

i<256 32 32 8
sum=sum+abs(A[i]-B[i]) sad_reg_ld
S3
i=i+1
<
t_
)l(i!
6
5
2
sad_reg +
S4 sad_reg=sum 32
Datapath
sad
• Step 2: Create datapath
Digital Design
Copyright © 2006 36
Frank Vahid

18
RTL Example: Video Compression – Sum of Absolute
Differences
go AB_rd AB_addr A_data B_data

i_lt_256
<256 8 8
S0 go’
9
go i_inc
S1
sum=0 sum_clr=1
i_clr
i –
i=0 i_clr=1
8
?

S2 sum_ld
i<256 i_lt_256 sum 32 abs
sum_clr
S3 sum=sum+abs(A[i]-B[i])
sum_ld=1; AB_rd=1
<
)!(i
6
5
2

32 32 8
i=i+1 i_inc=1 sad_reg_ld
<
t_
)l(i!
6
5
2
S4 sad_reg=sum a
sad_reg +
<
t_
)l(i!
6
5
2

sad_reg_ld=1
Controller 32

sad
• Step 3: Connect to controller
Digital Design • Step 4: Replace high-level state machine by FSM
Copyright © 2006 37
Frank Vahid

RTL Example: Video Compression – Sum of Absolute


Differences
• Comparing software and custom
circuit SAD
– Circuit: Two states (S2 & S3) for
each i, 256 i’sÆ 512 clock cycles
– Software: Loop (for i = 1 to 256), but (i<256)’
S2
for each i, must move memory to
local registers, subtract, compute i<256
sum=sum+abs(A[i]-B[i])
absolute value, add to sum, S3
i=i+1
increment i – say about 6 cycles per <
)!(i
6
5
2

array item Æ 256*6 = 1536 cycles


– Circuit is about 3 times (300%)
faster
<
t_
)l(i!
6
5
2

– Later, we’ll see how to build SAD


circuit that is even faster

Digital Design
Copyright © 2006 38
Frank Vahid

19
RTL Design Pitfalls and Good Practice
• Common pitfall: Assuming Local registers: R, Q (8 bits)
register is update in the
state it’s written R<100 C

– Final value of Q? A B R>=100


– Final state?
R=99 R=R+1 D
– Answers may surprise you Q=R
(a)
• Value of Q unknown
R<100
• Final state is C, not D
clk A B C
– Why?
99 100
• State A: R=99 and Q=R R ? 99 100
happen simultaneously
• State B: R not updated with Q ? ? ?
R+1 until next clock cycle,
simultaneously with state (b)
register being updated

Digital Design
Copyright © 2006 39
Frank Vahid

RTL Design Pitfalls and Good Practice


• Solutions Local registers: R, Q (8 bits)

– Read register in R<100 C


following state (Q=R) A B B2 R>=100
– Insert extra state so that
R=99 R=R+1 D
conditions use updated Q=R Q=R
value (a)

– Other solutions are R<100 R>=100


possible, depends on clk A B B2 D
the example 99 100
R ? 99 100 100

Q ? ? 99 99

(b)

Digital Design
Copyright © 2006 40
Frank Vahid

20
RTL Design Pitfalls and Good Practice
• Common pitfall: Inputs: A, B (8 bits) Inputs: A, B (8 bits)
Reading outputs Outputs: P (8 bits) Outputs: P (8 bits)
Local register: R (8 bits)
– Outputs can only be
written
– Solution: Introduce S T S T
additional register,
which can be written P=A P=P+B R=A P=R+B
and read P=A

(a) (b)

Digital Design
Copyright © 2006 41
Frank Vahid

RTL Design Pitfalls and Good Practice


• Good practice: Register B B
all data outputs R R
– In fig (a), output P would
show spurious values as
addition computes
• Furthermore, longest + +
register-to-register path,
which determines clock
period, is not known until P
that output is connected
to another component (a) Preg
– In fig (b), spurious outputs
reduced, and longest P
register-to-register path is (b)
clear

Digital Design
Copyright © 2006 42
Frank Vahid

21
Control vs. Data Dominated RTL Design
• Designs often categorized as control-dominated or data-
dominated
– Control-dominated design – Controller contains most of the
complexity
– Data-dominated design – Datapath contains most of the complexity
– General, descriptive terms – no hard rule that separates the two
types of designs
– Laser-based distance measurer – control dominated
– Bus interface, SAD circuit – mix of control and data
– Now let’s do a data dominated design

Digital Design
Copyright © 2006 43
Frank Vahid

Data Dominated RTL Design Example: FIR Filter


• Filter concept
– Suppose X is data from a
temperature sensor, and
particular input sequence is
180, 180, 181, 240, 180, 181 X Y
(one per clock cycle)
– That 240 is probably wrong! 12 digital filter 12
• Could be electrical noise clk
– Filter should remove such
noise in its output Y
– Simple filter: Output average
of last N values
• Small N: less filtering
• Large N: more filtering, but
less sharp output

Digital Design
Copyright © 2006 44
Frank Vahid

22
Data Dominated RTL Design Example: FIR Filter
• FIR filter
– “Finite Impulse Response” X Y
– Simply a configurable weighted 12 digital filter 12
sum of past input values clk
– y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
• Above known as “3 tap”
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
• Tens of taps more common
• Very general filter – User sets the
constants (c0, c1, c2) to define
specific filter
– RTL design
• Step 1: Create high-level state
machine
– But there really is none! Data
dominated indeed.
• Go straight to step 2
Digital Design
Copyright © 2006 45
Frank Vahid

Data Dominated RTL Design Example: FIR Filter


• Step 2: Create datapath X Y
12 digital filter 12
– Begin by creating chain clk
of xt registers to hold past
values of X
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
Suppose sequence is: 180, 181, 240
3-tap FIR filter
x(t) x(t-1) x(t-2)

xt0 xt1 xt2


X 240
180
181 180
181 180 Y
12 12 12 12 a

clk

Digital Design
Copyright © 2006 46
Frank Vahid

23
Data Dominated RTL Design Example: FIR Filter
• Step 2: Create datapath X Y
12 digital filter 12
(cont.) clk
– Instantiate registers for
c0, c1, c2
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
– Instantiate multipliers to
compute c*x values
3-tap FIR filter
x(t) x(t-1) x(t-2)
c0 c1 c2
xt0 xt1 xt2
X
a
clk

∗ ∗ ∗
Y

Digital Design
Copyright © 2006 47
Frank Vahid

Data Dominated RTL Design Example: FIR Filter


• Step 2: Create datapath X Y
12 digital filter 12
(cont.) clk
– Instantiate adders
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
3-tap FIR filter
x(t) x(t-1) x(t-2)
c0 c1 c2
xt0 xt1 xt2
X

clk
a
∗ ∗ ∗

+ +
Y

Digital Design
Copyright © 2006 48
Frank Vahid

24
Data Dominated RTL Design Example: FIR Filter
• Step 2: Create datapath (cont.) X Y
12 digital filter 12
– Add circuitry to allow loading of clk
particular c register
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
CL 3-tap FIR filter
e
3
Ca1 2x4 2
Ca0 1
0
C

x(t) x(t-1) x(t-2)


c0 c1 c2
xt0 xt1 xt2 a
X

clk

* * *

+ + yreg
Y
Digital Design
Copyright © 2006 49
Frank Vahid

Data Dominated RTL Design Example: FIR Filter


y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
• Step 3 & 4: Connect to controller, Create FSM
– No controller needed
– Extreme data-dominated example
– (Example of an extreme control-dominated design – an FSM, with no
datapath)
• Comparing the FIR circuit to a software implementation
– Circuit
• Assume adder has 2-gate delay, multiplier has 20-gate delay
• Longest past goes through one multiplier and two adders
– 20 + 2 + 2 = 24-gate delay
• 100-tap filter, following design on previous slide, would have about a 34-gate
delay: 1 multiplier and 7 adders on longest path
– Software
• 100-tap filter: 100 multiplications, 100 additions. Say 2 instructions per
multiplication, 2 per addition. Say 10-gate delay per instruction.
• (100*2 + 100*2)*10 = 4000 gate delays
– Circuit is more than 100 times faster (10,000% faster). Wow.
Digital Design
Copyright © 2006 50
Frank Vahid

25
5.4

Determining Clock Frequency


• Designers of digital circuits
often want fastest
performance clk a b
– Means want high clock
frequency
• Frequency limited by longest
register-to-register delay
2 ns +
delay
– Known as critical path
– If clock is any faster, incorrect
data may be stored into register c
– Longest path on right is 2 ns
• Ignoring wire delays, and
register setup and hold times,
for simplicity

Digital Design
Copyright © 2006 51
Frank Vahid

Critical Path
• Example shows four paths
– a to c through +: 2 ns
– a to d through + and *: 7 ns
a b
– b to d through + and *: 7 ns
– b to d through *: 5 ns
• Longest path is thus 7 ns 2 ns + * 5 ns
delay delay
• Fastest frequency 7 ns 7 ns
2 ns

5 ns
7 ns
7 ns

– 1 / 7 ns = 142 MHz c d
Max
(2,7,7,5)
= 7 ns

Digital Design
Copyright © 2006 52
Frank Vahid

26
Critical Path Considering Wire Delays
• Real wires have delay too
– Must include in critical path
• Example shows two paths
– Each is 0.5 + 2 + 0.5 = 3 ns clk a b
• Trend
0.5 ns
– 1980s/1990s: Wire delays were tiny 0.5 ns
compared to logic delays
– But wire delays not shrinking as fast as + 2 ns
logic delays
• Wire delays may even be greater than 0.5 ns

3 ns
s
n
3

3 ns
logic delays!
c
• Must also consider register setup and
hold times, also add to path
• Then add some time to the computed
path, just to be safe
– e.g., if path is 3 ns, say 4 ns instead
Digital Design
Copyright © 2006 53
Frank Vahid

A Circuit May Have Numerous Paths


s a
• Paths can exist
– In the datapath Combinational logic 8 8
d
– In the controller
– Between the tot_ld
ld
controller and t ot_clr tot
c clr
datapath
(c) 8
tot_lt_s
– May be n1
hundreds or
thousands of n0
8-bit 8-bit
< adder
paths tot_lt_s 8

• Timing analysis Datapath


s1 s0
tools that evaluate (b) (a)
all possible paths clk State register

automatically very
helpful
Digital Design
Copyright © 2006 54
Frank Vahid

27
5.5

Behavioral Level Design: C to Gates


C code
S0 !go
int SAD (byte A[256], byte B[256]) // not quite C syntax
go
{
sum = 0
S1 uint sum; short uint I;
i=0
sum = 0;
(i<256)’ i = 0;
S2 while (i < 256) {
sum = sum + abs(A[i] – B[i]);
i<256
i = i + 1;
sum=sum+abs(A[i]-B[i])
S3 }
i=i+1
return sum;
}
a
S4 sad_reg = sum

• Earlier sum-of-absolute-differences example


– Started with high-level state machine
– C code is an even better starting point -- easier to understand
Digital Design
Copyright © 2006 55
Frank Vahid

Behavioral-Level Design: Start with C (or Similar


Language)
• Replace first step of RTL design method by two steps
– Capture in C, then convert C to high-level state machine
– How convert from C to high-level state machine?
Step 1A: Capture in C
a
Step 1B: Convert to high-level state machine

Digital Design
Copyright © 2006 56
Frank Vahid

28
Converting from C to High-Level State Machine
• Convert each C construct to
equivalent states and
transitions
• Assignment statement
target= a
– Becomes one state with target = expression;
expression
assignment
• If-then statement
– Becomes state with condition !cond
check, transitioning to “then” cond
if (cond) {
statements if condition true, // then stmts (then stmts) a

otherwise to ending state }

• “then” statements would also (end)


be converted to states

Digital Design
Copyright © 2006 57
Frank Vahid

Converting from C to High-Level State Machine


• If-then-else
!cond
– Becomes state with condition if (cond) { cond
check, transitioning to “then” // then stmts
(then stmts) (else stmts)
statements if condition true, or }
a
else {
to “else” statements if condition // else stmts (end)
false }

• While loop statement !cond

– Becomes state with condition while (cond) { cond


// while stmts (while stmts)
check, transitioning to while a
}
loop’s statements if true, then
transitioning back to condition
(end)
check
Digital Design
Copyright © 2006 58
Frank Vahid

29
Simple Example of Converting from C to High-
Level State Machine
Inputs: uint X, Y
Outputs: uint Max !(X>Y) !(X>Y)

X>Y X>Y
if (X > Y) {
Max = X; (then stmts) (else stmts) Max=X Max=Y
}
else {
Max = Y;
(end) (end)
}
a a

(a) (b) (c)

• Simple example: Computing the maximum of two numbers


– Convert if-then-else statement to states (b)
– Then convert assignment statements to states (c)
Digital Design
Copyright © 2006 59
Frank Vahid

Example: Converting Sum-of-Absolute-Differences C


code to High-Level State Machine
Inputs: byte A[256, B[256]
• Convert each construct to bit go;
!(!go)
Output: int sad
states main() !go !go go !go go
{
– Simplify when possible, uint sum; short uint I;
while (1) {
sum=0 sum=0
i=0
e.g., merge states
while (!go); i=0
(d)
• From high-level state sum = 0;
i = 0;
machine, follow RTL design while (i < 256) {
(b)
(c)

method to create circuit sum = sum + abs(A[i] - B[i]);


i = i + 1;
• Thus, can convert C to }
}
sad = sum;
!go go !go go
gates using straightforward }
(a)
a

automatable process !go go


sum=0
i=0
sum=0
i=0
– Not all C constructs can be sum=0 !(i<256) !(i<256)
efficiently converted i=0
i<256 i<256
– Use C subset if intended !(i<256)
sum=sum sum=sum
for circuit i<256
+ abs
i=i+1
+ abs
i=i+1
while stmts
– Can use languages other sad =
sum
than C, of course
(g)
Digital Design
(e) sad =
Copyright © 2006 sum 60
Frank Vahid (f)

30
5.6

Memory Components
• Register-transfer level
design instantiates datapath
components to create
datapath, controlled by a
controller

M words
– A few more components are
often used outside the
controller and datapath
• MxN memory
– M words, N bits wide each N-bits
wide each
• Several varieties of memory,
which we now introduce M×N memory

Digital Design
Copyright © 2006 61
Frank Vahid

Random Access Memory (RAM)


• RAM – Readable and writable memory 32 32
W_data R_data
– “Random access memory” 4 4
W_addr R_addr
• Strange name – Created several decades ago to
contrast with sequentially-accessed storage like W_en R_en
tape drives 16×32
register file
– Logically same as register file – Memory with
address inputs, data inputs/outputs, and control Register file from Chpt. 4
• RAM usually just one port; register file usually two
or more
32
– RAM vs. register file data
• RAM typically larger than roughly 512 or 1024 10
addr
words 1024× 32
rw RAM
• RAM typically stores bits using a bit storage
approach that is more efficient than a flip flop en

• RAM typically implemented on a chip in a square


rather than rectangular shape – keeps longest RAM block symbol
wires (hence delay) short
Digital Design
Copyright © 2006 62
Frank Vahid

31
RAM Internal Structure
32
data
10
addr Let A = log2M wdata(N-1) wdata(N-2) wdata0
1024x32
rw RAM word bit storage
en enable block
d0 (aka “cell”)
addr0 a0 word
addr1 a1 AxM
d1
r
d
a
decoder
addr(A-1) a(A-1) data cell
word word
e d(M-1) enable enable
clk
en rw data
rw to all cells

rdata(N-1) rdata(N-2) rdata0 RAM cell

• Similar internal structure as register file


– Decoder enables appropriate word based on address
inputs
– rw controls whether cell is written or read
Digital Design
Copyright © 2006 – Let’s see what’s inside each RAM cell 63
Frank Vahid

Static RAM (SRAM)


wdata(N-1) wdata(N-2) wdata0
SRAM cell
32 Let A = log2 M
data data’
data word bit storage
10 enable block ,,
addr d0
,,
(aka cell ) cell
1024x32 addr0 a0 word d d’
rw RAM addr1 a1 A ⋅ M
addr

d1
decoder
en a(A-1) data cell
addr(A-1) a
word word
e d(M-1) enable enable
clk
rw data
en
rw to all cells
word 0
rdata(N-1) rdata(N-2) rdata0 enable

• “Static” RAM cell SRAM cell


– 6 transistors (recall inverter is 2 transistors) data data’
1 0
– Writing this cell d
• word enable input comes from decoder a

• When 0, value d loops around inverters 1 0

– That loop is where a bit stays stored


• When 1, the data bit value enters the loop word 1
– data is the bit to be stored in this cell enable
– data’ enters on other side
data data’
– Example shows a “1” being written into cell cell
d d’
1 0 a
Digital Design
Copyright © 2006 64
Frank Vahid word 0
enable

32
Static RAM (SRAM)
wdata(N-1) wdata(N-2) wdata0
32 Let A = log2 M
data word bit storage
10 enable block ,,
,,
addr d0 (aka cell )
1024x32 addr0 a0 word
rw RAM addr1 a1 A ⋅ M

addr
d1
decoder
en a(A-1) data cell
addr(A-1)
word word
e d(M-1) enable enable
clk
rw data
en
rw to all cells
SRAM cell
• “Static” RAM cell rdata(N-1) rdata(N-2) rdata0

data data’
– Reading this cell 1 1
• Somewhat trickier d
• When rw set to read, the RAM logic sets
1 0
both data and data’ to 1
a
• The stored bit d will pull either the left line or
the right bit down slightly below 1 1 1 <1
word
• “Sense amplifiers” detect which side is enable
slightly pulled down To sense amplifiers

– The electrical description of SRAM is really


beyond our scope – just general idea here,
mainly to contrast with DRAM...
Digital Design
Copyright © 2006 65
Frank Vahid

Dynamic RAM (DRAM)


wdata(N-1) wdata(N-2) wdata0
32 Let A = log2 M
data word bit storage
10 enable block ,,
,,
addr d0 (aka cell )
1024x32 addr0 a0 word
rw RAM addr1 a1 A ⋅ M
addr

d1
decoder
en a(A-1) data cell
addr(A-1)
word word
e d(M-1) enable enable
clk
rw data
en
rw to all cells DRAM cell
data
• “Dynamic” RAM cell rdata(N-1) rdata(N-2) rdata0

cell
– 1 transistor (rather than 6)
word
– Relies on large capacitor to store bit enable
d
capacitor
• Write: Transistor conducts, data voltage slowly
level gets stored on top plate of capacitor discharging

• Read: Just look at value of d (a)


• Problem: Capacitor discharges over time
data
– Must “refresh” regularly, by reading d and
then writing it right back enable

d discharges
(b)
Digital Design
Copyright © 2006 66
Frank Vahid

33
Comparing Memory Types
• Register file MxN Memory
– Fastest implemented as a:

– But biggest size register


file
• SRAM
– Fast SRAM
– More compact than register file DRAM
• DRAM
– Slowest
• And refreshing takes time
Size comparison for same
– But very compact
number of bits (not to scale)
• Use register file for small items,
SRAM for large items, and DRAM
for huge items
– Note: DRAM’s big capacitor requires
a special chip design process, so
DRAM is often a separate chip
Digital Design
Copyright © 2006 67
Frank Vahid

Reading and Writing a RAM


clk clk
1 2 3
addr 9 13 9 addr valid setup
time
data 500 999 Z 500 data valid hold Z 500
time
rw 1 means write setup
rw
time
en access
RAM[9] RAM[13] time
now equals 500 now equals 999
• Writing (b)
– Put address on addr lines, data on data lines, set rw=1, en=1
• Reading
– Set addr and en lines, but put nothing (Z) on data lines, set rw=0
– Data will appear on data lines
• Don’t forget to obey setup and hold times
– In short – keep inputs stable before and after a clock edge
Digital Design
Copyright © 2006 68
Frank Vahid

34
RAM Example: Digital Sound Recorder
4096⋅ 16
RAM

data
addr
rw
en
wire 16
analog-to- digital-to-
digital 12 analog
ad_buf Ra Rrw Ren wire
microphone converter converter
ad_ld processor da_ld

• Behavior speaker

– Record: Digitize sound, store as series of 4096 12-bit digital values in RAM
• We’ll use a 4096x16 RAM (12-bit wide RAM not common)
– Play back later
– Common behavior in telephone answering machine, toys, voice recorders
• To record, processor should read a-to-d, store read values into
successive RAM words
– To play, processor should read successive RAM words and enable d-to-a
Digital Design
Copyright © 2006 69
Frank Vahid

RAM Example: Digital Sound Recorder


4096x16
• RTL design of processor RAM

– Create high-level state


machine 16
analog-to- digital-to-
– Begin with the record behavior digital ad_buf
12
Ra Rw Ren analog
converter converter
– Keep local register a ad_ld processor da_ld

• Stores current address,


ranges from 0 to 4095 (thus
Record behavior
need 12 bits)
Local register: a (12 bits)
– Create state machine that a<4095
counts from 0 to 4095 using a S T
• For each a a=0 ad_ld=1 a

– Read analog-to-digital conv. ad_buf=1


Ra=a U
» ad_ld=1, ad_buf=1 Rrw=1 a=a+1
– Write to RAM at address a Ren=1

» Ra=a, Rrw=1, Ren=1 a=4095

Digital Design
Copyright © 2006 70
Frank Vahid

35
RAM Example: Digital Sound Recorder
4096x16
– Now create play behavior RAM data bus
– Use local register a again,
create state machine that 16
counts from 0 to 4095 again analog-to-
digital 12
digital-to-
analog
ad_buf Ra Rw Ren
• For each a converter converter
ad_ld processor da_ld
– Read RAM
– Write to digital-to-analog conv.
• Note: Must write d-to-a one Play behavior
cycle after reading RAM, when
Local register: a (12 bits)
the read data is available on
the data bus a<4095
V W
– The record and play state a=0
a
ad_buf=0
machines would be parts of a Ra=a
X
larger state machine controlled Rrw=0
Ren=1
by signals that determine when da_ld=1
a=a+1
to record or play
a=4095

Digital Design
Copyright © 2006 71
Frank Vahid

Read-Only Memory – ROM


• Memory that can only be read from, not 32
data
10
written to addr
1024× 32
– Data lines are output only rw RAM
– No need for rw input en

• Advantages over RAM


RAM block symbol
– Compact: May be smaller
– Nonvolatile: Saves bits even if power supply
is turned off 32
– Speed: May be faster (especially than data
10
DRAM) addr 1024x32
ROM
– Low power: Doesn’t need power supply to
en
save bits, so can extend battery life
• Choose ROM over RAM if stored data won’t ROM block symbol
change (or won’t change often)
– For example, a table of Celsius to Fahrenheit
conversions in a digital thermometer
Digital Design
Copyright © 2006 72
Frank Vahid

36
Read-Only Memory – ROM
32
data
10 1024x32
addr Let A = log2M
ROM
en bit storage
word
enable block
ROM block symbol d0 (aka “cell”)
addr0 a0 word
addr1 a1 AxM
d1
r
d
a
decoder
addr(A-1) a(A-1) data
word word
e d(M-1) enable enable
clk
en data

rdata(N-1) rdata(N-2) rdata0 ROM cell

• Internal logical structure similar to RAM, without the data


input lines

Digital Design
Copyright © 2006 73
Frank Vahid

ROM Types
• If a ROM can only be read, how Let A = log2 M
word
enable
bit storage
block ,,

are the stored bits stored in the


,,
d0 (a cell )
addr0 a0 word
addr1 a1 A ⋅ M
addr

d1

first place?
decoder
data
addr(A-1) a(A-1) cell
word word
e d(M-1) enable enable

– Storing bits in a ROM known as


data
en

programming
data(N-1) data(N-2) data0

– Several methods
• Mask-programmed ROM 1 data line 0 data line

– Bits are hardwired as 0s or 1s cell cell


during chip manufacturing word
• 2-bit word on right stores “10” enable
• word enable (from decoder) simply
passes the hardwired value
through transistor
– Notice how compact, and fast, this
memory would be
Digital Design
Copyright © 2006 74
Frank Vahid

37
ROM Types
• Fuse-Based Programmable Let A = log2 M
word
enable
bit storage
block ,,
,,
d0 (a cell )

ROM addr0
addr1
a0
a1 A ⋅ M
word

addr
d1
decoder
data
addr(A-1) a(A-1) cell

– Each cell has a fuse e d(M-1)


word word
enable enable
data
en

– A special device, known as a data(N-1) data(N-2) data0

programmer, blows certain fuses


(using higher-than-normal voltage)
1 data line 1 data line
• Those cells will be read as 0s
(involving some special electronics) cell cell
• Cells with unblown fuses will be read word
a

as 1s enable

• 2-bit word on right stores “10”


fuse blown fuse
– Also known as One-Time
Programmable (OTP) ROM

Digital Design
Copyright © 2006 75
Frank Vahid

ROM Types
• Erasable Programmable ROM Let A = log2 M
word bit storage
enable block ,,

(EPROM) addr0
addr1
a0
a1 A ⋅ M
d0
,,
(a cell )
word
addr

d1
decoder
– Uses “floating-gate transistor” in each cell addr(A-1) a(A-1)
data

word word
cell

e d(M-1) enable enable

– Special programmer device uses higher- en


data

than-normal voltage to cause electrons to data(N-1) data(N-2) data0

tunnel into the gate


floating-gate

• Electrons become trapped in the gate data line data line


transistor

• Only done for cells that should store 0 cell cell


• Other cells (without electrons trapped in r
o

t
1 0
gate) will be 1 word
n
ti
g

eÐeÐ
– 2-bit word on right stores “10” enable
a

t
e

• Details beyond our scope – just general


a

trapped electrons
idea is necessary here
– To erase, shine ultraviolet light onto chip
• Gives trapped electrons energy to escape
• Requires chip package to have window
Digital Design
Copyright © 2006 76
Frank Vahid

38
ROM Types
• Electronically-Erasable Programmable ROM
(EEPROM)
– Similar to EPROM
• Uses floating-gate transistor, electronic programming to
trap electrons in certain cells
– But erasing done electronically, not using UV light
– Erasing done one word at a time
• Flash memory
– Like EEPROM, but all words (or large blocks of
words) can be erased simultaneously r
o

32
data
– Become common relatively recently (late 1990s) n
ti

a
g

10
t
e

addr
• Both types are in-system programmable
t

en 1024x32
– Can be programmed with new stored bits while in the EEPROM
system in which the ROM operates write
• Requires bi-directional data lines, and write control input busy
• Also need busy output to indicate that erasing is in
progress – erasing takes some time
Digital Design
Copyright © 2006 77
Frank Vahid

ROM Example: Talking Doll


“Hello there!” audio
divided into 4096 speaker
“Hello there!”

4096x16 ROM
samples, stored
in ROM “Hello there!”
a

16 a
digital-to-
analog vibration
Ra Ren converter sensor

da_ld
processor

• Doll plays prerecorded message, trigger by vibration


– Message must be stored without power supply Æ Use a ROM, not a RAM,
because ROM is nonvolatile
• And because message will never change, use a mask-programmed ROM or
OTP ROM
– Processor should wait for vibration (v=1), then read words 0 to 4095 from
the ROM, writing each to the d-to-a
Digital Design
Copyright © 2006 78
Frank Vahid

39
ROM Example: Talking Doll
Local register: a (12 bits)
4096x16 ROM
v a<4095
a=0 S T
a

16 Ra=a
digital-to- Ren=1
analog U a

Ra Ren converter v’
da_ld=1
da_ld a=4095 a=a+1
processor
v

• High-level state machine


– Create state machine that waits for v=1, and then counts from 0 to
4095 using a local register a
– For each a, read ROM, write to digital-to-analog converter
Digital Design
Copyright © 2006 79
Frank Vahid

ROM Example: Digital Telephone Answering Machine


Using a Flash Memory
• Want to record the outgoing
announcement 4096x16 Flash
– When rec=1, record digitized “We’re not home.”
sound in locations 0 to 4095
y
s
u
b

– When play=1, play those


stored sounds to digital-to- 16
analog converter analog-to-
digital 12 digital-to-
ad_buf Ra Rrw Ren er bu
• What type of memory? converter analog
ad_ld processor converter
– Should store without power da_ld
supply – ROM, not RAM
– Should be in-system rec
programmable – EEPROM record play
or Flash, not EPROM, OTP microphone speaker
ROM, or mask-programmed
ROM
– Will always erase entire
memory when
reprogramming – Flash
better than EEPROM

Digital Design
Copyright © 2006 80
Frank Vahid

40
ROM Example: Digital Telephone Answering Machine
Using a Flash Memory
• High-level state machine 4096x16 Flash

– Once rec=1, begin


erasing flash by setting
a

d r n
e

16
er=1 analog-to-
digital 12 digital-to-
ad_buf Ra Rrw Ren er bu
converter analog
– Wait for flash to finish ad_ld processor converter
da_ld
erasing by waiting for
rec
bu=0 record play

– Execute loop that sets microphone speaker

local register a from 0 to


4095, reading analog-to- Local register: a (13 bits)
bu
digital converter and a<4096 a
writing to flash for each a S T bu’ U
a=0 er=0 ad_ld=1
er=1 ad_buf=1
Ra=a V
rec
Rrw=1
Ren=1
a=a+1 a=4096

Digital Design
Copyright © 2006 81
Frank Vahid

Blurring of Distinction Between ROM and RAM


• We said that
– RAM is readable and writable ROM Flash RAM
a
EEPROM NVRAM
– ROM is read-only
• But some ROMs act almost like RAMs
– EEPROM and Flash are in-system programmable
• Essentially means that writes are slow
– Also, number of writes may be limited (perhaps a few million times)
• And, some RAMs act almost like ROMs
– Non-volatile RAMs: Can save their data without the power supply
• One type: Built-in battery, may work for up to 10 years
• Another type: Includes ROM backup for RAM – controller writes RAM contents to
ROM before turning off
• New memory technologies evolving that merge RAM and ROM benefits
– e.g., MRAM
• Bottom line
– Lot of choices available to designer, must find best fit with design goals
Digital Design
Copyright © 2006 82
Frank Vahid

41
5.7

Queues
• A queue is another component
sometimes used during RTL
back front
design
• Queue: A list written to at the
back, from read from the front
– Like a list of waiting restaurant
customers write items read (and
to the back remove) items
• Writing called a push, reading of the queue from front of
called a pop the queue

• Because first item written into a


queue will be the first item read
out, also called a FIFO (first-in-
first-out)

Digital Design
Copyright © 2006 83
Frank Vahid

Queues
7 6 5 4 3 2 1 0
• Queue has addresses, and two
pointers: rear and front
– Initially both point to 0
• Push (write) rf
7 6 5 4 3 2 1 0
– Item written to address pointed to
by rear A
A a
– rear incremented
• Pop (read) r f
– Item read from address pointed 7 6 5 4 3 2 1 0
to by front
– front incremented B B A a

• If front or rear reaches 7, next


r f
(incremented) value should be 0 7 6 5 4 3 2 1 0
(for a queue with addresses 0 to a

7) B A

Digital Design r f
Copyright © 2006 84
Frank Vahid

42
Queues
• Treat memory as a circle 7 6 5 4 3 2 1 0
– If front or rear reaches 7, next (incremented)
value should be 0 rather than 8 (for a queue B A
with addresses 0 to 7)
r f
• Two conditions of interest
– Full queue – no room for more items 0
• In 8-entry queue, means 8 items present
1 7 a
• No further pushes allowed until a pop occurs
• Causes front=rear B
– Empty queue – no items
f
• No pops allowed until a push occurs
2 r
r

6
• Causes front=rear
– Both conditions have front=rear
• To detect whether front=rear means full or
empty, need state machine that detects if
previous operation was push or pop, sets full 3 5
or empty output signal (respectively) 4
Digital Design
Copyright © 2006 85
Frank Vahid

Queue Implementation
• Can use register file for 16
8⋅ 16 register file
16
wdata rdata
item storage wdata rdata

3 3
• Implement rear and front waddr raddr
using up counters wr rd
– rear used as register file’s
clr
write address, front as read wr clr
address inc
inc
rd 3-bit 3-bit
• Simple controller would up counter up counter
Controller

set control lines for rear front


reset
pushes and pops, and
also detect full and empty eq
=
situations full

– FSM for controller not empty


shown 8-word 16-bit queue

Digital Design
Copyright © 2006 86
Frank Vahid

43
Common Uses of a Queue
• Computer keyboard
– Pushes pressed keys onto queue, meanwhile pops and sends to
computer
• Digital video recorder
– Pushes captured frames, meanwhile pops frames, compresses
them, and stores them
• Computer network routers
– Pushes incoming packets onto queue, meanwhile pops packets,
processes destination information, and forwards each packet out
over appropriate port

Digital Design
Copyright © 2006 87
Frank Vahid

Queue Usage Example


7 6 5 4 3 2 1 0

• Example series of pushes Initially empty


and pops queue
rf
– Note how rear and front 7 6 5 4 3 2 1 0

pointers move 1. After pushing 3 2 7 5 8 5 9


9, 5, 8, 5, 7, 2, 3
– Note that popping doesn’t
r f
really remove the data from the 7 6 5 4 3 2 1 0
queue, but that data is no data:
2. After popping 3 2 7 5 8 5 9
longer accessible 9

– Note how rear (and front) r


7 6 5 4 3 2
f
1 0
wraps around from address 7
to 0 3. After pushing 6 6 3 2 7 5 8 5 9

• Note: pushing a full queue is 7 6 5 4 3 2


f
1
r
0
an error
4. After pushing 3 6 3 2 7 5 8 5 3 full
– As is popping an empty queue
rf

5. After pushing 4 ERROR! Pushing a full queue


Digital Design results in unknown state
Copyright © 2006 88
Frank Vahid

44
5.8

Hierarchy – A Key Design Concept


CityA CityD

Province 2
• Hierarchy

Province 3
Province 1
CityF
– An organization with a few items at the CityB
top, with each item decomposed into other CityE
items CityC
CityG
– Common example: A country Country A

• 1 item at the top (the country) Map showing all levels of hierarchy
• Country item decomposed into
state/province items
• Each state/province item decomposed into

Province 2

Province 3
Province 1
city items
• Hierarchy helps us manage complexity P

o
P

o
P

– To go from transistors to gates, muxes,


P P

n
i
v n
i
v

n
i
v

r r

o o

c c

n
i
v n
i
v

decoders, registers, ALUs, controllers,


o

2
e

1
e

Country A
n
vi

c c

datapaths, memories, queues, etc.


2
e 3
e

1
e

Map showing just top two levels


– Imagine trying to comprehend a controller
and datapath at the level of gates of hierarchy

Digital Design
Copyright © 2006 89
Frank Vahid

Hierarchy and Abstraction

• Abstraction
– Hierarchy often involves not just grouping
items into a new item, but also associating
higher-level behavior with the new item,
known as abstraction
• e.g., an 8-bit adder has an understandable a7.. a0 b7.. b0
high-level behavior – it adds two 8-bit binary
numbers 8-bit adder ci
– Frees designer from having to remember,
co s7.. s0
or even from having to understand, the P

r
P

r
P

lower-level details
o o

P P

n
i
v n
i
v

n
i
v

r r

o o

c c

n
i
v n
i
v

2
e 3
e

1
e

n
vi

c c

2
e 3
e

1
e

Digital Design
Copyright © 2006 90
Frank Vahid

45
Hierarchy and Composing Larger Components
from Smaller Versions
4⋅ 1
• A common task is to compose smaller components i0 i0
into a larger one i1 i1 a
– Gates: Suppose you have plenty of 3-input AND gates, i2 i2 d
but need a 9-input AND gate
i3 i3
• Can simple compose the 9-input gate from several 3-input
gates 2⋅ 1
s1 s0 i0
– Muxes: Suppose you have 4x1 and 2x1 muxes, but
need an 8x1 mux d
4⋅ 1 i1
• s2 selects either top or bottom 4x1
• s1s0 select particular 4x1 input i4 i0 s0
• Implements 8x1 mux – 8 data inputs, 3 selects, one output i5 i1
i6 P

r
P

r
i2 P

r
d
o o

P
P

r
P

r
n
i
v
n
i
v
i7 i3
n
i
v

o o

c c

n
i
v n
i
v

2
e 3
e

1
e

n
vi

c c

1
e
2
e 3
e

s1 s0

s1 s0 s2

Digital Design
Copyright © 2006 91
Frank Vahid

Hierarchy and Composing Larger Components


from Smaller Versions
• Composing memory very common
• Making memory words wider
– Easy – just place memories side-by-side until desired width obtained
– Share address/control lines, concatenate data lines
– Example: Compose 1024x8 ROMs into 1024x32 ROM
10
addr addr addr addr
r
d
a

1024x8 1024x8 1024x8 1024x8


ROM ROM ROM ROM
en en en en
data data data data
n
e

P P

8 8 8 8
r r

o o

P P

n
i
v

r r

o o

data(31..0)
r

n
i
v n
i
v

3
e

n
vi

c c

2
e 3
e

10
c

1
e

r
d
a
1024x32
ROM
n
e

data
Digital Design
Copyright © 2006 32
92
Frank Vahid

46
Hierarchy and Composing Larger Components
from Smaller Versions
11
a9..a0
• Creating memory with more words r
d
a
addr
– Put memories on top of one another until the a10 1x2 d0 1024x8
number of desired words is achieved i0 dcd ROM
– Use decoder to select among the memories e d1 en data
• Can use highest order address input(s) as 8
decoder input n
e

• Although actually, any address line could be addr


used 1024x8
11 ROM
– Example: Compose 1024x8 memories into
2048x8 memory r
d
a
2048x8 en data
ROM
a10a9a8 a0 n
e
8
data
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 addr P
P

r
8 P

o o

a 0 0 0 0 0 0 0 0 0 1 0
P

1024x8
P

n
i
v n
i
v

ROM
r r

o o

c c

n
i
v n
i
v

2
e 3
e

0 1 1 1 1 1 1 1 1 1 0
n
vi

a10 just chooses c

en data
c

2
e 3
e

a
0 1 1 1 1 1 1 1 1 1 1
c

To create memory with more


1
e

which memory
to access 1 0 0 0 0 0 0 0 0 0 0 words and wider words, can first
1 0 0 0 0 0 0 0 0 0 1 addr
compose to enough words, then
1 0 0 0 0 0 0 0 0 1 0 1024x8
ROM
widen.
Digital Design
Copyright © 2006 1 1 1 1 1 1 1 1 1 1 0 en data 93
Frank Vahid 1 1 1 1 1 1 1 1 1 1 1

Chapter Summary
– Modern digital design involves creating processor-level components
– Four-step RTL method can be used
• 1. High-level state machine 2. Create datapath 3. Connect datapath
to controller 4. Derive controller FSM
– Several example
• Control dominated, data dominated, and mix
– Determining fastest clock frequency
• By finding critical path
– Behavioral-level design – C to gates
• By using method to convert C (subset) to high-level state machine
– Additional RTL components
• Memory: RAM, ROM
• Queues
– Hierarchy: A key concept used throughout Chapters 2-5
Digital Design
Copyright © 2006 94
Frank Vahid

47

You might also like