You are on page 1of 42

# CHAPTER 2

Custom single-
purpose processors
Outline
Introduction

Combinational logic

Sequential logic

design

Introduction
•Processor

## –But, high NRE, longer time-to-market, less flexible.

CMOS transistor on silicon
Transistor

## from source to drain

Don’t confuse this “gate” with a logic gate. source

• gate Conducts
if gate=1

 drain

• gate
IC package IC oxide
source channel drain
Silicon
substrate
CMOS transistor implementations
Complementary Metal Oxide Semiconductor

source source

## Typically 0 is 0V, 1 is 5Vgate Conducts gate Conducts

if gate=0
if gate=1
drain
Two basic CMOS types drain
pMOS
nMOS
nMOS conducts if gate=1

## pMOS conducts if gate=0

1 1
1 x y x
Hence “complementary” y
F =
x F = x' x (xy)' F =
(x+y)'
Basic gates y x y
0 0
0 NOR gate
Inverter, NAND, NOR NAND gate
invert
• er
Basic logic gates
x F x F x
F
x y F x
F x y F x
F
x y F
y
0 0 y 0 0 0 0 0 0 y 0 0 0
1 1 0 1 0 0 1 1 0 1 1
F = F = x 1 0 0 F = x 1 0 1 F = x 1 0 1
x y 1 1 1 + y 1 1 1  y 1 1 0
Drive AND OR XOR
r

x F x F x x y F x
F x y F x
F
x y F
F
0 1 y 0 0 1 y 0 0 1 y 0 0 1
1 0 0 1 1 0 1 0 0 1 0
F = x’ F = (x 1 0 1 F = 1 0 0 F = x 1 0 0
Invert y)’ 1 1 0 (x+y)’ 1 1 0 y 1 1 1
er NAND NOR XNOR
Combinational logic design

## A) Problem description B) Truth table C) Output equations

y is 1 if a is to 1, or b and c are Inputs Outputs y = a'bc + ab'c' + ab'c +
1. z is 1 if b or c is to 1, but a b c y z abc' + abc
not both, or if all are 1. 0 0 0 0 0
0 0 1 0 1 z = a'b'c + a'bc' + ab'c +
0 1 0 0 1 abc' + abc
0 1 1 1 0
1 0 0 1 0
1 0 1 1 1
D) Minimized output equations 1 1 0 1 1
y bc 1 1 1 1 1 E) Logic Gates
a 00 01 11 10
0 0 0 1 0
a y
1 1 1 1 1 b
c
y = a + bc
z
bc
a 00 01 11 10
0 0 1 0 1
z
1 0 1 1 1

z = ab + b’c + bc’
RT-Level Combinational Components

## only one of its data inputs Im to pass through to the

output O.
A decoder converts its binary input I into a one-hot

## output O. A common feature on a decoder is an

extra input called enable. When enable is 0, all
outputs are 0. When enable is 1, the decoder
functions as before.
RT-Level Combinational Components

## generating an n-bit output sum along with an

output carry.
A comparator compares two n-bit binary inputs A

## and B, generating outputs that indicate whether A is

less than, equal to, or greater than B.
An ALU (arithmetic-logic unit) can perform a

## variety of arithmetic and logic functions on its n-bit

inputs A and B.
Sequential logic design
A sequential circuit is a digital circuit whose

## outputs are a function of the current as well as

previous input values.
One of the most basic sequential circuits is the flip-

flop.
The simplest type of flip-flop is the D flip-flop. It

## has two inputs: D and clock.

When clock is 1, the value of D is stored in the flip-

## flop, and that value appears at an output Q.

Sequential logic design
The SR flip-flop, which has three inputs: S, R and

clock.
When clock is 0, the previously stored bit is

## maintained and appears at output Q.

When clock is 1, the inputs S and R are examined. If S

is 1, a 1 is stored. If R is 1, a 0 is stored.
If both are 0, there’s no change. If both are 1,

## behavior is undefined. Thus, S stands for set, and R

for reset
Sequential logic design
JK flip-flop, which is the same as an SR flip-flop

## except that when both J and K are 1, the stored bit

toggles from 1 to 0 or 0 to 1.
To prevent unexpected behavior from signal glitches,

## flip-flops are typically designed to be edge

triggered.
They only pay attention to their non-clock inputs when

## the clock is rising from 0 to 1, or alternatively when

the clock is falling from 1 to 0.
RT-Level Sequential Components
 A register stores n bits from its n-bit data input I, with those

## stored bits appearing at its output O.

 A register usually has at least two control inputs, clock and load.

## which resets all bits to 0,regardless of the value of I.

RT-Level Sequential Components
Because all n bits of the register can be stored in

## parallel, we often refer to this type of register as a

A shift register stores n bits, but these bits cannot be

## stored in parallel. Instead, they must be shifted into

the register serially, meaning one bit per clock edge.
A shift register has a one-bit data input I, and at least

## two control inputs clock and shift.

RT-Level Sequential Components

## of I is stored in the (n)’th bit, while the

(n)’th bit is stored in the (n-1)’th bit, and
likewise, until the second bit is stored in the
first bit.
The first bit is typically shifted out, meaning it

## appears over an output Q.

RT-Level Sequential Components
A counter is a register that can also increment

## (add binary 1) to its stored binary value.

A counter has a clear input, which resets all

## stored bits to 0, and a count input, which

enables incrementing on the clock edge.
A counter often also has a parallel load data

## input and associated control signal.

RT-Level Sequential Components

## (incrementing and decrementing), requiring an

additional control input to indicate the count direction.
 The control inputs discussed above can be either

## synchronous or asynchronous. A synchronous input’s

value only has an effect during a clock edge.
 An asynchronous input’s value affects the circuit

## independent of the clock. Typically, clear control lines are

asynchronous.

RT-Level Sequential Components
Sequential logic design
A) Problem Description C) Implementation Model D) State Table (Moore-type)
You want to construct a
clock divider. Slow down
a Combinational logic Inputs Outputs
that you output a 1 for Q1 Q0 a I1 I0
every four clock cycles I x
1 0 0 0 0 0
I 0
0 0 0 1 0 1
0 1 0 0 1 0
Q1 Q0 0 1 1 1 0
1 0 0 1 0 0
B) State Diagram 1 0 1 1 1
State register
1 1 0 1 1
x=0 x=1 a=0 1
a=0 1 1 1 0 0
I I
0 a=1 3 1 0

a=1 a=1

## 1 2 •Given this implementation model

a=1
a=0 x=0 x=0 a=0
–Sequential logic design quickly reduces to
combinational logic design

Sequential logic design
E) Minimized Output Equations F) Combinational Logic
I1 Q1Q0
a 00 01 11
10 a
0 0 0 1 1
I1 = Q1 ’ Q0a + x
Q1a ’ + Q1Q0 ’
1 0 1 0 1

I0Q1Q0 I1
a
00 01 11 10
0 0 1 1 0 I0 = Q0a ’ + Q0 ’ a

1 1 0 0 1

Q1Q0 I0
xa 00 01 11
10
0 0 0 1 0 x = Q1Q0
Q1 Q0
1 0 0 1 0
Custom single-purpose processor
design
We can apply the above combinational and
sequential logic design techniques to build data
path components and controllers.
we need to build a custom single-purpose
processor for a given program, since a processor
consists of a controller and a data path.
Data path stores and manipulates a systems data.

It contains register units, functional units, and
connection units like wires and multiplexors.
A controller sets the data path control units like

## register load and multiplexor select signals of the

register units, functional units and connection units
to obtain desired configuration at a particular time.
It monitors external control inputs as well as data

## path control outputs known as status signals,

coming from functional units and sets external
control outputs.
… …
external external
control data
inputs inputs
controller datapath
… …
datapath
control registers
inputs next-state
controller datapath and
control
logic

datapath
control
outputs state functional
register units
… …

external external
control data
outputs outputs

… …

## controller and datapath a view inside the controller and datapath

Example: greatest common divisor
!1
(a) black-box 1:
(c) state
•First create algorithm view
1 !(!go_i ) diagram
2:
go_i x_i y_i !go_i
•Convert algorithm to GCD
2-J:

3: x = x_i

4: y = y_i

## –Known as FSMD: (b) desired

5: !(x!=y)
functionality
0: int x, y;
finite-state machine
x!=y
1: while (1) {
2: while (!go_i); 6:
3: x = x_i; x<y !(x<y)
with data path. 4: y = y_i;
7 : y = y -x 8: x = x - y
5: while (x != y) {
6: if (x < y)
–Can use templates to 7: y = y - x;
else
6 - J:

5-J:
perform such 8:
}
x = x - y;
9: d_o = x
9: d_o = x;
conversion. }
1-J:
State diagram templates
Branch statement

if ( c1 )
Assignment statement Loop statement c1
stmts
a = b while else if c2
next ( cond ) { c2
statement loop-body- stmts
else
statements other
} stmts
next next
a = b
statement statement
!cond C:
C:
c1 ! ! c1 *! c2
next statement cond c1 * c2
loop-body-
statements c1 stmts c2 stmts others

J:
J:

next statement

next statement
Creating the data path
•Create a register for any declared variable.
•Create a functional unit for each arithmetic operation.
•Connect the ports, registers and functional units.
–Use multiplexors for multiple sources
•Create unique identifier.
–for each data path component control input
and output
!1
1:

## 1 !(!go_i) x_i y_i

2:
Datapath
!go_i

2 - J: x_sel
n-bit 2x1 n-bit 2x1
3: x = x_i
y_sel

x_ld
4: y = y_i 0: x 0: y

y_ld
!(x!=y)
5:

x!=y
6: != < subtractor subtractor
x<y !(x<y)
5: x!=y 6 : x <y 8: x-y 7: y-x
7: y = y -x 8: x = x - y
x_neq_y

6 - J:
x_lt_y 9: d

d_ld
5 - J:

9: d_o = x d_o

1 - J:
Creating the controller’s FSM
!1 go_i •Same structure as FSMD
1:
Controller !1

2:
1 !(!go_i ) 0000 1:
1 !(!go_i)
•Replace complex
0001 2:
!go_i
2-J:
!go_i
actions/conditions with
0010 2-J:
3: x = x_i
0011
x_sel = 0
3: x_ld = 1 data path configurations.
4: y = y_i
0100
y_sel = 0
4: y_ld = 1 • x_i y_i

!(x!=y) Datapath
5: !x_neq_y
0101 5:
x!=y x_se
l n-bit 2x1 n-bit 2x1
x_neq_y
6: 0110 6: y_se
l
x <y !(x<y) x_lt_y !x_lt_y x_ld
0: x 0: y
y_sel = 1 x_sel = 1
7: y = y -x 8: x = x - y 7:
y_ld = 1
8:
x_ld = 1 y_ld

## 6-J: 0111 1000

1001 6-J:
!= < subtractor subtractor
5 -J : 1010 5-J: 5: x! 6 : x <y 8 : x- y 7: y-x
=y
x_neq_
9: d_o = x 1011 9: d_ld = 1 y
x_lt_y 9: d
1-J: 1100 1-J: d_ld

d_o
Splitting into a controller and data
path
go_i

## Controller implementation Controller !1

model
go_i
0000 1: x_i y_i
x_sel 1 !(!go_i) ( b ) Datapath
Combinational y_sel 0001 2:
logic !go_i x_se
x_ld l n-bit 2x1 n-bit 2x1
y_ld 0010 2-J: y_se
l
x_neq_y x_sel = 0 x_ld
0011 3: x_ld = 1 0: x 0: y
x_lt_y y_ld
d_ld
y_sel = 0
0100 4: y_ld = 1
!= < subtractor subtractor
x_neq_y=0 5: x! 6: x<y 8 : x- y 7: y-x
0101 5:
x_neq_
=y
Q3 Q2 Q1 Q0 x_neq_y= y
0110 6: 1 x_lt_y 9: d
State register
x_lt_y=1 x_lt_y d_ld
I3 I2 I1 I0 y_sel = 1 =0 = 1
x_sel d_o
7: 8:
y_ld = 1 x_ld = 1
0111 1000
1001 6-J:

1010 5-J:

1011 9: d_ld = 1

1100 1-J:
Completing the GCD custom single-
purpose processor design
•We finished the data path … …

## next state and control logic next-state registers

and
control
–All that’s left is logic

combinational logic
state functional
design register units

## •This is not an optimized

design, but we see the … …

## basic steps a view inside the controller and datapath

Controller state table for the GCD Example
Completing the GCD custom
single-purpose processor design
 We finished the data path … …

## next state and control logic next-state registers

and
control
 All that’s left is logic

combinational logic
state functional
register units
design

##  This is not an optimized

… …
design, but we see the basic
a view inside the controller and datapath
steps

33
RT-level custom single-purpose
processor design
machine

Specification
Sende Bridge Rece
r rdy_in A single-purpose processor that rdy_out iver

Problem
converts two 4-bit inputs,
 Rather than algorithm clock arriving one at a time over
data_in along with a rdy_in
pulse, into one 8-bit output on
data_out along with a rdy_out
 Cycle timing often too data_in(4) pulse. data_out(8)

central to functionality

##  Example rdy_in=0 Bridge rdy_in=1

rdy_in=1
 Bus bridge that converts 4- WaitFirst4 RecFirst4Start RecFirst4End
data_lo=data_in

## bit bus to 8-bit bus

rdy_in=0 rdy_in=0 rdy_in=1
WaitSecond4 RecSecond4Start RecSecond4End
data_hi=data_in
 Known as register-transfer
FSMD

rdy_in=0
(RT) level Send8Start
Inputs
rdy_in: bit; data_in:
data_out=data_hi & data_lo Send8End bit[4];
 Exercise: complete the rdy_out=1 rdy_out=0 Outputs
rdy_out: bit;
data_out:bit[8]
design Variables
data_lo, data_hi: bit[4];

• 34
RT-level custom single-purpose
processor design (cont’)
Bridge
( a ) Controller
rdy_in= rdy_in=
0 rdy_in=1 1
WaitFirst4 RecFirst4Start RecFirst4End
data_lo_ld = 1

## rdy_in=0 rdy_in=0 rdy_in=1

rdy_in=
WaitSecond4 1 RecSecond4Start RecSecond4End
data_hi_ld = 1

Send8Start Send8End
data_out_ld = 1 rdy_out=0
rdy_out=1

rdy_in rdy_ou
t
clk
data_in(4) data_ou
t

data_lo_ld
data_hi_ld
data_out_l

d
register

data_hi data_lo
to all
s

data_out
( b ) Datapath

35
Optimizing single-purpose
processors

## Optimization is the task of making design

metric values the best possible
Optimization opportunities
original program

FSMD

Data path

FSM

36
Optimizing the original program

## Analyze program attributes and look for areas

of possible improvement
number of computations

size of variable

operations used

##  multiplication and division very expensive

37
Optimizing the original
program
original optimized
program
0 : in t x , y ; program
0 : in t x , y , r;
1 : w h ile ( 1 ) { 1 : w h ile ( 1 ) {
2 : w h ile (! g o _i); 2 : w h ile (! g o _i);
3 : x = x_i; // x must be the larger
4 : y = y_i; num ber
5 : w h ile ( x ! = y ) { 3 : if ( x_i > = y_i) {
replace the subtraction
6: if ( x < y ) operation(s) with modulo 4: x = x_i;
7: y = y - x; operation in order to 5: y = y_i;
e lse speed up program }
8: x = x - y; 6 : e lse {
} 7: x = y_i;
9 : d _o = x ; 8: y = x_i;
} }
9 : w h ile ( y ! = 0 ) {
10: r = x % y;
11: x = y;
12: y = r;
}
1 3 : d _o = x ;
}
GCD(42, 8) - 9 iterations to complete the GCD(42,8) - 3 iterations to complete the
loop loop
x and y values evaluated as follows : (42, x and y values evaluated as follows: (42,
8), (43, 8), (26,8), (18,8), (10, 8), (2,8), (2,6), 8), (8,2), (2,0)
(2,4), (2,2).

38
Optimizing the FSMD
 Areas of possible improvements

 merge states

##  states with constants on transitions can be

eliminated, transition taken is already known
 states with independent operations can be merged
 separate states

##  states which require complex operations

(a*b*c*d) can be broken into smaller states to
reduce hardware size
 scheduling

39
Optimizing the FSMD (cont.)
int x, optimized FSMD
y;
!1 original FSMD
1:
in t x ,
1 !(!go_i) eliminate state 1 – transitions have constant y;
2: 2:
values
!go_i !
go_i go_i
2 - J: x = x_i
3: y = y_i
merge state 2 and state 2J – no loop
3: x = x_i
operation in between them
5:

## 4: y = y_i x<y x>y

merge state 3 and state 4 – assignment
operations are independent of one another 7 : y = y -x 8: x = x - y
5: !(x!=y)

x!=y
9: d_o = x
6: merge state 5 and state 6 – transitions from
state 6 can be done in state 5
x<y !(x<y)
y = y -x 8: x = x - y
7:
eliminate state 5J and 6J – transitions from
6 - J: each state can be done from state 7 and state
8, respectively
5 - J:
eliminate state 1-J – transition from state
d_o = x 1-J can be done directly from state 9
9:

1 - J:

40
Optimizing the data path
Sharing of functional units

##  one-to-one mapping, as done previously, is not

necessary
 if same operation occurs in different states, they can

## share a single functional unit

Multi-functional units

## among operations occurring in different states

41
Summary
Custom single-purpose processors