You are on page 1of 42

CHAPTER 2

Custom single-
purpose processors
Outline
Introduction

Combinational logic

Sequential logic

Custom single-purpose processor design

RT-level custom single-purpose processor

design


Introduction
•Processor

–Digital circuit that performs a computation tasks

–Controller and data path.

–General-purpose: variety of computation tasks.

–Single-purpose: one particular computation task.

–Custom single-purpose: non-standard task.

•A custom single-purpose processor may be

–Fast, small, low power

–But, high NRE, longer time-to-market, less flexible.


CMOS transistor on silicon
Transistor

The basic electrical component in digital systems

Acts as an on/off switch

Voltage at “gate” controls whether current flows

from source to drain


Don’t confuse this “gate” with a logic gate. source

• gate Conducts
if gate=1

 drain

• gate
IC package IC oxide
source channel drain
Silicon
substrate
CMOS transistor implementations
Complementary Metal Oxide Semiconductor

We refer to logic levels


source source

Typically 0 is 0V, 1 is 5Vgate Conducts gate Conducts


if gate=0
if gate=1
drain
Two basic CMOS types drain
pMOS
nMOS
nMOS conducts if gate=1

pMOS conducts if gate=0


1 1
1 x y x
Hence “complementary” y
F =
x F = x' x (xy)' F =
(x+y)'
Basic gates y x y
0 0
0 NOR gate
Inverter, NAND, NOR NAND gate
invert
• er
Basic logic gates
x F x F x
F
x y F x
F x y F x
F
x y F
y
0 0 y 0 0 0 0 0 0 y 0 0 0
1 1 0 1 0 0 1 1 0 1 1
F = F = x 1 0 0 F = x 1 0 1 F = x 1 0 1
x y 1 1 1 + y 1 1 1  y 1 1 0
Drive AND OR XOR
r

x F x F x x y F x
F x y F x
F
x y F
F
0 1 y 0 0 1 y 0 0 1 y 0 0 1
1 0 0 1 1 0 1 0 0 1 0
F = x’ F = (x 1 0 1 F = 1 0 0 F = x 1 0 0
Invert y)’ 1 1 0 (x+y)’ 1 1 0 y 1 1 1
er NAND NOR XNOR
Combinational logic design

A) Problem description B) Truth table C) Output equations


y is 1 if a is to 1, or b and c are Inputs Outputs y = a'bc + ab'c' + ab'c +
1. z is 1 if b or c is to 1, but a b c y z abc' + abc
not both, or if all are 1. 0 0 0 0 0
0 0 1 0 1 z = a'b'c + a'bc' + ab'c +
0 1 0 0 1 abc' + abc
0 1 1 1 0
1 0 0 1 0
1 0 1 1 1
D) Minimized output equations 1 1 0 1 1
y bc 1 1 1 1 1 E) Logic Gates
a 00 01 11 10
0 0 0 1 0
a y
1 1 1 1 1 b
c
y = a + bc
z
bc
a 00 01 11 10
0 0 1 0 1
z
1 0 1 1 1

z = ab + b’c + bc’
RT-Level Combinational Components

A multiplexor, sometimes called a selector, allows

only one of its data inputs Im to pass through to the


output O.
A decoder converts its binary input I into a one-hot

output O. A common feature on a decoder is an


extra input called enable. When enable is 0, all
outputs are 0. When enable is 1, the decoder
functions as before.
RT-Level Combinational Components

An adder adds two n-bit binary inputs A and B,

generating an n-bit output sum along with an


output carry.
A comparator compares two n-bit binary inputs A

and B, generating outputs that indicate whether A is


less than, equal to, or greater than B.
An ALU (arithmetic-logic unit) can perform a

variety of arithmetic and logic functions on its n-bit


inputs A and B.
Sequential logic design
A sequential circuit is a digital circuit whose

outputs are a function of the current as well as


previous input values.
One of the most basic sequential circuits is the flip-

flop.
The simplest type of flip-flop is the D flip-flop. It

has two inputs: D and clock.


When clock is 1, the value of D is stored in the flip-

flop, and that value appears at an output Q.


Sequential logic design
The SR flip-flop, which has three inputs: S, R and

clock.
When clock is 0, the previously stored bit is

maintained and appears at output Q.


When clock is 1, the inputs S and R are examined. If S

is 1, a 1 is stored. If R is 1, a 0 is stored.
If both are 0, there’s no change. If both are 1,

behavior is undefined. Thus, S stands for set, and R


for reset
Sequential logic design
JK flip-flop, which is the same as an SR flip-flop

except that when both J and K are 1, the stored bit


toggles from 1 to 0 or 0 to 1.
To prevent unexpected behavior from signal glitches,

flip-flops are typically designed to be edge


triggered.
They only pay attention to their non-clock inputs when

the clock is rising from 0 to 1, or alternatively when


the clock is falling from 1 to 0.
RT-Level Sequential Components
 A register stores n bits from its n-bit data input I, with those

stored bits appearing at its output O.

 A register usually has at least two control inputs, clock and load.

 For a rising-edge-triggered register, the inputs I are only stored

when load is 1 and clock is rising from 0 to 1.

 The clock input is usually drawn as a small triangle, as shown in

the figure. Another common register control input is clear,

which resets all bits to 0,regardless of the value of I.



RT-Level Sequential Components
Because all n bits of the register can be stored in

parallel, we often refer to this type of register as a


parallel-load register.
A shift register stores n bits, but these bits cannot be

stored in parallel. Instead, they must be shifted into


the register serially, meaning one bit per clock edge.
A shift register has a one-bit data input I, and at least

two control inputs clock and shift.


RT-Level Sequential Components

When clock is rising and shift is 1, the value

of I is stored in the (n)’th bit, while the


(n)’th bit is stored in the (n-1)’th bit, and
likewise, until the second bit is stored in the
first bit.
The first bit is typically shifted out, meaning it

appears over an output Q.



RT-Level Sequential Components
A counter is a register that can also increment

(add binary 1) to its stored binary value.


A counter has a clear input, which resets all

stored bits to 0, and a count input, which


enables incrementing on the clock edge.
A counter often also has a parallel load data

input and associated control signal.


RT-Level Sequential Components

 A common counter feature is both up and down counting

(incrementing and decrementing), requiring an


additional control input to indicate the count direction.
 The control inputs discussed above can be either

synchronous or asynchronous. A synchronous input’s


value only has an effect during a clock edge.
 An asynchronous input’s value affects the circuit

independent of the clock. Typically, clear control lines are


asynchronous.

RT-Level Sequential Components
Sequential logic design
A) Problem Description C) Implementation Model D) State Table (Moore-type)
You want to construct a
clock divider. Slow down
your pre-existing clock so x
a Combinational logic Inputs Outputs
that you output a 1 for Q1 Q0 a I1 I0
every four clock cycles I x
1 0 0 0 0 0
I 0
0 0 0 1 0 1
0 1 0 0 1 0
Q1 Q0 0 1 1 1 0
1 0 0 1 0 0
B) State Diagram 1 0 1 1 1
State register
1 1 0 1 1
x=0 x=1 a=0 1
a=0 1 1 1 0 0
I I
0 a=1 3 1 0

a=1 a=1

1 2 •Given this implementation model


a=1
a=0 x=0 x=0 a=0
–Sequential logic design quickly reduces to
combinational logic design


Sequential logic design
E) Minimized Output Equations F) Combinational Logic
I1 Q1Q0
a 00 01 11
10 a
0 0 0 1 1
I1 = Q1 ’ Q0a + x
Q1a ’ + Q1Q0 ’
1 0 1 0 1

I0Q1Q0 I1
a
00 01 11 10
0 0 1 1 0 I0 = Q0a ’ + Q0 ’ a

1 1 0 0 1

Q1Q0 I0
xa 00 01 11
10
0 0 0 1 0 x = Q1Q0
Q1 Q0
1 0 0 1 0
Custom single-purpose processor
design
We can apply the above combinational and
sequential logic design techniques to build data
path components and controllers.
we need to build a custom single-purpose
processor for a given program, since a processor
consists of a controller and a data path.
Data path stores and manipulates a systems data.


It contains register units, functional units, and
connection units like wires and multiplexors.
A controller sets the data path control units like

register load and multiplexor select signals of the


register units, functional units and connection units
to obtain desired configuration at a particular time.
It monitors external control inputs as well as data

path control outputs known as status signals,


coming from functional units and sets external
control outputs.
… …
external external
control data
inputs inputs
controller datapath
… …
datapath
control registers
inputs next-state
controller datapath and
control
logic

datapath
control
outputs state functional
register units
… …

external external
control data
outputs outputs

… …

controller and datapath a view inside the controller and datapath


Example: greatest common divisor
!1
(a) black-box 1:
(c) state
•First create algorithm view
1 !(!go_i ) diagram
2:
go_i x_i y_i !go_i
•Convert algorithm to GCD
2-J:

3: x = x_i

“complex” state machine d_o


4: y = y_i

–Known as FSMD: (b) desired


5: !(x!=y)
functionality
0: int x, y;
finite-state machine
x!=y
1: while (1) {
2: while (!go_i); 6:
3: x = x_i; x<y !(x<y)
with data path. 4: y = y_i;
7 : y = y -x 8: x = x - y
5: while (x != y) {
6: if (x < y)
–Can use templates to 7: y = y - x;
else
6 - J:

5-J:
perform such 8:
}
x = x - y;
9: d_o = x
9: d_o = x;
conversion. }
1-J:
State diagram templates
Branch statement

if ( c1 )
Assignment statement Loop statement c1
stmts
a = b while else if c2
next ( cond ) { c2
statement loop-body- stmts
else
statements other
} stmts
next next
a = b
statement statement
!cond C:
C:
c1 ! ! c1 *! c2
next statement cond c1 * c2
loop-body-
statements c1 stmts c2 stmts others

J:
J:

next statement

next statement
Creating the data path
•Create a register for any declared variable.
•Create a functional unit for each arithmetic operation.
•Connect the ports, registers and functional units.
–Based on reads and writes
–Use multiplexors for multiple sources
•Create unique identifier.
–for each data path component control input
and output
!1
1:

1 !(!go_i) x_i y_i


2:
Datapath
!go_i

2 - J: x_sel
n-bit 2x1 n-bit 2x1
3: x = x_i
y_sel

x_ld
4: y = y_i 0: x 0: y

y_ld
!(x!=y)
5:

x!=y
6: != < subtractor subtractor
x<y !(x<y)
5: x!=y 6 : x <y 8: x-y 7: y-x
7: y = y -x 8: x = x - y
x_neq_y

6 - J:
x_lt_y 9: d

d_ld
5 - J:

9: d_o = x d_o

1 - J:
Creating the controller’s FSM
!1 go_i •Same structure as FSMD
1:
Controller !1

2:
1 !(!go_i ) 0000 1:
1 !(!go_i)
•Replace complex
0001 2:
!go_i
2-J:
!go_i
actions/conditions with
0010 2-J:
3: x = x_i
0011
x_sel = 0
3: x_ld = 1 data path configurations.
4: y = y_i
0100
y_sel = 0
4: y_ld = 1 • x_i y_i

!(x!=y) Datapath
5: !x_neq_y
0101 5:
x!=y x_se
l n-bit 2x1 n-bit 2x1
x_neq_y
6: 0110 6: y_se
l
x <y !(x<y) x_lt_y !x_lt_y x_ld
0: x 0: y
y_sel = 1 x_sel = 1
7: y = y -x 8: x = x - y 7:
y_ld = 1
8:
x_ld = 1 y_ld

6-J: 0111 1000


1001 6-J:
!= < subtractor subtractor
5 -J : 1010 5-J: 5: x! 6 : x <y 8 : x- y 7: y-x
=y
x_neq_
9: d_o = x 1011 9: d_ld = 1 y
x_lt_y 9: d
1-J: 1100 1-J: d_ld

d_o
Splitting into a controller and data
path
go_i

Controller implementation Controller !1


model
go_i
0000 1: x_i y_i
x_sel 1 !(!go_i) ( b ) Datapath
Combinational y_sel 0001 2:
logic !go_i x_se
x_ld l n-bit 2x1 n-bit 2x1
y_ld 0010 2-J: y_se
l
x_neq_y x_sel = 0 x_ld
0011 3: x_ld = 1 0: x 0: y
x_lt_y y_ld
d_ld
y_sel = 0
0100 4: y_ld = 1
!= < subtractor subtractor
x_neq_y=0 5: x! 6: x<y 8 : x- y 7: y-x
0101 5:
x_neq_
=y
Q3 Q2 Q1 Q0 x_neq_y= y
0110 6: 1 x_lt_y 9: d
State register
x_lt_y=1 x_lt_y d_ld
I3 I2 I1 I0 y_sel = 1 =0 = 1
x_sel d_o
7: 8:
y_ld = 1 x_ld = 1
0111 1000
1001 6-J:

1010 5-J:

1011 9: d_ld = 1

1100 1-J:
Completing the GCD custom single-
purpose processor design
•We finished the data path … …

•We have a state table for the controller datapath

next state and control logic next-state registers


and
control
–All that’s left is logic

combinational logic
state functional
design register units

•This is not an optimized


design, but we see the … …

basic steps a view inside the controller and datapath


Controller state table for the GCD Example
Completing the GCD custom
single-purpose processor design
 We finished the data path … …

 We have a state table for the controller datapath

next state and control logic next-state registers


and
control
 All that’s left is logic

combinational logic
state functional
register units
design

 This is not an optimized


… …
design, but we see the basic
a view inside the controller and datapath
steps

33
RT-level custom single-purpose
processor design
 We often start with a state
machine

Specification
Sende Bridge Rece
r rdy_in A single-purpose processor that rdy_out iver

Problem
converts two 4-bit inputs,
 Rather than algorithm clock arriving one at a time over
data_in along with a rdy_in
pulse, into one 8-bit output on
data_out along with a rdy_out
 Cycle timing often too data_in(4) pulse. data_out(8)

central to functionality

 Example rdy_in=0 Bridge rdy_in=1


rdy_in=1
 Bus bridge that converts 4- WaitFirst4 RecFirst4Start RecFirst4End
data_lo=data_in

bit bus to 8-bit bus


rdy_in=0 rdy_in=0 rdy_in=1
 Start with FSMD rdy_in=1
WaitSecond4 RecSecond4Start RecSecond4End
data_hi=data_in
 Known as register-transfer
FSMD

rdy_in=0
(RT) level Send8Start
Inputs
rdy_in: bit; data_in:
data_out=data_hi & data_lo Send8End bit[4];
 Exercise: complete the rdy_out=1 rdy_out=0 Outputs
rdy_out: bit;
data_out:bit[8]
design Variables
data_lo, data_hi: bit[4];

• 34
RT-level custom single-purpose
processor design (cont’)
Bridge
( a ) Controller
rdy_in= rdy_in=
0 rdy_in=1 1
WaitFirst4 RecFirst4Start RecFirst4End
data_lo_ld = 1

rdy_in=0 rdy_in=0 rdy_in=1


rdy_in=
WaitSecond4 1 RecSecond4Start RecSecond4End
data_hi_ld = 1

Send8Start Send8End
data_out_ld = 1 rdy_out=0
rdy_out=1

rdy_in rdy_ou
t
clk
data_in(4) data_ou
t

data_lo_ld
data_hi_ld
data_out_l

d
register

data_hi data_lo
to all
s

data_out
( b ) Datapath

35
Optimizing single-purpose
processors

Optimization is the task of making design


metric values the best possible
Optimization opportunities
original program

FSMD

Data path

FSM

36
Optimizing the original program

Analyze program attributes and look for areas

of possible improvement
number of computations

size of variable

time and space complexity

operations used

 multiplication and division very expensive

37
Optimizing the original
program
original optimized
program
0 : in t x , y ; program
0 : in t x , y , r;
1 : w h ile ( 1 ) { 1 : w h ile ( 1 ) {
2 : w h ile (! g o _i); 2 : w h ile (! g o _i);
3 : x = x_i; // x must be the larger
4 : y = y_i; num ber
5 : w h ile ( x ! = y ) { 3 : if ( x_i > = y_i) {
replace the subtraction
6: if ( x < y ) operation(s) with modulo 4: x = x_i;
7: y = y - x; operation in order to 5: y = y_i;
e lse speed up program }
8: x = x - y; 6 : e lse {
} 7: x = y_i;
9 : d _o = x ; 8: y = x_i;
} }
9 : w h ile ( y ! = 0 ) {
10: r = x % y;
11: x = y;
12: y = r;
}
1 3 : d _o = x ;
}
GCD(42, 8) - 9 iterations to complete the GCD(42,8) - 3 iterations to complete the
loop loop
x and y values evaluated as follows : (42, x and y values evaluated as follows: (42,
8), (43, 8), (26,8), (18,8), (10, 8), (2,8), (2,6), 8), (8,2), (2,0)
(2,4), (2,2).

38
Optimizing the FSMD
 Areas of possible improvements

 merge states

 states with constants on transitions can be


eliminated, transition taken is already known
 states with independent operations can be merged
 separate states

 states which require complex operations


(a*b*c*d) can be broken into smaller states to
reduce hardware size
 scheduling

39
Optimizing the FSMD (cont.)
int x, optimized FSMD
y;
!1 original FSMD
1:
in t x ,
1 !(!go_i) eliminate state 1 – transitions have constant y;
2: 2:
values
!go_i !
go_i go_i
2 - J: x = x_i
3: y = y_i
merge state 2 and state 2J – no loop
3: x = x_i
operation in between them
5:

4: y = y_i x<y x>y


merge state 3 and state 4 – assignment
operations are independent of one another 7 : y = y -x 8: x = x - y
5: !(x!=y)

x!=y
9: d_o = x
6: merge state 5 and state 6 – transitions from
state 6 can be done in state 5
x<y !(x<y)
y = y -x 8: x = x - y
7:
eliminate state 5J and 6J – transitions from
6 - J: each state can be done from state 7 and state
8, respectively
5 - J:
eliminate state 1-J – transition from state
d_o = x 1-J can be done directly from state 9
9:

1 - J:

40
Optimizing the data path
Sharing of functional units

 one-to-one mapping, as done previously, is not

necessary
 if same operation occurs in different states, they can

share a single functional unit

Multi-functional units

 ALUs support a variety of operations, it can be shared

among operations occurring in different states



41
Summary
Custom single-purpose processors

Straightforward design techniques

Can be built to execute algorithms

Typically start with FSMD

CAD tools can be of great assistance


42