Lecture 04 Control Units

Control Unit :
Hardwired vs. Microprogrammed Approach
Dr Shankar Balachandran Indian Institute of Technology Madras shankar@cse.iitm.ernet.in 14 October 2006
Two Major Blocks in a CPU
Datapath
Adders,
multipliers, dividers Shifters, Registers Anything that changes or stores data
Control Unit
Controls
the data How data is stored? Where is it stored? When should data be available?
Control Unit
Correct sequencing of control signals Much like human brain controlling various parts of body Sequence and timing is the key
Any
aberration will result in wrong operation
A Simplified Control Unit

Fetch Fetch Unit Decode Decode Unit
Control Unit
Execute Execution Unit Write Back
Write Back Unit
A Possible Implementation
Mod-3 Counter
2 to 4 Decoder
CLK
Timing Diagram
CLK
Fetch
Decode
Execute
Write Back
Lets Sample The Signals
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
Another Way to Generate Signals 1000 0100 0010
0001
Hardwired vs Microprogrammed
Hardwired
Use
gates to generate signals Squeeze out the juice for performance Different logic styles possible
Microprogrammed
Store
the control signals in the sequence Just read from the memory every clock cycle
A Model Computer
(Richard Eckert, SIGCSE Bulletin, Vol. 20, No. 3, September 1988)
IP LP EP 8 PC 12 12 LM MAR ALU 12 Accumulator 12 S A EU 12 RAM 12 Register B LB LI EI 4 Bus Control LA EA
8
R W
12
LD ED 12 MDR IR
More Details
L = Load E = Copy to bus A,S = Add and Subtract Sign bit to control unit IP = Increment PC
IP LP EP
PC
ACC
LA EA
S LM
MAR
ALU
A EU
R W
RAM B
LB
LD ED
MDR
Bus
IR
LI EI
Control
Mnemonic Opcode
LDA
Load Accumulator
Action
A(Mem)
Register Transfers
1. MAR IR 2. MDR M(MAR) 3. A MDR 1. MAR IR 2.MDR A 3. M(MAR) MDR 1. AALU(Add) 1. AALU(Sub) 1. BA 1. PCIR 1. PCIR if NF is set
Active Controls
EI,LM R ED,LA EI,LM EA,LD W A,EU,LA S,EU,LA EA,LB EI,LP NF : EI,LP
STA
Store Accumulator
2 3 4 5 6 7
(Mem) A A A+B A A-B B A PC Mem PC Mem If ve flag is set Stop Clock IR Next Instruction
ADD SUB MBA JMP JN
HLT Fetch
8-15
1. MAR PC 2. MDR M(MAR) 3. IR MDR
EP,LM R ED,LI,IP
Hardwired Unit
IR
Opcode
LDA STA ADD CLK
Ring Counter
T1
T5
Decoder
SUB MBA JMP JN
Control Matrix
Halt
NF
Control Signals
Table with Sequencing

IP LP EP LM R W LD ED LI EI LA EA A S EU LB
Fetch T2 LDA STA MBA ADD SUB JMP JN T3 T3 *F
T0
T0 T3 T3
T1 T4 T5 T4
T2 T5
T2 T3 T3 T5 T4 T3 T3 T3 T3 T3 *F T3 T3 T3 T3 T3
IP = T2; LP = T3*JMP+T3*JN*NF; EP = T0; LM = T0+T3*LDA+T3*STA
R=T1+T4*LDA; W=T5* STA; LD = T4*STA; ED=T2+T5*LDA;
LI=T2; A = T3*ADD; S = T3*SUB; ..
Control Matrix
Implement using discrete gates Usually done using PLAs Large control matrices are implemented hierarchically
For
speed
A well known process and design flows are widespread
An Alternate Implementation
4-bit opcode IR MAP
Starting Address Generator
1*
01 00
&
CD
NF
CLK
uPC
32 x 24
Map 1 0
CD * 0
Meaning From IR Unconditional Branch within Microprogram NF=0 => Increment NF=1 => Conditional Branch
+1 Control Store
Control ROM Jump Address
Microinstruction Register
HLT
Control
Control Store
Instruction Op-Code
Fetch 0
uInstruction Address
00 01 02 03 04 05
Control Signals
0011000000000000 0000100000000000 1000000110000000 0001000001000000 0000100000000000 0000000100100000 0001000001000000
CD
0 0 0 0 0 0 0
MAP HLT Addr. Of Next

0 0 1 0 0 0 0 0 0 0 0 0 0 0 01 02 XX 04 05 00 07
LDA
STA
06
07
08 ADD SUB MBA JMP JN 3 4 5 6 7 09 0A 0B 0C 0D 0E 0F
0000001000010000
0000010000000000 0000000000101010 0000000000100110 0000000000010001 0100000001000000 0000000000000000 0000000000000000 0100000001000000 0000000000000000
0
0 0 0 0 0 1 0 0 0
0
0 0 0 0 0 0 0 0 0
0
0 0 0 0 0 0 0 0 1
08
00 00 00 00 00 0F 00 00 XX
Expansion
HLT
8-E
F
10-1E
1F
Control Word
Example 1 MBA followed by ADD

00 Fetch 0 01 02 LDA 1 03
I P
L P
E P
L M
L D
E D
L I
E I
L A
E A
E U
L B
0011000000000000 0000100000000000 1000000110000000 0001000001000000
0 0 0 0
0 0 1 0
0 0 0 0
01 02 XX 0B 09 04
04
05 STA 2 06 07 08 ADD SUB MBA JMP 3 4 5 6 09 0A 0B 0C
0000100000000000
0000000100100000 0001000001000000 0000001000010000 0000010000000000 0000000000101010 0000000000100110 0000000000010001 0100000001000000
0
0 0 0 0 0 0 0 0
0
0 0 0 0 0 0 0 0
0
0 0 0 0 0 0 0 0
05
00 07 08 00 00 00 00 00
JN
0D
0E 0F
0000000000000000
0000000000000000 0100000001000000 0000000000000000
1
0 0 0
0
0 0 0
0
0 0 1
0F
00 00 XX
Expansion HLT
8-E F
10-1E 1F
Sequence for MBA,ADD

MOV B,A
ADD
1. MAR PC 2. MDR M(MAR) 3. IR MDR BA 1. MAR PC 2. MDR M(MAR) 3. IR MDR AALU(Add)
0011000000000000 0000100000000000 1000000110000000 0000000000010001
0011000000000000
0000100000000000 1000000110000000 0000000000101010
I P
L P
E P
L M
L D
E D
L I
E I
L A
E A
E U
L B
CD
Example 2 JN with Flag Set

0
0 1 0 0 0 0 0 0
00
Fetch 0 01 02 LDA 1 03 04 05 STA 2 06 07 08
0011000000000000
0000100000000000 1000000110000000 0001000001000000 0000100000000000 0000000100100000 0001000001000000 0000001000010000 0000010000000000
0
0 0 0 0 0 0 0 0
0
0 0 0 0 0 0 0 0
01
02 XX 04 05 00 07 08 00
0D
ADD
SUB MBA JMP JN
3
4 5 6 7
09
0A 0B 0C 0D 0E 0F
0000000000101010
0000000000100110 0000000000010001 0100000001000000 0000000000000000 0000000000000000 0100000001000000 0000000000000000
0
0 0 0 1 0 0 0
0
0 0 0 0 0 0 0
0
0 0 0 0 0 0 1
00
00 00 00 0F 00 00 XX
Expansion HLT
8-E F
10-1E 1F
If negative FLAG is set, jump to a new location by skipping to uInstruction at 0F
I P
L P
E P
L M
L D
E D
L I
E I
L A
E A
E U
L B
CD
Example 3 JN with Flag Not Set

0
0 1 0 0 0 0 0 0
00
Fetch 0 01 02 LDA 1 03 04 05 STA 2 06 07 08
0011000000000000
0000100000000000 1000000110000000 0001000001000000 0000100000000000 0000000100100000 0001000001000000 0000001000010000 0000010000000000
0
0 0 0 0 0 0 0 0
0
0 0 0 0 0 0 0 0
01
02 XX 04 05 00 07 08 00
0D
ADD
SUB MBA JMP JN
3
4 5 6 7
09
0A 0B 0C 0D 0E 0F
0000000000101010
0000000000100110 0000000000010001 0100000001000000 0000000000000000 0000000000000000 0100000001000000 0000000000000000
0
0 0 0 1 0 0 0
0
0 0 0 0 0 0 0
0
0 0 0 0 0 0 1
00
00 00 00 0F 00 00 XX
Expansion HLT
8-E F
10-1E 1F
Lets Review the Microprogramming Model

Store the microprogram in control store Fetch the instruction Get the set of control signals from the control word Move the microinstruction address Lather, Rinse, Repeat
What is Microcode?
Michael Slater's "Microprocessor Based Design" (pg.42):

Microcode tells the processor every detailed step required to execute each machine language instruction. Microcode is thus at an even more detailed level than machine language, and in fact defines the machine language. In a standard microprocessor, the microcode is stored in a ROM or a programmable logic array (PLA) that is part of the microprocessor chip and cannot be modified by the user.'
Thought Experiment
Why is the design a little clumsy? What can we do about it?
Reason for Clumsiness

JN Conditional Flag check Without any condition check, the whole process is very smooth Solution Avoid all conditional checks
Real Life
A little American Football Story Theory vs. Practice
In
theory, there is no difference between theory and practice In practice, theory and practice are two different things altogether
Live with condition checks

Keep
designs as clean as possible
A General Approach
IR
Starting and Branch Address Generator
External Inputs
Conditional Codes
uPC
Control Store
Control Word
Format of Microinstructions
Pick yours
Your
choice is as best as your neighbors
What we did :
One
bit position per control signal Order of the bits ?

Dont matter
Can
result in long microinstructions
Not the number of microinstructions, but the width
A Note About Density

Observe that only a few bits are set to 1 Poor usage of bit space This scheme is called Horizontal Microprogram Alternate Version : Encode the bits
Vertical
Microprogram
Vertical Microprogram
Encode the bits by grouping similar elements together General Idea :
Group
similar resources together

operations are mutually exclusive
There can be only one source or destination register Read vs Write of memory
Some
Design Issues
Encoding reduces the bit-space

But
requires decoders decoder cost is very low
Cost of decoder vs bit-space

Usually
Another Idea
Group concuurently active signals Every meaningful combination gets a code Complex decoder to interpret every code
Vertical vs Horizontal
Horizontal
Faster More
area More common currently

Cheap transistors
Vertical
Slower
More
microinstructions
Microsequencing
Other ways to save on hardware Every instruction had its own microprogram sequence Also, instructions have several addressing modes
Only
the first few microinstructions differ
Can we share microcode?
A Powerful Technique in Sharing
Bit-ORing

Example Two instructions share some microcode Eventually, must branch The default branch (one instructions) is X0 The other branch is stored at X1 Change the least significant bit(s?) to get a new address Having two conditional branches Store two fields, one for each branch Both very unclean
Compare that with :

Thought Experiment :
What if we provided explicit branch instead of storing next field in our microprogram? Typical instruction set will need a lot of branches Lot of time will be wasted on branching
A Pat on Our Back
We provided explicit field for address

Branch
location is now data It is already saved
Caution :
Microinstruction
can get very wide
Solution :
There
is no free lunch.
Can we pipeline microfetch?
A neat idea :

Why wait till the current micro-op is over? Branch field gives next operation Get the next op External inputs and status flags may change the order What about interrupts?
Caveat :

They are going to follow you everywhere
Should have a mechanism that can invalidate microcode prefetch
Similar to pipeline flush for instructions
Commonly used
Historical Perspectives
Hardwired Logic

Popular before 60s
Only way people did it Speed Benefits
Popular now
Microprogram
Popular in 70s

Memory was slower than CPU No on-chip cache Best way is to store the microcode
Now Depends on who you ask? Extremes of spectrum are harder to find nowadays
Shades of gray :
Tools for Design
Hardwired
Any
state machine optimizer Assigning states, minimizing tranisitions, races, hazards,..
Microcoding
Small

ones can be in binary Large ones Use microassembler

Very useful debug tool Can use microassembler simultaneously with actual hardware development
Hardwired vs Microcoding
Hardwired units are faster and smaller Emulation is easy with microcoding Hardwired design is complex if large Bugs in hardwired design cannot be fixed in field Hardwired control is not suited for loops
Looping
with microcode can be made as fast
Hardwired vs Microcode vs RISC
RISC
Simpler
instruction set Hardwired Implementation
RISC instructions are like microcodes

Instructions
come from I-Cache instead of Control
Store
Difference :
Contents
are not fixed Advantage : Only load what you want on the I-Cache
Keeps size smaller as compared to Control Stores
Microprogram vs Software
Imagine Floating Point Division Solution 1 : Write in software

Long
process Error prone Many fetches repeatedly from memory for the given sequence of operations
Solution 2 : Microcode
process too but designers not programmers Relatively error free more thorough design Requires many cycles but fetched and used locally
Long
Emulation

A very common use of microcoding IBM System/360

32 bit architecture 16-bit registers Most implementations were 8-bit
Secret :

Keep cost low
Heavy microcoding Programmers oblivious
In 1992, International Meta Systems (IMS) announced the 3250

Designed to emulate the x86, 68K, and 6502 architectures Uses customizable microcode, among other techniques Went bust, never released
Another Interesting Note
Writable Control Store

What
if you, a programmer, can write your own control store? Not a mad scientist thought
Implemented in
VAX
8800 PDP-11/60 IBM System/370
Current Trends
Microcode Update Linux Utility - microcode_ctl

Companion
to IA32 microcode driver It decodes and sends new microcode to the kernel driver to be uploaded to Intel IA32 processors Update is volatile lost on reboots
Microcode updates are also rolled into BIOS updates typically

Ready
even before an OS is loaded
Intel Said..
The Pentium(R) Pro processor and Pentium(R) II processor may contain design defects or errors known as errata that may cause the product to deviate from published specifications. Many times, the effects of the errata can be avoided by implementing hardware or software work-arounds, which are documented in the Pentium Pro Processor Specification Update and the Pentium II Processor Specification Update. Pentium Pro and Pentium II processors include a feature called "reprogrammable microcode", which allows certain types of errata to be worked around via microcode updates. The microcode updates reside in the system BIOS and are loaded into the processor by the system BIOS during the Power-On Self Test, or POST.
Current Trends
Hyperthreading in P4
A
second logical CPU Complete state of the system in both CPUs
Microcoding in P4
Two
pointers control flow independently Both processors share the ROM entries Access is alternated between the CPUs
Thank You

Lecture 04 Control Units

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 04 Control Units

Uploaded by

Copyright:

Available Formats

Control Unit :

Hardwired vs. Microprogrammed Approach

Dr Shankar Balachandran Indian Institute of Technology Madras shankar@cse.iitm.ernet.in 14 October 2006

Two Major Blocks in a CPU

multipliers, dividers Shifters, Registers Anything that changes or stores data

aberration will result in wrong operation

A Simplified Control Unit

Execute Execution Unit Write Back

Write Back Unit

Lets Sample The Signals

Another Way to Generate Signals 1000 0100 0010

ADD SUB MBA JMP JN

1. MAR PC 2. MDR M(MAR) 3. IR MDR

SUB MBA JMP JN

Table with Sequencing

Fetch T2 LDA STA MBA ADD SUB JMP JN T3 T3 *F

IP = T2; LP = T3*JMP+T3*JN*NF; EP = T0; LM = T0+T3*LDA+T3*STA

R=T1+T4*LDA; W=T5* STA; LD = T4*STA; ED=T2+T5*LDA;

LI=T2; A = T3*ADD; S = T3*SUB; ..

A well known process and design flows are widespread

Starting Address Generator

Control ROM Jump Address

MAP HLT Addr. Of Next

Example 1 MBA followed by ADD

0011000000000000 0000100000000000 1000000110000000 0001000001000000

Sequence for MBA,ADD

1. MAR PC 2. MDR M(MAR) 3. IR MDR BA 1. MAR PC 2. MDR M(MAR) 3. IR MDR AALU(Add)

0011000000000000 0000100000000000 1000000110000000 0000000000010001

Example 2 JN with Flag Set

If negative FLAG is set, jump to a new location by skipping to uInstruction at 0F

Example 3 JN with Flag Not Set

Lets Review the Microprogramming Model

Michael Slater's "Microprocessor Based Design" (pg.42):

Reason for Clumsiness

Live with condition checks

designs as clean as possible

Starting and Branch Address Generator

choice is as best as your neighbors

bit position per control signal Order of the bits ?

result in long microinstructions

Not the number of microinstructions, but the width

A Note About Density

similar resources together

Encoding reduces the bit-space

requires decoders decoder cost is very low

Cost of decoder vs bit-space

area More common currently

the first few microinstructions differ

Can we share microcode?

A Powerful Technique in Sharing

Compare that with :

A Pat on Our Back

We provided explicit field for address

location is now data It is already saved

can get very wide

Can we pipeline microfetch?

They are going to follow you everywhere

Should have a mechanism that can invalidate microcode prefetch

Similar to pipeline flush for instructions

Popular before 60s

Only way people did it Speed Benefits

Tools for Design

state machine optimizer Assigning states, minimizing tranisitions, races, hazards,..

IP = T2; LP = T3JMP+T3JNNF; EP = T0; LM = T0+T3LDA+T3*STA

R=T1+T4LDA; W=T5 STA; LD = T4STA; ED=T2+T5LDA;

LI=T2; A = T3ADD; S = T3SUB; ..