You are on page 1of 49

Control Unit :

Hardwired vs. Microprogrammed Approach

Dr Shankar Balachandran Indian Institute of Technology Madras shankar@cse.iitm.ernet.in 14 October 2006

Two Major Blocks in a CPU

Datapath
Adders,

multipliers, dividers Shifters, Registers Anything that changes or stores data

Control Unit
Controls

the data How data is stored? Where is it stored? When should data be available?

Control Unit
Correct sequencing of control signals Much like human brain controlling various parts of body Sequence and timing is the key

Any

aberration will result in wrong operation

A Simplified Control Unit


Fetch Fetch Unit Decode Decode Unit

Control Unit

Execute Execution Unit Write Back

Write Back Unit

A Possible Implementation

Mod-3 Counter

2 to 4 Decoder

CLK

Timing Diagram
CLK

Fetch

Decode

Execute

Write Back

Lets Sample The Signals

1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 1

Another Way to Generate Signals 1000 0100 0010

0001

Hardwired vs Microprogrammed

Hardwired
Use

gates to generate signals Squeeze out the juice for performance Different logic styles possible

Microprogrammed
Store

the control signals in the sequence Just read from the memory every clock cycle

A Model Computer
(Richard Eckert, SIGCSE Bulletin, Vol. 20, No. 3, September 1988)
IP LP EP 8 PC 12 12 LM MAR ALU 12 Accumulator 12 S A EU 12 RAM 12 Register B LB LI EI 4 Bus Control LA EA

8
R W

12
LD ED 12 MDR IR

More Details
L = Load E = Copy to bus A,S = Add and Subtract Sign bit to control unit IP = Increment PC

IP LP EP

PC

ACC

LA EA

S LM

MAR

ALU

A EU

R W

RAM B

LB

LD ED

MDR
Bus

IR

LI EI

Control

Mnemonic Opcode
LDA
Load Accumulator

Action
A(Mem)

Register Transfers
1. MAR IR 2. MDR M(MAR) 3. A MDR 1. MAR IR 2.MDR A 3. M(MAR) MDR 1. AALU(Add) 1. AALU(Sub) 1. BA 1. PCIR 1. PCIR if NF is set

Active Controls
EI,LM R ED,LA EI,LM EA,LD W A,EU,LA S,EU,LA EA,LB EI,LP NF : EI,LP

STA
Store Accumulator

2 3 4 5 6 7

(Mem) A A A+B A A-B B A PC Mem PC Mem If ve flag is set Stop Clock IR Next Instruction

ADD SUB MBA JMP JN

HLT Fetch

8-15

1. MAR PC 2. MDR M(MAR) 3. IR MDR

EP,LM R ED,LI,IP

Hardwired Unit
IR
Opcode
LDA STA ADD CLK

Ring Counter
T1

T5

Decoder

SUB MBA JMP JN

Control Matrix

Halt

NF

Control Signals

Table with Sequencing


IP LP EP LM R W LD ED LI EI LA EA A S EU LB

Fetch T2 LDA STA MBA ADD SUB JMP JN T3 T3 *F

T0

T0 T3 T3

T1 T4 T5 T4

T2 T5

T2 T3 T3 T5 T4 T3 T3 T3 T3 T3 *F T3 T3 T3 T3 T3

IP = T2; LP = T3*JMP+T3*JN*NF; EP = T0; LM = T0+T3*LDA+T3*STA

R=T1+T4*LDA; W=T5* STA; LD = T4*STA; ED=T2+T5*LDA;

LI=T2; A = T3*ADD; S = T3*SUB; ..

Control Matrix
Implement using discrete gates Usually done using PLAs Large control matrices are implemented hierarchically

For

speed

A well known process and design flows are widespread

An Alternate Implementation
4-bit opcode IR MAP

Starting Address Generator

1*
01 00

&

CD

NF

CLK

uPC
32 x 24

Map 1 0

CD * 0

Meaning From IR Unconditional Branch within Microprogram NF=0 => Increment NF=1 => Conditional Branch

+1 Control Store

Control ROM Jump Address

Microinstruction Register

HLT

Control

Control Store
Instruction Op-Code
Fetch 0

uInstruction Address
00 01 02 03 04 05

Control Signals
0011000000000000 0000100000000000 1000000110000000 0001000001000000 0000100000000000 0000000100100000 0001000001000000

CD
0 0 0 0 0 0 0

MAP HLT Addr. Of Next


0 0 1 0 0 0 0 0 0 0 0 0 0 0 01 02 XX 04 05 00 07

LDA

STA

06

07
08 ADD SUB MBA JMP JN 3 4 5 6 7 09 0A 0B 0C 0D 0E 0F

0000001000010000
0000010000000000 0000000000101010 0000000000100110 0000000000010001 0100000001000000 0000000000000000 0000000000000000 0100000001000000 0000000000000000

0
0 0 0 0 0 1 0 0 0

0
0 0 0 0 0 0 0 0 0

0
0 0 0 0 0 0 0 0 1

08
00 00 00 00 00 0F 00 00 XX

Expansion
HLT

8-E
F

10-1E
1F

Control Word

Example 1 MBA followed by ADD


00 Fetch 0 01 02 LDA 1 03

I P

L P

E P

L M

L D

E D

L I

E I

L A

E A

E U

L B

0011000000000000 0000100000000000 1000000110000000 0001000001000000

0 0 0 0

0 0 1 0

0 0 0 0

01 02 XX 0B 09 04

04
05 STA 2 06 07 08 ADD SUB MBA JMP 3 4 5 6 09 0A 0B 0C

0000100000000000
0000000100100000 0001000001000000 0000001000010000 0000010000000000 0000000000101010 0000000000100110 0000000000010001 0100000001000000

0
0 0 0 0 0 0 0 0

0
0 0 0 0 0 0 0 0

0
0 0 0 0 0 0 0 0

05
00 07 08 00 00 00 00 00

JN

0D
0E 0F

0000000000000000
0000000000000000 0100000001000000 0000000000000000

1
0 0 0

0
0 0 0

0
0 0 1

0F
00 00 XX

Expansion HLT

8-E F

10-1E 1F

Sequence for MBA,ADD


MOV B,A

ADD

1. MAR PC 2. MDR M(MAR) 3. IR MDR BA 1. MAR PC 2. MDR M(MAR) 3. IR MDR AALU(Add)

0011000000000000 0000100000000000 1000000110000000 0000000000010001

0011000000000000
0000100000000000 1000000110000000 0000000000101010

I P

L P

E P

L M

L D

E D

L I

E I

L A

E A

E U

L B

CD

Example 2 JN with Flag Set


0
0 1 0 0 0 0 0 0

00
Fetch 0 01 02 LDA 1 03 04 05 STA 2 06 07 08

0011000000000000
0000100000000000 1000000110000000 0001000001000000 0000100000000000 0000000100100000 0001000001000000 0000001000010000 0000010000000000

0
0 0 0 0 0 0 0 0

0
0 0 0 0 0 0 0 0

01
02 XX 04 05 00 07 08 00

0D

ADD
SUB MBA JMP JN

3
4 5 6 7

09
0A 0B 0C 0D 0E 0F

0000000000101010
0000000000100110 0000000000010001 0100000001000000 0000000000000000 0000000000000000 0100000001000000 0000000000000000

0
0 0 0 1 0 0 0

0
0 0 0 0 0 0 0

0
0 0 0 0 0 0 1

00
00 00 00 0F 00 00 XX

Expansion HLT

8-E F

10-1E 1F

If negative FLAG is set, jump to a new location by skipping to uInstruction at 0F

I P

L P

E P

L M

L D

E D

L I

E I

L A

E A

E U

L B

CD

Example 3 JN with Flag Not Set


0
0 1 0 0 0 0 0 0

00
Fetch 0 01 02 LDA 1 03 04 05 STA 2 06 07 08

0011000000000000
0000100000000000 1000000110000000 0001000001000000 0000100000000000 0000000100100000 0001000001000000 0000001000010000 0000010000000000

0
0 0 0 0 0 0 0 0

0
0 0 0 0 0 0 0 0

01
02 XX 04 05 00 07 08 00

0D

ADD
SUB MBA JMP JN

3
4 5 6 7

09
0A 0B 0C 0D 0E 0F

0000000000101010
0000000000100110 0000000000010001 0100000001000000 0000000000000000 0000000000000000 0100000001000000 0000000000000000

0
0 0 0 1 0 0 0

0
0 0 0 0 0 0 0

0
0 0 0 0 0 0 1

00
00 00 00 0F 00 00 XX

Expansion HLT

8-E F

10-1E 1F

Lets Review the Microprogramming Model


Store the microprogram in control store Fetch the instruction Get the set of control signals from the control word Move the microinstruction address Lather, Rinse, Repeat

What is Microcode?

Michael Slater's "Microprocessor Based Design" (pg.42):


Microcode tells the processor every detailed step required to execute each machine language instruction. Microcode is thus at an even more detailed level than machine language, and in fact defines the machine language. In a standard microprocessor, the microcode is stored in a ROM or a programmable logic array (PLA) that is part of the microprocessor chip and cannot be modified by the user.'

Thought Experiment
Why is the design a little clumsy? What can we do about it?

Reason for Clumsiness


JN Conditional Flag check Without any condition check, the whole process is very smooth Solution Avoid all conditional checks

Real Life
A little American Football Story Theory vs. Practice

In

theory, there is no difference between theory and practice In practice, theory and practice are two different things altogether

Live with condition checks


Keep

designs as clean as possible

A General Approach
IR

Starting and Branch Address Generator

External Inputs
Conditional Codes

uPC

Control Store

Control Word

Format of Microinstructions

Pick yours
Your

choice is as best as your neighbors

What we did :
One

bit position per control signal Order of the bits ?


Dont matter
Can

result in long microinstructions

Not the number of microinstructions, but the width

A Note About Density


Observe that only a few bits are set to 1 Poor usage of bit space This scheme is called Horizontal Microprogram Alternate Version : Encode the bits

Vertical

Microprogram

Vertical Microprogram
Encode the bits by grouping similar elements together General Idea :

Group

similar resources together


operations are mutually exclusive

There can be only one source or destination register Read vs Write of memory

Some

Design Issues

Encoding reduces the bit-space


But

requires decoders decoder cost is very low

Cost of decoder vs bit-space


Usually

Another Idea
Group concuurently active signals Every meaningful combination gets a code Complex decoder to interpret every code

Vertical vs Horizontal

Horizontal
Faster More

area More common currently


Cheap transistors

Vertical
Slower
More

microinstructions

Microsequencing
Other ways to save on hardware Every instruction had its own microprogram sequence Also, instructions have several addressing modes

Only

the first few microinstructions differ

Can we share microcode?

A Powerful Technique in Sharing

Bit-ORing

Example Two instructions share some microcode Eventually, must branch The default branch (one instructions) is X0 The other branch is stored at X1 Change the least significant bit(s?) to get a new address Having two conditional branches Store two fields, one for each branch Both very unclean

Compare that with :


Thought Experiment :
What if we provided explicit branch instead of storing next field in our microprogram? Typical instruction set will need a lot of branches Lot of time will be wasted on branching

A Pat on Our Back

We provided explicit field for address


Branch

location is now data It is already saved

Caution :
Microinstruction

can get very wide

Solution :
There

is no free lunch.

Can we pipeline microfetch?

A neat idea :

Why wait till the current micro-op is over? Branch field gives next operation Get the next op External inputs and status flags may change the order What about interrupts?

Caveat :

They are going to follow you everywhere

Should have a mechanism that can invalidate microcode prefetch

Similar to pipeline flush for instructions

Commonly used

Historical Perspectives

Hardwired Logic

Popular before 60s

Only way people did it Speed Benefits

Popular now

Microprogram

Popular in 70s

Memory was slower than CPU No on-chip cache Best way is to store the microcode

Now Depends on who you ask? Extremes of spectrum are harder to find nowadays

Shades of gray :

Tools for Design

Hardwired
Any

state machine optimizer Assigning states, minimizing tranisitions, races, hazards,..

Microcoding
Small

ones can be in binary Large ones Use microassembler


Very useful debug tool Can use microassembler simultaneously with actual hardware development

Hardwired vs Microcoding
Hardwired units are faster and smaller Emulation is easy with microcoding Hardwired design is complex if large Bugs in hardwired design cannot be fixed in field Hardwired control is not suited for loops

Looping

with microcode can be made as fast

Hardwired vs Microcode vs RISC

RISC
Simpler

instruction set Hardwired Implementation

RISC instructions are like microcodes


Instructions

come from I-Cache instead of Control

Store

Difference :
Contents

are not fixed Advantage : Only load what you want on the I-Cache
Keeps size smaller as compared to Control Stores

Microprogram vs Software

Imagine Floating Point Division Solution 1 : Write in software


Long

process Error prone Many fetches repeatedly from memory for the given sequence of operations

Solution 2 : Microcode
process too but designers not programmers Relatively error free more thorough design Requires many cycles but fetched and used locally
Long

Emulation

A very common use of microcoding IBM System/360


32 bit architecture 16-bit registers Most implementations were 8-bit

Secret :

Keep cost low

Heavy microcoding Programmers oblivious

In 1992, International Meta Systems (IMS) announced the 3250


Designed to emulate the x86, 68K, and 6502 architectures Uses customizable microcode, among other techniques Went bust, never released

Another Interesting Note

Writable Control Store


What

if you, a programmer, can write your own control store? Not a mad scientist thought

Implemented in
VAX

8800 PDP-11/60 IBM System/370

Current Trends

Microcode Update Linux Utility - microcode_ctl


Companion

to IA32 microcode driver It decodes and sends new microcode to the kernel driver to be uploaded to Intel IA32 processors Update is volatile lost on reboots

Microcode updates are also rolled into BIOS updates typically


Ready

even before an OS is loaded

Intel Said..
The Pentium(R) Pro processor and Pentium(R) II processor may contain design defects or errors known as errata that may cause the product to deviate from published specifications. Many times, the effects of the errata can be avoided by implementing hardware or software work-arounds, which are documented in the Pentium Pro Processor Specification Update and the Pentium II Processor Specification Update. Pentium Pro and Pentium II processors include a feature called "reprogrammable microcode", which allows certain types of errata to be worked around via microcode updates. The microcode updates reside in the system BIOS and are loaded into the processor by the system BIOS during the Power-On Self Test, or POST.

Current Trends

Hyperthreading in P4
A

second logical CPU Complete state of the system in both CPUs

Microcoding in P4
Two

pointers control flow independently Both processors share the ROM entries Access is alternated between the CPUs

Thank You

You might also like