Professional Documents
Culture Documents
Microprogramming
Arvind
M.I.T.
6.823 L4- 2
Arvind
CISC
RISC
VLIW
JVM
microcoded
hardwired, pipelined
software interpretation
microarchitectural style
6.823 L4- 3
Arvind
Microarchitecture:
status
lines
Implementation of an ISA
Controller
control
points
Data
path
6.823 L4- 4
Arvind
op
code
conditional
flip-flop
Next state
address
Matrix A
Matrix B
Decoder
Control lines to
ALU, MUXs, Registers
September 21, 2005
6.823 L4- 5
Arvind
Microcoded Microarchitecture
busy?
zero?
opcode
holds fixed
microcode instructions
controller
(ROM)
Datapath
Data
Addr
Memory
(RAM)
enMem
MemWrt
6.823 L4- 6
Arvind
Processor State
Data types
6.823 L4- 7
Arvind
5
rs
5
rt
5
rd
5
0
6
func
ALU
ALUi
opcode
rs
rt
immediate
rt (rs) op immediate
Mem
6
opcode
5
rs
5
rt
16
displacement
M[(rs) + displacement]
6
opcode
5
rs
16
offset
6
opcode
5
rs
16
6
opcode
26
offset
BEQZ, BNEZ
JR, JALR
J, JAL
6.823 L4- 8
Arvind
Opcode
ldIR
zero?
OpSel
ldA
Busy?
32(PC)
31(Link)
rd
rt
rs
ldB
rd
rt
rs
IR
ExtSel
Imm
Ext
2
enImm
3
A
ALU
control
32 GPRs
+ PC ...
32-bit Reg
enALU
RegWrt
Memory
MemWrt
enReg
data
Bus
MA
addr
addr
ALU
RegSel
ldMA
data
enMem
32
PC
means RegSel = PC; enReg=yes;
Reg[rt] means RegSel = rt; enReg=yes;
ldMA= yes
ldB = yes
6.823 L4- 9
Arvind
Memory Module
addr
busy
RAM
din
we
Write(1)/Read(0)
Enable
dout
bus
Assumption: Memory operates asynchronously
and is slow as compared to Reg-to-Reg transfers
September 21, 2005
6.823 L4- 10
Arvind
Instruction Execution
Execution of a MIPS instruction involves
1.
2.
3.
4.
5.
instruction fetch
decode and register fetch
ALU operation
memory operation (optional)
write back to register file (optional)
+ the computation of the
next instruction address
6.823 L4- 11
Arvind
Microprogram Fragments
instr fetch:
MA PC
A PC
IR Memory
PC A + 4
dispatch on OPcode
can be
treated as
a macro
ALU:
A Reg[rs]
B Reg[rt]
Reg[rd] func(A,B)
do instruction fetch
ALUi:
A Reg[rs]
B Imm
sign extension ...
Reg[rt] Opcode(A,B)
do instruction fetch
6.823 L4- 12
Arvind
Microprogram Fragments
(cont.)
LW:
A Reg[rs]
B Imm
MA A + B
Reg[rt] Memory
do instruction fetch
J:
A PC
{A[31:28],B[25:0],00}
B IR
PC JumpTarg(A,B)
do instruction fetch
beqz:
A Reg[rs]
If zero?(A) then go to bz-taken
do instruction fetch
bz-taken:
A PC
B Imm << 2
PC A + B
do instruction fetch
JumpTarg(A,B) =
6.823 L4- 13
Arvind
MIPS Microcontroller:
Opcode
zero?
Busy (memory)
first attempt
PC (state)
s
addr
ROM size ?
= 2(opcode+status+s) words
Word size ?
= control+s bits
Program ROM
data
How big
is s?
next
state
6.823 L4- 14
Arvind
zero?
busy
Control points
next-state
fetch0
fetch1
fetch1
fetch2
fetch3
*
*
*
*
*
*
*
*
*
*
*
yes
no
*
*
MA PC
....
IR Memory
A PC
PC A + 4
fetch1
fetch1
fetch2
fetch3
?
fetch3
ALU
PC A + 4
ALU0
*
*
*
*
*
*
A Reg[rs]
ALU1
B Reg[rt]
ALU2
Reg[rd] func(A,B) fetch0
ALU0
ALU1
ALU2
*
*
*
6.823 L4- 15
Arvind
State Op
fetch0
fetch1
fetch1
fetch2
fetch3
fetch3
fetch3
fetch3
fetch3
fetch3
fetch3
fetch3
fetch3
...
ALU0
ALU1
ALU2
September 21, 2005
zero?
busy
Control points
next-state
*
*
*
*
ALU
ALUi
LW
SW
J
JAL
JR
JALR
beqz
*
*
*
*
*
*
*
*
*
*
*
*
*
*
yes
no
*
*
*
*
*
*
*
*
*
*
MA PC
....
IR Memory
A PC
PC A + 4
PC A + 4
PC A + 4
PC A + 4
PC A + 4
PC A + 4
PC A + 4
PC A + 4
PC A + 4
*
*
*
*
*
*
*
*
*
A Reg[rs]
ALU1
B Reg[rt]
ALU2
Reg[rd] func(A,B) fetch0
fetch1
fetch1
fetch2
fetch3
ALU0
ALUi0
LW0
SW0
J0
JAL0
JR0
JALR0
beqz0
6.823 L4- 16
Arvind
zero?
busy
Cont.
Control points
next-state
*
sExt
uExt
*
*
*
*
*
*
*
*
*
A Reg[rs]
B sExt16(Imm)
B uExt16(Imm)
Reg[rd] Op(A,B)
ALUi1
ALUi2
ALUi2
fetch0
*
*
*
*
*
*
*
*
*
A PC
J1
B IR
J2
PC JumpTarg(A,B) fetch0
*
*
*
*
*
*
yes
no
*
*
*
*
*
*
*
A Reg[rs]
A PC
....
B sExt16(Imm)
PC A+B
beqz1
beqz2
fetch0
beqz3
fetch0
JumpTarg(A,B) = {A[31:28],B[25:0],00}
6.823 L4- 17
Arvind
/
w
PC
addr
size = 2(w+s) x (c + s)
Control signals
Control ROM
/ s
next PC
data
/ c
MIPS:
w = 6+2
c = 17
s=?
no. of steps per opcode = 4 to 6 + fetch-sequence
no. of states (4 steps per op-group ) x op-groups
+ common sequences
= 4 x 8 + 10 states = 42 states s = 6
Control ROM = 2(8+6) x 23 bits 48 Kbytes
September 21, 2005
6.823 L4- 18
Arvind
6.823 L4- 19
Arvind
MIPS Controller V2
Opcode
ext
input encoding
reduces ROM height
PC
PC (state)
address
JumpType =
next | spin
| fetch | dispatch
| feqz | fnez
+1
PCSrc
jump
logic
zero
busy
Control ROM
data
PC+1
next-state encoding
reduces ROM width
6.823 L4- 20
Arvind
Jump Logic
next
PC+1
spin
fetch
absolute
dispatch
op-group
feqz
fnez
6.823 L4- 21
Arvind
State
Control points
fetch0
fetch1
fetch2
fetch3
...
ALU0
ALU1
ALU2
MA PC
IR Memory
A PC
PC A + 4
next
spin
next
dispatch
A Reg[rs]
B Reg[rt]
Reg[rd]func(A,B)
next
next
fetch
ALUi0
ALUi1
ALUi2
A Reg[rs]
B sExt16(Imm)
Reg[rd] Op(A,B)
next
next
fetch
next-state
6.823 L4- 22
Arvind
MIPS-Controller-2
State
Control points
next-state
LW0
LW1
LW2
LW3
LW4
A Reg[rs]
B sExt16(Imm)
MA A+B
Reg[rt] Memory
next
next
next
spin
fetch
SW0
SW1
SW2
SW3
SW4
A Reg[rs]
B sExt16(Imm)
MA A+B
Memory Reg[rt]
next
next
next
spin
fetch
Branches:
6.823 L4- 23
Arvind
MIPS-Controller-2
State
Control points
BEQZ0
BEQZ1
BEQZ2
BEQZ3
BEQZ4
A Reg[rs]
BNEZ0
BNEZ1
BNEZ2
BNEZ3
BNEZ4
A Reg[rs]
next-state
next
fnez
A PC
next
B sExt16(Imm<<2) next
PC A+B
fetch
next
feqz
A PC
next
B sExt16(Imm<<2) next
PC A+B
fetch
6.823 L4- 24
Arvind
Jumps:
MIPS-Controller-2
State
Control points
J0
J1
J2
A PC
next
B IR
next
PC JumpTarg(A,B) fetch
JR0
JR1
A Reg[rs]
PC A
JAL0
JAL1
JAL2
JAL3
A PC
next
Reg[31] A
next
B IR
next
PC JumpTarg(A,B) fetch
JALR0
JALR1
JALR2
JALR3
A PC
B Reg[rs]
Reg[31] A
PC B
next-state
next
fetch
next
next
next
fetch
25
6.823 L4- 26
Arvind
Implementing Complex
Instructions
Opcode
ldIR
zero?
OpSel
ldA
Busy?
32(PC)
31(Link)
rd
rt
rs
ldB
IR
ExtSel
2
Imm
Ext
enImm
rd
rt
rs
3
A
ALU
control
32 GPRs
+ PC ...
data
Bus
Memory
MemWrt
enReg
data
enMem
32
rd M[(rs)] op (rt)
RegWrt
32-bit Reg
enALU
MA
addr
addr
ALU
RegSel
ldMA
Reg-Memory-src ALU op
Reg-Memory-dst ALU op
Mem-Mem ALU op
6.823 L4- 27
Arvind
MIPS-Controller-2
Mem-Mem ALU op
ALUMM0
ALUMM1
ALUMM2
ALUMM3
ALUMM4
ALUMM5
ALUMM6
MA Reg[rs]
next
A Memory
spin
MA Reg[rt]
next
B Memory
spin
MA Reg[rd]
next
Memory func(A,B) spin
fetch
6.823 L4- 28
Arvind
Performance Issues
Microprogrammed control
multiple cycles per instruction
Cycle time ?
tC > max(treg-reg, tALU, tROM, tRAM)
Given complex control, tALU & tRAM can be broken
into multiple cycles. However, tROM cannot be
broken down. Hence
tC > max(treg-reg, tROM)
Suppose 10 * tROM < tRAM
Good performance, relative to the single-cycle
hardwired implementation, can be achieved
even with a CPI of 10
September 21, 2005
6.823 L4- 29
Arvind
Nanocoding
Tries to combine best of horizontal and vertical code
September 21, 2005
6.823 L4- 30
Arvind
Nanocoding
Exploits recurring
in code, e.g.,
ALU0 A Reg[rs]
...
ALUi0 A Reg[rs]
...
PC (state)
code
next-state
address
code ROM
nanoaddress
nanoinstruction ROM
data
MC68000 had 17-bit code containing either 10-bit jump or 9bit nanoinstruction pointer
Nanoinstructions were 68 bits wide, decoded to give 196
control signals
September 21, 2005
6.823 L4- 31
Arvind
6.823 L4- 32
Arvind
M40
M50
M65
16
32
64
inst width
(bits)
50
52
85
87
code size
(K minsts)
2.75
2.75
store
technology
CCROS
TCROS
BCROS
BCROS
store cycle
(ns)
750
625
500
200
memory
cycle (ns)
Rental fee
($K/month)
1500
2500
2000
750
15
35
Datapath
width (bits)
6.823 L4- 33
Arvind
Microcode Emulation
6.823 L4- 34
Arvind
6.823 L4- 35
Arvind
User-WCS failed
6.823 L4- 36
Arvind
Microprogramming:
late seventies
6.823 L4- 37
Arvind
Modern Usage
Microprogramming is far from extinct
Played a crucial role in micros of the Eighties
38
Thank you !