You are on page 1of 7

The Big Picture: Where are We Now?

The Five Classic Components of a Computer

ECE4680 Computer Organization and Architecture Designing a Single Cycle Datapath

Processor Input Control Memory Datapath

Output

Processor Design: How to Implement MIPS Simplicity favors regularity


Todays Topic: Datapath Design What is data? What is datapath?

ECE4680 Datapath.1

2003-3-19

ECE4680 Datapath.2

2003-3-19

The Big Picture: The Performance Perspective


Performance of a machine was determined by: Instruction count Clock cycle time Clock cycles per instruction Processor design (datapath and control) will determine: Clock cycle time Clock cycles per instruction In the next two lectures: Single cycle processor: Advantage: One clock cycle per instruction Disadvantage: long cycle time

The MIPS Instruction Formats


All MIPS instructions are 32 bits long. The three instruction formats:
31 26 op 6 bits 31 op 6 bits 31 op 6 bits 26 target address 26 bits 26 rs 5 bits rs 5 bits 21 rt 5 bits 21 rt 5 bits 16 immediate 16 bits 0 16 rd 5 bits 11 shamt 5 bits 6 funct 6 bits 0 0

R-type I-type J-type

The different fields are: op: operation of the instruction rs, rt, rd: the source and destination register specifiers shamt: shift amount funct: selects the variant of the operation in the op field address / immediate: address offset or immediate value target address: target address of the jump instruction

ECE4680 Datapath.3

2003-3-19

ECE4680 Datapath.4

2003-3-19

The MIPS Subset


ADD and subtract add rd, rs, rt sub rd, rs, rt OR Immediate: ori rt, rs, imm16 LOAD and STORE lw rt, rs, imm16 sw rt, rs, imm16 BRANCH: beq rs, rt, imm16 JUMP: j target
31 26 op 6 bits target address 26 bits
2003-3-19

An Abstract View of the Implementation


26 op 6 bits rs 5 bits 21 rs 5 bits 21 rt 5 bits 16 rt 5 bits immediate 16 bits 16 rd 5 bits 11 shamt 5 bits 6 funct 6 bits 0 Clk PC Instruction Address Ideal Instruction Memory Instruction bus Rd Rs 5 5 Rt 5 0
Two types of functional units Operational element that operate on data (combinational State element that contain data (sequential) Generic Implementation: use PC to supply instruction address get the instruction from memory read registers use the instruction to decide exactly what to do All instructions use the ALU after reading the registers

31

31

26 op 6 bits

Imm Why? memory-reference? arithmetic? control flow? 16 Data Address Data In Clk

32 Clk 0

Rw Ra Rb 32 32-bit Registers

32 32 ALU 32

Ideal Data Memory

DataOut

Next step: to fill in the details: more units, more connections, and control unit
ECE4680 Datapath.5 ECE4680 Datapath.6 2003-3-19

State Elements
Unclocked vs. Clocked Clocks used in synchronous logic when should an element that contains state be updated? falling edge

An unclocked state element


The set-reset latch output depends on present inputs and also on past inputs

__

cycle time rising edge

ECE4680 Datapath.7

2003-3-19

ECE4680 Datapath.8

2003-3-19

Latches and Flip-flops


Output is equal to the stored value inside the element (don't need to ask for permission to look at the value) Change of state (value) is based on the clock Latches: whenever the inputs change, and the clock is asserted Flip-flop: state changes only on a clock edge (edge-triggered methodology) "logically true", could mean electrically low A clocking methodology defines when signals can be read and written wouldn't want to read a signal at the same time it was being written

D-latch and D flip-flop


Two inputs: the data value to be stored (D) the clock signal (C) indicating when to read & store D Output changes when C is high
C Q

_ Q D

Output changes only on the clock edge

D C

D latch

D C

Q D latch _ Q

Q _ Q
D

C
Q

ECE4680 Datapath.9

2003-3-19

ECE4680 Datapath.10

2003-3-19

Clocking Methodology (Appendix B.7)


Clk Setup Hold Dont Care Clk . . . . . . . . . . . . Setup Hold

An Abstract View of the Critical Path


Register file and ideal memory: The CLK input is a factor ONLY during write operation During read operation, behave as combinational logic: - Address valid => Output valid after access time.
PC Instruction Address Ideal Instruction Memory Instruction bus Rd Rs 5 5 Rt 5 Imm 16 Critical Path (Load Operation) = PCs prop time + Instruction Memorys Access Time + Register Files Access Time + ALU to Perform a 32-bit Add + Data Memory Access Time + Setup Time for Register File Write + Clock Skew Data Address Data In Clk
2003-3-19

All storage elements are clocked by the same clock edge Edge-trigged: all stored values are updated on a clock edge Cycle Time = Latch Prop + Longest Delay Path + Setup + Clock Skew (Latch Prop + Shortest Delay Path - Clock Skew) > Hold Time
32 Clk

Rw Ra Rb 32 32-bit Registers

32 32 ALU 32

Ideal Data Memory

DataOut

ECE4680 Datapath.11

2003-3-19

ECE4680 Datapath.12

The Steps of Designing a Processor


Instruction Set Architecture => Register Transfer Language Register Transfer Language (RTL) => Datapath components Datapath interconnect Datapath components => Control signals Control signals => Control logic

What is RTL: The ADD Instruction


Register Transfer Language

add

rd, rs, rt Fetch the instruction from memory The ADD operation Calculate the next instructions address

mem[PC] R[rd] PC R[rs] + R[rt] PC + 4

ponent Element < com

ECE4680 Datapath.13

2003-3-19

ECE4680 Datapath.14

2003-3-19

What is RTL: The Load Instruction


lw

Combinational Logic Elements


Adder
CarryIn A 32 Adder

rt, rs, imm16 mem[PC] Addr Fetch the instruction from memory MUX (p.B-9,B-19) R[rs] + SignExt(imm16) Calculate the memory address
A B

32

Sum Carry

32 Select 32 32 OP

Decoder
out0 out1 out2 out7

Decoder

MUX

R[rt] PC

Mem[Addr] PC + 4

Load the data into the register


B

32

Calculate the next instructions address

ALU

32

32

Result Zero

In which cases do we need an adder, ALU, MUX or Decoder?

ALU

B
ECE4680 Datapath.15 2003-3-19 ECE4680 Datapath.16

32

2003-3-19

Storage Element: Register (p.B22-B25)


Register Similar to the D Flip Flop except - N-bit input and output - Write Enable input Write Enable: - 0: Data Out will not change Clk - 1: Data Out will become Data In Array of logical elements(see register file on next 2 slides)
signal is set to 1.

Storage Element: Register File


Write Enable Data In N Data Out N

Register File consists of 32 registers: Two 32-bit output busses: busA and busB One 32-bit input bus: busW Register is selected by: RA selects the register to put on busA RB selects the register to put on busB RW selects the register to be written via busW when Write Enable is 1 Clock input (CLK)

RW RA RB Write Enable 5 5 5 busA busW 32 Clk 32 32-bit Registers 32 busB 32

The content is upd

ated at the cloc

Write Enable k tick ONLY if the

The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: - RA or RB valid => busA or busB valid after access time.

ECE4680 Datapath.17

2003-3-19

ECE4680 Datapath.18

2003-3-19

Storage Element: Register File -- Detailed diagram


RW RA RB Write Enable 5 5 5 busA busW 32 Clk 32 32-bit Registers 32 busB 32

Storage Element: Idealized Memory


Write Enable Address

Memory (idealized) One input bus: Data In One output bus: Data Out

Data In 32 Clk

DataOut 32

Write Enable
C D C D

RA RB

0 1

Register 0 Register 1

RW

32-to-1 Decoder
30 31

M U X

Memory word is selected by: Address selects the word to put on Data Out Write Enable = 1: address selects the memory memory word to be written via the Data In bus Clock input (CLK) The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: Address valid => Data Out valid after access time.

busA

C D C

Register 30 Register 31

busW

Clk

M U X
busB

ECE4680 Datapath.19

2003-3-19

ECE4680 Datapath.20

2003-3-19

Overview of the Instruction Fetch Unit (Fig. 5.5)


The common RTL operations Fetch the Instruction: mem[PC] Update the program counter: Sequential Code: PC <- PC + 4 Branch and Jump PC <- something else

RTL: The ADD Instruction


31 op 6 bits 26 rs 5 bits 21 rt 5 bits 16 rd 5 bits 11 shamt 5 bits 6 funct 6 bits 0

add

rd, rs, rt Fetch the instruction from memory The actual operation Calculate the next instructions address

mem[PC] R[rd]
Next Address Logic Address Instruction Memory

Clk

PC

R[rs] + R[rt] PC + 4

PC

Instruction Word 32

ECE4680 Datapath.21

2003-3-19

ECE4680 Datapath.22

2003-3-19

RTL: The Subtract Instruction


31 op 6 bits 26 rs 5 bits 21 rt 5 bits 16 rd 5 bits 11 shamt 5 bits 6 funct 6 bits 0

Datapath for Register-Register Operations


R[rd] <- R[rs] op R[rt] Example: add rd, rs, rt Ra, Rb, and Rw comes from instructions rs, rt, and rd fields ALUctr and RegWr: control logic after decoding the instruction
31 26 op 6 bits rs 5 bits 21 rt 5 bits 16 rd 5 bits 11 shamt 5 bits 6 funct 6 bits 0

sub

rd, rs, rt Fetch the instruction from memory The actual operation Calculate the next instructions address

mem[PC] R[rd] PC R[rs] - R[rt] PC + 4

Rd Rs Rt RegWr 5 5 5 Rw Ra Rb busW 32 Clk 32 32-bit Registers busA 32 busB 32

ALUctr

ALU

Result 32

ECE4680 Datapath.23

2003-3-19

ECE4680 Datapath.24

2003-3-19

Register-Register Timing
Clk PC Old Value Clk-to-Q New Value Old Value Old Value Old Value Old Value Old Value Instruction Memory Access Time New Value Delay through Control Logic New Value New Value Register File Access Time New Value ALU Delay New Value

RTL: The OR Immediate Instruction


31 op 6 bits 26 rs 5 bits 21 rt 5 bits 16 immediate 16 bits 0

Rs, Rt, Rd, Op, Func ALUctr RegWr busA, B busW

ori

rt, rs, imm16 Fetch the instruction from memory

mem[PC] R[rt]

R[rs] or ZeroExt(imm16) The OR operation PC + 4


31

Rd Rs Rt RegWr 5 5 5 busW 32 Clk Rw Ra Rb 32 32-bit Registers busA 32 busB 32

PC
ALUctr Register Write Occurs Here

Calculate the next instructions address


16 15 immediate 16 bits 0

ALU

Result 32

0000000000000000 16 bits

ECE4680 Datapath.25

2003-3-19

ECE4680 Datapath.26

2003-3-19

Datapath for Logical Operations with Immediate


R[rt] <- R[rs] op ZeroExt[imm16]]
31 26 op 6 bits 21 rs 5 bits 16 rt 5 bits immediate 16 bits

RTL: The Load Instruction


31 26 op 6 bits 21 rs 5 bits 16 rt 5 bits immediate 16 bits 0

Example: ori

rt, rs, imm16


0

lw

rt, rs, imm16 mem[PC] Addr R[rt] PC

Fetch the instruction from memory

Rd RegDst Mux RegWr 5 busW 32 Clk

Rt Dont Care Rs (Rt) 5 5 busA 32 busB Mux 32 ZeroExt

Newly added parts are in blue color.

R[rs] + SignExt(imm16) Mem[Addr] PC + 4


31

ALUctr

Calculate the memory address Load the data into the register Calculate the next instructions address
0 immediate 16 bits 0 immediate 16 bits
2003-3-19

Rw Ra Rb 32 32-bit Registers

Result 32 16 15 0 0000000000000000 16 bits 31 16 15 1111111111111111 1 16 bits


2003-3-19 ECE4680 Datapath.28

ALU

imm16

16

32 ALUSrc

ECE4680 Datapath.27

Datapath for Load Operations


R[rt] <- Mem[R[rs] + SignExt[imm16]]
31 op 6 bits Rd RegDst Mux RegWr 5 busW 32 Clk Rs 5 5 Rt 26 rs 5 bits 21 rt 5 bits 16 immediate 16 bits

RTL: The Store Instruction


rt, rs, imm16
0 31 op 6 bits 26 rs 5 bits 21 rt 5 bits 16 immediate 16 bits 0

Example: lw

sw

rt, rs, imm16 Fetch the instruction from memory

mem[PC]
Dont Care (Rt) busA 32 busB 32 Extender 32 ALUSrc ALU ALUctr MemtoReg

Addr

R[rs] + SignExt(imm16) Calculate the memory address

Rw Ra Rb 32 32-bit Registers

Mux

32 MemWr WrEn Adr

32

Mem[Addr] PC PC + 4

R[rt]

Store the register into memory Calculate the next instructions address

Mux Data In 32 Clk

imm16

16

Data Memory

ECE4680 Datapath.29

ExtOp

2003-3-19

ECE4680 Datapath.30

2003-3-19

Datapath for Store Operations


Mem[R[rs] + SignExt[imm16] <- R[rt]]
31 op 6 bits Rd RegDst Mux RegWr 5 busW 32 Clk Rs 5 5 busA 32 busB 32 Extender 32 ALUSrc Clk Rt ALUctr MemWr ALU MemtoReg Rt 26 rs 5 bits 21 rt 5 bits 16 immediate 16 bits

RTL: The Branch Instruction


rt, rs, imm16
0 31 op 6 bits 26 rs 5 bits 21 rt 5 bits 16 immediate 16 bits 0

Example: sw

beq

rs, rt, imm16 Fetch the instruction from memory Calculate the branch condition Calculate the next instructions address

mem[PC] Cond R[rs] - R[rt]

Rw Ra Rb 32 32-bit Registers

if (COND eq 0)
Mux 32 32 WrEn Adr Data Memory

Data In 32

- PC else - PC

PC + 4 + ( SignExt(imm16) x 4 ) PC + 4

Mux

imm16

16

ECE4680 Datapath.31

ExtOp

2003-3-19

ECE4680 Datapath.32

2003-3-19

Datapath for Branch Operations


beq rs, rt, imm16
31 op 6 bits Rd RegDst Mux RegWr 5 busW 32 Clk Rs 5 5 busA 32 busB 32 Extender 32 ALUSrc
ECE4680 Datapath.33

Binary Arithmetic for the Next Address


In theory, the PC is a 32-bit byte address into the instruction memory: Sequential operation: PC<31:0> = PC<31:0> + 4 Branch operation: PC<31:0> = PC<31:0> + 4 + SignExt[Imm16] * 4
PC Clk

We need to compare Rs and Rt!


21 rs 5 bits rt 5 bits 16 immediate 16 bits Branch Rt 0

26

Rt

The magic number 4 always comes up because: The 32-bit PC is a byte address And all our instructions are 4 bytes (32 bits) long In other words: The 2 LSBs of the 32-bit PC are always zeros There is no reason to have hardware to keep the 2 LSBs In practice, we can simplify the hardware by using a 30-bit PC<31:2>: Sequential operation: PC<31:2> = PC<31:2> + 1 Branch operation: PC<31:2> = PC<31:2> + 1 + SignExt[Imm16] In either case: Instruction Memory Address = PC<31:2> concat 00

ALUctr

imm16 16

Next Address Logic

Rw Ra Rb 32 32-bit Registers

ALU

Zero

To Instruction Memory

Mux

imm16

16

ExtOp

2003-3-19

ECE4680 Datapath.34

2003-3-19

Next Address Logic: Expensive and Fast Solution


Using a 30-bit PC: Sequential operation: PC<31:2> = PC<31:2> + 1 Branch operation: PC<31:2> = PC<31:2> + 1 + SignExt[Imm16] In either case: Instruction Memory Address = PC<31:2> concat 00
30 Addr<31:2> 30 30 1 Clk imm16 Instruction<15:0> 16 SignExt 00 0 30 Adder Mux Adder Addr<1:0> Instruction Memory 32 PC
ECE4680 Datapath.35

Next Address Logic: Cheap and Slow Solution


Why is this slow? Cannot start the address add until Zero (output of ALU) is valid Does it matter that this is slow in the overall scheme of things? Probably not here. Critical path is the load operation.
30 30 0 Clk imm16 Instruction<15:0> 16 0 Mux Addr<31:2> Addr<1:0> Instruction Memory 32 PC

1 Carry In Adder

00

1 30

1 30 30

30

SignExt

30 Instruction<31:0> Branch Zero

Instruction<31:0> Branch Zero

2003-3-19

ECE4680 Datapath.36

2003-3-19

RTL: The Jump Instruction


j
31 op 6 bits 26 target address 26 bits 0

Instruction Fetch Unit


target PC<31:2> PC<31:28> concat target<25:0>
30 PC<31:28> 30 4 30 26 0 Mux 00 1 Mux Addr<31:2> Addr<1:0> Instruction Memory 32

target mem[PC] PC<31:2> Fetch the instruction from memory PC<31:28> concat target<25:0> Calculate the next instructions address
Clk imm16 Instruction<15:0> PC

Target Instruction<25:0> 30 1 Adder

30 Adder

1 30

Jump

Instruction<31:0>

SignExt

16

30

Branch

Zero

This is the whole design of Instruction Fetch Unit: 3 inputs: jump, Branch and Zero; 1 output: instruction word.
ECE4680 Datapath.37 2003-3-19 ECE4680 Datapath.38 2003-3-19

Putting it All Together: A Single Cycle Datapath


We have everything except control signals (underline)

Where to get more information?


To be continued ...

Branch Rd RegDst Rt Rs 5 5 busA 32 0 Mux ALU Rt Jump Clk Instruction Fetch Unit

Instruction<31:0> <11:15> <21:25> <16:20> <0:15>

1 Mux 0 RegWr 5 ALUctr

Rt Zero

Rs

Rd

Imm16 MemtoReg 0 Mux

busW 32 Clk

Rw Ra Rb 32 32-bit Registers busB 32 Extender

MemWr

32 32 WrEn Adr

1 32 ALUSrc

imm16

Data In 32 Clk

16

Data Memory

ExtOp
ECE4680 Datapath.39 2003-3-19 ECE4680 Datapath.40 2003-3-19

You might also like