Professional Documents
Culture Documents
L3 - The Processor-SingleClock
L3 - The Processor-SingleClock
COMPUTER ARCHITECTURE
Bo Yu
Computer Engineering Department
SJSU
Adapted from Computer Organization and Design, 5th Edition, 4th edition, Patterson and Hennessy, MK
and Computer Architecture – A Quantitative Approach, 4th edition, Patterson and Hennessy, MK
Lecture 2 Key Concepts Review
■ Last Lecture
o MIPS Architecture
op rs rt rd shamt funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
0 17 18 8 0 32
000000100011001001000000001000002 = 0232402016
op rs rt constant or address
6 bits 5 bits 5 bits 16 bits
o Immediate arithmetic and load/store instructions
❖ rt: destination or source register number
❖ Constant: –215 to +215 – 1
❖ Address: offset added to base address in rs
● Example of translating MIPS Assembly Language into Machine Language:
A[300] = h + A[300]; $t1 has base address of array A, $s2 corresponds to h
The above assignment statement is compiled into:
Assembly
Language
Machine
Language
Instructions
Machine
Binary
Code
Cmpe 200 Lecture 3 8
Lecture 2: Instructions: Language of the Computer
■ Signed Negation
◼ Complement and add 1
◼ Complement means 1 → 0, 0 → 1
x + x = 1111...1112 = −1
x + 1 = −x
◼ Example: negate +2
◼ +2 = 0000 0000 … 00102
◼ –2 = 1111 1111 … 11012 + 1
= 1111 1111 … 11102
Cmpe 200 Lecture 3 10
Lecture 2: Instructions: Language of the Computer
■ Hexadecimal
◼ Base 16
◼ Compact representation of bit strings
◼ 4 bits per hex digit
■ Logical Operations
● Instructions for bitwise manipulation
● Useful for extracting and inserting groups of bits in a
word
Logical
C Operators Java Operators MIPS Instructions
Operations
Shift left logical << << sll
■ Shift Operations
op rs rt rd shamt funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
■ Conditional Operations
● Branch to a labeled instruction if a condition is true
o Otherwise, continue sequentially
● beq rs, rt, L1
o if (rs == rt) branch to instruction labeled L1;
● bne rs, rt, L1
o if (rs != rt) branch to instruction labeled L1;
● j L1
o unconditional jump to instruction labeled L1
■ Basic Blocks
● C code:
● A basic block is a sequence of instructions with
o No embedded branches (except at end)
o No branch targets (except at beginning)
❖ –1 < +1 $t0 = 1
■ Memory Layout
● Text: program code
● Static data: global variables
o e.g., static variables in C, constant arrays and strings
o $gp initialized to address allowing ±offsets into this
segment
o $sp → starts at 7fff fffc, grows down towards data
segment
o $gp → starts at 1000 8000, moves using positive
and negative 16-bit offset
o Text Segment: MIPS machine code
o Static data: starts at 1000 0000
o Dynamic data: heap grows towards stack
● Dynamic data: heap
o E.g., malloc in C, new in Java
● Stack: automatic storage
Static linking
■ Instruction Execution
● Instruction Fetch
o PC => instruction memory => Instruction register
● Operand Read
o Register Numbers => Register File
o Read Registers
● Execute
o ALU to calculate
o Results for arithmetic operations
o Data memory address for Load/Store
o Branch target address (alternative implementation?)
● Next Instruction
o PC + target address for jumps or PC + 4
Cmpe 200 Lecture 3 24
Lecture 3: The Processor
■ CPU Overview
■ Multiplexers
MUX points
■ Control
MemWrite
Clock
RegWrite
Clock
PC Clock
PC clock
RegWrite clock
PC → Instruction Memory
MemWrite clock
Instruction Fetch
Instruction Decode
Operand Read
State elements: A memory element such as a
Execute
register or a memory, edge-triggered FFs Result Write
Cmpe 200 Lecture 3 28
Lecture 3: The Processor
PC clock
RegWrite clock
MemWrite clock
Cmpe 200 Lecture 3 29
Lecture 3: The Processor
■ Combinational Elements
◼ AND-gate ◼ Adder A
Y
+
◼ Y=A&B ◼ Y=A+B B
A
Y
B
◼ Arithmetic/Logic Unit
◼ Multiplexer ◼ Y = F(A, B)
◼ Y = S ? I1 : I0
A
I0 M
u Y ALU Y
I1 x
B
S F
Cmpe 200 Lecture 3 31
Lecture 3: The Processor
■ Sequential Elements
● Register: stores data in a circuit
o Uses a clock signal to determine when to
update the stored value
o Edge-triggered: update when Clk changes
from 0 to 1
Clk
D Q
D
Clk
Q
■ Sequential Elements
● Register with write control
o Only updates on clock edge when write
control input is 1
o Used when stored value is required later
Clk
D Q Write
Write D
Clk
Q
■ Building a Datapath
● Datapath
o Elements that process data and addresses
in the CPU
❖ Registers, ALUs, mux’s, memories, …
● We will build a MIPS datapath incrementally
o Refining the overview design
■ Data Path
● Instruction Fetch
Increment by
32-bit 4 for next
register instruction
■ R-Format Instructions
● Read two register operands
● Perform arithmetic/logical operation
● Write register result
■ Branch Instructions
● Read register operands
● Compare operands
o Perform ALU op as required for condition testing
o Use ALU, subtract and check Zero output
● Calculate target address based on the
condition
o Sign-extend displacement
o Shift left 2 places (word displacement)
o Add to PC + 4
❖Already calculated by instruction fetch
Examples (with RTL descriptions):
BEQ If [R(rs)==R(rt) then PC PC + signExt(Imm16) || 00
else PC PC + 4
Cmpe 200 Lecture 3 38
Lecture 3: The Processor
■ Branch Instructions
Just
re-routes
wires
Sign-bit wire
replicated
Cmpe 200 Lecture 3 39
Lecture 3: The Processor
Sign
16 Extend 32
zeroExt
RegDst
Add
RegWrite MemWrite
4 MemtoReg
ALUSrc ovf
Instr[25-21] Read Addr 1 zero
Instruction
Register Read Address
Memory Instr[20-16] Read Addr 2 Data 1
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1
Branch
RegDst
MemtoReg
ALUSrc MemWrite
RegWrite
ovf
Instr[25-21] Read Addr 1
Instruction Read Address
Memory Register
Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1
zeroExt Instr[5-0]
■ ALU Control
● ALU used for
o Load/Store: F = add (to compute the address)
o Branch: F = subtract
o R-type: F depends on funct field
■ ALU Control
● Assume 2-bit ALUOp derived from opcode
o Combinational logic derives ALU control
Load/
35 or 43 rs rt address
Store
31:26 25:21 20:16 15:0
Branch 4 rs rt address
31:26 25:21 20:16 15:0
■ J-Type Instruction
Jump 2 address
31:26 25:0
Instruction Opcode funct shamt ALU func ALUContrl RegDst Branch JUMP MemToReg MemWrite ALUsrc RegWrite zeroExt
LW xxxxx ADD 0010 0 0 0 1 0 1 1 0
SW yyyyyy ADD 0010 x 0 0 x 1 1 0 0
BEQ zzzzz SUB 0110 x 1 0 x 0 0 0 0
R-type add 100000 ADD 0010 1 0 0 0 0 0 1 0
sub 100010 SUB 0110 1 0 0 0 0 0 1 0
and 100100 AND 0000 1 0 0 0 0 0 1 0
or 100101 OR 0001 1 0 0 0 0 0 1 0
shift xxxx xxxx SHIFT 1 0 0 0 0 0 1 0
ori 1 0 0 0 0 0 1 1
Jump x x x 1 x 0 x 0 x
■ Performance Issues
● Single Clock
o Functional Block have different delays
o Single clock designs use the clock inefficiently – the clock must be
timed to accommodate the slowest instruction
o Longest delay determines clock period
√ Critical path: load instruction
√ Instruction memory → register file → ALU → data memory → register file
o Violates design principle: Making the common case fast
o We will improve performance by pipelining
Cycle 1 Cycle 2
Clk
lw sw Waste
■ Solution
Single-cycle (Tc= 800ps)
■ Next lecture
● Multicycle datapath - Pipelining