Professional Documents
Culture Documents
11 Processor 1
11 Processor 1
11 Processor 1
The Processor
1
§4.1 Introduction
Introduction
CPU performance factors
Instruction count
Determined by ISA and compiler
CPI and Cycle time
Determined by CPU hardware
We will examine two LEGv8 implementations
A simplified version
A more realistic pipelined version
Simple subset, shows most aspects
Memory reference: LDUR, STUR
Arithmetic/logical: ADD, SUB, AND, ORR, SLT
Control transfer: Compare and branch on zero (CBZ), Branch (B), beq, j
2
Instruction Execution
PC instruction memory, fetch instruction
Register numbers register file, read registers
Depending on instruction class
Use ALU to calculate
Arithmetic result
Memory address for load/store
Branch target address
Access data memory for load/store
PC target address or PC + 4
3
CPU Overview
4
Multiplexers Can’t just join
wires together
Use multiplexers
5
Control
6
§4.2 Logic Design Conventions
Logic Design Basics
Information encoded in binary
Low voltage = 0, High voltage = 1
One wire per bit
Multi-bit data encoded on multi-wire buses
Combinational element
Operate on data
Output is a function of input
State (sequential) elements
Store information
7
Combinational Elements
AND-gate Adder A
+ Y
Y=A&B Y=A+B B
A
Y
B
Arithmetic/Logic Unit
Multiplexer Y = F(A, B)
Y = S ? I1 : I0
A
I0 M
u Y ALU Y
I1 x
B
S F
8
Sequential Elements
Register: stores data in a circuit
Uses a clock signal to determine when to update the
stored value
Edge-triggered: update when Clk changes from 0 to 1
Clk
D Q
D
Clk
Q
9
Sequential Elements
Register with write control
Only updates on clock edge when write control input is
1
Used when stored value is required later
Clk
D Q Write
Write D
Clk
Q
10
Clocking Methodology
Combinational logic transforms data during clock
cycles
Between clock edges
Input from state elements, output to state element
Longest delay determines clock period
11
§4.3 Building a Datapath
Building a Datapath
Datapath
Elements that process data and addresses
in the CPU
Registers, ALUs, mux’s, memories, …
We will build a LEGv8 datapath incrementally
Refining the overview design
12
Instruction Fetch
Increment by
4 for next
64-bit instruction
register
13
R-Format Instructions
Read two register operands
Perform arithmetic/logical operation
Write register result
14
Load/Store Instructions
LDUR X1,[X2,offset_value] or STUR X1, [X2,offset_value]
Read register operands, and Calculate memory address by adding the
base register X2 with 9-bit signed offset
Use ALU, but sign-extend the 9-bit offset field in the instruction to a 64-bit signed
value
Load: Read memory and write into register file (register X1 here)
Store: read register file (X1) and write value to memory
15
Branch Instructions
CBZ X1,offset
XI register is tested for zero, and a 19-bit offset used to compute the branch
target address relative to the branch instruction address
Use ALU, subtract and check Zero output
Calculate target address
Sign-extend displacement
The base for the branch address calculation is the address of the branch
instruction
Shift left offset field by 2 bits so that it is a word offset
If the operand (X1) is zero, the branch target address is the new PC
If the operand is not zero, the incremented PC (PC+4, during
instruction fetch) replaces the current PC
16
Datapath segment for branches
Just
re-routes
wires
Sign-bit wire
replicated
17
Composing the Elements
The simplest datapath executes all instructions in one clock cycle
Each datapath element can only do one function at a time
Hence, we need separate instruction and data memories
Use multiplexers where alternate data sources are used for
different instructions
18
R-Type/Load/Store Datapath
19
Full Datapath
20
§4.4 A Simple Implementation Scheme
ALU Control
Load/Store (LDUR/STUR): ALU computes the memory address by addition
R-type instructions: ALU performs one of the four actions (AND, OR, subtract, or add),
depending on the value of the 11-bit opcode field in the instruction
compare and branch zero (CBZ): ALU just passes the register input value.
Small control unit
Input: opcode field of the instruction and a 2-bit control field, called ALUOp, with the following values:
(00) indicates the operation to be performed should be add for loads and stores,
(01) pass input b for CBZ,
(10) determined by the operation encoded in the opcode field.
Output: 4-bit signal that directly controls the ALU by generating one of the 6 combinations shown below
ALU control lines Function
0000 AND
0001 OR
0010 add
0110 subtract
0111 pass input b
21
1100 NOR
ALU Control
ALU control inputs based on the 2-bit ALUOp control and the 11-bit
opcode.
ALUOp bits are generated from the main control unit.
Multiple levels of decoding - common implementation technique
can reduce the size of the main control unit
potentially reduce the latency of the control unit
ALU
opcode ALUOp Operation Opcode field ALU function control
LDUR 00 load register XXXXXXXXXXX add 0010
STUR 00 store register XXXXXXXXXXX add 0010
CBZ 01 compare and XXXXXXXXXXX pass input b 0111
branch on zero
R-type 10 add 100000 add 0010
subtract 100010 subtract 0110
AND 100100 AND 0000
22
ORR 100101 OR 0001
The Main Control Unit
Control signals derived from instruction