11 Processor 1

You might also like

You are on page 1of 23

Chapter 4

The Processor

1
§4.1 Introduction
Introduction
 CPU performance factors
 Instruction count
 Determined by ISA and compiler
 CPI and Cycle time
 Determined by CPU hardware
 We will examine two LEGv8 implementations
 A simplified version
 A more realistic pipelined version
 Simple subset, shows most aspects
 Memory reference: LDUR, STUR
 Arithmetic/logical: ADD, SUB, AND, ORR, SLT
 Control transfer: Compare and branch on zero (CBZ), Branch (B), beq, j

2
Instruction Execution
 PC  instruction memory, fetch instruction
 Register numbers  register file, read registers
 Depending on instruction class
 Use ALU to calculate
 Arithmetic result
 Memory address for load/store
 Branch target address
 Access data memory for load/store
 PC  target address or PC + 4

3
CPU Overview

4
Multiplexers  Can’t just join
wires together
 Use multiplexers

5
Control

6
§4.2 Logic Design Conventions
Logic Design Basics
 Information encoded in binary
 Low voltage = 0, High voltage = 1
 One wire per bit
 Multi-bit data encoded on multi-wire buses
 Combinational element
 Operate on data
 Output is a function of input
 State (sequential) elements
 Store information

7
Combinational Elements
 AND-gate  Adder A
+ Y
 Y=A&B  Y=A+B B

A
Y
B
 Arithmetic/Logic Unit
 Multiplexer  Y = F(A, B)
 Y = S ? I1 : I0
A
I0 M
u Y ALU Y
I1 x
B
S F

8
Sequential Elements
 Register: stores data in a circuit
 Uses a clock signal to determine when to update the
stored value
 Edge-triggered: update when Clk changes from 0 to 1

Clk
D Q
D

Clk
Q

9
Sequential Elements
 Register with write control
 Only updates on clock edge when write control input is
1
 Used when stored value is required later

Clk

D Q Write

Write D
Clk
Q

10
Clocking Methodology
 Combinational logic transforms data during clock
cycles
 Between clock edges
 Input from state elements, output to state element
 Longest delay determines clock period

11
§4.3 Building a Datapath
Building a Datapath
 Datapath
 Elements that process data and addresses
in the CPU
 Registers, ALUs, mux’s, memories, …
 We will build a LEGv8 datapath incrementally
 Refining the overview design

12
Instruction Fetch

Increment by
4 for next
64-bit instruction
register

13
R-Format Instructions
 Read two register operands
 Perform arithmetic/logical operation
 Write register result

14
Load/Store Instructions
 LDUR X1,[X2,offset_value] or STUR X1, [X2,offset_value]
 Read register operands, and Calculate memory address by adding the
base register X2 with 9-bit signed offset
 Use ALU, but sign-extend the 9-bit offset field in the instruction to a 64-bit signed
value
 Load: Read memory and write into register file (register X1 here)
 Store: read register file (X1) and write value to memory

15
Branch Instructions

 CBZ X1,offset
 XI register is tested for zero, and a 19-bit offset used to compute the branch
target address relative to the branch instruction address
 Use ALU, subtract and check Zero output
 Calculate target address
 Sign-extend displacement
 The base for the branch address calculation is the address of the branch
instruction
 Shift left offset field by 2 bits so that it is a word offset
 If the operand (X1) is zero, the branch target address is the new PC
 If the operand is not zero, the incremented PC (PC+4, during
instruction fetch) replaces the current PC
16
Datapath segment for branches
Just
re-routes
wires

Sign-bit wire
replicated

17
Composing the Elements
 The simplest datapath executes all instructions in one clock cycle
 Each datapath element can only do one function at a time
 Hence, we need separate instruction and data memories
 Use multiplexers where alternate data sources are used for
different instructions

18
R-Type/Load/Store Datapath

19
Full Datapath

20
§4.4 A Simple Implementation Scheme
ALU Control
 Load/Store (LDUR/STUR): ALU computes the memory address by addition
 R-type instructions: ALU performs one of the four actions (AND, OR, subtract, or add),
depending on the value of the 11-bit opcode field in the instruction
 compare and branch zero (CBZ): ALU just passes the register input value.
 Small control unit
 Input: opcode field of the instruction and a 2-bit control field, called ALUOp, with the following values:
 (00) indicates the operation to be performed should be add for loads and stores,
 (01) pass input b for CBZ,
 (10) determined by the operation encoded in the opcode field.
 Output: 4-bit signal that directly controls the ALU by generating one of the 6 combinations shown below
ALU control lines Function
0000 AND
0001 OR
0010 add
0110 subtract
0111 pass input b
21
1100 NOR
ALU Control
 ALU control inputs based on the 2-bit ALUOp control and the 11-bit
opcode.
 ALUOp bits are generated from the main control unit.
 Multiple levels of decoding - common implementation technique
 can reduce the size of the main control unit
 potentially reduce the latency of the control unit
ALU
opcode ALUOp Operation Opcode field ALU function control
LDUR 00 load register XXXXXXXXXXX add 0010
STUR 00 store register XXXXXXXXXXX add 0010
CBZ 01 compare and XXXXXXXXXXX pass input b 0111
branch on zero
R-type 10 add 100000 add 0010
subtract 100010 subtract 0110
AND 100100 AND 0000
22
ORR 100101 OR 0001
The Main Control Unit
 Control signals derived from instruction

 Opcode field: 6 – 11 bits wide, bit positions 31:26 to 31:21


 First register operand: bit positions 9:5 (Rn)
 Other register operand: bit positions 20:16 (Rm), 4:0 (Rt)
 Another operand: 19-bit offset (CBZ) or 9-bit offset (Load/Store)
 The destination register for R-type instructions (Rd) and for loads (Rt) is in bit positions 4:0.
23

You might also like