Professional Documents
Culture Documents
Micro Architecture
• Microarchitecture is the connection between logic and architecture.
• Microarchitecture is the specific arrangement of registers, ALUs,
finite state machines (FSMs), memories, and other logic building
blocks needed to implement an architecture.
• A particular architecture, such as ARM, may have many different
microarchitectures, each with different trade-offs of performance,
cost, and complexity. They all run the same programs, but their
internal designs vary widely.
Pre-requisite / Recaps
B
• Combinational Logic Circuits B = 0
B
B = 1
• Memory
CLK
D Q
Q
4 4
CLK
Status CLK
CLK WE3
A1 RD1 WE
4 32
PC' PC
A RD A RD
32 32 32 32 A2 RD2 32 32
Instructi on
4 32 Data
Memory A3 Register Memory
4
WD3 File WD
32 32
R15
32
4 4
CLK
Status CLK
CLK WE3
A1 RD1 WE
4 32
PC' PC
A RD A RD
32 32 32 32 A2 RD2 32 32
Instructi on
4 32 Data
Memory A3 Register Memory
4
WD3 File WD
32 32
R15
32
4 4
CLK
Status CLK
CLK WE3
A1 RD1 WE
4 32
PC' PC
A RD A RD
32 32 32 32 A2 RD2 32 32
Instructi on
4 32 Data
Memory A3 Register Memory
4
WD3 File WD
32 32
R15
32
4 4
CLK
Status CLK
CLK WE3
A1 RD1 WE
4 32
PC' PC
A RD A RD
32 32 32 32 A2 RD2 32 32
Instructi on
4 32 Data
Memory A3 Register Memory
4
WD3 File WD
32 32
R15
32
11
• The 15-element × 32-bit register file holds registers R0–R14 and has an
additional input to receive R15 from the PC.
• The register file has two read ports and one write port. The read ports take 4-bit
address inputs, A1 and A2, each specifying one of 24 = 16 registers as source
operands.
• They read the 32-bit register values onto read data outputs RD1 and RD2,
respectively.
• The write port takes a 4-bit address input, A3; a 32-bit write data input, WD3; a
write enable input, WE3; and a clock.
• If the write enable is asserted, then the register file writes the data into the
specified register on the rising edge of the clock.
• A read of R15 returns the value from the PC plus 8, and writes to R15 must be
specially handled to update the PC because it is separate from the register file
Dr. Sajid Muhaimin Choudhury 12
EEE 415 - Department of EEE, BUET Digital Design and Computer Architecture: ARM® Edition © 2015
6
12
DRAFT 7/12/2023
4 4
CLK
Status CLK
CLK WE3
A1 RD1 WE
4 32
PC' PC
A RD A RD
32 32 32 32 A2 RD2 32 32
Instructi on
4 32 Data
Memory A3 Register Memory
4
WD3 File WD
32 32
R15
32
13
The ALU receives two operands, SrcA and SrcB. SrcA comes from the register
file, and SrcB comes from the extended immediate. The ALU can perform
many operations, (described in Section 5.2.4). The 2-bit ALUControl signal
specifies the operation. The ALU generates a 32-bit ALUResult.
Dr. Sajid Muhaimin Choudhury 14
EEE 415 - Department of EEE, BUET Digital Design and Computer Architecture: ARM® Edition © 2015
7
14
DRAFT 7/12/2023
Synchronization
• The instruction memory, register file, and data memory
• Read: Asynchronous
• Write: Synchronous
• The state of the system is changed only at the clock edge.
• The address, data, and write enable must setup before the clock edge and must
remain stable until a hold time after the clock edge.
• Because the state elements change their state only on the rising edge of the
clock, they are synchronous sequential circuits (as covered in EEE 303).
15
Processor Performance
• Program execution time
• Definitions:
• CPI: Cycles/instruction
• clock period: seconds/cycle
• IPC: instructions/cycle = IPC
17
Microarchitecture
• Multiple implementations for a single architecture:
• Single-cycle: Each instruction executes in a single cycle
• Multicycle: Each instruction is broken up into series of shorter steps
• Pipelined: Each instruction broken up into series of steps & multiple instructions execute at
once
ARM Processor
• We consider subset of ARM instructions:
• Data-processing instructions:
– ADD, SUB, AND, ORR
– with register and immediate Src2, but no shifts
• Memory instructions:
– LDR, STR
– with positive immediate offset
• Branch instructions:
–B
19
Microarchitecture
• Multiple implementations for a single architecture:
• Single-cycle: Each instruction executes in a single cycle
• Multicycle: Each instruction is broken up into series of shorter steps
• Pipelined: Each instruction broken up into series of steps & multiple instructions execute at
once
21
23
CLK CLK
CLK
Instr
WE3 WE
PC' PC A1 RD1
A RD
A RD
Instruction A2 RD2
Memory Data
Memory
A3 Register
WD
WD3 File
R15
The program counter contains the address of the instruction to execute. The first step is
to read this instruction from instruction memory. Figure shows that the PC is simply
connected to the address input of the instruction memory. The instruction memory
reads out, or fetches, the 32-bit instruction, labeled Instr.
Dr. Sajid Muhaimin Choudhury 25
EEE 415 - Department of EEE, BUET Digital Design and Computer Architecture: ARM® Edition © 2015
25
CLK CLK
CLK
Instr
Instr
19:16 RA1 WE3 WE
PC' PC A1 RD1
A RD
A RD
Instructi on A2 RD2
Memory Data
15:12 Memory
A3 Register
WD
WD3 File
R15
11:0
Extend Ext Imm
27
Instr
19:16 RA1 WE3 WE
PC' PC A1 RD1
A RD
A RD
Instructi on A2 RD2
Memory Data
15:12 Memory
A3 Register
WD
WD3 File
R15
11:0
Extend Ext Imm
29
A RD
Instructi on A2 RD2 SrcB Data
Memory
15:12 Memory
A3 Register
WD
WD3 File
R15
Address = RD1+ExtImm
11:0
Extend ExtImm
Instr
19:16 RA1 WE3 SrcA WE
PC' PC A1 RD1
A RD ALUResult ReadDat a
ALU
A RD
Instructi on A2 RD2 SrcB Data
Memory
15:12 Memory
A3 Register
WD
WD3 File
R15
11:0
Extend Ext Imm
31
• While the instruction is being executed, the processor must compute the
address of the next instruction, PC′.
• Because instructions are 32 bits (4 bytes), the next instruction is at PC + 4
• We can use additional adder to compute the next address
o
19:16 RA1 WE3 SrcA WE
PC' PC A1 RD1
A RD ALUResult ReadDat a
ALU
A RD
Instructi on A2 RD2 SrcB Data
Memory
15:12 Memory
A3 Register
WD
WD3 File
R15
PCPlus4
+
4
11:0
Extend Ext Imm
33
0 A RD
Instructi on A2 RD2 SrcB Data
Memory
15:12 Memory
A3 Register
WD
WD3 File
4 PCPlus8
R15
+
PCPlus4
+
4
11:0
Extend ExtImm
ALU
0 A RD
Instructi on A2 RD2 SrcB Data
Memory
15:12 Memory
A3 Register
WD
WD3 File
4
PCPlus8
R15
+
PCPlus4
+
4
11:0
Extend ExtImm
35
0 A RD
Instructi on A2 RD2 SrcB Data
Memory
15:12 Memory
A3 Register
WD
WD3 File
4 PCPlus8
R15
+
PCPlus4
+
4
11:0
Extend ExtImm
Instr
19:16 RA1 WE3 SrcA WE
1 PC' PC A1 RD1
A RD ALUResult ReadDat a
ALU
0 A RD
Instructi on RA2
A2 RD2 SrcB Data
Memory
15:12 Memory
A3 Register WriteData
WD
WD3 File
4
PCPlus8
R15
+
PCPlus4
+
4
11:0
Extend ExtImm
37
Lecture 5.2
Week 5
Dr. Sajid Muhaimin Choudhury, Assistant Professor
Department of Electrical and Electronics Engineering
Bangladesh University of Engineering and Technology
19
38
DRAFT 7/12/2023
39
0 A RD
Instructi on RA2
A2 RD2 SrcB Data
Memory
15:12 Memory
A3 Register WriteData
WD
WD3 File
4 1
PCPlus8
R15
+
PCPlus4 0
+
4
11:0
Extend Ext Imm
Result
41
0 3:0 A RD
Instruction 0 RA2
A2 RD2 0 SrcB Data
Memory 1
15:12 1 Memory
A3 Register WriteData
WD
4 WD3 File
PCPlus8 1
R15
+
PCPlus4 0
+
4
11:0
Extend Ext Imm
Result
Single-Cycle Datapath: B
Calculate branch target address:
BTA = (ExtImm) + (PC + 8)
ExtImm = Imm24 << 2 and sign-extended
ALUFlags
PCSrc RegSrc RegWrite ImmSrc ALUSrc ALUControl MemWrite MemtoReg
1 1 x 0 10 1 00 0 0
CLK CLK
CLK
19:16
Instr
0 RA1 WE3 SrcA WE
1 PC' PC A1 RD1
A RD 15 1 ALUResult ReadDat a
ALU
0 3:0 A RD
Instructi on 0 RA2
A2 RD2 0 SrcB Data
Memory 1
15:12 1 Memory
A3 Register WriteData
WD
WD3 File
4 1
PCPlus8
R15
+
PCPlus4 0
+
4
23:0
Extend Ext Imm
Result
B Label
43
0 3:0 A RD
Instructi on 0 RA2
A2 RD2 0 SrcB Data
Memory 1
15:12 1 Memory
A3 Register WriteData
WD
4 WD3 File
PCPlus8 1
R15
+
PCPlus4 0
+
4
23:0
Exten d ExtImm
Result
ALUFlags
PCSrc RegSrc RegWrite ImmSrc ALUSrc ALUControl MemWrite MemtoReg
1 1 x 0 10 1 00 0 0
CLK CLK
CLK
19:16
Instr
0 RA1 WE3 SrcA WE
1 PC' PC A1 RD1
A RD 15 1 ALUResult ReadDat a
ALU
0 3:0 A RD
Instructi on 0 RA2
A2 RD2 0 SrcB Data
Memory 1
15:12 1 Memory
A3 Register WriteData
WD
4 WD3 File
PCPlus8 1
R15
+
PCPlus4 0
+
4
23:0
Exten d ExtImm
Result
45
Flags
ALUFlags
RegSrc
0 1 CLK CLK
CLK
19:16
Instr
0 3:0 A RD
Instruction 0 RA2
A2 RD2 0 SrcB Data
Memory 1
15:12 1 Memory
A3 Register WriteData
WD
4 WD3 File
PCPlus8 1
R15
+
PCPlus4 0
+
4
23:0
Extend ExtImm
Result
Single-Cycle Control
• These signals change the
state (PC, RF, Memory)
• If instruction shouldn’t
execute, forced to 0
Sent through
Conditional Logic
first, then to
datapath
Sent directly
to datapath
Control Unit
Dr. Sajid Muhaimin Choudhury 47
EEE 415 - Department of EEE, BUET Digital Design and Computer Architecture: ARM® Edition © 2015
47
Single-Cycle Control
• FlagW1:0: Flag Write signal,
asserted when ALUFlags
should be saved (i.e., on
instruction with S=1)
• ADD, SUB update all flags
(NZCV)
• AND, ORR only update NZ
flags
• So, two bits needed:
FlagW1 = 1: NZ saved
(ALUFlags3:2 saved)
FlagW0 = 1: CV saved
(ALUFlags1:0 saved)
Single-Cycle Control
49
Submodules:
• Main Decoder
• ALU Decoder
• PC Logic
51
Submodules:
• Main Decoder
• ALU Decoder
• PC Logic
Funct5
Funct0
Type
Branch
MemtoReg
MemW
ALUSrc
ImmSrc
RegW
RegSrc
ALUOp
00 0 X DP Reg 0 0 0 0 XX 1 00 1
00 1 X DP Imm 0 0 0 1 00 1 X0 1
01 X 0 STR 0 X 1 1 01 0 10 0
01 X 1 LDR 0 1 0 1 01 1 X0 0
11 X X B 1 0 0 1 10 0 X1 0
53
Submodules:
• Main Decoder
• ALU Decoder
• PC Logic
Review: ALU
ALUControl1:0 Function
00 Add
01 Subtract
10 AND
11 OR
55
Review: ALU
Submodules:
• Main Decoder
• ALU Decoder
• PC Logic
57
Submodules:
• Main Decoder
• ALU Decoder
• PC Logic
59
Flags
ALUFlags
RegSrc
0 1 CLK CLK
CLK
19:16
Instr
0 3:0 A RD
Instructi on 0 RA2
A2 RD2 0 SrcB Data
Memory 1
15:12 1 Memory
A3 Register WriteData
WD
WD3 File
4 1
PCPlus8
R15
+
PCPlus4 0
+
4
23:0
Exten d ExtImm
Result
Single-Cycle Control
61
Conditional Logic
Function:
1. Check if instruction should execute (if not, force PCSrc, RegWrite, and MemWrite to 0)
2. Possibly update Status Register (Flags3:0)
Conditional Logic
Function:
1. Check if instruction should execute (if not, force PCSrc, RegWrite, and MemWrite to 0)
2. Possibly update Status Register (Flags3:0)
63
Flags3:0 is the
status register
65
Flags3:0 = NZCV
67
Flags3:0 = NZCV
Conditional Logic
Function:
1. Check if instruction should execute (if not, force PCSrc, RegWrite, and MemWrite to 0)
2. Possibly update Status Register (Flags3:0)
69
Flags3:0 = NZCV
71
All Flags
updated
Flags3:0 = NZCV
• Only Flags3:2
updated
• i.e., only NZ
Flags
updated
73
Example: ORR
Op
Funct5
Funct0
Type
Branch
MemtoReg
MemW
ALUSrc
ImmSrc
RegW
RegSrc
ALUOp
00 0 X DP Reg 0 0 0 0 XX 1 00 1
Example: ORR
75
No change to datapath
77
79
No change to controller
81
Lecture 5.3
Week 5
Dr. Sajid Muhaimin Choudhury, Assistant Professor
Department of Electrical and Electronics Engineering
Bangladesh University of Engineering and Technology
41
82
DRAFT 7/12/2023
Lecture 5.3
Week 5
Dr. Sajid Muhaimin Choudhury, Assistant Professor
Department of Electrical and Electronics Engineering
Bangladesh University of Engineering and Technology
83
Single-Cycle Performance
tdec
tdec
tmux tmux
tsext
85
Single-Cycle Performance
• Single-cycle critical path:
Tc1 = tpcq_PC + tmem + tdec + max[tmux + tRFread, tsext + tmux] + tALU + tmem + tmux + tRFsetup
87
89
Lecture 5.3
Week 5
Dr. Sajid Muhaimin Choudhury, Assistant Professor
Department of Electrical and Electronics Engineering
Bangladesh University of Engineering and Technology
45
90
DRAFT 7/12/2023
91
PCSrc
Control
MemtoReg
Unit
31:28 MemWrite
Cond
27:26 ALUControl
Op
25:20 ALUSrc
Funct
15:12
Rd ImmSrc
RegWrite
Flags
ALUFlags
0 1 CLK CLK
CLK
19:16
Instr
0 3:0 A RD
Instruction 0 RA2
ADD, SUB, AND, ORR, LDR Memory 1
A2 RD2 0 SrcB Data
15:12 1 Memory
will not work A3 Register WriteData
WD
4 WD3 File
PCPlus8 1
R15
+
PCPlus4 0
+
4
23:0
Extend ExtImm
Result
Dr. Sajid Muhaimin Choudhury 92
EEE 415 - Department of EEE, BUET Digital Design and Computer Architecture: ARM® Edition © 2015
46
92
DRAFT 7/12/2023
PCSrc
Control
MemtoReg
Unit
31:28 MemWrite
Cond
27:26 ALUControl
Op
25:20 ALUSrc
Funct
15:12
Rd ImmSrc
RegWrite
Flags
ALUFlags
RegSrc
0 1 CLK CLK
CLK
19:16
Instr
0 RA1 WE3 SrcA WE
1 PC' PC A1 RD1
A RD 15 1 ALUResult ReadDat a
ALU
0 3:0 A RD
0
SUB, AND, ORR will not work! Instruction
Memory 1
RA2
A2 RD2 0 SrcB Data
15:12 1 Memory
A3 Register WriteData
WD
4 WD3 File
PCPlus8 1
R15
+
PCPlus4 0
+
4
23:0
Extend ExtImm
Result
Dr. Sajid Muhaimin Choudhury 93
EEE 415 - Department of EEE, BUET Digital Design and Computer Architecture: ARM® Edition © 2015
93
PCSrc
Control
MemtoReg
Unit
31:28 MemWrite
Cond
27:26 ALUControl
Op
25:20 ALUSrc
Funct
15:12
Rd ImmSrc
RegWrite
Flags
0 1 CLK CLK
CLK
19:16
Instr
0 3:0 A RD
0
STR will not work! Instruction
Memory 1
RA2
A2 RD2 0 SrcB Data
15:12 1 Memory
A3 Register WriteData
WD
4 WD3 File
PCPlus8 1
R15
+
PCPlus4 0
+
4
23:0
Extend ExtImm
Result
Dr. Sajid Muhaimin Choudhury 94
EEE 415 - Department of EEE, BUET Digital Design and Computer Architecture: ARM® Edition © 2015
47
94
DRAFT 7/12/2023
95
PCSrc
Control
MemtoReg
Unit
31:28 MemWrite
Cond
27:26 ALUControl
Op
25:20 ALUSrc
Funct
15:12
Rd ImmSrc
RegWrite
Flags
ALUFlags
0 1 CLK CLK
CLK
0 3:0 A RD
Instruction 0 RA2
A2 RD2 0 SrcB Data
Memory 1
1 Memory
STR, B: these instructions write
15:12
A3 Register WriteData
WD
WD3 File
to the register file when they 4
PCPlus8
R15
1
+
shouldn't PCPlus4 0
+
4
23:0
Extend ExtImm
Result
Dr. Sajid Muhaimin Choudhury 96
EEE 415 - Department of EEE, BUET Digital Design and Computer Architecture: ARM® Edition © 2015
48
96
DRAFT 7/12/2023
PCSrc
Control
MemtoReg
Unit
31:28 MemWrite
Cond
27:26 ALUControl
Op
25:20 ALUSrc
Funct
15:12
Rd ImmSrc
RegWrite
Flags
ALUFlags
RegSrc
0 1 CLK CLK
CLK
to determine the operation to
19:16
Instr
0 RA1 WE3 SrcA WE
1 PC' PC A1 RD1
A RD 15 1
perform. ALUResult ReadDat a
ALU
0 3:0 A RD
Instruction 0 RA2
A2 RD2 0 SrcB Data
ALU cannot perform addition
Memory 1
15:12 1 Memory
A3 Register WriteData
WD
4 WD3 File
PCPlus8 1
R15
+
PCPlus4 0
LDR, STR, B cannot work!
+
4
23:0
Extend ExtImm
Result
Dr. Sajid Muhaimin Choudhury 97
EEE 415 - Department of EEE, BUET Digital Design and Computer Architecture: ARM® Edition © 2015
97
PCSrc
Control
MemtoReg
Unit
31:28 MemWrite
Cond
27:26 ALUControl
Op
25:20 ALUSrc
Funct
15:12
Rd ImmSrc
RegWrite
Flags
0 1 CLK CLK
CLK
0 3:0 A RD
Instruction 0 RA2
A2 RD2 0 SrcB Data
Memory 1
1 Memory
ADD, SUB, AND, ORR, LDR, B
15:12
A3 Register WriteData
WD
WD3 File
these instructions inadvertently 4
PCPlus8
R15
1
+
4
memory 23:0
Extend ExtImm
Result
Dr. Sajid Muhaimin Choudhury 98
EEE 415 - Department of EEE, BUET Digital Design and Computer Architecture: ARM® Edition © 2015
49
98
DRAFT 7/12/2023
99
Will perform a logical AND operation. Update flags. but will not write the result
Will perform a logical AND operation. Update flags. but will not write the result
101
Will shift the result of register 2 and use shifted result. Add shifter and MUX
103
https://developer.arm.com/documentation/dui0204/j/arm-and-thumb-
instructions/general-data-processing-instructions/cmp-and-cmn Dr. Sajid Muhaimin Choudhury 105
EEE 415 - Department of EEE, BUET Digital Design and Computer Architecture: ARM® Edition © 2015
105
107
109
111
113
It is not possible to implement this instruction without either modifying the register
file or making the instruction take at least two cycles to execute.
Add WE1 and WD1 signals to the register file.
WE1 connects to the PostIndex signal (from control unit)
WD1 connects to ALUResult, which is the sum of Rn + Rm (or Rn + Src2, more
generally).
Add multiplexer before Data Memory Address to choose between (Rn + Src2) and
Rn. With post-indexing, the Data Memory Address input connects to Rn.
Dr. Sajid Muhaimin Choudhury 115
EEE 415 - Department of EEE, BUET Digital Design and Computer Architecture: ARM® Edition © 2015
115
117
It is not possible to implement this instruction without either modifying the register file or
making the instruction take at least two cycles to execute. We modify the register file and
datapath as shown below.
Add WE1 and WD1 signals to the register file.
WE1 connects to the PreIndex signal (from control unit)
WD1 connects to ALUResult, which is the sum of Rn + Rm (or Rn + Src2, more generally)
Dr. Sajid Muhaimin Choudhury 118
EEE 415 - Department of EEE, BUET Digital Design and Computer Architecture: ARM® Edition © 2015
59
118
DRAFT 7/12/2023
119
It is not possible to implement this instruction without either modifying the register file or
making the instruction take at least two cycles to execute. We modify the register file and
datapath as shown below.
Add WE1 and WD1 signals to the register file.
WE1 connects to the PreIndex signal (from control unit)
WD1 connects to ALUResult, which is the sum of Rn + Rm (or Rn + Src2, more generally)
Dr. Sajid Muhaimin Choudhury 121
EEE 415 - Department of EEE, BUET Digital Design and Computer Architecture: ARM® Edition © 2015
121