You are on page 1of 36

Adeel Pasha, LUMS, Fall’23

CS-225: Lecture-24

Y86-64 Sequential Processor Working


Basics of Pipelining and its Side Effects

Majority of the content is modified from CMU’s “Introduction to Computer Systems” Course by Randal E.
Bryant and David R. O’Hallaron who are also the authors of the course textbook.
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

Y86-64 Sequential Processor Working

2
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

SEQ Processor: Operation


 Processor State
 PC register
 CC register
 Data memory
 Register file (RF)
 All updated as clock rises
 Combinational Logic
 ALU
 Control logic
 All the gray boxes
 PC Increment
 Memory reads
 Instruction memory
 Register file
 Data memory
3
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

SEQ Processor: Operation


 Processor State
 PC register
 CC register
 Data memory Combinational
Read Write

 Register file (RF) logic Data


memory
 All updated as clock rises
CC
 Combinational Logic 100
Read Write
 ALU ports ports

 Control logic RF
%rbx = 0x100
 All the gray boxes
 PC Increment
 Memory reads PC
0x014
 Instruction memory
 Register file
 Data memory
4
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

SEQ Processor:
Cycle 1 Cycle 2 Cycle 3 Cycle 4

Clock
irmovq irmovq addq je
   
Operation#2 Cycle 1:

Cycle 2:
0x000:

0x00a:
irmovq $0x100,%rbx

irmovq $0x200,%rdx
# %rbx <-- 0x100

# %rdx <-- 0x200


Cycle 3: 0x014: addq %rdx,%rbx # %rbx <-- 0x300 CC <-- 000
Cycle 4: 0x016: je dest # Not taken
Cycle 5: 0x01f: rmmovq %rbx,0(%rdx) # M[0x200] <-- 0x300

Read Write
Combinational
logic Data  state set according to
memory

CC
second irmovq instr.
100
Read Write  combinational logic
ports ports

RF starting to react to state


%rbx = 0x100
%rdx = 0x200 changes

PC
0x014

5
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

SEQ Processor:
Cycle 1 Cycle 2 Cycle 3 Cycle 4

Clock
irmovq irmovq addq je
   
Operation#3 Cycle 1:

Cycle 2:
0x000:

0x00a:
irmovq $0x100,%rbx

irmovq $0x200,%rdx
# %rbx <-- 0x100

# %rdx <-- 0x200


Cycle 3: 0x014: addq %rdx,%rbx # %rbx <-- 0x300 CC <-- 000
Cycle 4: 0x016: je dest # Not taken
Cycle 5: 0x01f: rmmovq %rbx,0(%rdx) # M[0x200] <-- 0x300

Read Write
Combinational
logic Data  state set according to
memory

CC
second irmovq instr.
100
000
Read Write  combinational logic
ports ports

RF %rbx generates results for addq


%rbx = 0x100 <--
%rdx = 0x200
0x300
instruction

0x016
PC
0x014

6
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

SEQ Processor:
Cycle 1 Cycle 2 Cycle 3 Cycle 4

Clock
irmovq irmovq addq je
   
Operation#4 Cycle 1:

Cycle 2:
0x000:

0x00a:
irmovq $0x100,%rbx

irmovq $0x200,%rdx
# %rbx <-- 0x100

# %rdx <-- 0x200


Cycle 3: 0x014: addq %rdx,%rbx # %rbx <-- 0x300 CC <-- 000
Cycle 4: 0x016: je dest # Not taken
Cycle 5: 0x01f: rmmovq %rbx,0(%rdx) # M[0x200] <-- 0x300

Read Write
Combinational
logic Data  state set according to
memory
addq instruction
CC
000
Read Write  combinational logic
ports ports

RF
starting to react to state
%rbx = 0x300
changes

PC
0x016

7
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

SEQ Processor:
Cycle 1 Cycle 2 Cycle 3 Cycle 4

Clock
irmovq irmovq addq je
   
Operation#5 Cycle 1:

Cycle 2:
0x000:

0x00a:
irmovq $0x100,%rbx

irmovq $0x200,%rdx
# %rbx <-- 0x100

# %rdx <-- 0x200


Cycle 3: 0x014: addq %rdx,%rbx # %rbx <-- 0x300 CC <-- 000
Cycle 4: 0x016: je dest # Not taken
Cycle 5: 0x01f: rmmovq %rbx,0(%rdx) # M[0x200] <-- 0x300

Read Write
Combinational
logic Data  state set according to
memory
addq instruction
CC
000
Read Write  combinational logic
ports ports
generates results for je
RF
%rbx = 0x300
instruction

0x01f
PC
0x016

8
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

Why Pipelining?

9
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

Fundamental Execution Cycle in Y86-64

PC Update Update program counter

Write back Write program registers

Mem Access Read or write data into data Memory or stack

Execute Compute value or address

Decode Read Registers

Fetch Obtain instruction from Inst Memory

10
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

How to Improve Performance


 Basic idea is to reduce the execution time
 Increase the clock frequency
 Work in parallel on multiple data

Parallelism

 Serialize the operations like an assembly line

Pipelining

11
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

Example: Car Assembly Line


T1 T2 T3 T4 T5 T6

• One worker doing all the work


• Latency: 6 time-units
• Thruput: 1 car every 6 time-units
Time

12
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

How to increase the production?

Dedicate one worker for each elementary task

13
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

T1 T2 T3 T4 T5 T6

• Work divided into 6 workers


• Latency: 6 time-units
• Throughput: 1 car every 1 time-unit
• Length of the time-unit?
•Depends on the slowest worker
Time

14
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

IF ID EX MEM WB PU

I1
I2 I1
I3 I2 I1
I4 I3 I2 I1
I5 I4 I3 I2 I1
I6 I5 I4 I3 I2 I1
I6 I5 I4 I3 I2 I1
I6 I5 I4 I3 I2
I6 I5 I4 I3
I6 I5 I4
Time

I6 I5
I6
15
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

Real-World Pipelines: Car Washes


Sequential Parallel

Pipelined
 Idea
 Divide process into independent stages
 Move objects through stages in sequence
 At any given time, multiple objects being
processed

16
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

How to Implement a Pipeline?

17
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

Computational Example
300 ns 20 ns

R
Combinational Delay = 320 ns
e
logic Throughput = 3.125 MIPS
g

Clock

 Unpipelined System
 Computation requires total of 300 nano seconds (ns)
 Additional 20 ns to save result in register
 Must have clock cycle of at least 320 ns

18
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

3-Way Pipelined Version


100 ns 20 ns 100 ns 20 ns 100 ns 20 ns

Comb. R Comb. R Comb. R


Delay = 360 ns
logic e logic e logic e
Throughput = 8.33 MIPS
A g B g C g

Clock

 Pipelined System
 Divide combinational logic into 3 blocks of 100 ns each
 Can begin new operation as soon as previous one passes
through stage A.
 Begin new operation every 120 ns
 Overall latency increases
 360 ns from start to finish
19
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

Pipeline Diagrams
 Unpipelined
320 ns
I1 640 ns
I2 960 ns
I3
Time

 Cannot start new instruction until previous one completes

 3-Way Pipelined 360 ns


I1 A B C 480 ns
I2 A B C 600 ns
I3 A B C
Time

 Up to 3 operations/instructions in process simultaneously

20
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

Operating a Pipeline # 1
239
Clock
I1 A B C
I2 A B C
I3 A B C

0 120 240 360 480 600 Time

100 ns 20 ns 100 ns 20 ns 100 ns 20 ns

Comb. R Comb. R Comb. R


logic e logic e logic e
A g B g C g

Clock

21
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

Operating a Pipeline # 2
241
Clock
I1 A B C
I2 A B C
I3 A B C

0 120 240 360 480 600 Time

100 ns 20 ns 100 ns 20 ns 100 ns 20 ns

Comb. R Comb. R Comb. R


logic e logic e logic e
A g B g C g

Clock

22
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

Operating a Pipeline # 3
300
Clock
I1 A B C
I2 A B C
I3 A B C

0 120 240 360 480 600 Time

100 ns 20 ns 100 ns 20 ns 100 ns 20 ns

Comb. R Comb. R Comb. R


logic e logic e logic e
A g B g C g

Clock

23
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

Operating a Pipeline # 4
359
Clock
I1 A B C
I2 A B C
I3 A B C

0 120 240 360 480 600 Time

100 ns 20 ns 100 ns 20 ns 100 ns 20 ns

Comb. R Comb. R Comb. R


logic e logic e logic e
A g B g C g

Clock

24
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

Limitations: Nonuniform Delays


50 ns 20 ns 150 ns 20 ns 100 ns 20 ns

Comb. R Comb. R Comb. R


logic e Delay = 510 ns
logic e logic e
A g Throughput = 5.88 MIPS
B g C g

Clock
I1 A B C
I2 A B C
I3 A B C
Time

 Throughput limited by slowest stage


 Other stages sit idle for much of the time
 Challenging to partition system into balanced stages

25
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

Limitations: Register Overhead


 As try to deepen pipeline, overhead of loading registers becomes
more significant
 Percentage of clock cycle spent loading register:
 1-stage pipeline: 6.25%
 3-stage pipeline: 16.67%
 6-stage pipeline: 28.57%
 High speeds of modern processor designs obtained through very
deep pipelining
50 ns 20 ns 50 ns 20 ns 50 ns 20 ns 50 ns 20 ns 50 ns 20 ns 50 ns 20 ns

Comb. R Comb. R Comb. R Comb. R Comb. R Comb. R


Logic e Logic e Logic e Logic e Logic e Logic
e
A B g C D E F
g g g g g

Clock Delay = 420 ns, Throughput = 14.29 MIPS


26
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

Data and Control Dependencies

Combinational R
logic e
g

Clock
320 ns
I1 640 ns
I2 960 ns
I3

Time

 System
 Each operation depends on result from preceding one
27
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

Data Dependencies in Processors


 Result from one instruction used as operand for another
 Read-After-Write (RAW) dependency
 Very common in actual programs
 Must make sure our pipeline handles these properly
 Get correct results
 Minimize performance impact

I1 irmovq $50, %rax

I2 addq %rax , %rbx

I3 mrmovq 100(%rbx ), %rdx

28
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

Side Effects of Pipelining: Data Hazards


 Result does not feed back around in time for next operation
 Pipelining has changed behavior of system

Comb. R Comb. R Comb. R


logic e logic e logic e
A g B g C g

Clock
irmovq $50, %rax I1 A B C
addq %rax, %rbx I2 A B C
mrmovq 100(%rbx), %rdx I3 A B C
I4 A B C
Time
0 120 240 360 480 600 720
29
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

Control Dependencies in Processors


 Result from one instruction (say jump) is used to determine
the control flow in your program
 Called Control/Branch Dependency
 Again, very common in actual programs

I1 loop:
I2 subq %rdx,%rbx
I3 jne target
I4 irmovq $10,%rdx
I5 jmp loop
I6 target:
I7 halt

30
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

Side Effects of Pipelining: Control Hazards


 Result does not feed back around in time for next operation
 Pipelining has changed behavior of system

Comb. R Comb. R Comb. R


logic e logic e logic e
A g B g C g

Clock
jne target I1 A B C
irmovq $10,%rdx I2 A B C
jmp loop I3 A B C
I4 A B C
Time
0 120 240 360 480 600 720
31
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

A Common Query from Last few Lectures

32
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

SEQ Processor: Processing Steps Steps


Inst xyz
icode, ifun Read instruction byte
rA, rB Read register byte
Fetch
valC Read constant word
valP Compute next PC
valA, srcA Read operand A
Decode
valB, srcB Read operand B
valE Perform ALU operation
Execute
Cond code Set/use cond. code reg.
Memory valM Memory read/write
Write dstE Write back ALU result
back dstM Write back memory result
PC update PC Update PC

 All instructions follow same general pattern


 Differ in what gets computed on each step

33
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

SEQ Processor: Processing Steps Steps


OPq rA, rB irmovq V, rB popq rA
icode, ifun icode:ifun ← M1[PC] icode:ifun ← M1[PC] icode:ifun ← M1[PC]
rA, rB rA:rB ← M1[PC+1] rA:rB ← M1[PC+1] rA:rB ← M1[PC+1]
Fetch
valC valC ← M8[PC+2]
valP valP ← PC+2 valP ← PC+10 valP ← PC+2
valA, srcA valA ← R[rA] valA ← R[%rsp]
Decode
valB, srcB valB ← R[rB] valB ← R[%rsp]
valE valE ← valB OP valA valE ← valC + 0 valE ← valB + 8
Execute
Cond code Set CC
Memory valM valM ← M8[valA]
Write dstE R[rB] ← valE R[rB] ← valE R[%rsp] ← valE
back dstM R[rA] ← valM
PC update PC PC ← valP PC ← valP PC ← valP

OPq rA, rB 6 fn rA rB

irmovq V, rB 3 0 F rB V

popq rA B 0 rA F

34
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

Reminder: Quiz 5 on Friday

Academic Block, MCB Auditorium


6:00pm – 8:00pm

35
CS-225: Comp. Sys. Fund. 11/22/2023
Slides modified from: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Adeel Pasha, LUMS, Fall’23

Quiz 6: Wednesday, Nov. 28, 2023


 Syllabus:
 Lect. 22 (Slide#14 onwards) to Lect. 25 (Friday wala)

 Topics covered:

 HCL Logic for Y86-64 Processor Sequential Implementation


 Basics of Pipelining
 Pipelined Y86-64 Implementation

 4.3.2, 4.3.3, 4.3.4, 4.4, 4.5 (up to Lect. 25)

 There could be problem solving and/or objective type questions


(TFs/MCQs)

36
CS-225: Comp. Sys. Fund. 11/22/2023

You might also like