You are on page 1of 20

CS F342: Computer Architecture

(First Semester 2022-23)

Lect 3: Performance, MIPS Instrutcion


Dr. Nikumani Choudhury
Asst. Prof., Dept. of Computer Sc. & Information Systems
nikumani@hyderabad.bits-pilani.ac.in
BITS Pilani
Hyderabad Campus
CPI in More Detail

If different instruction classes take different numbers of cycles


$

Clock Cycles = *(CPI! ×Instruction Count ! )


!"#

n Weighted average CPI


$
Clock Cycles Instruction Count !
CPI = = * CPI! ×
Instruction Count Instruction Count
!"#

Relative frequency

2 CS F342 BITS Pilani, Hyderabad Campus


Performance Summary

The BIG Picture

Instructions Clock cycles Seconds


CPU Time = × ×
Program Instruction Clock cycle

Performance depends on
– Algorithm: affects IC, possibly CPI
– Programming language: affects IC, CPI
– Compiler: affects IC, CPI
– Instruction set architecture: affects IC, CPI, clock rate(Tc

3 CS F342 BITS Pilani, Hyderabad Campus


Uniprocessor Performance

Constrained by power, instruction-level parallelism,


memory latency
4 CS F342 BITS Pilani, Hyderabad Campus
Multicore microprocessors

Multicore microprocessors
– More than one processor per chip

Requires explicitly parallel programming


– Compare with instruction level parallelism
• Hardware executes multiple instructions at once
• Hidden from the programmer
– Hard to do
• Programming for performance
• Load balancing
• Optimizing communication and synchronization

5 CS F342 BITS Pilani, Hyderabad Campus


SPEC CPU Benchmark
Programs used to measure performance
– Supposedly typical of actual workload
Standard Performance Evaluation Corp (SPEC)
– Develops benchmarks for CPU, I/O, Web, …
SPEC CPU2017
– Elapsed time to execute a selection of programs
• Negligible I/O, so focuses on CPU performance
– Set of 10 integer benchmarks and 13 floating point benchmarks
– Summarize as geometric mean of performance ratios

$
!
! Execution time ratio!
!"#

6 CS F342 BITS Pilani, Hyderabad Campus


SPEC Power Benchmark

Power consumption of server at different workload levels


– Performance: ssj_ops/sec
– Power: Watts (Joules/sec)

#% #%

Overall ssj_ops per Watt = 8 ssj_ops! 7 8 power!


!"% !"%

7 CS F342 BITS Pilani, Hyderabad Campus


Pitfall: Amdahl’s Law
Improving an aspect of a computer and expecting a proportional
improvement in overall performance

T,--*./*+
T!%&'()*+ = + T0$,--*./*+
improvement factor

n Example: multiply accounts for 80s/100s


n How much improvement in multiply performance to
get 5× overall?
80 n Can’t be done!
20 = + 20
n

8 CS F342 BITS Pilani, Hyderabad Campus


Amdahl’s law
In computer architecture, Amdahl’s law (or Amdahl’s Argument) is a formula which gives the theoretical speedup in
latency of the execution of a task at fixed workload that can be expected of a system whose resources are improved. It is
named after the computer scientist Gene Amdahl.

The speedup formula is given by ,


speedup = Execution time without enhancement = Execution Timeold
Execution Time with enhancement Execution Timenew

Execution Timenew = Execution Timeold x ((1- Fraction enhanced) + (FractionEnhanced / SpeedupEnhanced ))

overall speedup = 1/ ((1- Fraction enhanced) + (FractionEnhanced / SpeedupEnhanced ))

9 CS F342 BITS Pilani, Hyderabad Campus


Example
a) Given , Speedupenhanced = 20
Fraction enhanced =0.5
overall speedup =?
overall speedup = 1/((1-0.5) + 0.5/20) = 1.105

b) Given , Speedupenhanced =16


Fraction enhanced = 0.6
overall speedup = ?
overall speedup = 1/((1-0.6) + 0.6/16) = 2.286.

10 CS F342 BITS Pilani, Hyderabad Campus


13/09/22
Instruction Set

The repertoire of instructions of a computer


Different computers have different instruction sets
– But with many aspects in common
Early computers had very simple instruction sets
– Simplified implementation
Many modern computers also have simple instruction sets

12 CS F342 BITS Pilani, Hyderabad Campus


The MIPS Instruction Set

Stanford MIPS commercialized by MIPS Technologies


(www.mips.com)
Typical of many modern ISAs
– See MIPS Reference Data tear-out card, and Appendixes B and E
Similar ISAs have a large share of embedded core market
– Applications in consumer electronics, network/storage equipment,
cameras, printers, …

13 CS F342 BITS Pilani, Hyderabad Campus


Arithmetic Operations

Add and subtract, three operands


– Two sources and one destination
add a, b, c # a gets b + c
All arithmetic operations have this form
Design Principle 1: Simplicity favors regularity
– Regularity makes implementation simpler
– Simplicity enables higher performance at lower cost

14 CS F342 BITS Pilani, Hyderabad Campus


Arithmetic Example

C code:

f = (g + h) - (i + j);
Compiled MIPS code:

add t0, g, h # temp t0 = g + h


add t1, i, j # temp t1 = i + j
sub f, t0, t1 # f = t0 - t1

15 CS F342 BITS Pilani, Hyderabad Campus


Register Operands

Arithmetic instructions use register


operands
MIPS has a 32 × 32-bit register file
– Use for frequently accessed data
– Numbered 0 to 31
– 32-bit data called a “word”
Assembler names
– $t0, $t1, …, $t9 for temporary values
– $s0, $s1, …, $s7 for saved variables
Design Principle 2: Smaller is faster
– c.f. main memory: millions of locations

16 CS F342 BITS Pilani, Hyderabad Campus


Register Operand Example

C code:
f = (g + h) - (i + j);
– f, …, j in $s0, …, $s4
Compiled MIPS code:
add $t0, $s1, $s2
add $t1, $s3, $s4
sub $s0, $t0, $t1

17 CS F342 BITS Pilani, Hyderabad Campus


Memory Operands

Main memory used for composite data


– Arrays, structures, dynamic data
To apply arithmetic operations
– Load values from memory into registers
– Store result from register to memory
Memory is byte addressed
– Each address identifies an 8-bit byte
Words are aligned in memory
– Address must be a multiple of 4
MIPS is Big Endian
– Most-significant byte at least address of a word
– c.f. Little Endian: least-significant byte at least address

18 CS F342 BITS Pilani, Hyderabad Campus


Memory Operand Example 1

C code:
g = h + A[8];
– g in $s1, h in $s2, base address of A in $s3
Compiled MIPS code:
– Index 8 requires offset of 32
• 4 bytes per word
lw $t0, 32($s3) # load word
add $s1, $s2, $t0

19 CS F342 BITS Pilani, Hyderabad Campus


Memory Operand Example 2

C code:
A[12] = h + A[8];
– h in $s2, base address of A in $s3
Compiled MIPS code:
– Index 8 requires offset of 32

lw $t0, 32($s3) # load word


add $t0, $s2, $t0
sw $t0, 48($s3) # store word

20 CS F342 BITS Pilani, Hyderabad Campus

You might also like