You are on page 1of 37

Chapter 1 — Computer Abstractions & Technology September 3, 2022

Computer Architecture
Chapter 1: Computer Abstractions and Technology

Adapted from Computer Organization the Hardware/Software Interface – 5th

Computer Engineering – CSE – HCMUT

The Computer Revolution


• The third revolution along with agriculture and
industry
• Progress in computer technology
– Underpinned by Moore’s Law
• Makes novel applications feasible
– Computers in automobiles
– Cell phones
– Human genome project
– World Wide Web
– Search Engines
• Computers are pervasive

Chapter 1: Computer Technology 2

Computer Architecture 1
Chapter 1 — Computer Abstractions & Technology September 3, 2022

Industrial Revolutions

Chapter 1: Computer Technology 3

The Computer Revolution


• The third revolution along with agriculture and
industry
• Progress in computer technology
– Underpinned by Moore’s Law
• Makes novel applications feasible
– Computers in automobiles
– Cell phones
– Human genome project
– World Wide Web
– Search Engines
• Computers are pervasive

Chapter 1: Computer Technology 4

Computer Architecture 2
Chapter 1 — Computer Abstractions & Technology September 3, 2022

The Moore’s Law

Chapter 1: Computer Technology 5

Processors & Chips


• World record, in terms of the number of transistors integrated into a chip:
– Altera FPGA device: 30+ Billions
• Apple M1 launched 2020
– octa-core 64-bit ARM64 SoC,
SIMD, caches
– 5 nm technology
– 119 mm2
– 16B transistors

Chapter 1: Computer Technology 6

Computer Architecture 3
Chapter 1 — Computer Abstractions & Technology September 3, 2022

The First “Computer”

Source: Internet
Chapter 1: Computer Technology 7

The First “Computer” (cont.)

The ENIAC Computer, source: US Army photo


Chapter 1: Computer Technology 8

Computer Architecture 4
Chapter 1 — Computer Abstractions & Technology September 3, 2022

The ENIAC Computer


• 30+ tons
• 1,500+ square feet (140 square meter)
• 18,000+ vacuum tubes
• 140+ KW power
• 5,000+ additions per second

Chapter 1: Computer Technology 9

A Brief History of Computers


• The first generation
– Vacuum tubes
– 1946 – 1955
• The second generation
– Transistors
– 1955 – 1965
• The third generation
– 1965 – 1980
– Integrated circuits
• The current generation
– 1980 - …
– Personal computers
• What’s the next?
– Quantum computers?
– Memristor?

Chapter 1: Computer Technology 10

Computer Architecture 5
Chapter 1 — Computer Abstractions & Technology September 3, 2022

Classes of Computers
• Personal computers
– General purpose, variety of software
– Subject to cost/performance tradeoff

• Server computers
– Network based
– High capacity, performance, reliability
– Range from small servers to building sized

Chapter 1: Computer Technology 11

Classes of Computers
• Supercomputers
– High-end scientific and engineering calculations
– Highest capability but represent a small fraction of
the overall computer market

• Embedded computers
– Hidden as components of systems
– Stringent power/performance/cost constraints

Chapter 1: Computer Technology 12

Computer Architecture 6
Chapter 1 — Computer Abstractions & Technology September 3, 2022

The PostPC Era

source: BusinessInsider
Chapter 1: Computer Technology 13

The PostPC Era


• Personal Mobile Device (PMD)
– Battery operated
– Connects to the Internet
– Hundreds of dollars
– Smart phones, tablets, electronic glasses,…
• Clouding computing
– Warehouse Scale Computers (WSC)
– Software as a Service (SaaS)
– Portion of software run on a PMD and a portion run in
the Cloud
– Amazon and Google
Chapter 1: Computer Technology 14

Computer Architecture 7
Chapter 1 — Computer Abstractions & Technology September 3, 2022

Understanding Performance
• Algorithm
– Determines number of operations executed
• Programming language, compiler, architecture
– Determine number of machine instructions executed per
operation
• Processor and memory system
– Determine how fast instructions are executed
• I/O system (including OS)
– Determines how fast I/O operations are executed

Chapter 1: Computer Technology 15

Quiz
1. The number of embedded processors sold every year greatly
outnumbers the number of PC and even PostPC processors. Can
you confirm or deny this insight based on your own experience?
Try to count the number of embedded processors in your home.
How does it compare with the number of conventional computers
in your home?
2. As mentioned earlier, both the software and hardware affect the
performance of a program. Can you think of examples where each
of the following is the right place to look for a performance
bottleneck?
– The algorithm chosen
– The programming language or compiler
– The operating system
– The processor
– The I/O system and devices

Chapter 1: Computer Technology 16

Computer Architecture 8
Chapter 1 — Computer Abstractions & Technology September 3, 2022

Below Your Program


• Application software
– Written in high-level language
• System software
– Compiler: translates HLL code to
machine code
– Operating System: service code
• Handling input/output
• Managing memory and storage
• Scheduling tasks & sharing resources
• Hardware
– Processor, memory, I/O controllers

Chapter 1: Computer Technology 17

Levels of Program Code


• High-level language
– Level of abstraction closer
to problem domain
– Provides for productivity
and portability
• Assembly language
– Textual representation of
instructions
• Hardware representation
– Binary digits (bits)
– Encoded instructions and
data

Chapter 1: Computer Technology 18

Computer Architecture 9
Chapter 1 — Computer Abstractions & Technology September 3, 2022

Components of a Computer
• Same components for
The BIG Picture
all kinds of computer
– Desktop, server,
embedded
• Input/output includes
– User-interface devices
• Display, keyboard, mouse
– Storage devices
• Hard disk, CD/DVD, flash
– Network adapters
• For communicating with
other computers

Chapter 1: Computer Technology 19

Touchscreen
• PostPC device
• Replace keyboard and
mouse
• Resistive and Capacitive
types
– Most tablets, smart
phones use capacitive
– Capacitive allows
multiple touches
simultaneously

Chapter 1: Computer Technology 20

Computer Architecture 10
Chapter 1 — Computer Abstractions & Technology September 3, 2022

Through the Looking Glass


• LCD screen: picture elements (pixels)
– Mirrors content of frame buffer memory

Chapter 1: Computer Technology 21

Opening the Box (Apple iPad 2)


Capacitive multitouch LCD screen

3.8 V, 25 Watt-hour battery

Computer board

Chapter 1: Computer Technology 22

Computer Architecture 11
Chapter 1 — Computer Abstractions & Technology September 3, 2022

Inside the Processor (CPU)


• Datapath: performs operations on data
• Control: sequences datapath, memory, ...
• Cache memory
– Small fast SRAM memory for immediate access to
data

Chapter 1: Computer Technology 23

Inside the Processor


• Apple A5

Chapter 1: Computer Technology 24

Computer Architecture 12
Chapter 1 — Computer Abstractions & Technology September 3, 2022

A Safe Place for Data


• Volatile main memory
– Loses instructions and data when power off
• Non-volatile secondary memory
– Magnetic disk
– Flash memory
– Optical disk (CDROM, DVD)

Chapter 1: Computer Technology 25

Networks
• Communication, resource sharing, nonlocal
access
• Local area network (LAN): Ethernet
• Wide area network (WAN): the Internet
• Wireless network: WiFi, Bluetooth

Chapter 1: Computer Technology 26

Computer Architecture 13
Chapter 1 — Computer Abstractions & Technology September 3, 2022

Abstractions
The BIG Picture
• Abstraction helps us deal with complexity
– Hide lower-level detail
• Instruction set architecture (ISA)
– The hardware/software interface
• Application binary interface
– The ISA plus system software interface
• Implementation
– The details underlying and interface
Chapter 1: Computer Technology 27

The Instruction Set Architecture (ISA)

software

instruction set architecture

hardware

The interface description separating the


software and hardware

Chapter 1: Computer Technology 28

Computer Architecture 14
Chapter 1 — Computer Abstractions & Technology September 3, 2022

Instruction Set Architecture (ISA)


• ISA, or simply architecture – the abstract interface between
the hardware and the lowest level software that
encompasses all the information necessary to write a
machine language program, including instructions,
registers, memory access, I/O, …
– Enables implementations of varying cost and performance to
run identical software

• The combination of the basic instruction set (the ISA) and


the operating system interface is called the application
binary interface (ABI)
– ABI – The user portion of the instruction set plus the operating
system interfaces used by application programmers. Defines a
standard for binary portability across computers.

Chapter 1: Computer Technology 29

ISA Type Sales


ARM
1400 IA-32
Millions of Processor

1200
MIPS
1000
Motorola 68K
800
PowerPC
600
Hitachi SH
400
SPARC
200
Other
0
1998 1999 2000 2001 2002

Chapter 1: Computer Technology 30

Computer Architecture 15
Chapter 1 — Computer Abstractions & Technology September 3, 2022

How Do the Pieces Fit Together?


Applications
Operating
System
Instruction Set Compiler Firmware
Architecture Memory I/O system network
Processor
system
Datapath & Control
Digital Design
Circuit Design

• Coordination of many levels of abstraction


• Under a rapidly changing set of forces
• Design, measurement, and evaluation

Chapter 1: Computer Technology 31

Technology Trends
• Electronics technology
continues to evolve
– Increased capacity and
performance
– Reduced cost
DRAM capacity

Year Technology Relative performance/cost


1951 Vacuum tube 1
1965 Transistor 35
1975 Integrated circuit (IC) 900
1995 Very large scale IC (VLSI) 2,400,000
2013 Ultra large scale IC 250,000,000,000

Chapter 1: Computer Technology 32

Computer Architecture 16
Chapter 1 — Computer Abstractions & Technology September 3, 2022

Semiconductor Technology
• Silicon: semiconductor
• Add materials to transform properties:
– Conductors
– Insulators
– Switch

Chapter 1: Computer Technology 33

Manufacturing ICs

• Yield: proportion of working dies per wafer


Chapter 1: Computer Technology 34

Computer Architecture 17
Chapter 1 — Computer Abstractions & Technology September 3, 2022

Intel Core i7 Wafer

• 300mm wafer, 280 chips, 32nm technology


• Each chip is 20.7 x 10.5 mm
Chapter 1: Computer Technology 35

AMD Opteron X2 Wafer


• X2: 300mm wafer, 117 chips/wafer, 90nm technology
• X4: 45nm technology

Chapter 1: Computer Technology 36

Computer Architecture 18
Chapter 1 — Computer Abstractions & Technology September 3, 2022

Integrated Circuit Cost


Cost per wafer
Cost per die =
Dies per wafer  Yield
Dies per wafer  Wafer area Die area
1
Yield =
(1+ (Defects per area  Die area/ ))

• Nonlinear relation to area and defect rate


– Wafer cost and area are fixed
– Defect rate determined by manufacturing process
– Die area determined by architecture and circuit design

Chapter 1: Computer Technology 37

Quiz
• A key factor in determining the cost of an integrated circuit
is volume. Which of the following are reasons why a chip
made in high volume should cost less?
1. With high volumes, the manufacturing process can be tuned
to a particular design, increasing the yield.
2. It is less work to design a high-volume part than a low-volume
part.
3. The masks used to make the chip are expensive, so the cost
per chip is lower for higher volumes.
4. Engineering development costs are high and largely
independent of volume; thus, the development cost per die is
lower with high-volume parts.
5. High-volume parts usually have smaller die sizes than low-
volume parts and therefore have higher yield per wafer

Chapter 1: Computer Technology 38

Computer Architecture 19
Chapter 1 — Computer Abstractions & Technology September 3, 2022

Defining Performance
• Which airplane has the best performance?
Boeing 777 Boeing 777

Boeing 747 Boeing 747

BAC/Sud BAC/Sud
Concorde Concorde
Douglas DC- Douglas DC-
8-50 8-50

0 100 200 300 400 500 0 2000 4000 6000 8000 10000

Passenger Capacity Cruising Range (miles)

Boeing 777 Boeing 777

Boeing 747 Boeing 747

BAC/Sud BAC/Sud
Concorde Concorde
Douglas DC- Douglas DC-
8-50 8-50

0 500 1000 1500 0 100000 200000 300000 400000

Cruising Speed (mph) Passengers x mph

Chapter 1: Computer Technology 39

Response Time and Throughput


• Response time
– How long it takes to do a task
• Throughput
– Total work done per unit time
• e.g., tasks/transactions/… per hour
• How are response time and throughput affected by
– Replacing the processor with a faster version?
– Adding more processors?
• We’ll focus on response time for now…

Chapter 1: Computer Technology 40

Computer Architecture 20
Chapter 1 — Computer Abstractions & Technology September 3, 2022

Relative Performance
• Define Performance = 1/Execution Time
• “X is n time faster than Y”
Performance X Performance Y
= Execution time Y Execution time X = n
• Example: time taken to run a program
– 10s on A, 15s on B
– Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
– So A is 1.5 times faster than B
Chapter 1: Computer Technology 41

Measuring Execution Time


• Elapsed time
– Total response time, including all aspects
• Processing, I/O, OS overhead, idle time
– Determines system performance
• CPU time
– Time spent processing a given job
• Discounts I/O time, other jobs’ shares
– Comprises user CPU time and system CPU time
– Different programs are affected differently by CPU
and system performance

Chapter 1: Computer Technology 42

Computer Architecture 21
Chapter 1 — Computer Abstractions & Technology September 3, 2022

CPU Clocking
• Operation of digital hardware governed by a
constant-rate clock
Clock period
Clock (cycles)
Data transfer
and computation
Update state

• Clock period: duration of a clock cycle


– e.g., 250ps = 0.25ns = 250×10–12s
• Clock frequency (rate): cycles per second
– e.g., 4.0GHz = 4000MHz = 4.0×109Hz

Chapter 1: Computer Technology 43

CPU Time
CPU Time = CPU Clock Cycles Clock Cycle Time
CPU Clock Cycles
=
Clock Rate
• Performance improved by
– Reducing number of clock cycles
– Increasing clock rate
– Hardware designer must often trade off clock rate
against cycle count

Chapter 1: Computer Technology 44

Computer Architecture 22
Chapter 1 — Computer Abstractions & Technology September 3, 2022

CPU Time Example


• Computer A: 2GHz clock, 10s CPU time
• Designing Computer B
– Aim for 6s CPU time
– Can do faster clock, but causes 1.2 × clock cycles
• How fast must Computer B clock be?
Clock CyclesB 1.2  Clock CyclesA
Clock RateB = =
CPU TimeB 6s
Clock CyclesA = CPU Time A  Clock RateA
= 10s  2GHz = 20  109
1.2  20  109 24  109
Clock RateB = = = 4GHz
6s Technology 6s
Chapter 1: Computer 45

Instruction Count and CPI


Clock Cycles = Instruction Count  Cycles per Instruction
CPU Time = Instruction Count  CPI  Clock Cycle Time
Instruction Count  CPI
=
Clock Rate
• Instruction Count for a program
– Determined by program, ISA and compiler
• Average cycles per instruction
– Determined by CPU hardware
– If different instructions have different CPI
• Average CPI affected by instruction mix

Chapter 1: Computer Technology 46

Computer Architecture 23
Chapter 1 — Computer Abstractions & Technology September 3, 2022

CPI Example
• Computer A: Cycle Time = 250ps, CPI = 2.0
• Computer B: Cycle Time = 500ps, CPI = 1.2
• Same ISA, compiler
• Which is faster, and by how much?
CPU Time = Instruction Count  CPI  Cycle Time
A A A
= I  2.0  250ps = I  500ps A is faster…
CPU Time = Instruction Count  CPI  Cycle Time
B B B
= I  1.2  500ps = I  600ps

B = I  600ps = 1.2
CPU Time
…by this much
CPU Time I  500ps
A
Chapter 1: Computer Technology 47

CPI in More Detail


• If different instruction classes take different
numbers of cycles
n
Clock Cycles =  (CPIi  Instruction Counti )
i=1

• Weighted average CPI


Clock Cycles n
 Instruction Counti 
CPI = =   CPIi  
Instruction Count i=1  Instruction Count 

Relative frequency
Chapter 1: Computer Technology 48

Computer Architecture 24
Chapter 1 — Computer Abstractions & Technology September 3, 2022

CPI Example
• Alternative compiled code sequences using
instructions in classes A, B, C
Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1
◼ Sequence 1: IC = 5 ◼ Sequence 2: IC = 6
◼ Clock Cycles ◼ Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
◼ Avg. CPI = 10/5 = 2.0 ◼ Avg. CPI = 9/6 = 1.5
Chapter 1: Computer Technology 49

Exercise
• A program consists of 1000 instructions in which:
– 30% load/store instructions, CPI = 2.5
– 10% jump instructions, CPI = 1
– 20% branch instructions, CPI = 1.5
– The rest are arithmetic instructions, CPI = 2.0
• The program is executed on a 2 GHz CPU
a) What is execution time (CPU time) of the program?
b) What is the weight average CPI of the program?
c) If load/store instructions are improved so that their
execution time is reduced by a factor of 2, what is the
speed-up of the system?

Chapter 1: Computer Technology 50

Computer Architecture 25
Chapter 1 — Computer Abstractions & Technology September 3, 2022

Performance Summary
The BIG Picture
Instructio ns Clock cycles Seconds
CPU Time =  
Program Instructio n Clock cycle

• Performance depends on
– Algorithm: affects IC, possibly CPI
– Programming language: affects IC, CPI
– Compiler: affects IC, CPI
– Instruction set architecture: affects IC, CPI, Tc

Chapter 1: Computer Technology 51

Quiz
• A given application written in Java runs 15
seconds on a desktop processor. A new Java
compiler is released that requires only 0.6 as
many instructions as the old compiler.
Unfortunately, it increases the CPI by 1.1. How
fast can we expect the application to run using
this new compiler?
– 15 x 0.6 x 1.1 = 9.9 sec

Chapter 1: Computer Technology 52

Computer Architecture 26
Chapter 1 — Computer Abstractions & Technology September 3, 2022

Power Trends
10000 3600 120
3900
2000 2667 3300 3400
100
1000
frequency 103
95
Frequency (MHz)

200 87 80

Power (W)
66 75.3 77
100 65 60
25
16 power
12.5 40
29.1
10
10.1 20
3.3 4.1 4.9
1 0

Pentium Pro

Pentium 4
Willamette

Core i5 Ivy
Pentium 4

Prescott

Skylake
Core i5
Kentsfield
Pentium

Clarkdal
e (2010)
(2004)

(2015)
Core i5

Bridge
(1982)

(1985)

(1989)

(1993)

(1997)

(2001)

(2012)
80286

80486
80386

Core 2

(2007)
• In CMOS IC technology
𝑃𝑜𝑤𝑒𝑟𝑑𝑦𝑛𝑎𝑚𝑖𝑐 = Capacitive load × Voltage2 × Frequency

×30 5V → 1V ×1000
Chapter 1: Computer Technology 53

Define and Quantify Power


• For mobile devices, energy better metric
𝐸𝑛𝑒𝑟𝑔𝑦𝑑𝑦𝑛𝑎𝑚𝑖𝑐 = Capacitive load × Voltage2

• For a fixed task, slowing clock rate (frequency


switched) reduces power, but not energy
• Capacitive load a function of number of transistors
connected to output and technology, which determines
capacitance of wires and transistors
• Dropping voltage helps both, so went from 5V to 1V
• To save energy & dynamic power, most CPUs now turn
off clock of inactive modules (e.g. Fl. Pt. Unit)

Chapter 1: Computer Technology 54

Computer Architecture 27
Chapter 1 — Computer Abstractions & Technology September 3, 2022

Reducing Power
• Suppose a new CPU has
– 85% of capacitive load of old CPU
– 15% voltage and 15% frequency reduction
Pnew Cold  0.85 (Vold  0.85)2  Fold  0.85
= = 0.854 = 0.52
Cold  Vold  Fold
2
Pold

• The power wall


– We can’t reduce voltage further
– We can’t remove more heat
• How else can we improve performance?

Chapter 1: Computer Technology 55

Define and Quantify power


• Because leakage current flows even when a transistor is off,
now static power important too
𝑃𝑜𝑤𝑒𝑟𝑠𝑡𝑎𝑡𝑖𝑐 = 𝐶𝑢𝑟𝑟𝑒𝑛𝑡𝑠𝑡𝑎𝑡𝑖𝑐 × 𝑉𝑜𝑙𝑡𝑎𝑔𝑒

• Leakage current increases in processors with smaller


transistor sizes
• Increasing the number of transistors increases power even
if they are turned off
• In 2006, goal for leakage is 25% of total power
consumption; high performance designs at 40%
• Very low power systems even gate voltage to inactive
modules to control loss due to leakage

Chapter 1: Computer Technology 56

Computer Architecture 28
Chapter 1 — Computer Abstractions & Technology September 3, 2022

Uniprocessor Performance

Constrained by power, instruction-level parallelism,


memory latency

Chapter 1: Computer Technology 57

Multiprocessors
• Multicore microprocessors
– More than one processor per chip
• Requires explicitly parallel programming
– Compare with instruction level parallelism
• Hardware executes multiple instructions at once
• Hidden from the programmer
– Hard to do
• Programming for performance
• Load balancing
• Optimizing communication and synchronization
Chapter 1: Computer Technology 58

Computer Architecture 29
Chapter 1 — Computer Abstractions & Technology September 3, 2022

SPEC CPU Benchmark


• Usually rely on benchmarks vs. real workloads
• Programs used to measure performance
– Supposedly typical of actual workload
• To increase predictability, collections of benchmark applications,
called benchmark suites, are popular
– Standard Performance Evaluation Corp (SPEC)
– Develops benchmarks for CPU, I/O, Web, …
• SPECCPU: popular desktop benchmark suite
– CPU only, split between integer and floating point programs
– SPECint2000 has 12 integer, SPECfp2000 has 14 integer programs
• SPEC CPU2006
– SPECCPU2006 to be announced Spring 2006
– Elapsed time to execute a selection of programs
– Negligible I/O, so focuses on CPU performance

Chapter 1: Computer Technology 59

How To Summarize Suite Performance


• Arithmetic average of execution time of all programs?
– But they vary by 4X in speed, so some would be more
important than others in arithmetic average
• Could add a weights per program, but how pick
weight?
– Different companies want different weights for their
products
• SPECRatio: Normalize execution times to reference
computer, yielding a ratio
𝑅𝑎𝑡𝑖𝑜 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛𝑎𝑙 𝑡𝑜 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒
𝑡𝑖𝑚𝑒 𝑜𝑛 𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟
=
𝑡𝑖𝑚𝑒 𝑜𝑛 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟 𝑏𝑒𝑖𝑛𝑔 𝑟𝑎𝑡𝑒𝑑

Chapter 1: Computer Technology 60

Computer Architecture 30
Chapter 1 — Computer Abstractions & Technology September 3, 2022

How To Summarize Suite Performance


• If program SPECRatio on Computer A is 1.25 times bigger
than Computer B, then
ExecutionTimereference
SPECRatioA ExecutionTimeA
1.25 = =
SPECRatioB ExecutionTimereference
ExecutionTimeB
ExecutionTimeB PerformanceA
= =
ExecutionTimeA PerformanceB
• Note that when comparing 2 computers as a ratio,
execution times on the reference computer drop out, so
choice of reference computer is irrelevant

Chapter 1: Computer Technology 61

How To Summarize Suite Performance


• Since ratios, proper mean is geometric mean
(SPECRatio unitless, so arithmetic mean meaningless)
n
GeometricMean = n  SPECRatio
i =1
i

1. Geometric mean of the ratios is the same as the ratio of


the geometric means
2. Ratio of geometric means
= Geometric mean of performance ratios
 choice of reference computer is irrelevant!
• These two points make geometric mean of ratios attractive
to summarize performance

Chapter 1: Computer Technology 62

Computer Architecture 31
Chapter 1 — Computer Abstractions & Technology September 3, 2022

CINT2006 for Intel Core i7 920

Chapter 1: Computer Technology 63

SPEC Power Benchmark


• Power consumption of server at different
workload levels
– Performance: ssj_ops/sec
– Power: Watts (Joules/sec)

 10   10 
Overall ssj_ops per Watt =   ssj_opsi    poweri 
 i =0   i =0 

Chapter 1: Computer Technology 64

Computer Architecture 32
Chapter 1 — Computer Abstractions & Technology September 3, 2022

SPECpower_ssj2008 for Xeon X5650

Chapter 1: Computer Technology 65

Pitfall: Amdahl’s Law


• Improving an aspect of a computer and expecting
a proportional improvement in overall
performance
T𝑎𝑓𝑓𝑒𝑐𝑡𝑒𝑑 𝑏𝑦 𝑖𝑚𝑝𝑟𝑜𝑣𝑒𝑚𝑒𝑛𝑡
T𝑎𝑓𝑡𝑒𝑟 𝑖𝑚𝑝𝑟𝑜𝑣𝑒𝑚𝑒𝑛𝑡 = + Tunaffected
𝑖𝑚𝑝𝑟𝑜𝑣𝑒𝑚𝑒𝑛𝑡 𝑓𝑎𝑐𝑡𝑜𝑟

• Example: multiply accounts for 80s/100s


– How much improvement in multiply performance to
get 5× overall?
80
– Can’t be done: 20 = + 20
n

• Corollary: make the common case fast


Chapter 1: Computer Technology 66

Computer Architecture 33
Chapter 1 — Computer Abstractions & Technology September 3, 2022

Amdahl’s Law
• “X is N% faster than Y.”
Execution Time of Y N
= 1+
Execution Time of X 100
• Assume enhancement E accelerates a fraction F of the task by a factor S
– F: The fraction enhanced
– S: The speedup of the enhanced fraction

1–F F 1–F F/S Taffected


Timproved = Tunaffected +
improvement factor

Improvement is limited by the  F


Texe (with E) = Texe (without E) * (1 − F ) + 
 S
fraction of the original execution
Texe (without E) 1
time that can’t be enhanced Overall Speedup = =
F
Texe (with E) (1 - F ) +
1 S
Speedup max =
(1 - F )

Chapter 1: Computer Technology 67

Amdahl’s Law Example


• New CPU 10x faster
• I/O bound server, so 60% of time waiting for
I/O
– F = 40% S = 10
1 1
Overall Speedup = = = 1.56
F 0.4
(1 − F ) + (1 − 0.4 ) +
S 10

• Apparently, its human nature to be attracted


by 10x faster, vs. keeping in perspective its just
1.6x faster
Chapter 1: Computer Technology 68

Computer Architecture 34
Chapter 1 — Computer Abstractions & Technology September 3, 2022

Quiz
• Overall speedup if we make 90% of a program
run 10 times faster.
– F = 0.9 S = 10
1
Overall Speedup = = 5.26
𝐹
(1 − 𝐹) +
𝑆

• Overall speedup if we make 80% of a program


run 20% faster
– F = 0.8 S = 1.2
1
Overall Speedup = = 1.153
𝐹
(1 − 𝐹) +
𝑆

Chapter 1: Computer Technology 69

Fallacy: Low Power at Idle


• Look back at i7 power benchmark
– At 100% load: 258W
– At 50% load: 170W (66%)
– At 10% load: 121W (47%)
• Google data center
– Mostly operates at 10% – 50% load
– At 100% load less than 1% of the time
• Consider designing processors to make power
proportional to load
Chapter 1: Computer Technology 71

Computer Architecture 35
Chapter 1 — Computer Abstractions & Technology September 3, 2022

Pitfall: MIPS as a Performance Metric

• MIPS: Millions of Instructions Per Second


– Doesn’t account for
• Differences in ISAs between computers
• Differences in complexity between instructions
Instruction count
MIPS =
Execution time  106
Instruction count Clock rate
= =
Instruction count  CPI CPI  106
 106
Clock rate
• CPI varies between programs on a given CPU
Chapter 1: Computer Technology 72

Quiz

Chapter 1: Computer Technology 73

Computer Architecture 36
Chapter 1 — Computer Abstractions & Technology September 3, 2022

Concluding Remarks
• Cost/performance is improving
– Due to underlying technology development
• Hierarchical layers of abstraction
– In both hardware and software
• Instruction set architecture
– The hardware/software interface
• Execution time: the best performance measure
• Power is a limiting factor
– Use parallelism to improve performance

Chapter 1: Computer Technology 74

Computer Architecture 37

You might also like