Professional Documents
Culture Documents
1
The First “Computer” The First “Computer” (cont.)
2
Classes of Computers Classes of Computers
• Personal computers • Supercomputers
– General purpose, variety of software – High-end scientific and engineering calculations
– Subject to cost/performance tradeoff – Highest capability but represent a small fraction of
the overall computer market
• Server computers
– Network based • Embedded computers
– High capacity, performance, reliability – Hidden as components of systems
– Range from small servers to building sized – Stringent power/performance/cost constraints
3
Understanding Performance Below Your Program
• Algorithm • Application software
– Determines number of operations executed – Written in high-level language
• Programming language, compiler, architecture • System software
– Compiler: translates HLL code to
– Determine number of machine instructions executed per
machine code
operation
– Operating System: service code
• Processor and memory system • Handling input/output
– Determine how fast instructions are executed • Managing memory and storage
• Scheduling tasks & sharing resources
• I/O system (including OS)
• Hardware
– Determines how fast I/O operations are executed
– Processor, memory, I/O controllers
4
Touchscreen Through the Looking Glass
• PostPC device • LCD screen: picture elements (pixels)
• Supersedes keyboard – Mirrors content of frame buffer memory
and mouse
• Resistive and Capacitive
types
– Most tablets, smart
phones use capacitive
– Capacitive allows
multiple touches
simultaneously
5
Inside the Processor Abstractions
The BIG Picture
• Apple A5 • Abstraction helps us deal with complexity
– Hide lower-level detail
• Instruction set architecture (ISA)
– The hardware/software interface
• Application binary interface
– The ISA plus system software interface
• Implementation
– The details underlying and interface
Chapter 1: Computer Technology 21 Chapter 1: Computer Technology 22
6
Technology Trends Semiconductor Technology
• Electronics technology • Silicon: semiconductor
continues to evolve
– Increased capacity and • Add materials to transform properties:
performance
– Conductors
– Reduced cost
DRAM capacity
– Insulators
Year Technology Relative performance/cost – Switch
1951 Vacuum tube 1
1965 Transistor 35
1975 Integrated circuit (IC) 900
1995 Very large scale IC (VLSI) 2,400,000
2013 Ultra large scale IC 250,000,000,000
7
Integrated Circuit Cost Defining Performance
Cost per wafer • Which airplane has the best performance?
Cost per die
Dies per wafer Yield Boeing 777 Boeing 777
Dies per wafer Wafer area Die area Boeing 747 Boeing 747
BAC/Sud BAC/Sud
1
Concorde Concorde
(1 (Defects per area Die area/ )) 0 100 200 300 400 500 0 2000 4000 6000 8000 10000
• Nonlinear relation to area and defect rate Boeing 777 Boeing 777
– Wafer cost and area are fixed Boeing 747 Boeing 747
BAC/Sud BAC/Sud
– Die area determined by architecture and circuit design 0 500 1000 1500 0 100000 200000 300000 400000
8
Measuring Execution Time CPU Clocking
• Elapsed time • Operation of digital hardware governed by a
– Total response time, including all aspects constant-rate clock
Clock period
• Processing, I/O, OS overhead, idle time
Clock (cycles)
– Determines system performance
Data transfer
• CPU time and computation
• Discounts I/O time, other jobs’ shares • Clock period: duration of a clock cycle
– Comprises user CPU time and system CPU time – e.g., 250ps = 0.25ns = 250×10–12s
– Different programs are affected differently by CPU • Clock frequency (rate): cycles per second
and system performance
– e.g., 4.0GHz = 4000MHz = 4.0×109Hz
Chapter 1: Computer Technology 33 Chapter 1: Computer Technology 34
9
Instruction Count and CPI CPI Example
Clock Cycles Instruction Count Cycles per Instruction • Computer A: Cycle Time = 250ps, CPI = 2.0
CPU Time Instruction Count CPI Clock Cycle Time • Computer B: Cycle Time = 500ps, CPI = 1.2
• Same ISA, compiler
Instruction Count CPI
• Which is faster, and by how much?
Clock Rate
CPU Time Instruction Count CPI Cycle Time
A A A
• Instruction Count for a program
I 2.0 250ps I 500ps A is faster…
– Determined by program, ISA and compiler
CPU Time Instruction Count CPI Cycle Time
• Average cycles per instruction B B B
– Determined by CPU hardware I 1.2 500ps I 600ps
– If different instructions have different CPI CPU Time
B I 600ps 1.2
• Average CPI affected by instruction mix CPU Time I 500ps …by this much
A
Chapter 1: Computer Technology 37 Chapter 1: Computer Technology 38
10
Exercise Performance Summary
• A program consists of 1000 instructions in which: The BIG Picture
– 30% load/store instructions, CPI = 2.5 Instructio ns Clock cycles Seconds
– 10% jump instructions, CPI = 1 CPU Time
– 20% branch instructions, CPI = 1.5 Program Instructio n Clock cycle
– The rest are arithmetic instructions, CPI = 2.0
• The program is executed on a 2 GHz CPU • Performance depends on
a) What is execution time (CPU time) of the program? – Algorithm: affects IC, possibly CPI
b) What is the weight average CPI of the program? – Programming language: affects IC, CPI
c) If load/store instructions are improved so that their
execution time is reduced by a factor of 2, what is the – Compiler: affects IC, CPI
speed-up of the system? – Instruction set architecture: affects IC, CPI, Tc
10000
Power Trends 120
Reducing Power
3600 3900
2000 2667 3300 3400
100
1000
frequency 103
95 • Suppose a new CPU has
Frequency (MHz)
200 87 80
Power (W)
100
25
66 75.3 77
65 60 – 85% of capacitive load of old CPU
power
10
12.5 16
29.1 40 – 15% voltage and 15% frequency reduction
4.1 4.9
10.1 20
Pnew Cold 0.85 (Vold 0.85)2 Fold 0.85
1
3.3
0 2
0.85 4 0.52
Pold Cold Vold Fold
Pentium Pro
Pentium 4
Willamette
Core i5 Ivy
Pentium 4
Prescott
Skylake
Core i5
Kentsfield
Pentium
Clarkdal
e (2010)
(2004)
(2015)
Core i5
Bridge
(1982)
(1985)
(1989)
(1993)
(1997)
(2001)
(2012)
80286
80386
80486
Core 2
(2007)
11
Uniprocessor Performance Multiprocessors
• Multicore microprocessors
– More than one processor per chip
• Requires explicitly parallel programming
– Compare with instruction level parallelism
• Hardware executes multiple instructions at once
• Hidden from the programmer
– Hard to do
• Programming for performance
• Load balancing
Constrained by power, instruction-level parallelism,
memory latency • Optimizing communication and synchronization
Chapter 1: Computer Technology 45 Chapter 1: Computer Technology 46
12
SPEC Power Benchmark SPECpower_ssj2008 for Xeon X5650
10 10
Overall ssj_ops per Watt ssj_opsi poweri
i0 i 0
13
Pitfall: MIPS as a Performance Metric Concluding Remarks
• MIPS: Millions of Instructions Per Second • Cost/performance is improving
– Doesn’t account for – Due to underlying technology development
• Differences in ISAs between computers • Hierarchical layers of abstraction
• Differences in complexity between instructions – In both hardware and software
Instruction count • Instruction set architecture
MIPS
Execution time 10 6 – The hardware/software interface
Instruction count Clock rate • Execution time: the best performance measure
Instruction count CPI CPI 10 6
10 6 • Power is a limiting factor
Clock rate
– Use parallelism to improve performance
• CPI varies between programs on a given CPU
Chapter 1: Computer Technology 53 Chapter 1: Computer Technology 54
14