Professional Documents
Culture Documents
Introduction
1.1 Introduction 1.2 The Task of a Computer Designer 1.3 Technology and Computer Usage Trends 1.4 Cost and Trends in Cost 1.5 Measuring and Reporting Performance 1.6 Quantitative Principles of Computer Design
SOFTWARE
Workloads
Trends
Gordon Moore (Founder of Intel) observed in 1965 that the number of transistors that could be crammed on a chip doubles every year. This has CONTINUED to be true since then.
Transistors Per Chip
1.E+08 Pentium 3 Pentium Pro 1.E+07 Pentium Pentium II Power PC G3
486
Power PC 601
8086 1.E+04
4004 1.E+03 1970 1975 1980 1985 1990 1995 2000 2005
Trends
Processor performance, as measured by the SPEC benchmark has also risen dramatically.
Alpha 6/833
DEC Alpha 5/500 DEC AXP/ DEC Alpha 21264/600 500 DEC Alpha 4/266
87
88
89
90
91
92
93
94
95
96
97
98
99
2000
8
Trends
Memory Capacity (and Cost) have changed dramatically in the last 20 years.
size
1000000000
100000000
10000000
1000000
100000
10000
size(Mb) cyc time 0.0625 250 ns 0.25 220 ns 1 190 ns 4 165 ns 16 145 ns 64 120 ns 256 100 ns
Bits
Trends
Based on SPEED, the CPU has increased dramatically, but memory and disk have increased only a little. This has led to dramatic changed in architecture, Operating Systems, and Programming practices.
Capacity
Logic DRAM 2x in 3 years 4x in 3 years
Speed (latency)
2x in 3 years 2x in 10 years
Disk
4x in 3 years
2x in 10 years
10
11
Metrics
Plane DC to Paris Speed Passengers Throughput (pmph) 286,700 Boeing 747 6.5 hours 610 mph 470
BAD/Sud Concodre
3 hours
1350 mph
132
178,200
Metrics - Comparisons
"X is n times faster than Y" means
ExTime(Y) --------ExTime(X) Performance(X) --------------Performance(Y)
Speed of Concorde vs. Boeing 747 Throughput of Boeing 747 vs. Concorde
13
Metrics - Comparisons
Pat has developed a new product, "rabbit" about which she wishes to determine performance. There is special interest in comparing the new product, rabbit to the old product, turtle, since the product was rewritten for performance reasons. (Pat had used Performance Engineering techniques and thus knew that rabbit was "about twice as fast" as turtle.) The measurements showed: Performance Comparisons
Product Turtle Rabbit Transactions / second 30 60 Seconds/ transaction 0.0333 0.0166 Seconds to process transaction 3 1
Which of the following statements reflect the performance comparison of rabbit and turtle? o Rabbit is 100% faster than turtle. o Rabbit is twice as fast as turtle. o Rabbit takes 1/2 as long as turtle. o Rabbit takes 1/3 as long as turtle. o Rabbit takes 100% less time than turtle. o Rabbit takes 200% less time than turtle. o Turtle is 50% as fast as rabbit. o Turtle is 50% slower than rabbit. o Turtle takes 200% longer than rabbit. o Turtle takes 300% longer than rabbit.
14
Metrics - Throughput
Application Programming Language Compiler
ISA
(millions) of Instructions per second: MIPS (millions) of (FP) operations per second: MFLOP/s
Megabytes per second Cycles per second (clock rate)
15
16
Benchmarks
SPEC: System Performance Evaluation Cooperative
First Round 1989 10 programs yielding a single number (SPECmarks) Second Round 1992 SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs) Compiler Flags unlimited. March 93 of DEC 4000 Model 610: spice: unix.c:/def=(sysv,has_bcopy,bcopy(a,b,c)= memcpy(b,a,c) wave5: /ali=(all,dcom=nat)/ag=a/ur=4/ur=200 nasa7: /norecu/ag=a/ur=4/ur2=200/lc=blas
Third Round 1995 new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floating point) benchmarks useful for 3 years Single flag setting for all programs: SPECint_base95, SPECfp_base95
17
Benchmarks
CINT2000 (Integer Component of SPEC CPU2000):
Program
164.gzip 175.vpr 176.gcc 181.mcf 186.crafty 197.parser 252.eon 253.perlbmk 254.gap 255.vortex 256.bzip2 300.twolf C C C C C C C++ C C C C C
Language
What Is It
Compression FPGA Circuit Placement and Routing C Programming Language Compiler Combinatorial Optimization Game Playing: Chess Word Processing Computer Visualization PERL Programming Language Group Theory, Interpreter Object-oriented Database Compression Place and Route Simulator
http://www.spec.org/osg/cpu2000/CINT2000/
18
Benchmarks
CFP2000 (Floating Point Component of SPEC CPU2000):
Program 168.wupwise 171.swim 172.mgrid 173.applu 177.mesa 178.galgel 179.art 183.equake 187.facerec 188.ammp 189.lucas 191.fma3d 200.sixtrack 301.apsi Language Fortran 77 Fortran 77 Fortran 77 Fortran 77 C Fortran 90 C C Fortran 90 C Fortran 90 Fortran 90 Fortran 77 Fortran 77 What Is It Physics / Quantum Chromodynamics Shallow Water Modeling Multi-grid Solver: 3D Potential Field Parabolic / Elliptic Differential Equations 3-D Graphics Library Computational Fluid Dynamics Image Recognition / Neural Networks Seismic Wave Propagation Simulation Image Processing: Face Recognition Computational Chemistry Number Theory / Primality Testing Finite-element Crash Simulation High Energy Physics Accelerator Design Meteorology: Pollutant Distribution
19
Benchmarks
Benchmarks Base Base Base Ref Time Run Time Ratio
http://www.spec.org/osg/cpu2000/results/res2000q3/cpu2000-20000718-00168.asc
164.gzip 1400 175.vpr 1400 176.gcc 1100 181.mcf 1800 186.crafty 1000 197.parser 1800 252.eon 1300 253.perlbmk 1800 254.gap 1100 255.vortex 1900 256.bzip2 1500 300.twolf 3000 SPECint_base2000 SPECint2000
277 419 275 621 191 500 267 302 249 268 389 784
505* 334* 399* 290* 522* 360* 486* 596* 442* 710* 386* 382* 438
1400 1400 1100 1800 1000 1800 1300 1800 1100 1900 1500 3000
270 417 272 619 191 499 267 302 248 264 375 776
518* 336* 405* 291* 523* 361* 486* 596* 443* 719* 400* 387*
442
20
Benchmarks
Performance Evaluation
For better or worse, benchmarks shape a field Good products created when have: Good benchmarks Good ways to summarize performance Given sales is a function in part of performance relative to competition, investment in improving product as reported by performance summary If benchmarks/summary inadequate, then choose between improving product for real programs vs. improving product to get more sales; Sales almost always wins! Execution time is the measure of computer performance!
21
Benchmarks
How to Summarize Performance
Management would like to have one number. Technical people want more: 1. They want to have evidence of reproducibility there should be enough information so that you or someone else can repeat the experiment. 2. There should be consistency when doing the measurements multiple times.
1001
110
40
22
23
Quantitative Design
Amdahl's Law
Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected
24
Quantitative Design
Amdahl's Law
Speedupenhanced
1 = (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced
Speedupoverall =
ExTimeold ExTimenew
Quantitative Design
Amdahl's Law
Floating point instructions improved to run 2X; but only 10% of actual instructions are FP
Speedupoverall =
1 0.95
1.053
26
Quantitative Design
CPI = (CPU Time * Clock Rate) / Instruction Count = Cycles / Instruction Count
CPI CPI i * Fi
i 1
where
Fi
Ii Instruction _ Count
Quantitative Design
Suppose we have a machine where we can count the frequency with which instructions are executed. We also know how many cycles it takes for each instruction type.
Base Machine (Reg / Reg) Op Freq Cycles CPI(i) ALU 50% 1 .5 Load 20% 2 .4 Store 10% 2 .2 Branch 20% 2 .4 Total CPI 1.5 How do we get CPI(I)? How do we get %time?
28
Quantitative Design
Locality of Reference
Programs access a relatively small portion of the address space at any instant of time.
There are two different types of locality: Temporal Locality (locality in time): If an item is referenced, it will tend to be referenced again soon (loops, reuse, etc.) Spatial Locality (locality in space/location): If an item is referenced, items whose addresses are close by tend to be referenced soon (straight line code, array access, etc.)
29