Professional Documents
Culture Documents
210 1,024 K Kb KB
220 1,048,576 M Mb MB
230 1,073,741,824 G Gb GB
240 1,099,511,627,776 T Tb TB
Elapsed time
of elapsed time
in seconds
90.7 + 12.9
─────── × 100 = 65%
159
Time(ref)
Relative MIPS = ────── × MIPS(ref)
Time
http://en.wikipedia.org/wiki/Supercomputer
Petaflops
Teraflops
Gigaflops
Megaflops
http://en.wikipedia.org/wiki/Supercomputer
or
n
Av. execution time = [ ∏ Execution time (program i ) ]1/n
i =1
spice
docuc
matrix300
tomcatv
nasa7
eqntott
espresso
li
fpppp
https://www.spec.org/cpu2000/docs/readme1st.html#Q8
Spr 2015, Apr 13 . . . ELEC 5200-001/6200-001 Lecture 10 30
CFP2000: Fourteen Programs
(6 Fortran 77, 4 Fortran 90, 4 C)
Name Ref Time Remarks
168.wupwise 1600 Quantum chromodynamics
171.swim 3100 Shallow water modeling
172.mgrid 1800 Multi-grid solver in 3D potential field
173.applu 2100 Parabolic/elliptic partial differential equations
177.mesa 1400 3D Graphics library
178.galgel 2900 Fluid dynamics: analysis of oscillatory instability
179.art 2600 Neural network simulation; adaptive resonance theory
183.equake 1300 Finite element simulation; earthquake modeling
187.facerec 1900 Computer vision: recognizes faces
188.ammp 2200 Computational chemistry
189.lucas 2000 Number theory: primality testing
191.fma3d 2100 Finite element crash simulation
200.sixtrack 1100 Particle accelerator model
301.apsi 2600 Solves problems regarding temperature, wind,
velocity and distribution of pollutants
https://www.spec.org/cpu2000/docs/readme1st.html#Q8
Spr 2015, Apr 13 . . . ELEC 5200-001/6200-001 Lecture 10 31
Reference CPU: Sun Ultra 5_10
300MHz Processor
3500
3000
CPU seconds
2500
2000
1500 CINT2000
1000 CFP2000
500
0
wupwise
twolf
galgel
swim
mgrid
gzip
applu
apsi
crafty
perlbmk
bzip2
lucas
sixtrack
parser
mcf
art
eon
gap
ammp
fma3d
gcc
mesa
equake
facerec
vpr
vortex
1500
0
vpr
gcc
crafty
parser
perlbmk
vortex
bzip2
mcf
twolf
gzip
eon
gap
Source: www.spec.org
crafty
parser
perlbmk
vortex
bzip2
mcf
twolf
gzip
eon
gap
Source: www.spec.org
mesa
lucas
sixtrack
wupwise
equake
facerec
art
mgrid
apsi
applu
galgel
ammp
fma3d
Source: www.spec.org
i =1
where Efficiencyi is the efficiency for program i.
Relative efficiency:
Efficiency of a computer
Relative efficiency = ─────────────────
computer Eff. of reference
5
Pentium M
4 @1.6/0.6GHz Energy-
3 efficient procesor
Pentium 4-M
2 @2.4GHz (Reference)
1
Pentium III-M
0 @1.2GHz
SPECINT2000
SPECFP2000
SPECINT2000
SPECFP2000
SPECINT2000
SPECFP2000
1800 15
1.2 V 1.0 120 Joules 120W
megacycles/s megacycles/joule
277 70
0.6 V 6.5 25.7 Joules 3.96W
megacycles/s megacycles/joule
54.5 660
200 mV 33 2.73 Joules 0.083W
megacycles/s megacycles/joule
P P
P M P
P P
T(1) 1 N
Speedup = ——— = —————— = ———————
T(N) 1 – α + α/N (1 – α)N + α
α
T(N) = 1 – α + α/N
α/N
1–α
1 2 3 4 5 6 7
Number of processors (N)
Spr 2015, Apr 13 . . . ELEC 5200-001/6200-001 Lecture 10 52
Speedup
6
5
Ideal, N
Speedup, T(1)/T(N)
(α = 1)
4
3
N
(1 – α)N + α
2
1
1 2 3 4 5 6
Number of processors (N)
Spr 2015, Apr 13 . . . ELEC 5200-001/6200-001 Lecture 10 53
Example
10% memory accesses, i.e., α = 0.9
Maximum speedup= 1/(1 – a)
= 1.0/0.1 = 10,
when N = ∞
What is the speedup with 10 processors?
1 N
Speedup = ———————— = ———————
(1 – α)N + α/N (1 – α)N2 + α
1–α
1 2 3 4 5 6 7
Number of processors (N)
Spr 2015, Apr 13 . . . ELEC 5200-001/6200-001 Lecture 10 56
Minimum Run Time
Minimize N processor run time,
T(N) = (1 – α)N + α/N
∂T(N)/∂N = 0
1 – α – α/N2 = 0, N = [α/(1 – α)]½
Min. T(N) = 2[α(1 – α)]½, because ∂2T(N)/∂N2 > 0.
Maximum speedup = 1/T(N) = 0.5[α(1 – α)]-½
Example: α = 0.9
Maximum speedup = 1.67, when N = 3
5
Speedup, T(1)/T(N)
Ideal, N
4
N
2 (1 – α)N2 + α
1
1 2 3 4 5 6
Number of processors (N)
Spr 2015, Apr 13 . . . ELEC 5200-001/6200-001 Lecture 10 58
Parallel Processors: Distributed Memory
M P P M
Inter-
connectio P M
M P n
network
M P P M
1 N
Speedup = ———————— = ———————
β(N – 1) + 1/N βN(N – 1) + 1
β(N – 1)
1/N
0
1 10 20 30
10
Speedup, T(1)/T(N)
Ideal, N
8
N
6 βN(N – 1) + 1
2
2 4 6 8 10 12
Number of processors (N)
Spr 2015, Apr 13 . . . ELEC 5200-001/6200-001 Lecture 10 63
Further Reading
G. M. Amdahl, “Validity of the Single Processor Approach to Achieving Large-
Scale Computing Capabilities,” Proc. AFIPS Spring Joint Computer Conf.,
Atlantic City, NJ, Apr. 1967, pp. 483-485.
J. L. Gustafson, “Reevaluating Amdahl’s Law,” Comm. ACM, vol. 31, no. 5, pp.
532-533, May 1988.
M. D. Hill and M. R. Marty, “Amdahl’s Law in the Multicore Era,” Computer, vol.
41, no. 7, pp. 33-38, July 2008.
D. H. Woo and H.-H. S. Lee, “Extending Amdahl’s Law for Energy-Efficient
Computing in the Many-Core Era,” Computer, vol. 41, no. 12, pp. 24-31, Dec.
2008.
S. M. Pieper, J. M. Paul and M. J. Schulte, “A New Era of Performance
Evaluation,” Computer, vol. 40, no. 9, pp. 23-30, Sep. 2007.
S. Gal-On and M. Levy, “Measuring Multicore Performance,” Computer, vol. 41,
no. 11, pp. 99-102, November 2008.
S. Williams, A. Waterman and D. Patterson, “Roofline: An Insightful Visual
Performance Model for Multicore Architectures,” Comm. ACM, vol. 52, no. 4, pp.
65-76, Apr. 2009.
U. Vishkin, “Is Multicore Hardware for General-Purpose Parallel Processing
Broken?” Comm. ACM, vol. 57, no. 4, pp. 35-39, Apr. 2014.
Spr 2015, Apr 13 . . . ELEC 5200-001/6200-001 Lecture 10 64