Professional Documents
Culture Documents
Speeduplaws PPT
Speeduplaws PPT
• Amdahl’s Law
• Gustafson-Barsis’ Law
• Karp-Flatt metric
• Isoefficiency metric
• Speedup - (n, p)
• (n,p) as a function of p
• Speedup as a function of p
• elbowing out
1
f (1 f ) /
p
2010@FEUP Parallel performance analysis 6
Example
1
0.05 (1 0.05) / 5.9
8
1 1
lim
p 0.2 (1 0.2) / p
5
2010@FEUP
0.2 8
Parallel performance analysis
Question
n = 1,000
n = 100
Processors
2010@FEUP Parallel performance analysis 10
Another perspective
• We often use more processors to solve
larger problem instances
• Let’s treat time as a constant and allow
problem size to increase with the number
of processors
Consider a parallel program solving a problem of size n
using p processors. Let s be fraction spent in
sequencial computations. Hence s = (n)/((n)+(n)/p).
• An application running on 10
processors spends 3% of its time in
serial code. What is the scaled
speedup of the application?
10 (110)(0.03) 10 0.27
9.73
7 8 (1 8)s s
0.14
e
(n) (n,
p)
(n) (n)
1/ 1/ p
e
2010@FEUP Parallel performance analysis 14
Experimentally determined serial fraction
• Takes into account parallel overhead
p 2 3 4 5 6 7 8
p 2 3 4 5 6 7 8
Isoefficiency Relation
T(n,1) CT 0 (n, p) (n, p)
C
1 (n,
p)
In order to maintain the same efficiency as p increases, n
must be increased in order to satisfy the above
inequality
2010@FEUP Parallel performance analysis 19
Scalability function
• Suppose isoefficiency relation is n f(p)
Cannot maintain
efficiency
Cp
Memory Size
Can maintain
efficiency
C
l
o
g
p
C
2010@FEUP Parallel performance analysis 22
Example 1: Reduction
• Sequential algorithm complexity
• T(n,1) = (n)
• Parallel algorithm
• Computational complexity = (n/p)
• Communication complexity = (log p)
• Parallel overhead
• T0(n,p) = (p log p)
• Memory usage:
• M(n) = n
M (Cplogp) / p Cplogp / p
Clogp
• The system has Parallel
2010@FEUP good scalability
performance analysis 24
Example 2: Floyd’s Algorithm
• Sequential time complexity:
• (n3)
• Parallel overhead:
• T0(n,p) = (pn2log p)
• Memory usage:
• M(n) = n2
• Parallel overhead:
• (n p)
• Memory usage:
• M(n) = n2
M (C p ) / p C 2 p / p
C2
• This algorithm is perfectly scalable