Professional Documents
Culture Documents
4/12/2017 1
What is Parallel Computing?
4/12/2017 2
Why Parallel Computing Now?
4/12/2017 3
Technology Trends: Microprocessor Capacity
Moore’s Law
100,000,000 1000
10,000,000
R10000 100
Pentium
i80386
10
i80286 R3000
100,000 R2000
i8086 1
10,000
i8080
i4004
0.1
1,000
1970 1980 1990 2000
1970 1975 1980 1985 1990 1995 2000 2005
Year
Year
1000
Nozzle
Nuclear
100
Reactor
Power = (C
C
2C***VV
V222*/4
**FF)/4
F
* F/2 Performance = 2Cores
Cores
(Cores***FF)*1
F
F/2
Capacitance Voltage Frequency
• Using additional cores
– Increase density (= more transistors = more
capacitance)
– Can increase cores (2x) and performance (2x)
– Or increase cores (2x), but decrease frequency (1/2):
same performance at ¼ the power
• Additional benefits
– Small/simple cores more predictable performance
4/12/2017 7
Limit #2: Hidden Parallelism Tapped Out
Application performance was increasing by 52% per year as measured
by the SpecInt benchmarks here
10000
From Hennessy and Patterson,
Computer Architecture: A Quantitative ??%/year
Approach, 4th edition, 2006
1000
52%/year
100
1
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006
• VAX : 25%/year 1978 to 1986
• RISC + x86: 52%/year 1986 to 2002
4/12/2017 8
Limit #2: Hidden Parallelism Tapped Out
• Superscalar (SS) designs were the state of the art;
many forms of parallelism not visible to programmer
• multiple instruction issue
• dynamic scheduling: hardware discovers parallelism
between instructions
• speculative execution: look past predicted branches
• non-blocking caches: multiple outstanding memory ops
• Unfortunately, these sources have been used up
4/12/2017 9
Performance Comparison
10000 3X
From Hennessy and Patterson,
Computer Architecture: A Quantitative ??%/year
2x every
Approach, 4th edition, 2006
1000 5 years?
Performance (vs. VAX-11/780)
52%/year
100
1 Tflop/s, 1
r = 0.3
Tbyte sequential mm
machine
Processors/chip 2 2 2 8
Threads/Processor 1 2 2 16
Threads/chip 2 4 4 128
4/12/2017 18
Why writing (fast) parallel
programs is hard
4/12/2017 19
Principles of Parallel Computing
• Finding enough parallelism (Amdahl’s Law)
• Granularity
• Locality
• Load balance
• Coordination and synchronization
• Performance modeling
4/12/2017 21
Overhead of Parallelism
• Given enough parallel work, this is the biggest barrier to
getting desired speedup
• Parallelism overheads include:
• cost of starting a thread or process
• cost of communicating shared data
• cost of synchronizing
• extra (redundant) computation
• Each of these can be in the range of milliseconds
(=millions of flops) on some systems
• Tradeoff: Algorithm needs sufficiently large units of work
to run fast in parallel (I.e. large granularity), but not so
large that there is not enough parallel work
4/12/2017 22
Locality and Parallelism
Conventional
Storage Proc Proc
Hierarchy Proc
Cache Cache Cache
L2 Cache L2 Cache L2 Cache
interconnects
potential
L3 Cache L3 Cache L3 Cache
4/12/2017 24