Professional Documents
Culture Documents
CS 258, Spring 99
David E. Culler
Computer Science Division
U.C. Berkeley
Today’s Goal:
Workstations
Personal Computers
• NO!
• Book provides a framework and complete
background, so lectures can be more interactive.
– You do the reading
– We’ll discuss it
• Projects will go “beyond”
Parallelism:
• Provides alternative to faster clock for performance
• Applies at all levels of system design
• Is a fascinating perspective from which to view architecture
• Is increasingly central in information processing
New Applications
More Performance
• Range of performance demands
– Need range of system performance with progressively increasing
cost
15,000
10,000
5,000
0
0 20 40 60 80 100 120
Number of processors
• Parallelism is pervasive
• Small to moderate scale parallelism very important
• Difficult to obtain snapshot to compare across vendor
platforms
02/21/22 CS258 S99 13
Scientific Computing Demand
100
Supercomputers
10
Performance
Mainframes
Microprocessors
Minicomputers
1
0.1
1965 1970 1975 1980 1985 1990 1995
i80386
10 i80286
Transistors
i8086 i80386
i80286 R3000
100,000 R2000
i8080
1 i8086
i8008 10,000
i4004 i8080
i8008
i4004
0.1 1,000
1970 1980 1990 2000 1970 1980 1990 2000
1975 1985 1995 2005 1975 1985 1995 2005
10,000,000
R10000
1,000,000
Pentium
Tra ns is tors
i80386
i80286 R3000
100,000
R2000
i8086
10,000
i8080
i8008
i4004
1,000
1970 1975 1980 1985 1990 1995 2000 2005
20 2
Speedup
15 1.5
10 1
5 0.5
0 0
0 1 2 3 4 5 6+ 0 5 10 15
Number of instructions issued Instructions issued per cycle
MEM
CRAY CS6400
Sun
60 E10000
50
Number of processors
40
SGI Challenge
AS8400
Sequent B8000 Symmetry21 SE10
10 SE30
Power SS1000 SS1000E
Sun E10000
10,000
SGI
Sun E6000
PowerCh
Shared bus bandwidth (MB/s)
XL AS8400
SGI Challenge CS6400
1,000 HPK400
SC2000E
SC2000 AS2100
P-Pro
SS1000E
SS1000 SS20
SS690MP 120
SS10/ SE70/SE30
SS690MP 140 SE10/
Symmetry81/21 SE60
100
SGI PowerSeries Power
Sequent B2100
Sequent
B8000
10
1984 1986 1988 1990 1992 1994 1996 1998
02/21/22 CS258 S99 30
What about Storage Trends?
• Divergence between memory capacity and speed even more
pronounced
– Capacity increased by 1000x from 1980-95, speed only 2x
– Gigabit DRAM by c. 2000, but gap with processor speed much greater
• Larger memories are slower, while processors get faster
– Need to transfer more data in parallel
– Need deeper cache hierarchies
– How to organize caches?
• Parallelism increases effective size of each level of hierarchy,
without increasing access time
• Parallelism and locality within memory systems too
– New designs fetch many bits within memory chip; follow with fast
pipelined transfer across narrower interface
– Buffer caches most recently accessed data
• Disks too: Parallel disks plus caching
CRAY n = 1,000
CRAY n = 100
Micro n = 1,000
Micro n = 100
1,000
T94
C90
LINPACK (MFLOPS)
DEC 8200
Ymp
Xmp/416
IBM Power2/990
100
MIPS R4400
Xmp/14se
DEC Alpha
HP9000/735
DEC Alpha AXP
CRAY 1s
HP 9000/750
IBM RS6000/540
10
MIPS M/2000
MIPS M/120
Sun 4/260
1
1975 1980 1985 1990 1995 2000
Paragon XP/S MP
(1024)
T3D
CM-5
100
T932(32)
Paragon XP/S
CM-200
CM-2 C90(16)
10 Delta
iPSC/860
nCUBE/2(1024)
Ymp/832(8)
1
Xmp /416(4)
0.1
1985 1987 1989 1991 1993 1995 1996
250
MPP
200 PVP
198
187 SMP
150
110 106
100
106
50 73
63
0
11/93 11/94 11/95 11/96
Application Software
Systolic System
Arrays Software SIMD
Architecture
Message Passing
Dataflow
Shared Memory
• Defines
– Critical abstractions, boundaries, and primitives (interfaces)
– Organizational structures that implement interfaces (hw or sw)
Compilation
or library Communication abstraction
User/system boundary
Operating systems support
Hardware/software boundary
Communication hardware
Physical communication medium
http://www.cs.berkeley.edu/~culler/cs258-s99/schedule.html