You are on page 1of 29

The Earth Simulator

Presented by Jin Soon Lim for CS 566


Outline
• Brief history of Supercomputing
• The architecture of the Earth Simulator
– Processor node
– Arithmetic processor
– Interconnection Network
– Inter-node communication mechanism
– Inter-node synchronization
• Parallel programming environment
– MPI
• Performance
• Projects
Historical Background of Supercomputing
• Modern Scalar-processors
• The first modern scalar computer: CDC 6600 in 1964
• Seymour Cray

• Vector Processors
• Cray-1 in 1975

• 1980’s
• Supercomputing became one of the most important research
tools
• In 1989, USA had 167, Japan 90, Europe 92.

• 1990’s
• Parallel supercomputers with tightly connected CPUs
• Clusters and grid systems.
Earth Simulator
• “Earth Simulator project” started in 1997 by Japanese
government

• For simulating global environment change problems

• System design proposed by NEC Corporation

• Construction completed in February 2002 and the


operation started from March 2002

• Cost = about $350 million

• Fastest supercomputer in 2002 ~ 2004

• No. 7 in Top500 list for November 2005


Earth Simulator Facilities
• Located in Yokohama, Japan
Earth Simulator Facilities
Earth Simulator Facilities
Earth Simulator Facilities

• 640x130 = 83200 cables.


• 2,400 km
System Overview
• Highly parallel vector supercomputer
• 640 processor nodes
• Crossbar interconnection network
• 8 arithmetic processors per node (Total 640x8 = 5120)
• Distributed shared memory
• System disk 415 TB, user disk 225 TB.
System Overview
• Three architectural features for high-performance and
high-efficiency
– Vector processor
– Shared Memory
– High-bandwidth and non-blocking interconnection
crossbar network

• Three levels of parallelizing paradigms


– Vector processing on a processor
– Parallel processing with shared memory within a node
– Parallel processing among distributed nodes via the
interconnection network
Processor Node
• Shared memory parallel vector supercomputer
• 8 arithmetic processors (8Gflops per AP)
• Peak performance: 64Gflops
• Data transfer rate btw AP and main memory: 32GB/s
• Aggregate bandwidth: 256GB/s
Arithmetic Processor (AP)
• 1 chip LSI
• 8Gflops
• 500MHz (1GHz)

• Vector Unit
• 6 types of vector
pipelines
• 72 vector registers
• (72x256x64 = 144KB)

• Scalar Unit
• 4-way super scalar
• 128 scalar registers
Interconnection Network
• 640 x 640 single-stage non-blocking crossbar switch
• Global addressing and synchronization
• 2 control units (XCT)
• 128 crossbar switches (XSW)
• Data transfer rate btw two nodes: 12.3GB/s x 2 ways
Inter-node communication mechanism
Inter-node synchronization
• Global Barrier Counter (GBC)
• Global Barrier Flag (GBF)
Parallel programming Environment
• Operating System
– UNIX-based system (SUPER-UX for NEC SX series)
• Hybrid Parallel programming environment
MPI for Earth Simulator
• MPI/ES
• Supports the full MPI-2 Standard
• Optimized to achieve highest performance of
communication on the ES architecture
• Communication mode
– Point to point
– One-sided
– Collective
• Parallel I/O
• Dynamic process management
Performance of MPI libraries
• Memory space of a process on the ES
– Local memory (LMEM)
– Global memory (GMEM)
• Both can be assigned to buffers of MPI functions
• GMEM is addressed globally over nodes and can be
accessed by every MPI processes allocated to different
nodes
• The behavior of MPI communications is different
according to the memory area where the buffers are
resided.
Performance of MPI libraries
• Case 1: Data stored in LMEM of a process A are
transferred to LMEM of another process B in the same
node.
• Case 2: Data stored in LMEM of the process A are
transferred to LMEM of a process C invoked in different
node.
• Case 3: Data stored in GMEM of the process A are
transferred to GMEM of the process B in the same node.
• Case 4: Data stored in GMEM of the process A are
transferred to GMEM of the process C invoked in
different node.
Performance of MPI libraries (MPI_Send)

• Case 3: maximum 14.8GB/s


• Case 4: maximum 11.8GB/s
Performance of MPI libraries

• MPI_Get: max 11.62GB/s MPI_PUT: 11.62 GB/s


• MPI_Accumulate: max 3.16 GB/s
Performance of MPI libraries (MPI_Barrier)
• Scalability of MPI_Barrier
Performance of MPI libraries (MPI_Barrier)
• Scalability of MPI_Barrier
Performance of ES
• Peak performance / AP: 8Gflops
• Peak performance / PN: 64Gflops
• Total peak performance: 40Tflops (64Gflops x 640)
• Memory bandwidth / AP: 32GB/s
• Memory bandwidth / PN: 256GB/s
• Main memory / PN: 16GB
• Total main memory: 10TB (16GB x 640)
• Total memory throughput: 160TB/s
LINPACK performance

• Achieved 35.86Tflops
• Ratio of peak performance > 85%
Projects using ES in 2005
• Ocean & Atmosphere (12)
– Future Climate Change Projection using a High-
Resolution Coupled Ocean-Atmosphere Climate
Model
• Solid Earth (9)
– Numerical simulation of the mantle convection
• Computer Science (1)
– Development of Micro-Macro Interaction Simulation
Algorithm
• Epoch-making Simulation (22)
– Nano-simulation of electrode reaction in fuel cells
The use of ES
• Condition for an application to run on ES
• The Number of PNs for a job must be less than or equal
to 10
• This can be extended to 512 based on other conditions
being satisfied.
• The number of PNs can be expanded if the vectorization
ratio > 95% and the parallel efficiency > 0.5
Summary
• Highly parallel vector supercomputer system
• Distributed shared memory
• High-bandwidth and non-blocking crossbar
interconnection network
• Three levels of parallel programming
– Inter-node: Message Passing (MPI)
– Intra-node: Shared Memory (Open MP)
– AP: Vectorization
• Promotes research on environmental problems.
References
• The Earth Simulator Center <http://
www.es.jamstec.go.jp/esc/eng/ES/index.html>
• S. Habata, M. Yokokawa, and S. Kitawaki, "The Earth Simulator System",
NEC Res. & Develop., Vol. 44, No.1, pp. 21-26, January 2003. <
http://www.nec.co.jp/techrep/en/r_and_d/r03/r03-no1/rd06.pdf>
• T. Sato, S. Kitawaki, and M. Yokokawa, "Earth Simulator Running", ISC,
June 2002. <http://www.ultrasim.info/sato.pdf>
• Jack Dongarra, “The Earth Simulator”, WTEC Panel Report, December
2004. <http://www.wtec.org/hec/report/02-Earth.pdf>
• Christopher Lazou, “Historical Perspective of Supercomputing”, NEC HPCE,
June 2002.
<http://www.hpce.nec.com/typo3conf/ext/nf_downloads/pi1/passdownload.p
hp?downloaddata=26>

You might also like