You are on page 1of 13

Supercomputer “Fugaku”

Formerly known as Post-K

0 Copyright 2019 FUJITSU LIMITED


Supercomputer Fugaku Project
RIKEN and Fujitsu are currently developing Japan's
next-generation flagship supercomputer, the successor to
the K computer, as the most advanced general-purpose
supercomputer in the world
PRIMEHPC FX100
PRIMEHPC FX10
No.1(2017) Finalist(2016) No.1(2018)

K computer
Supercomputer
Fugaku
© RIKEN

 RIKEN and Fujitsu announced that manufacturing started in March 2019


 RIKEN announced on May 23, 2019 that the supercomputer is named “Fugaku”
*Formerly known as Post-K

1 Copyright 2019 FUJITSU LIMITED


Goals and Approaches for Fugaku
 Goals

High application Good usability Keeping application


performance and wide range of uses compatibility

RIKEN announced predicted performance:


 More than 100x+ faster than K computer for GENESIS and
NICAM+LETKF
 Geometric mean of speedup over K computer in 9 Priority Issues
is greater than 37x+
https://postk-web.r-ccs.riken.jp/perf.html

2 Copyright 2019 FUJITSU LIMITED


Goals and Approaches for Fugaku
 Goals

High application Good usability Keeping application


performance and wide range of uses compatibility

 Approaches
Develop Achieve
1. High-performance Arm CPU A64FX in HPC and AI areas - High performance in real applications
2. Cutting-edge hardware design - High efficiency in key features for AI
3. System software stack applications

3 Copyright 2019 FUJITSU LIMITED


1. High-Performance Arm CPU A64FX in HPC and AI Areas
 Architecture features
CMG (Core Memory Group) ISA Armv8.2-A (AArch64 only) SVE (Scalable Vector Extension)
specification TofuD
13 cores 28 Gbps x 2 lanes x 10 ports SIMD width 512-bit
L2 Cache 8 MiB I/O
Memory 8 GiB, 256 GB/s PCIe Gen3 16 lanes Precision FP64/32/16, INT64/32/16/8
Cores 48 computing cores + 4 assistant cores (4 CMGs)
Memory HBM2: Peak B/W 1,024 GB/s
TofuD PCIe
Controller Controller
Interconnect TofuD: 28 Gbps x 2 lanes x 10 ports
HBM2 HBM2
 Peak performance (Chip level)
Network on Chip

HBM2 HBM2 (TOPS) HPC AI


25 21.6+
A64FX (Fugaku)
20
SPARC64 VIIIfx (K computer)
15 10.8+
10
5.4+
5 2.7+
0.128 0.128 N/A N/A
0
64 bits 32 bits 16 bits 8 bits
(Element size)

4 Copyright 2019 FUJITSU LIMITED


1. High-Performance Arm CPU A64FX in HPC and AI Areas
 Architecture features
TofuD Interface PCIe Interface
ISA Armv8.2-A (AArch64 only) SVE (Scalable Vector Extension)
SIMD width 512-bit
Precision FP64/32/16, INT64/32/16/8
HBM2 Interface

HBM2 Interface
Core Core Core Core Core Core Core Core Core Core

Cores 48 computing cores + 4 assistant cores (4 CMGs)


Memory HBM2: Peak B/W 1,024 GB/s
L2 L2
Core Core Core Core
Cache Cache
Interconnect TofuD: 28 Gbps x 2 lanes x 10 ports

Core Core Core Core Core Core Core Core Core Core Core Core  Peak performance (Chip level)
Ring Bus

(TOPS) HPC AI
Core Core Core Core Core Core Core Core Core Core Core Core
25 21.6+
A64FX (Fugaku)
20
HBM2 Interface SPARC64 VIIIfx (K computer)
HBM2 Interface

Core
L2
Cache
Core Core
L2
Cache
Core 15 10.8+
10
5.4+
5 2.7+
Core Core Core Core Core Core Core Core Core Core
0.128 0.128 N/A N/A
0
64 bits 32 bits 16 bits 8 bits
(Element size)

5 Copyright 2019 FUJITSU LIMITED


2. Cutting-edge Hardware Design
 1PFlops by Fugaku and K computer
Fugaku K computer
© RIKEN

Configuration 1x rack including SSDs 80x compute racks & 20x disk racks
Nodes 384 8,160
Footprint 1.1 m2 (0.8 m x 1.4 m) 128 m2 (4 m x 32 m)

 Scalable design  CMU(CPU Memory Unit)


IN OUT Water
 100% direct water coupler
cooling
PCIe
 3x QSFP for AOC(Active connector
Optical Cables) TofuD
connector
CPU CMU BoB Shelf Rack Fugaku  Single-sided blind

QSFP28 (X)
QSFP28 (Y)
QSFP28 (Z)
mate connectors for
Nodes 1 2 16 48 384 150k+ electrical signals and
water TofuD
Performance cables

AOC
AOC
AOC
[Flops] 2.7 T+ 5.4 T+ 43 T+ 129 T+ 1 P+ 400 P+

6 Copyright 2019 FUJITSU LIMITED


3. System Software Stack
 Fujitsu developing system software in collaboration with RIKEN
 Fujitsu Technical Computing Suite implementing development and execution environments
with great usability on large-scale system

Fugaku applications
Fujitsu Technical Computing Suite / RIKEN developing system software

Management software File system Programming environment


System management FEFS XcalableMP MPI
for high availability & power Lustre-based distributed file (Open MPI, MPICH)
saving operation system OpenMP, Compilers
Job management for higher Coarray (C, C++, Fortran)
LLIO
system utilization & power Debugging and
NVM-based file I/O accelerator Math. libs.
efficiency tuning tools

Linux OS / McKernel (Lightweight kernel)


Fugaku system hardware

Fugaku Under development w/ RIKEN 7 Copyright 2019 FUJITSU LIMITED


3. System Software Stack
 Fujitsu developing system software in collaboration with RIKEN
 Fujitsu Technical Computing Suite implementing development and execution environments
with great usability on large-scale system

Fugaku applications
Fujitsu Technical Computing Suite / RIKEN developing system software

Management software File system Programming environment


 Exploits hardware performance byFEFS
System management compiler MPI
XcalableMP*
XcalableMP
for optimizations such as SVELustre-based
high availability & power vectorization
distributed file (Open MPI, MPICH)
saving operation system OpenMP, Compilers
JobSupports
management new programming language
for higher
LLIO
COARRAY
Coarray (C, C++, Fortran)
Math.libs.
system utilization & power Debugging and
standards and
efficiency data type FP16
NVM-based file I/O accelerator
tuning tools Math. libs.

Linux OS / McKernel (Lightweight kernel)


Fugaku system hardware

Fugaku Under development w/ RIKEN 8 Copyright 2019 FUJITSU LIMITED


Achieve High Performance in Real Application
 WRF: Weather Research and Forecasting model (v3.8.1)
 Vectorizing loops including IF statements is key optimization High memory B/W and
long SIMD length work
 Himeno Benchmark (Fortran90, size: XL) effectively
 Stencil calculation to solve Poisson’s equation by Jacobi method

WRF v3.8.1 (48-hour,12km, CONUS) on 48 cores Himeno Benchmark (Fortran90)


Performance Ratio *

400 346
2
1.56 286 305
Higher is better

Higher is better
1.5 300
1.00

GFlops
1 200
85 103
0.5 100

0 0
Skylake A64FX Skylake(Xeon FX100 A64FX SX-Aurora Tesla V100
(Xeon Platinum 8168) 1 CPU Platinum 8168) 1CPU 1CPU TSUBASA 1GPU†
2 CPUs with source tuning 2CPUs 1VE†
†Performance evaluation of a vector supercomputer SX-aurora TSUBASA
* Normalized by the average elapsed time for timestep of Skylake https://dl.acm.org/citation.cfm?id=3291728

9 Copyright 2019 FUJITSU LIMITED


Achieve High Efficiency in Key Features for AI Applications
 High FP16 & INT8 peak performance and high memory peak B/W
INT8 partial dot product
FP16 performance: 10.8+ TOPS, > 90%@HGEMM 8-bit 8-bit 8-bit 8-bit

A0 A1 A2 A3

INT8 performance : 21.6+ TOPS in partial dot product


X X X X
B0 B1 B2 B3

Memory B/W : 1,024 GB/s, > 80%@STREAM Triad


32-bit

 Functions contributing to key features in AI fields


• 2x 512-bit wide SIMD pipelines per core for FP16 and INT8
A64FX CPU • High memory B/W and calculation throughput

• Vectorization and software pipelining


Compilers & libraries • FP16 as data type of programming language (e.g., real (kind=2) in Fortran)
• Mathematical Library for HGEMM
10 Copyright 2019 FUJITSU LIMITED
Future Plans
Supercomputer Fugaku Today
 Operations starting around CY2021
CY2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Manufacturing,
Basic
Fugaku Design
Design Implementation Installation Operations
and Tuning

Development Operations Operations ending Aug, 2019


© RIKEN
K computer

Fujitsu HPC Products


 Fujitsu will begin global sales of supercomputers
based on the Supercomputer Fugaku technology in
the 2nd half of FY2019
11 Copyright 2019 FUJITSU LIMITED
12

You might also like