Professional Documents
Culture Documents
in HPC
CIUK 2018, Manchester
• No SVE vectorisation
• 128-bit NEON vectorisation
5 © 2018 Arm Limited
Fujitsu A64FX
• Chip designed for RIKEN POST-K machine
• 32 GB HBM2
• No DDR
• 1 TB/s bandwidth
• TOFU 3 interconnect
Fulhame Catalyst system at EPCC • OS: SUSE Linux Enterprise Server for HPC Leicester: Data-intensive apps, genomics,
MOAB Torque, DiRAC collaboration
OpenHPC is a community effort to provide a common, Base OS CentOS 7.4, SLES 12 SP3
verified set of open source packages for HPC deploymentsAdministrative Conman, Ganglia, Lmod, LosF, Nagios, pdsh, pdsh-
Tools mod-slurm, prun, EasyBuild, ClusterShell, mrsh,
Genders, Shine, test-suite
Arm and partners actively involved:
Provisioning Warewulf
• Arm is a silver member of OpenHPC Resource Mgmt. SLURM, Munge
• Linaro is on Technical Steering Committee I/O Services Lustre client (community version)
• Arm-based machines in the OpenHPC build Numerical/Scientific Boost, GSL, FFTW, Metis, PETSc, Trilinos, Hypre,
Libraries SuperLU, SuperLU_Dist,Mumps, OpenBLAS,
infrastructure Scalapack, SLEPc, PLASMA, ptScotch
Status: 1.3.4 release out now I/O Libraries HDF5 (pHDF5), NetCDF (including C++ and Fortran
interfaces), Adios
• Packages built on Armv8-A for CentOS and SUSE Compiler Families GNU (gcc, g++, gfortran), LLVM
Silicon partners can provide tuned micro kernels for their SoCs
• Partners can contribute directly through open source routes
Validated with
NAG test suite • Parallel tuning within our library increases overall application performance
https://github.com/UoB-HPC/benchmarks
20 © 2018 Arm Limited
ArmPL: DGEMM performance
Excellent serial and parallel performance
Achieving very high performance at the node Arm PL 19.0 DGEMM parallel performance improvement
for Cavium ThunderX2 using 56 threads
level leveraging high core counts and large
100
memory bandwidth
90
80
Single core performance at
60
problem sizes 30
20
10
0
0 1000 2000 3000 4000 5000 6000 7000 8000
Problem size
ArmPL 19.0
– Powers of primes 0
– 143, 187 and 198 times tables 1 101 201 301 401 501
Length of side for FFTW transform, size NxNxN
22 © 2018 Arm Limited
Math Routine Performance
Aarch64 optimised GLIBC maths routines
0
WRF Cloverleaf OpenMX Branson
GCC Arm Arm + libamath
• By more vendors • Large scale and small scale • Neoverse – IP roadmap for
• Marvell, Ampere, Amazon, deployments silicon vendors
HiSilicon, Fujitsu
• Increased exposure to the • Investment in software
• Targeting different market architecture ecosystem
segments • E.g. F18
www.arm.com/company/policies/trademarks