Professional Documents
Culture Documents
Pietro Bonfà
Department of Mathematical, Physical and Computer Sciences, University of Parma, Italy
14 Jan. 2020 - Jülich
✓ Domain Specific Libraries, that you may want to check for your own codes.
https://gitlab.com/QEF/q-e-gpu
● Berry’s
Structural Optimization:
phase polarization; ● K−, L1 and L2,3-edge X-ray Absorption Spectra (XSpectra package);
● Time-Dependent Density Functional Perturbation Theory (TurboTDDFT package);
●● Spin-orbit
GDIIS coupling
with quasi-Newton BFGSand noncollinear magnetism.
preconditioning; ● Electronic excitations with Many-Body Perturbation Theory, using the YAMBO package;
● Damped dynamics. ● Electronic excitations with Many-Body Perturbation Theory (GWL package).
● Large scientific user base, vehicle for new methods, new theories and new
science.
○ 600k lines of Fortran Posts/month in ML
○ V6.5 -> 10k downloads
○ >50 contributors
○ 1600+ registered users
○ …
Matrix
multiplication
size
Not a single "portion" of the code takes the majority of the wall-time
Matrix
Depending on the input, 3D-FFT dominant or LAmultiplication
dominant
size
2013
2012
✓ Good performance
✓ Good performance
✓ Good performance
Kernel Interface
✗ Boilerplate code
Reduction automatically detected (only scalars)
Only loop based constructs (possibly nested)
✗ Code duplication:
Performance (eg. preserve cache blocking optimizations)
Preserve both CPU and GPU version.
● Unit testing:
LAXLib
Available at https://gitlab.com/max-centre/components
Available at https://gitlab.com/max-centre/components
8 bands
Scatter
Alltoall
4 bands 2D FFT
Alltoall
4 bands 2D FFT
GPU Total Energy Forces Stress Collinear Non-collinear Gamma US PP & EXX DFT+U All other
version (K points) Magnetism magnetism trick PAW functionalities
v6.3 A W W A A A A W W W
V6.5 A A W A A A A A A W
Galileo @ CINECA
Processors: 2*8-cores Intel Haswell 2.40 GHz
Accelerators: 2 NVIDIA K80
RAM: 128 GB/node
500
~ 18x speedup
Timer 'electrons' (s)
400
4 nodes
300
8 nodes
200 16 nodes
100
0
CPU – XL CPU – GNU GPU
● Directive based