Professional Documents
Culture Documents
Joost VandeVondele
Exam date and format
June 17, 2016 : oral exam, 25+5min each, use doodle link below to register,
First come first served
Single particle orbitals are expanded with basis functions which can be non-orthogonal
DFT: computational aspects (II)
Naturally leads to a generalized eigenvalue equation of the form:
H KS C i=S C i ϵi
Where the Nel lowest eigenvalues & vectors are needed
SCF
4nm
Traditional approaches to solve the self-
consistent field (SCF) equations are O(N3)
limiting system size significantly.
22nm
New algorithms are O(N), allowing for far
larger systems to be studied.
Bulk liquid water, traditional diagonalization vs. Linear scaling algorithms, ε=10-5
Typical crossover point (vs. OT) still a few thousand atoms.
Some properties of P
Orthonormality of the C's is equivalent to idempotency of P
T
C i S C j=δij ⇔ PS PS=PS
P is a projector on the subspace spanned by the Ci
( )
1 0 c 0 0 0
Metal (c=1) H= 0 c 0 1 0 0 Insulator (c=0.5)
0 0 1 0 c 0
0 0 0 c 0 1
c 0 0 0 1 0
# Atoms: 13'846
# Basis functions: 133'214
Basis quality: DZVP
At a threshold 10-5
the percentage non-zero elements is:
H, S : 2%
P : 15%
Inv(S) : 20%
N el
ψk (r)=∑i=1 ψi (r )Qik
∑k Qik Q jk =δij
QT Q=1=QQ T
A
Q=e
T
A =− A
N el
Ω=∑i=1 (〈 ψi∣x ∣ψi 〉−〈 ψi∣x∣ψi 〉 )+.
2 2
2 2
(〈 ψi∣ y ∣ψi 〉−〈 ψi∣ y∣ψi 〉 )+.
(〈 ψi∣z 2∣ψi 〉−〈 ψi∣z∣ψi 〉2 )
http://www.psi-k.org/newsletters/News_57/Highlight_57.pdf
Wannier centers and orbitals
1
PS= −1 Fermi function
S H −μ I
1+exp( )
kT
1 −1
PS= (1−sign(S H −μ I ))
2
If we can exploit the sparsities of H, S and P computing the matrix
functions, we can be more efficient than diagonalization based approaches
Almost any way to compute a matrix function can be (has been) tried...
Chebyshev expansion, contour integrals, recursions, minimizations
The matrix sign function
+1
-1
0
Sign matrix iterations
Various iterative schemes exist to compute sign(A), the simplest is
→
In exact arithmetic convergence is quadratic:
The number of correct digits of Xn is doubled for each iteration
Convergence illustration
Matrix inversion from sign iterations
Cost per SCF step for a dftb calculation on 6912 water molecules as a function of the filtering threshold for the matrix multiplication. The dashed
line represent as fit using the functional form aε −1/3 . The error in the trace and the total energy at full SCF convergence is linear in the threshold.
More advanced SCF methods
Available in CP2K
TRS4 :
trace resetting purification (Niklasson et al. JCP (2003))
Automatically determines the chemical potential
GPU
CPU
13846 atoms and 39560 electrons (cell 53.84 A), 133214 basis functions.
At full scale-out on the XC30 one multiplication takes less than 0.5s on average, one SCF step 24s.
Towards O(1) :
constant walltime with proportional resources
Stringent test:
Small blocks, large overhead
Total time Very sparse matrices
Running with 200 atoms / MPI task
Local multiplies
Local multiplies constant (OK!).
Overhead
Overhead & Communication
Communication Grows with sqrt(N)
Needs a replacement for Cannon
Ato 3
4
Ato 1
Ato 2
m4
4
4
m
m
m
m
m
Ato
Ato
Ato
Ato
Atom 1
Atom 2
Atom 2 = Atom 2 + ∑ Atom 2
× Atom 3
Atom 4
C C A B
Recursive multiplication of
matrices to reuse cache
Mid: K
C A B
CSR:
Two stage processing: GPUs
Double buffering on host and device : async for MPI and device
1st CPU-GPU comparison
ACN solvent treated with the Kim-Gordon DFT model: naturally suited for linear scaling
With explicit MD based equilibration