JCompChem2017 Cholesky SOC

WWW.C-CHEM.
ORG FULL PAPER
A Full-Pivoting Algorithm for the Cholesky Decomposition of

Two-Electron Repulsion and Spin-Orbit Coupling Integrals
Matteo Piccardo and Alessandro Soncini
A significant reduction in the computational effort for the evalu- and 2e-SOC integrals. The proposed algorithm differs from previous
ation of the electronic repulsion integrals (ERI) in ab initio quantum CD implementations by the extensive use of a full-pivoting design,
chemistry calculations is obtained by using Cholesky decomposition which allows a univocal definition of the Cholesky basis, once the
(CD), a numerical procedure that can remove the zero or small CD d threshold is made explicit. We show that 2d is the upper limit
eigenvalues of the ERI positive (semi)definite matrix, while avoiding for the errors affecting the reconstructed 2e-SOC integrals. The pro-
the calculation of the entire matrix. Conversely, due to its antisym- posed strategy was implemented in the ab initio program Compu-
metric character, CD cannot be directly applied to the matrix repre- tational Emulator of Rare Earth Systems (CERES), and tested for
sentation of the spatial part of the two-electron spin-orbit coupling computational performance on both the ERI and 2e-SOC integrals
(2e-SOC) integrals. Here, we present a computational strategy to evaluation. V
C 2017 Wiley Periodicals, Inc.
achieve a Cholesky representation of the spatial part of the 2e-SOC

integrals, and propose a new efficient CD algorithm for both ERI DOI: 10.1002/jcc.25062
Introduction Significant reductions in the computational effort, as well as

storage requirements, are attainable in electronic structure
The electron repulsion integrals (ERI) and the two-electron spin-
theory using the Cholesky decomposition (CD).[13,15–21] CD is
orbit coupling (2e-SOC) integrals are of great importance in
the only numerical procedure that can remove the zero or
quantum chemistry, the first representing the Coulomb elec-
small eigenvalues of a positive (semi)definite matrix without
tron–electron interactions,[1] the latter describing one of the
calculating the entire matrix. In the literature, CD has success-
main spin-dependent relativistic corrections to the Born–Oppen-
fully been used to reduce the rank of the ERI matrix by elimi-
heimer Hamiltonian.[2–4] However, their calculation still represents
nation of the elements of no numerical significance, that is,
a computationally expensive step in quantum chemistry codes
smaller than a specified threshold d. Differently from RI, CD is
based on atomic basis sets. Moreover, the 2e-SOC integrals,
which are usually rewritten in terms of ERI derivatives,[3] need not subjected to the problem of defining an auxiliary basis for
significantly more expensive computations than ERI integrals, obtaining a fixed accuracy in a calculation, since the accuracy
due to both the higher angular momenta accessed via differenti- simply derives from the choice of the d threshold. For this rea-
ation of the basis, and the lower permutational symmetry of the son, in the last 10 years the CD applied to the Coulomb inte-
basis functions indices entering the integrals. grals has shown to give large computational savings for large
To reduce the computational effort for the calculation of the basis sets with a variety of theoretical methods, from Hartree–
2e-SOC intergals, a number of approximations have been pro- Fock (HF) and density functional theory (DFT),[13] to second
posed in the literature. One of the most widely used is the spin- order Møller–Plesset perturbation theory (MP2) and multiconfi-
orbit mean field (SOMF) operator derived and implemented by gurational self-consistent field (MCSCF) methods,[22,23] and it is
Hess and coworkers,[5] which has gained popularity through the currently implemented in modern packages for quantum chem-
Schimmelpfenning’s atomic mean field interaction (AMFI) imple- istry, like DALTON and MOLCAS softwares.[24,25] Unfortunately,
mentation.[6] Within this approximation, only the one-center since the Cholesky approach can only be applied to positive
two-electron integrals are computed. Less restricted approxima- (semi)definite matrices, CD is not directly applicable to the
tions have been explored by Neese,[7] like the use of a semi/fully matrix representation of the spatial part of the two-electron SOC
numerical-grid based evaluation, or the use of the very popular integrals, as this is an anti-symmetric matrix.[3]
resolution of the identity (RI), owing to its high efficiency.[8–13] In In this article, we (i) present a strategy which allows for the
his conclusions, Neese affirms the need of including all the sig- Cholesky representation of the two-electronic SOC integrals,
nificant multicenters two-electron SOC integrals to achieve a (ii) present a new algorithm for the CD implementation, and
good accuracy, and proposes the RI method as the best com- (iii) test the performances of the new algorithm on the
promise between integral accuracy and computational time.
However, it is well known that the accuracy of the RI approxima- M. Piccardo, A. Soncini
School of Chemistry, The University of Melbourne, Australia
tion depends on the ability of the auxiliary basis set to describe E-mail: asoncini@unimelb.edu.au
the overlap distributions of the original set, which is not an eas- Contract grant sponsor: Australian Research Council Discovery Project;
ily controllable parameter, and that different wave function mod- Contract grant number: DP150103254
els usually need specifically optimized auxiliary basis sets.[14] C 2017 Wiley Periodicals, Inc.
V
Journal of Computational Chemistry 2017, 38, 2775–2783 2775

FULL PAPER WWW.C-CHEM.ORG
3
decomposition of both two-electron ERI and SOC integrals. In obtained by expressing the operator r12 =r12 as the gradient of
particular, the proposed implementation differs from previous the inverse electronic distance with respect to the coordinate
implementations by the use of a full-pivoting strategy, which of the second electron
allows a clear and univocal definition of the Cholesky basis, r12 1
showing improved performances in computational time and 3 5r2 r
r12
(4)
12
disk space consumption.
thus, obtaining, when eq. (4) is plugged in eq. (1), a linear
combination of ERI integrals in which the overlap distributions
Two-Electron SOC Integrals
of both electrons are differentiated only once:
The spatial part of the two-electron SOC integrals in a real $ 21
% & $ 21 % 21 '
hprjEabg r2b r12 r1g jqsi5Eabg hprjr1g q r2b r12 s 2r12 r2b s i
basis {p} is[2] & '
! " " # 52Eabg hpr2b rjr1g qsi1hprjr1g qr2b si
" r12 " $ % $ %
" "
pð1Þrð2Þ" 3 3r1 "qð1Þsð2Þ (1)
r12 52Eabg prg qjrb rs 2Eabg prg qjrrb s
defining the elements of an antisymmetric matrix. The usual (5)
way to evaluate integral eq. (1) is to express the operator r12 = where now hprjr1c qr2b si is not symmetric with respect to
3
r12 as a gradient of the inverse electronic distance with respect the b $ c permutation.
to the coordinate of electron 1, using Regardless of whether eq. (3) or (5) are used, it is clear that
r12 1 the spatial part of the 2e-SOC integrals on the {q} basis can be
52r1 (2)
3
r12 r12 rewritten as linear combinations of ERIs calculated on the
expanded basis fvp g5fpg1frpg. Since the space spanned by
where r1 5ðr1x ; r1y ; r1z Þ. Using this relation, the a compo-
the expanded overlap distribution basis fvp vq g is an inner
nent (a5x; y; z) of eq. (1) can be expressed as 21
$ % & $ 21 % 21 ' product space with respect to the positive definite r12 metric,
21
hprj2Eabg r1b r12 r1g jqsi52Eabg hprj r1b r12 r1g q 2r12 r1b r1g q si
the terms entering eqs. (3) and (5) fulfil the Cauchy–Schwarz
& '
5Eabg hr1b prjr1g qsi1hprjr1b r1g qsi inequalities
$ % $ % $ %
5Eabg rb prg qjrs j ra prb qjrs j2 # ra prb qjra prb q ðrsjrsÞ (6)
$ %2 $ %
(3) j pra qjrrb s j # ðpra qjpra qÞ rrb sjrrb s (7)
& '
where Eabc is the Levi–Civita symbol, the hprjr1b ð1=r12 Þr1c q si
term has been manipulated by integration by parts, and the Thus, our first result is that either eq. (6) or (7) can be used to
hprjr1b r1c qsi term cancels to zero because of the b $ c per- prescreen each ERI component of the 2e-SOC integrals appear-
mutational symmetry. h12j12i and ð11j22Þ are the ERIs in short- ing in eqs. (3) and (5) separately, and in fact we will argue in
hand physicists’ and chemists’ (Mulliken) notation, respectively. the following section that eq. (7) lead to a more efficient algo-
In the literature, the SOC integrals are usually calculated as rithm to implement such prescreening within a CD algorithm.
combinations of ERI derivatives by the use of eq. (3) (see e.g.,
Helgaker and Taylor[26]). Equation (3) shows that the 2e-SOC
integrals can be expressed as linear combinations of ERI inte-
Cholesky Representation
grals, in which the overlap distributions {pq} from the original
basis {p} need to be differentiated twice. In the light of the results presented in the previous section,
A different strategy we propose, here, to achieve an efficient we observe that the integrals entering eq. (3) lead to the defi-
Cholesky decomposition algorithm for the 2e-SOC integrals, is nition of the positive (semi)definite matrix
pp pq ... rx prx p ... rx pry q ...

pp ðppjppÞ ...
pq ðpqjppÞ ðpqjpqÞ ...
... ... ... ...
(8)
rx prx p ðrx prx pjppÞ ðrx prx pjpqÞ ... ðrx prx pjrx prx pÞ ...
... ... ... ...
$ % $ % $ % $ %
rx pry q rx pry qjpp rx pry pjpq ... rx pry qjrx prx p ... rx pry qjrx pry q ...
... ...
which has maximal rank ½nðn11Þ=213nð3n11Þ=2%55n2 12n (note of atomic basis functions. Conversely, the integrals entering eq. (5)
the permutational symmetry of the indices), where n is the number lead to the definition of a smaller positive (semi)definite matrix
2776 Journal of Computational Chemistry 2017, 38, 2775–2783 WWW.CHEMISTRYVIEWS.COM

WWW.C-CHEM.ORG FULL PAPER
prx p prx q ... qrx p ... pry p ...

prx p ðprx pjprx pÞ ...
prx q ðprx qjprx pÞ ðprx qjprx qÞ ...
... ... ... ...
(9)
qrx p ðqrx pjprx pÞ ðqrx pjprx qÞ ... ðqrx pjqrx pÞ ...
... ... ... ...
$ % $ % $ % $ %
pry p pry pjprx p pry pjprx q ... pry pjqrx p ... pry pjpry p ...
... ...
jDKIJ j2 # DKII DKJJ (15)

which has maximal rank 3n2 . A positive (semi)definite matrix
can be decomposed by CD, which allows to determine its ele- If carried through to completion, the decomposition of the M
ments of numerical significance without calculating the entire matrix into N5rankðMÞ Cholesky vectors would be computa-
matrix. It is clear at this point that the matrix in eq. (9) has to tionally more expensive than the evaluation of the full M
be preferred to the matrix in eq. (8), both due to its smaller matrix by conventional means. However, within a given thresh-
maximal rank, and, especially, due to the absence of the old for the integral numerical precision, the effective rank of M
ðrprqjrrrsÞ integrals, involving the computationally expen- is smaller than its full dimension, due to the physical content
sive quadruple differentiations. We then propose, here, to eval- of the basis, like near redundancy and saturation.[16] This
uate the terms entering the spatial part of the two-electron means that the number of Cholesky vectors needed to numer-
SOC integrals from the Cholesky representation of the matrix ically represent all the M elements (the effective rank) is signif-
eq. (9). Note that our proposed approach bears similarities to icantly smaller than rankðMÞ. The CD is a recursive procedure
the attempt made by O’Neal and Simons at treating the geo- in which the Cholesky vector at the Kth recursion is calculated
metrical derivatives of ERIs with the CD technique.[18] from the residual matrix of ðK21Þth recursion. The key is to
As well-known from the CD theory,[13,15–21] the two-electron carry on the decomposition taking care that at any step in the
integrals ðpq!jr!s Þ, where now q! can be either the q function or calculation (say the Kth step), DK21
KK is the largest remaining
its derivative, can be represented by element in the residual matrix. The decomposition can be
stopped when
X
N
ðpq!jr!s Þ & ð I j J Þ ' LKI LKJ (10) DK21
KK # d (16)
K51
where d is a given threshold, after the computation of N5K21

where I and J are, respectively, the super indices correspond-
Cholesky vectors. In this representation, the MIJ integrals with I;
ing to the ðp; q!Þ and ðr; !s Þ overlap distributions, LK the Kth
J # N are represented exactly (within the machine precision),
Cholesky vector, and N the total number of Cholesky vectors.
while the integrals with I; J > N with absolute accuracy d. This
LK has elements,
means that the largest integrals (in absolute value) are repre-
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sented exactly, that has been shown to be the main reason
u
u X
I21 $ %
why the CD technique gives accurate results for ERI evaluations
LII 5tMII 2
2
LKI (11)
K51 even with rather large values of the decomposition
! threshold.[21,27]
1 XJ21
Moreover, the DKIK residuals can be estimated by eq. (15)
LJI 5 I MIJ 2 LKI LKJ with I > J (12)
LI K51 (i.e., by just looking for the residual diagonals) before their
explicit calculation, then discharged from the decomposition
where MIJ 5ð I j J Þ. From eqs. (11) and (12), the residual matrix of the Kth and subsequent vectors if their absolute values are
after the Kth vector calculation has elements smaller than d. As a consequence, the effective dimension of
the Cholesky vectors, that is, the number of significant ele-
XK
ments, decreases during the decomposition, and only a frac-
DKIJ 5MIJ 2 LTI LTJ with I; J > K (13)
tion of the original integrals needs to be computed to
T51
represent the integrals with d accuracy.
with the properties (Cauchy–Schwarz inequality),[16] Finally, in addition to savings in integral evaluation time and
storage, the CD can give major saving in the transformation of
jMIJ j2 # MII MJJ (14) two-electron integrals from the atomic (AO) to the molecular

(MO) basis. Because each integral is expressed by eq. (10), the accordingly reorganized. The main problems dealing with
transformation of the I and J indices to the MO basis set (here CD are then three: (i) the integrals calculation, (ii) the ongo-
s and r indices) can be realized by[24] ing reorganizations of the rows and columns of the Cho-
lesky matrix, (iii) the calculation of the residuals, with the
Xn X
n
associated reading/saving of the elements of numerical sig-
LKs!r 5 Cps LKp!q Cqr (17)
p51 q51 nificance from/to the Cholesky vectors calculated in the pre-
vious/current iterations.
where Cpq are the LCAO-MO expansion coefficients.[28] Note i. The integrals calculation is performed by the use of Libcint
that the above transformation needs to be performed sepa- library, a general and efficient integral library for Gaussian
$ %
rately for each x, y, and z component, if q! is the derivative of basis functions.[32] The code for ðpqjrsÞ and pra qjrrb s inte-
qth function (i.e., to transform the two-electron SOC integrals grals were built by the use of the Libcint built-in symbolic
to the MO basis). algebra tool for the integrals implementation. The calculation is
It is straightforward to estimate the error affecting the SOC atomic shell-based for computational efficiency, which means
integrals when estimated by the procedure proposed here. that the entire set of integrals ðPQjRSÞ is computed at once,
Assuming the error affecting each of the four integral entering with P; Q; R; S shells indices. This does not perform well with
eq. (5) is # d, the error DSOC affecting the overall two-electron the Cholesky procedure, as the latter requires individual col-
SOC integral is given by umns of integrals ðpqj ( (Þ at each step, instead of the set of
columns ðPQj ( (Þ, where ðPQj is the shell pair to which (p,q)
pffiffiffiffiffiffiffiffi
DSOC # 4 d2 52d (18) belongs. This can lead to a massive discarding and recalculation
of the integrals, due to the need of decomposing the columns
$ %
Note that, even if the largest pra qjrrb s integrals (in abso- in the set ðPQj ( (Þ in nonconsecutive steps, while following the
lute value) are represented exactly by their Cholesky represen- nonincreasing order of the residual diagonals. To avoid this
tation, this is not strictly true for the SOC integrals, due to the overhead, the strategy that has been adopted to date consists
summations and subtractions entering eq. (5). of generating new Cholesky vectors from the calculated shell
While we will only discuss, here, the application of our pro- pair set as long as the new residual diagonal elements (within
posed CD algorithm to the complete representation of the 2e- the shell pair) are at most a span factor smaller than the largest
SOC integrals, our strategy can easily be coupled with approxi- residual diagonal element of the shell pair itself.[19,24] This
mations involving the calculation of a reduced number of 2e- approach has two consequences: first, the total number of Cho-
SOC integrals, for example the SOC one-center approxima- lesky vectors depends weakly on the chosen span factor and,
tion[29] or the one-center CD (1C-CD) methodology,[20] where second, a number of shell pair integral columns may be calcu-
only one-center or two-center integrals, respectively, would be lated more than once.
decomposed. Finally, we remark that any computational In our implementation, we adopted a different strategy. We
model, which utilizes density fitting (DF) or resolution-of- rigorously decompose at any step the column under the larg-
identity (RI) of the two-electron integrals,[8,11,12] can easily be est remaining diagonal, thus implementing a full-pivoting
adapted to an approach with Cholesky decomposition,[21,22,30] algorithm. The integral recalculation is avoided by the use of
and that the Cholesky representation of the two-electron inte- both the main memory and the disk. After the calculation of
grals can be of particular interest within the DF approximation ðPQj ( (Þ set, the integrals belonging to the right-most nonsig-
as generator of auxiliary basis sets.[30] nificant columns in the Cholesky matrix (i.e., under the diago-
nals smaller than d) are directly discarded, while the significant
integrals are temporarily ordered in memory in their columns.
Full-Pivoting Algorithm
Then, all the columns except the one under decomposition
A CD algorithm for both ERIs and ERI derivatives matrices, as are downloaded on a scratch binary file, ready to be quickly
presented in eq. (9), has been implemented in the Computa- reloaded in the subsequent steps, by means of random file
tional Emulator of Rare Earth Systems (CERES) code.[31] The access directly to their positions. These file positions are in
implementation is written in C1111, taking advantage of the fact stored as pointers on specific attributes of the C11
object-oriented programming. A particular attention has been objects describing the diagonal elements (see below). This
dedicated to the parallelization of the CD code where appro- strategy avoids the integral recalculation, while the overhead
priate, by use of openMP application programming interface due to the I/O operations is small.
(API) specification for parallel programming. ii. The use of the C11 object-oriented design allows a clear
In the CD, the matrix of the integrals is iteratively decom- code structure, and an easy management of the rows and col-
posed column by column (i.e., Cholesky–Crout algorithm), umns reordering. Key elements of the CD, the diagonals are
and a decomposition step corresponds to the transforma- implemented as C11 objects collecting all their specifications
tion of one column of integrals into a new Cholesky vector. as attributes (e.g., integral value, functions indices, shell indi-
To perform the CD in the way which exploits the rank ces, pointer to position on the disk of the corresponding col-
reduction, all the residual diagonals [see eq. (13)] are evalu- umn of integrals). The latter are collected in a vector (i.e.,
ated and sorted in their nonincreasing order at the end of C11 object composition), which is in turn an attribute of a
each step, and the elements in the Cholesky matrix more complex object collecting all the information and

methods concerning the Cholesky data management (e.g., algorithm continues the reordering and deallocation of the
sorting and Cauchy–Schwartz evaluation methods). In this nonsignificant elements in the vectors stored in memory. The
setup, the in-memory position of the elements in the calcu- vectors read from disk are reloaded and kept in memory as
lated Cholesky vectors is independent from the order of the soon as space becomes available.
diagonal elements (as these positions are stored as attributes
of the diagonal objects), allowing for minimal reordering of Computational Considerations
the memory.
iii. The calculation of the residuals by eq. (13) follows the Some considerations must be taken into account when evalu-
calculation/reloading of the integrals. Whether the residuals ating the computational performances of the strategy pro-
estimation is more or less expensive than the integral calcula- posed in the previous sections for the 2e-SOC integrals
tion depends on the basis set (i.e., number of primitive Gaus- calculation, which involve the use of eq. (5) coupled with the
sians and maximum angular momentum quantum numbers), Cholesky representation of the matrix in eq. (9), with respect
the number of the Cholesky vectors already calculated, and to their complete evaluation performed by eq. (3).
the value chosen for the decomposition threshold, which First, we note that the integrals ðpra qjrra sÞ (i.e., with the
determines the number of the significant rows in the new vec- same cartesian component of the derivatives for electron 1
tor, as pointed out in the previous section. Noting that the off- and 2), while present in the matrix eq. (9), they are not needed
diagonal elements in the calculated Cholesky vectors enter the to evaluate eq. (5). However, given the shell-based organisa-
CD of the new vectors always via the corresponding rows [see tion of the integral calculation in any modern code, evaluation
eq. (12)], the first become of no use in the decomposition pro- of these additional integrals does not introduce computational
cedure as soon as the corresponding rows in the new Cho- overhead, as the code makes all the integrals ðPrQjRrSÞ avail-
lesky vectors are classified as no longer significant. Our able at once (i.e., evaluates the integrals for all the functions
algorithm keeps the significant elements of the calculated belonging to the four shells, and for all nine possible combina-
Cholesky vectors in the main memory as long as the amount tions of the two derivatives).[2]
of allocated memory allows. At each step, once a Cholesky More importantly, we note that the number of ERI that needs
vector is completed, the residual diagonals are updated and to be computed to evaluate the 2e-SOC integrals via eq. (5) is
reordered, the rows and the columns in the Cholesky matrix larger than the number of ERI required if the same evaluation is
are revalued as significant or nonsignificant by eq. (15), and carried out via eq. (3), due to the lower permutational symme-
the elements belonging to the no longer significant rows are try of ðPrQjRrSÞ with respect to ðrPrQjRSÞ. This indicates
discarded after having been pushed at the end of their Cho- that eq. (3) is a more efficient strategy to perform the full
lesky vectors, freeing the associated memory. This dynamical unscreened evaluation of the 2e-SOC integrals. Conversely, the
reordering ensures the use of the smallest amount of memory CD of the 2e-SOC integrals via eqs. (9) and (5) will become compu-
at any step, while the overhead due to the reordering is tationally convenient with respect to the direct unscreened evalu-
negligible. ation if the rank reduction achieved by CD is large.
At the end of each decomposition step, the new Cholesky It is possible to make a conservative estimate of the number
vector is written on an output file saving just its nonzero ele- of Cholesky vectors below which the maximal number of inte-
ments, contiguously ordered on the disk by I super index grals that needs to be calculated during a CD becomes smaller
mapped on the order of the functions in the basis set. The than the number of integrals calculated in a direct unscreened
number of Cholesky vectors spanned by each row is also evaluation of the 2e-SOC using eq. (3). Thus to pursue the
saved on disk, in a dedicated vector of integers (guide vector), direct unscreened calculation of the 2e-SOC integrals via eq.
which is updated at any step. This strategy allows for an easy (3) we need to evaluate all the ðrPrQjRSÞ shell combinations,
reconstruction of the Cholesky vectors, where the position of amounting to:
the nonzero elements is determined by the guide vector, and
for a minimal amount of stored data. Moreover, it allows a fast mðm11Þ mðm11Þ 1 $ 2 %
5 m 12m11 m2 (19)
transformation when using eq. (17). Note that the AO to MO 2 2 4
basis transformation is executed vector by vector, reordering
the elements LKI of the Kth vector in their matrix form LKp!q , where m is the number of shells partitioning the basis set. If
then contracting them with the LCAO coefficients matrix. we follow instead the proposed CD strategy using eqs. (5) and
Being the position of the nonzero elements in each vector (9), conservatively assuming that (i) the row-discharging strat-
explicit, the first contraction is executed just on the rows and egy described in the previous section is not pursued (ii) the
columns of the reconstructed matrix which contains at least first N columns belong to N different shell sets, then at most
one nonzero element, with a significant reduction of the num- Nm2 shell-integrals calculations will be necessary to evaluate
ber of multiplications when dealing with the right-most the first N Cholesky vectors.
vectors. Hence, we can obtain a threshold number of significant
An on-the-fly evaluation of the memory usage can switch Cholesky vectors Nth (related to a threshold accuracy dth ), for
the calculation to the disk-supported mode when the available which the number of shells-based integral calculations
memory is not sufficient. In this mode, the last calculated vec- required for the direct unscreened evaluation of the 2e-SOC
tors are recovered at each iteration from the disk, while the integrals reported in eq. (19) will be matched by the maximal

Table 1. Cholesky decomposition (CD) of ERIs for different molecular systems.
DIRECT Traditional CD (MOLCAS) Full-pivoting CD (CERES)
System Basis size Full rank Time Shells Vectors Time t%(ints) Shells Vectors Time t%(ints)
1. 264 34,980 78 s (4.8 Gb) 42 3248 (0.8 Gb) 115 s 18% 90 3034 (0.5 Gb) 75 (85) s 8%
2. 340 57,970 215 s (13.9 Gb) 36 3746 (1.5 Gb) 251 s 25% 96 3575 (1.0 Gb) 174 (226) s 7%
3. 356 63,546 44 m (15.9 Gb) 77 2848 (1.4 Gb) 30 m 86% 77 2721 (0.8 Gb) 20 m 91%
4. 433 93,961 660 s (35.5 Gb) 77 3434 (1.8 Gb) 10 m 40% 213 3275 (0.8 Gb) 118 (276) s 17%
The results in the column DIRECT have been obtained calculating and saving all the unique shell-quartets of indices for ERIs. The values between
parenthesis in the “Time” column are obtained using the nonoptimized general contractions form of the basis sets. The CD threshold is d51028 for 1.
and 2., d51026 for 3. and 4.
number of shell-based integral calculations necessary to per- [2s1p] for H. The SARC2-QZVP-DKH set[36] for Dy has been
form a CD via the strategy proposed here, by solving: used in conjunction with the valence double-zeta 6–31G
basis[37,38] for H, C, and O (hereafter just labeled as SARC2).
1$ 2 %
The geometries for benzene and cis-acrolein come from the
m2 Nth 5 m 12m11 m2 (20)
4 CCse set of geometries reported in Ref. [39], while the geome-
try for Dy complex from its crystallographic structure, reported
which gives Nth 5ðm2 12m11Þ=4.
in Ref. [40].
This indicates that to achieve an efficient CD of the 2e-SOC
The CD timing and number of Cholesky vectors obtained
integrals via the strategy proposed here, basis sets including a
from the decomposition of the ERIs are reported in Table 1.
large number of shells, each containing a small number of
The quite conservative d51028 for benzene and cis-acrolein,
functions (e.g., one- or two-contractions per shells) will be bet-
and d51026 for the [Dy(acac)3(H2O)2] complex, have been
ter performing than basis sets collecting a small number of
selected as CD thresholds (see Refs. [20] and [41] for more
shells, each containing a large number of functions (i.e., multi-
details on the d influence on the energy accuracy). The imple-
ple contractions per shells). We also note that this observation
mentation proposed, here, shows better performance than
holds true for an efficient CD of the ERIs, because the shell-
what reported in the literature, with respect to both timing
driven estimation of the integrals also in this case can lead to
and disk usage. The number of Cholesky vectors calculated
the computation of a large number of nonsignificant columns
with the full-pivoting algorithm, which is now the true effec-
of integrals.
tive rank of the ERI matrix or the dimension of the Cholesky
Finally, we highlight that Nth indicates savings in the integral
basis for the specified threshold d, is in all cases significantly
evaluations when compared with a higher number of calcu-
smaller than those calculated with the traditional algorithm.
lated Cholesky vectors, but cannot give clear information
Moreover, the percentages of time dedicated to the integral
when correlated with a smaller number of calculated Cholesky
calculations reported in Table 1 indicate that MOLCAS’ algo-
vectors, because of the assumptions made in its definition.
rithm spends a longer time on the integral calculation, mainly
due to the shells-partitioning of the basis functions, discussed
Examples and Performance
above, and the integral recalculations (i.e., 21% of the calcu-
In this section, we present some examples of CD for both ERIs lated shell-pairs are repeated for CD of 1., 24% for 2., 13% for
and 2e-SOC integrals, carried out using the algorithm pro- 3., and 30% for 4.). The avoided integral recalculation per-
posed above. Our code CERES has been compiled with GCC formed by the algorithm, here, proposed, together with the
4.8.4 compiler at -O2 level of optimization, and all computa- smaller number of Cholesky vectors obtained from the decom-
tions run on a single CPU core Intel Core(TM) i7-4790 at position, leads to a significant reduction of the decomposition
3.60 GHz, DDR3 memory at 1600 MHz, SSHD at 5400 rpm rota- times. Moreover, our algorithm shows better computational
tion rate, and SATA 3.1 (6.0 Gb/s) connection. At first, the full performances even completely neglecting the integral timings
pivoting strategy has been tested for the ERIs decomposition (i.e., 94, 188, 252, and 360 s for traditional algorithm decompo-
for different molecular systems, comparing its computational sitions vs. 69, 161, 108, and 98 s for full-pivoting decomposi-
performances with the traditional algorithm, as implemented tions, respectively), and it is noteworthy that CD is always less
in MOLCAS package.[24] MOLCAS has been compiled with GNU time consuming than the complete evaluations of the ERIs
Fortran, at the same optimization level of our CERES code. (see column DIRECT).
The systems studied are benzene (1.) and cis-acrolein (2.), In Table 1, the full rank of ERI matrix is reported together
coupled with the Dunning correlation-consistent valence cc- with the size of the basis set, and the partitioning in shells. As
pVTZ and cc-pVQZ basis sets,[33,34] respectively, and [Dy(aca- pointed out in the previous section, the partition into shells
c)3(H2O)2] complex, coupled with both the multiple-contracted plays an important role in the speed up of the decomposition.
ANO-RCC basis[35] (3.), and the segmented all-electron relativis- The reduced form of the basis sets obtained by optimizing the
tically contracted (SARC) (4.) basis set. The ANO-RCC set is general contractions (see Ref. [42]) with the functions grouped
contracted to [8s7p4d3f2g1h] for Dy, [3s2p] for C and O, and in a larger number of shells, have to be preferred to the

Table 2. Cholesky decomposition (CD) for 2e-SOC integrals.
DIRECT CD
System Basis size Shells Time Full rank Nth d Vectors Time t%(ints)
24
1. 264 90 7.7 m (28 Gb) 209,088 2070 10 1749 (1.3 Gb) 3.3 m 18%
1025 2321 (1.8 Gb) 5.3 m 16%
1026 3106 (2.6 Gb) 8.9 m 14%
1027 3831 (3.5 Gb) 14.3 m 12%
1028 4689 (4.5 Gb) 22.7 m 10%
2. 340 96 21.9 m (77 Gb) 346,800 2352 1024 2319 (2.8 Gb) 12.2 m 17%
1025 2900 (4.0 Gb) 20.0 m 14%
1026 3659 (5.3 Gb) 28.0 m 13%
1027 4317 (6.8 Gb) 41.1 m 12%
1028 4705 (8.3 Gb) 51.9 m 12%
3. 356 77 8.0 h (94 Gb) 380,208 1521 1024 2942 (3.5 Gb) 4.4 h 93%
1025 3793 (5.6 Gb) 5.6 h 90%
1026 5028 (8.1 Gb) 7.0 h 88%
1027 6242 (10.9 Gb) 14.7 h 92%
1028 6427 (13.5 Gb) 18.9 h 92%
4. 433 213 63.2 m (200 Gb) 562,467 11,449 1024 3721 (3.8 Gb) 18.3 m 27%
1025 4774 (5.9 Gb) 29.7 m 23%
1026 6196 (8.4 Gb) 48.0 m 19%
1027 7919 (11.4 Gb) 77.7 m 16%
1028 8510 (14.9 Gb) 114.2 m 13%
The results in the column DIRECT have been obtained calculating and saving all the unique shell-quartets of indices for the integrals given by eq. (3).
Nth is defined in eq. (20).
generally contracted form,[43–45] leading to a smaller number integrals is about 20–30 times more time consuming than the
of shells. As evident from the number of shells reported in CD of ERIs.
Table 1, the MOLCAS code uses the generally contracted Good computational performances are obtained when deal-
forms. The best performance of our code for the ERI decompo- ing with a large number of shells, as shown by the decomposi-
sition was achieved for [Dy(acac)3(H2O)2] using the SARC2 tions of [Dy(acac)3(H2O)2] integrals using a SARC2 basis set.
basis set, in which case CERES takes about 2.5 min using 433 For this system, CD is computationally less expensive than the
functions partitioned in 213 shells, while MOLCAS takes about complete direct evaluation (64 min) when using thresholds
10 min using 433 functions grouped in 77 shells. In all cases higher than 1027 . The number of calculated Cholesky vectors
explored here, our code performs faster decompositions with is always much smaller that the threshold number of columns
respect to MOLCAS, also when using the generally contracted Nth estimated via eq. (20) (11,449), which indicates that a sig-
form of the basis. The latter leads to a slightly different num- nificantly smaller number of integrals has been calculated by
ber of vectors for cis-acrolein (3566). Conversely, it is notewor- CD respect to the integrals calculated by their complete evalu-
thy that our proposed algorithm shows better performances ation using eq. (3). This is confirmed by the timings for the
for [Dy(acac)3(H2O)2] also when using the ANO-RCC basis set, integral evaluations within CDs, reported in Table 2 as percen-
where both MOLCAS and CERES use the same shells partition- tages of the whole CD timings, which are smaller than 22% of
ing of the basis functions. Finally, our proposed algorithm dis- the direct evaluation time for all the d thresholds showed in
plays a more optimized use of disk space when storing the the Table (e.g., about 14 min for d51028 vs. 64 min).
Cholesky vectors, leading to time savings for I/O operations, The situation is different for the heavily contracted ANO-
due both to a smaller number of calculated vectors as a con- RCC basis set, where the number of Cholesky vectors (5028) is
sequence of full pivoting, and to the storage strategy pro- more than three times the threshold number of columns given
posed here. by eq. (20) (1521). In this case, Nth cannot give a clear informa-
The number of Cholesky vectors generated by decomposition about possible savings in the integral calculations. From
$ %
tion of the matrices collecting the pra qjrrb s integrals are the numerical evaluation, the integral calculations while per-
shown in Table 2. Different thresholds have been chosen to forming CD take about 4, 5, and 6 h for d equal to 1024 ; 1025 ,
study the computational performances of the proposed CD and 1026 , respectively, which are 50%, 63%, and 86% of the
with respect to the 2e-SOC integrals complete calculations by timings for the direct 2e-SOC integrals computation. This percen-
eq. (3) (hereafter called direct evaluation). Here, the rank tages still indicate savings in the integral calculations, but they
reduction obtained by CD is huge with respect to that are not as good as those obtained for system 4. Here, the vector
achieved for the ERIs using equal threshold. The number of decomposition takes less than 10% of the whole decomposition
Cholesky vectors for the 2e-SOC integrals representation is time, which means that the better performances showed by CD
never larger than twice the number of Cholesky vectors for with respect to the direct evaluation by eq. (3), when using
ERIs, while the increase in the full rank is about sixfold with d51024 ; 1025 , and 1026 , mainly come from the savings in the
respect to rank of the ERI matrix. The CD of the 2e-SOC integrals computations. Conversely, CDs performed with smaller

thresholds compute an excessively large number of integrals, Nth for which we report a rough estimate. We showed that 2d
resulting in high time-demanding decompositions. is the upper limit for the errors affecting the reconstructed 2e-
The number of calculated Cholesky vectors is about/larger SOC integrals. The first tests of our algorithm show that CD of
than the Nth value for the results on the Dunning basis sets, still the 2e-SOC integrals is about 20–30 times more time consum-
not indicating a clear saving in the integral evaluations. Differ- ing that CD of ERIs, and our analysis does not indicate that a
ently from system 3., and similarly to system 4., here, the inte- significant computational savings can always be achieved.
grals calculations take less than 20% of the total CD timings, Finally, our results illustrated the fact that the shell-
meaning that the latter marginally influence the computational partitioning of the basis functions plays an important role in
performances, and that the bottleneck of the decomposition is the speed up of the decomposition, and that low-contracted
the vectors calculation. CD shows slightly smaller timings with shells have to be preferred. On this point, we noted significant
respect to the 2e-SOC direct evaluation when using d51024 differences in computational times when performing CD on
and 1025 , while the decompositions are significantly more time integrals represented on optimized general rather than gener-
consuming when adopting smaller thresholds. ally contracted basis sets.
Unfortunately, this preliminary study on the possible appli-
cation of our proposed CD of 2e-SOC integrals, as currently Keywords: Cholesky ) spin-orbit coupling ) electron repulsion
implemented, does not show significant computational savings integrals ) full-pivoting ) decomposition
in a consistent manner. Partly, this is due to the fact that, in
comparison to the ERIs case where the Cholesky matrix is
defined by the same integrals that are needed for the direct How to cite this article: M. Piccardo, A. Soncini. J. Comput.
evaluation of ERIs, the two Cholesky matrices for the 2e-SOC Chem. 2017, 38, 2775–2783. DOI: 10.1002/jcc.25062
defined, here, contain more integrals than those needed for
the direct calculation of 2e-SOC integrals, which can obviously
lead to small time savings of the Cholesky approach with [1] T. Helgaker, P. Jørgensen, J. Olsen, Molecular Electronic–Structure The-
ory; Wiley: New York, 2000.
respect to direct evaluation using eq. (3). In fact, as discussed [2] C. M. Marian, In Reviews in Computational Chemistry; Wiley, 2001;
in the Computational Considerations section, for cases in chapter 3, John Wiley & Sons, Inc., pp. 99–204.
which the rank reduction is not overwhelmingly favourable, [3] K. G. Dyall, K. Fægri, Introduction to Relativistic Quantum Chemistry;
there even exists the possibility for the Cholesky algorithm to Oxford University Press, 2007.
[4] M. Reiher, A. Wolf, Relativistic Quantum Chemistry: The Fundamental
compute a number of integrals that exceeds of overcoming Theory of Molecular Science; Wiley-VCH: Germany, 2015.
the total number of 2e-SOC integrals directly calculated via eq. [5] B. A. Heß, C. M. Marian, U. Wahlgren, O. Gropen, Chem. Phys. Lett.
(3). However, we still note that only the CD representation of 1996, 251, 365.
[6] B. Schimmelpfennig, AMFI - An Atomic Mean Field Integral Program,
SOC integrals allows (i) the storage of the smallest amount of
University of Stockholm, Stockholm, Sweden, 1996.
data needed to numerically represent all the SOC integrals [7] F. Neese, J. Chem. Phys. 2005, 122, 034107.
with absolute accuracy smaller than 2d, (ii) the possibility of [8] J. L. Whitten, J. Chem. Phys. 1973, 58, 4496.
efficient AO to MO transformations by eq. (17). [9] E. Baerends, D. Ellis, P. Ros, Chem. Phys. 1973, 2, 41.
[10] T. A. Pakkanen, J. L. Whitten, J. Chem. Phys. 1978, 69, 2168.
[11] M. Feyereisen, G. Fitzgerald, A. Komornicki, Chem. Phys. Lett. 1993,
Conclusions 208, 359.
[12] O. Vahtras, J. Alml€of, M. Feyereisen, Chem. Phys. Lett. 1993, 213, 514.
In this article, we presented a new algorithm for the CD of [13] F. Aquilante, T. B. Pedersen, R. Lindh, J. Chem. Phys. 2007, 126, 194106.
two-electron integrals based on a full-pivoting design, and [14] F. Aquilante, L. Boman, J. Bostr€ om, H. Koch, R. Lindh, A. S. de Mer#as, T.
B. Pedersen, In Linear-Scaling Techniques in Computational Chemistry
tested a new proposed strategy to compute two-electron ERI
and Physics; Springer: Netherlands, Dordrecht, 2011; pp. 301–343.
and SOC integrals using the Cholesky representation. [15] Note Sur Une M# ethode de R# esolution des #equations Normales Prove-
The new algorithm shows very good performance in both nant de L’Application de la M# eThode des Moindres Carr# es a un Sys-
time and storage savings for the case of ERIs. Full-pivoting tème D’# equations Lin# eaires en Nombre Inf# erieur a Celui des
Inconnues. — Application de la M# ethode a la R# esolution D’un
achieves efficiency by avoiding the recalculation of integrals,
Système D. Bulletin G# eod#esique, 1924, 2, 67.
that are usually discarded in CD algorithms due to the shell- [16] N. H. F. Beebe, J. Linderberg, Int. J. Quantum Chem. 1977, 12, 683.
driven integral evaluation, and leads to a univocal definition of [17] I. Røeggen, E. Wisløff-Nilssen, Chem. Phys. Lett. 1986, 132, 154.
the Cholesky basis, once the CD d threshold is fixed. The [18] D. W. O’neal, J. Simons, Int. J. Quantum Chem. 1989, 36, 673.
[19] H. Koch, A. Sanchez de Meras, T. B. Pedersen, J. Chem. Phys. 2003, 118,
results for the CD of the ERIs achieved with the new algorithm 9481.
show a significant reduction of the number of Cholesky vec- [20] F. Aquilante, R. Lindh, T. Bondo Pedersen, J. Chem. Phys. 2007, 127,
tors with respect to previous approaches, amounting to the 114107.
true effective rank of the ERI matrix. [21] I. Røeggen, T. Johansen, J. Chem. Phys. 2008, 128, 194107.
[22] F. Aquilante, T. B. Pedersen, R. Lindh, B. O. Roos, A. Sanchez de Meras,
We also proposed a strategy to perform CD of 2e-SOC inte- H. Koch, J. Chem. Phys. 2008, 129, 024113.
grals, based on a previously unexplored expansion of the 2e- [23] F. Aquilante, P.-Å. Malmqvist, T. B. Pedersen, A. Ghosh, B. O. Roos, J.
SOC integrals in terms of ERI integrals involving the derivatives Chem. Theory Comput. 2008, 4, 694.
[24] F. Aquilante, L. De Vico, N. Ferr# e, G. Ghigo, P. Å. Malmqvist, P.
of overlap distribution functions of both electrons, and
Neogr#ady, T. B. Pedersen, M. Pito$ n#ak, M. Reiher, B. O. Roos, L. Serrano-
showed that this strategy is expected to be efficient if the Andr# es, M. Urban, V. Veryazov, R. Lindh, J. Comput. Chem. 2010, 31,
effective rank achieved via CD is lower than a threshold rank 224.

[25] K. Aidas, C. Angeli, K. L. Bak, V. Bakken, R. Bast, L. Boman, O. Properties Of Lanthanide Complexes, 2017, see also S. Calvello, M. Pic-
Christiansen, R. Cimiraglia, S. Coriani, P. Dahle, E. K. Dalskov, U. cardo, S. V. Rao, A. Soncini, submitted.
Ekstr€om, T. Enevoldsen, J. J. Eriksen, P. Ettenhuber, B. Fern#andez, L. [32] Q. Sun, J. Comput. Chem. 2015, 36, 1664.
Ferrighi, H. Fliegl, L. Frediani, K. Hald, A. Halkier, C. H€attig, H. Heiberg, [33] T. H. Dunning, J. Chem. Phys. 1989, 90, 1007.
T. Helgaker, A. C. Hennum, H. Hettema, E. Hjertenæs, S. Høst, I.-M. [34] R. A. Kendall, T. H. Dunning, R. J. Harrison, J. Chem. Phys. 1992, 96,
Høyvik, M. F. Iozzi, B. Jans#ık, H. J. A. Jensen, D. Jonsson, P. Jørgensen, 6796.
J. Kauczor, S. Kirpekar, T. Kjærgaard, W. Klopper, S. Knecht, R. [35] B. O. Roos, R. Lindh, P.-Å. Malmqvist, V. Veryazov, P.-O. Widmark, A. C.
Kobayashi, H. Koch, J. Kongsted, A. Krapp, K. Kristensen, A. Ligabue, O. Borin, J. Phys. Chem. A 2008, 112, 11431.
B. Lutnæs, J. I. Melo, K. V. Mikkelsen, R. H. Myhre, C. Neiss, C. B. [36] D. Aravena, F. Neese, D. A. Pantazis, J. Chem. Theory Comput. 2016, 12,
Nielsen, P. Norman, J. Olsen, J. M. H. Olsen, A. Osted, M. J. Packer, F. 1148.
Pawlowski, T. B. Pedersen, P. F. Provasi, S. Reine, Z. Rinkevicius, T. A. [37] R. Ditchfield, W. J. Hehre, J. A. Pople, J. Chem. Phys. 1971, 54, 724.
Ruden, K. Ruud, V. Rybkin, P. Sałek, C. C. M. Samson, A. S. de Mer#as, T. [38] W. J. Hehre, R. Ditchfield, J. A. Pople, J. Chem. Phys. 1972, 56, 2257.
Saue, S. P. A. Sauer, B. Schimmelpfennig, K. Sneskov, A. H. Steindal, K. [39] M. Piccardo, E. Penocchio, C. Puzzarini, M. Biczysko, V. Barone, J. Phys.
O. Sylvester-Hvid, P. R. Taylor, A. M. Teale, E. I. Tellgren, D. P. Tew, A. J. Chem. A 2015, 119, 2058.
Thorvaldsen, L. Thøgersen, O. Vahtras, M. A. Watson, D. J. D. Wilson, [40] S.-D. Jiang, B.-W. Wang, G. Su, Z.-M. Wang, S. Gao, Angew. Chem. Int.
M. Ziolkowski, H. Ågren, Wiley Interdiscip. Rev. Comput. Mol. Sci. 2014, Ed. 2010, 49, 7448.
4, 269. [41] F. Aquilante, R. Lindh, T. B. Pedersen, J. Chem. Phys. 2008, 129, 034106.
[26] T. Helgaker, P. R. Taylor, In Modern Electronic Structure Theory; D. R. [42] T. Hashimoto, K. Hirao, H. Tatewaki, Chem. Phys. Lett. 1995, 243, 190.
Yarkony, Ed.; Vol. 2 of Advanced Series in Physical Chemistry; World Sci- [43] R. C. Raffenetti, J. Chem. Phys. 1973, 58, 4452.
entific Publishing Company: London, UK, 1995; Chapter 12. [44] S. Huzinaga, M. Klobukowski, H. Tatewaki, Can. J. Chem. 1985, 63,
[27] F. Aquilante, L. Gagliardi, T. B. Pedersen, R. Lindh, J. Chem. Phys. 2009, 1812.
130, 154107. [45] S. Huzinaga, M. Klobukowski, Chem. Phys. Lett. 1985, 120, 509.
[28] F. Jensen, Introduction to Computational Chemistry; Wiley: Oxford, UK,
2016.
[29] F. Neese, J. Chem. Phys. 2005, 122, 034107. Received: 6 June 2017
[30] T. B. Pedersen, F. Aquilante, R. Lindh, Theor. Chem. Acc. 2009, 124, 1. Revised: 11 August 2017
[31] A. Soncini, S. Calvello, M. Piccardo, S. V. Rao, CERES, An Ab Initio Quan- Accepted: 24 August 2017
tum Chemistry Package for the Electronic Structure and Magnetic Published online on 25 September 2017

JCompChem2017 Cholesky SOC

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

JCompChem2017 Cholesky SOC

Uploaded by

Copyright:

Available Formats

WWW.C-CHEM.

ORG FULL PAPER

A Full-Pivoting Algorithm for the Cholesky Decomposition of

achieve a Cholesky representation of the spatial part of the 2e-SOC

Introduction Significant reductions in the computational effort, as well as

Journal of Computational Chemistry 2017, 38, 2775–2783 2775

pp pq ... rx prx p ... rx pry q ...

2776 Journal of Computational Chemistry 2017, 38, 2775–2783 WWW.CHEMISTRYVIEWS.COM

prx p prx q ... qrx p ... pry p ...

jDKIJ j2 # DKII DKJJ (15)

where d is a given threshold, after the computation of N5K21

Journal of Computational Chemistry 2017, 38, 2775–2783 2777

2778 Journal of Computational Chemistry 2017, 38, 2775–2783 WWW.CHEMISTRYVIEWS.COM

Journal of Computational Chemistry 2017, 38, 2775–2783 2779

Table 1. Cholesky decomposition (CD) of ERIs for different molecular systems.

DIRECT Traditional CD (MOLCAS) Full-pivoting CD (CERES)

2780 Journal of Computational Chemistry 2017, 38, 2775–2783 WWW.CHEMISTRYVIEWS.COM

Table 2. Cholesky decomposition (CD) for 2e-SOC integrals.

Journal of Computational Chemistry 2017, 38, 2775–2783 2781

2782 Journal of Computational Chemistry 2017, 38, 2775–2783 WWW.CHEMISTRYVIEWS.COM

Journal of Computational Chemistry 2017, 38, 2775–2783 2783

You might also like