Professional Documents
Culture Documents
VARIATIONAL BASIS OF UPDATE measure of the difference between B9 and G9. The
HESSIAN MATRICES value of this trace at the stationary point, N W ŽE*.,
Greenstadt 10 was first to apply a variational is11
principle to obtain update formulae, but it was 2
jT Wj Ž D x T j.
applied to the H matrix rather than the B matrix. N W Ž E*. s 2y . Ž 11.
Here we briefly summarize this variational princi- D x T MD x D x T MD xjT Wj
ple, but we apply it only to the B matrix.11 The QN
condition is defined as1, 16 In many situations the W matrix and, conse-
quently, the M matrix depend on a parameter,11
D g s Ž B q E . D x s B9D x. Ž6.
say f . The optimum value of this parameter that
This condition is an approximation of the first- minimizes N W ŽE. can be obtained by derivation of
order Taylor series expansion of g k with respect eq. Ž9. at the stationary point E*, p*, L* with re-
to the step vector D x conveniently rearranged. spect to f and imposing the stationary condition;
Greenstadt 10 evaluated the E matrix by minimiz- that is,
ing the following weighted Euclidean norm,
dN W Ž E*. dF Ž E Ž f . *, p Ž f . *, L Ž f . *, W Ž f ..
N W Ž E . s Tr Ž WEWET . , Ž7. s
df df
subject to the QN condition, eq. Ž6., rearranged as ET pT
s =E F q =pF
j s ED x s D g y BD x, Ž8. f f
and the symmetry condition E y E T s 0. In eq. Ž7. LT F
Tr indicates the trace and W is a symmetric and q =LF q s 0. Ž 12.
f f
positive definite matrix. The problem is formu-
lated using the following Lagrangian function, Using the fact that F is stationary with respect to
1 E, p, and L at the point E*, p*, and L*, one gets
F Ž E, p, L, W. s Tr Ž WEWET . after some rearrangements
2
2
q Tr w Ž ED x y j . p T x q Tr w L Ž E y E T .x , Ž9. jT WM9Wj Ž D x T MD x .
2
where the p vector and L matrix are the La- q D x T MD xjT Wj y Ž D x T j . D x T M9D x s 0, Ž 13.
grangian multipliers. With these considerations
the formula to update the B9 matrix is where M9 is the derivative of the M matrix with
respect to the f parameter. In the deduction of
B9 s B q E* s B q juT q ujT y Ž jTD x . uuT expression Ž13. the derivative of WM s I was used.
0 1 Note that the term in brackets is positive or equal
s B q w j ux w j ux T , Ž 10.
1 y Ž jTD x . to zero due to the Cauchy]Schwarz inequality.
Given a D x T MD x and jT Wj different from zero, if
where the E* matrix is the solution of eq. Ž9., w D x T MD xjT Wj y Ž D x T j. 2 x / 0 and the M9 matrix
u s Ž D x T MD x.y1 MD x, and M s Wy1. In Ap- is positive definite, the stationary condition is sat-
pendix A we give a derivation of the most general isfied by the condition jT WM9Wj s D x T M9D x s
inverse to eq. Ž10.. It is important to emphasize 0. On the other hand, if the term in brackets is
that this variational principle does not say any- zero, then the stationary condition should be satis-
thing about the form of the W matrix, except for its fied by the single condition jT WM9Wj s 0. How-
symmetric and positive definite character. How- ever, as will be seen below, these results have
ever, given suitable forms to the W matrix one can limited practical value.
derive many updated Hessian formulae as the
Powell,12 MSP,11, 13, 14 and others.10, 11
Equation Ž10. indicates that the best general DIRECT VARIATIONAL DEDUCTION OF BFGS
FORMULA TO UPDATE B MATRIX
correction to the B matrix from a variational point
of view is a rank two matrix. Applied to a quadratic From the author’s knowledge, all variational
function, because the vector j is conjugate to the deductions of the BFGS formula to update the B
previous directions, the B9 matrix will preserve the matrix were obtained indirectly. First the correc-
conjugancy condition if the u vector is also conju- tion to the H matrix is obtained using a formula
gate to the previous directions.1, 16 Equation Ž7. is a like Ž10. for inverse updating and after that one
evaluates its inverse.1, 24, 25 Here we present a di- Substituting eqs. Ž18. into eq. Ž16. and the result
rect variational deduction based on expression Ž10.. into eq. Ž10., we obtain eq. Ž3. after some algebraic
Because the E matrix represents a variation of the manipulations. From this deduction we can very
B matrix, the best W matrix should be an approxi- easily analyze the behavior of the BFGS formula.
mation to the next average integral,24 Because the M matrix should be positive definite,
then from eq. Ž16. it is necessary that the B and B9
1 y1 Ž
H0 G x k q aD x . da matrices both be positive definite too. This means
Ws . Ž 14. that the optimum situation for BFGS updating of
1
H0 da the Hessian matrix is when the QNR algorithm is
applied to minimize a function. In other condi-
tions, such as in the optimization of a TS, the BFGS
A reasonable approximation to this average inte-
update formula presents a degradation because in
gral consists of using the H and H9 matrices rather
this case the M matrix is not positive definite. This
than the full variation of the true inverse Hessian
means that from the present variational theory one
matrix, Gy1 , and replacing the integral by a dis-
cannot use eq. Ž3. directly to optimize a TS. How-
crete summation
ever, other important conclusions can be drawn
aH9 q bH from this deduction. First, BFGS is based on an M
Ws a G 0, b G 0, a q b s 1. or W matrix that is very close to the correct aver-
aqb
Ž 15. age of the variation on the Gy1 matrix. Second, the
parameters a and b are functions of the quotient of
In this way the M matrix should be an approxima- two Rayleigh quotients, namely D x T BD xrDx TD x
tion of the average integral of the true Hessian and D x T B9D xrD x TD x s D x TD grD x TD x, which
matrix G; that is, take into account the behavior of B and B9. These
two points explain the good performance generally
M s aB9 q bB a G 0, b G 0, a q b s 1. Ž 16. observed with the BFGS update formula when it
Note that the M matrix is positive definite if B and is applied to the optimization of a minimum. Fi-
B9 are positive definite. In order to evaluate the nally, if in eq. Ž16. one forces a s 1, then M s B9,
parameters a and b, we impose the following and substituting it in eq. Ž10. we get the DFP
condition: formula.10, 11 On the other hand, forcing b s 1,
then M s B, gives the dual Greenstadt
D x T MD x s aD x TD g q bD x T BD x formula.10, 11 Using these results we conclude that
s Ž D x TD g D x T BD x .
1r2 the BFGS formula is an average of the DFP and
dual Greenstadt update formulae.
a G 0, b G 0, a q b s 1, Ž 17.
where eq. Ž6. has been used. Now from eq. Ž17.
and the fact that a q b s 1, we get TS-BFGS UPDATE HESSIAN FORMULA
as
ž D x TD g / 1r2
ifying this formula to apply it to saddle point
optimizations. The first serious and maybe the
D x T BD x single problem is that the M matrix is nonpositive
1q
ž D x TD g / defined, as pointed out before. In BFGS the u
vector is
1r2
Ž D x T BD x .
s 1r2 1r2
, Ž 18a.
Ž D x TD g . q Ž D x T BD x . 1
us 1r2
Ž aD g q bBD x . , Ž 19.
1 Ž D g TD x D x T BD x .
bs 1r2
D x T BD x
1q
ž D x TD g / which is clearly a vector that is a linear combina-
tion of the D g and BD x vectors. Now, if we define
1r2 the M matrix in the following way,
Ž D x TD g .
s 1r2 1r2
. Ž 18b .
Ž D x TD g . q Ž D x T BD x . M s Ž 1 y f . <B < q f D g D g T , Ž 20.
where the <B < is the B matrix, but forcing all eigen- parameter f as the square cosine of the angle
values to be positive, that is, between the D x and j vectors. In this way the f
n
parameter falls in the correct domain.
<B < s Ý < li <vi viT , Ž 21.
is1 PRACTICAL IMPLEMENTATION OF TS-BFGS
UPDATE HESSIAN FORMULA
where l i , n
are the eigenpairs of the B matrix.
vi 4is1
Equation Ž20. can be seen as a rank one correction The update Hessian formula just derived is ade-
to the Ž1 y f .<B < matrix. Note that the M matrix quate for any algorithm of the type QArTRIM or
defined in this way is positive definite if 1 G f G 0. RQNR briefly described above with the eigenvec-
The corresponding W matrix is tor following method.20 This is because one needs
1
Ws TABLE I.
Ž1 y f .
Starting and Final Geometrical Parameters
f for First Transition Structure of Reaction
= <H < y <H < D g D g T <H < , CH3 CH 2 F + OH yª CH 2 CH 2 + H 2 O + F y.
1 y Ž 1 y D g T <H < D g . f
Ž 22. H8
-
H7 F6
0
0
C5
where <H < s Ž<B <.y1 . Now the u vector takes the
following form
1 C4
us
-
2 H3 0 H10
½ D x B < D x q Ž D x TD g . y D x T <B < D x f
T<
5 H9
O2
= w Ž 1 y f . <B < D x q f Ž D x TD g . D g x , Ž 23.
H1
which is a linear combination of BD x with B forced
to be positive definite and the D g vectors as in the Parameter Starting Final
BFGS formula. Substituting eq. Ž23. into eq. Ž10.
we get a new update formula, labeled BXTS-BFGS ; H1O 2 0.944 0.943
O 2H3 1.500 1.559
that is,
H3 C4 1.254 1.222
BXTS-BFGS C 4C 5 1.472 1.503
C5F6 1.600 1.393
T
Ž D g y BD x .w Ž 1 y f . <B < D x q f Ž D x TD g . D g x C5H7 1.112 1.126
sBq 2 C5H8 1.112 1.126
D x T <B < D x q w Ž D x TD g . y D x T <B < D x x f 4 C4 H9 1.110 1.111
C 4 H10 1.110 1.111
w Ž 1 y f . <B < D x q f Ž D x TD g . D g x Ž D g y BD x . T
q H 3 O 2 H1 101.8 101.8
2
D x T <B < D x q w Ž D x TD g . y D x T <B < D x x f 4 C4 H3 O 2 177.3 178.8
C5 C4 H3 108.2 108.4
D x T Ž D g y BD x . F6 C5 C4 111.2 114.0
y 2
2 H7 C5 C4 114.0 109.9
D x T <B < D x q w Ž D x TD g . y D x T <B < D x x f 4
H8C5C4 114.0 109.9
= w Ž 1 y f . <B < D x q f Ž D x TD g . D g x H 9 C 4C 5 112.2 111.2
H10 C 4C 5 112.2 111.2
T
= w Ž 1 y f . <B < D x q f Ž D x TD g . D g x . Ž 24. H1O 2 H 3 C 4 0.0 5.0
O 2 H 3 C 4C 5 4.8 162.0
To obtain the best f parameter one should H 3 C 4C 5 F 6 180.0 180.0
substitute eqs. Ž20. and Ž22. and the derivative of H7 C5 C4 F6 115.0 120.6
the M matrix with respect to f into eq. Ž13.. H 8 C5 C4 F6 y115.0 y120.6
However, this gives a very complicated equation H 9 C 4C 5 H 3 118.2 119.3
on f to be solved and, in addition, sometimes the H10 C 4C 5 H 3 y118.2 y119.2
value of the resulting f does not fall in the do- Bond lengths in angstroms and bond angles and dihedrals
main 1 G f G 0. Instead of this we selected the in degrees.
the full diagonalization of the current B matrix. Compute u ¤ f uŽuTD x. and after that
Rather than evaluating BXTS-BFGS by using eq. Ž24.,
we employ the following algorithm: n
u ¤ u q Ž1 y f . Ý < li <vi ŽviTD x. . Ž 25.
Reset the vector u s 0. is1
TABLE II.
Behaviors of TS-BFGS, MSP, and Powell Update Formulae along Optimization Process of First
Transition Structure for Reaction CH3 CH 2 F + OH yª CH 2 CH 2 + H 2 O + F y.
0
H3
where f s a. Now the problem is to see which is
2
H9
H10
the best f parameter, that is, the f parameter that
0
C4
minimizes N W ŽE*.. Evaluating the inverse matrix
W and the derivative with respect to f , M9, corre-
sponding to the M matrix defined in eq. Ž26., and C5
-
substituting the results into eq. Ž13. we get H8 H7
2
Ž D x T j.
f s f cos 2 v s 0 1 G f G 0, Ž 27. F6
D x TD x jT j
Parameter Starting Final
where v is the angle formed between the vectors j
and D x. Equation Ž27. tell us that when cos 2 v / 0, H1O 2 0.962 0.963
O 2H3 0.966 0.965
the optimum f is f s 0, which corresponds to the
H3 C4 2.433 2.534
MS update.11 On the other hand, if cos 2 v s 0, any C 4C 5 1.334 1.336
f between 1 and 0 is, in principle, optimum. C5F6 2.500 2.420
However, the last situation very often occurs near C5H7 1.100 1.100
the convergence, but in this case the MS update is C5H8 1.100 1.104
not the optimum.1 Consequently, one should take C4 H9 1.097 1.096
any f such that 1 G f ) 0. From a practical point C 4 H10 1.096 1.096
of view, when cos 2 v F « , « being a small num- H 3 O 2 H1 102.5 102.4
ber, the parameter f is selected as f s C4 H3 O 2 111.5 106.2
Ž1 y cos 2 v .1r2 .11, 13 A suitable choice for « is 0.05. C5 C4 H3 81.3 74.5
Obviously, if along the optimization process F6 C5 C4 137.7 129.9
H7 C5 C4 123.5 122.4
cos 2 v F « , the MSP update coincides with the
H8C5C4 123.7 124.0
Powell update because in this case f f 1, and, H 9 C 4C 5 122.8 123.0
consequently, a f 1. H10 C 4C 5 122.7 122.4
Now we present the behavior and the perfor- H1O 2 H 3 C 4 0.0 y5.0
mance of the BFGS, TS-BFGS, MSP, and Powell O 2 H 3 C 4C 5 179.3 156.5
update formulae in the location and optimization H 3 C 4C 5 F 6 178.7 144.9
of TSs of several reactions. The calculations were H7 C5 C4 F6 87.6 99.9
performed at the AM126 and ab initio Hartree]Fock H 8 C5 C4 F6 y85.8 y74.2
levels. The optimizations were carried out in inter- H 9 C 4C 5 H 3 92.6 116.2
nal coordinates. The algorithms used to optimize H10 C 4C 5 H 3 y91.4 y68.5
the TSs were the RQNR described in ref. 13 for the Bond lengths in angstroms and bond angles and dihedrals
AM1 calculations and the QArTRIM algorithm18 in degrees.
the absolute value of the energy differences be- reactions.30 In Table I we present the geometrical
tween two consecutive iterations < f Žx kq 1 . y f Žx k .<; parameters of the initial and optimized molecular
with the threshold values 1.8 ? 10y3 A,
˚ 1.2 ? 10y3 A, ˚ geometries calculated for the first TS. In Table II
y1 ˚ y2 ˚
1.5 ? 10 kcalrmol A, 7.5 ? 10 kcalrmol A, and we compare the behavior of the optimization pro-
5.0 ? 10y4 kcalrmol, respectively, which are the cess using different update formulae. We observe
units used in the MOPAC program. In the case of that using the TS-BFGS update Hessian formula
the ab initio optimizations, the standard conver- the algorithm converges within 28 iterations.
gence criteria of the GAMESS program were used. Using the Powell update formula, the algorithm
In the GAMESS program a value of 7 ? 10y3 for the converges within 14 iterations. However, the sta-
« parameter was taken. tionary point reached possesses two imaginary fre-
quencies, 331.3i cmy1 and 39.5i cmy1 . On the
other hand, if we use the MSP update formula the
ELIMINATION REACTION CH 3 CH 2 F + OH yª
algorithm needs 31 iterations to converge to the
CH 2 = CH 2 + H 2 O + F y
correct TS. The correct optimized geometrical pa-
This reaction was studied by Cummins and rameters are those shown in Table I. At the final
Gready 29 with the AM1 method.26 The wave func- stationary point, the imaginary frequency
tion employed was the restricted Hartree]Fock. 305.3i cmy1 is associated with the true transition
The overall reaction occurs through two transition vector. Finally we note that using the standard
structures as found for other elimination BFGS formula, eq. Ž3., the algorithm diverges.
TABLE IV.
Behavior of TS-BFGS, MSP, and Powell Update Formulae along Optimization Process of Second
Transition Structure for Reaction CH3 CH 2 F + OH yª CH 2 CH 2 + H 2 O + F y.
TABLE V.
Starting and Final Geometrical Parameters of
DISSOCIATION REACTION OF H 2 CO ª
Transition Structure for Reaction
H 2 + CO
CH3 OOH ª CH3 + OOH. This reaction was studied by Cerjan and Miller 31
H1
at the Hartee]Fock ab initio level with the STO-2G
-
H7
mula, in the first four iterations the Hessian matrix
Parameter Starting Final does not have the desired eigenvalue spectra. For
the MSP update, in the first four iterations and at
H1O 2 1.010 1.009 iteration 11 the Hessian matrix is positive definite.
O2O3 1.177 1.178 Also, in the Powell update at the first four itera-
O3C4 2.800 2.683 tions, the Hessian matrix does not present the
C4 H5 1.086 1.087 correct structure. Using the three different update
C4 H6 1.086 1.086
formulae the optimization process converges to the
C4 H7 1.088 1.086
H1O 2 O 3 112.6 112.5
same molecular geometry with insignificant differ-
O2O3C4 119.5 117.3 ences. Finally, and employing the standard BFGS
O3C4 H5 70.0 68.8 formula, the process does not converge.
O3C4 H6 92.3 75.9
O 3 C4 H7 107.8 125.0
H1O 2 O 3 C 4 88.2 90.4 METHOXY RADICAL ISOMERIZATION,
O2O3C4 H5 174.5 177.5 CH 3 O ª CH 2 OH
H5O3C4 H6 y120.9 y127.5
This reaction was studied by Culot et al.18b at
H 5 O 3 C4 H7 116.8 116.0
the Hartree]Fock ab initio level. Here we report
Bond lengths in angstroms and bond angles in degrees. the results for the same reaction within the C s
TABLE VI.
Behavior of TS-BFGS, MSP, and Powell Update Formulae along Optimization Process of Transition
Structure for Reaction CH3 OOH ª CH3 + OOH.
symmetry using the unrestricted Hartree]Fock sense that the u vector, which is present in any
wave function with the STO-3G basis set. In Table updating Hessian formula, is in this case a func-
IX we show the geometry and in Table X the tion of both the BD x and D g vectors, with the B
behavior of the three updated Hessians. The initial matrix forced to be positive definite as in the
Hessian matrix is positive definite, but just after standard BFGS formula. This update presents some
the first correction the Hessian achieves the correct stability and efficiency and is quite competitive
eigenvalue spectra in all three cases. The three with respect to the normal update formulae used
update formulae converge to the same molecular to locate TSs, such as the Powell and the MSP.
geometry. Using the standard BFGS formula, eq. Finally, we make the following consideration: be-
Ž3., the algorithm diverges. cause the BFGS update formula is the best formula
to update Hessian matrices for a minimization
algorithm, the TS-BFGS can be seen as its analogue
Concluding Remarks for optimizing saddle points, and the TS-BFGS
presents the same performance as the Powell or
We presented a BFGS-like updated Hessian for- MSP update formulae, it is likely that the Powell
mula to locate TSs. This TS-BFGS modification of and MSP formulae are the best update that can be
the standard BFGS formula is understood in the formulated to optimize saddle points.
C1
H3
O2 C1
0
-
H4
Parameter Starting Final H5
TABLE VIII.
Behavior of TS-BFGS, MSP, and Powell Update Formulae along Optimization Process of Transition
Structure for Reaction H 2 CO ª H 2 + CO.
TABLE X.
Behavior of TS-BFGS, MSP, and Powell Update Formulae along Optimization Process of Transition
Structure for Reaction CH3 O ª CH 2 OH.