Professional Documents
Culture Documents
A Statistical Analysis of The Precision of Reweighting-Based Simulations PDF
A Statistical Analysis of The Precision of Reweighting-Based Simulations PDF
Various advanced simulation techniques, which are used to sample the statistical ensemble of
systems with complex Hamiltonians, such as those displayed in condensed matters and biomolecular
systems, rely heavily on successfully reweighting the sampled configurations. The sampled points of
a system from an elevated thermal environment or on a modified Hamiltonian are reused with
different statistical weights to evaluate its properties at the initial desired temperature or of the
original Hamiltonian. Often, the decrease of accuracy induced by this procedure is ignored and the
final results can be far from what is expected. We have addressed the reasons behind such a
phenomenon and have provided a quantitative method to estimate the number of sampled points
required in the crucial step of reweighting of these advanced simulation methods. We also provided
examples from temperature histogram reweighting and accelerated molecular dynamics reweighting
to illustrate this idea, which can be generalized to the dynamic reweighting as well. The study shows
that this analysis may provide a priori guidance for the strategy of setting up the parameters of
advanced simulations before a lengthy one is carried out. The method can therefore provide insights
for optimizing the parameters for high accuracy simulations with finite amount of computational
resources. © 2008 American Institute of Physics. 关DOI: 10.1063/1.2944250兴
关ln共N + 冑N兲 − ln共N兲兴. Finally, reversing the above equation, This expression is the first central result of this article. This
the required average sampling number N to the desired ac- idea of using Nx ⫻ p共s兲 = 1 to identify extreme s is of the
curacy of measurement is expressed as same origin as the cutoff of energy population of the REM
N = 共e⑀ − 1兲−2 . 共1兲 共Ref. 11兲 and for the onset of the glass temperature for a
complex system.12,13
A shorthand notation ␦s ª ⌬ ⫻ 冑ln关N2x / 共2⌬2兲兴 is
Ideally, the actual sampling number should be much larger
than N to satisfy the desired accuracy ⑀. Apparently for a
adopted, and be aware that it explicitly depends on Nx as
profile of N共r兲, the bottleneck of the profile has relatively the
well. These factors si that are much lower than sh apparently
smallest N and thus the highest free energy; quite often, that
will carry a very small relevance compared to those close to
location of r indicates the transition state of the profile.
sh, and thus will not contribute much to the effective data
Why might reweighting increase the error of a simula-
points Ne to be useful after reweighting. This point is illus-
tion? A reweighting tags each of the initially equal weighted
trated in Fig. 2. Thus the second essential aspect of the ar-
data points with different weights. As a result, some points
ticle is to define this effective number Ne. This effective
have very large weights, whereas others may have small
number at a specific location or bin is quantitatively ex-
weights. The effective number of independent measures thus
pressed as
decreases because many small weighted points are no longer
relevant, and the results are dictated by the points with large
weights. A reweighting procedure changes a collection of Ne = Nx ⫻ 冕sl
sh
p共s兲exp共s − sh兲ds, 共3兲
sample points: The ith sample point will be weighted by wi.
This situation is schematically shown in Fig. 1. An altered which can be further rendered to erf共 兲 in the Gaussian case
dynamics simulation is performed to obtain the statistics fol- of p共s兲, that is,
034103-3 Reweighting-based simulations J. Chem. Phys. 129, 034103 共2008兲
Ne 1 ⌬2/2−␦s
= e
Nx 2
erf 冋 冉 冊 冉 冊册
␦s − ⌬2
冑2⌬ + erf
␦s + ⌬2
冑2⌬ . 共4兲
Normally erf关共␦s + ⌬2兲 / 共冑2⌬兲兴 ⬇ 1 when the effect of the FIG. 3. 共a兲 The effective number Ne as functions of input number Nx for
lower bond sl can be ignored. Finally, the sampling length of three sets of values for ⌬ = 0.5, 1.0, 2.0. 共b兲 The ratio Ne / Nx as functions of
whole simulation N is related in terms of number of timestep ⌬ for three sets of Nx.
nr to the accuracy ⑀ as
III. EXAMPLES AND DISCUSSIONS
N
nr
⫻ exp共− ⌬F兲 ⫻ 冕 sl
sh
p共s兲es−shds = 共e⑀ − 1兲−2 . 共5兲 The examples used in this article are the sampling of
conformations of a small peptide, alanine dipeptide, which
has been rigorously studied by various computational meth-
Here, nr is the inverse of data collection frequency and ods over the last three decades. This system was chosen as it
should be large enough to ensure the independence of the is simple, yet nontrivial. Any accurate calculation on a prac-
sample points. Apparently the more spread out the distribu- tical model requires both a good sampling and an accurate
tion p共s兲 is, the smaller the effective number of sample point Hamiltonian 共so-called molecular force field兲. We stress that
will be. Only ideal ␦-function distributed p共s兲 = ␦共s − s̄兲 will we are only concerned with the efficiency of different sam-
keep the original number of data points. pling methods, rather than whether the subtle features pre-
As shown in Fig. 3, the relation between the effective sented under sufficient samplings of this particular model
number and the original number is plotted before reweight- Hamiltonian 关AMBER 8.0,14 PARM99,15 and generalized Born
ing for several sets of parameters of the distribution p共s兲. We solvation16兴 reflect results from experiments. Generally
found that generally for a Gaussian distribution of p共s兲, the speaking, when the sampling problem that clouds simula-
effective number Ne increases with total input number Nx. tions of complex systems has been lifted, one can then focus
However, the ratio of Ne / Nx decreases with increasing Nx. on the force-field problem. We also want to stress the fact
Thus, for such a situation, it can be concluded that although that the peptide system has been picked for the error analysis
increasing the total number of sample points improves the not because of a tremendous sampling problem in the peptide
precision of the target system, the increase is sublinear. Note simulation field but because it is probably one of the systems
that the increase in the sublinear behavior is because of the that is most accurately sampled and most directly compa-
nature of the Gaussian distribution. When one samples really rable to experiments in the biomolecular simulation field.17
thoroughly the conformations of the system, one will even- This makes it is easy to compare the results of advanced
tually find a deviation of the realistic p共s兲 from an ideal methods to the direct method which is difficult for those
Gaussian distribution, and that deviation will cause the dis- even more complicated and sampling-problem-ridden sys-
tribution to be more tightly bounded than that of a Gaussian. tems. We would like to stress also that the levels of pertur-
Also, the sampling efficiency will be linear with the input bation used in all the examples are relatively small. As a
points. Still, practically for a large complex system, the result, we have very good energy overlapping between the
Gaussian distribution of the reweighted scaling factors is al- target and original Hamiltonians.
ways encountered unless an extremely long sampling or a The properties calculated here is the two-dimensional
very subtle perturbation is used. 共2D兲 free energy profile of the protein backbone torsional
034103-4 T. Shen and D. Hamelberg J. Chem. Phys. 129, 034103 共2008兲
FIG. 5. 共Color兲 The distribution of reweighting factor p共s兲 for four sets of
energy reweighting and two sets of TR simulations.
9
tate the exploration of the phase space of the system but at D. Hamelberg, J. Mongan, and J. A. McCammon, J. Chem. Phys. 120,
11919 共2004兲.
the same time suffers much more during the reweighting 10
Y. Q. Gao, J. Chem. Phys. 128, 064105 共2008兲.
step. The main reason is that many logged visits of the phase 11
B. Derrida, Phys. Rev. Lett. 45, 79 共1980兲.
space do not count as much as the few dominating points, 12
J. D. Bryngelson and P. G. Wolynes, Proc. Natl. Acad. Sci. U.S.A. 84,
which have high weights. 7524 共1987兲.
13
From these ideals, a model was built to define the effec- K. K. Koretke, Z. Luthey-Schulten, and P. G. Wolynes, Proc. Natl. Acad.
Sci. U.S.A. 95, 2932 共1998兲.
tive number after reweighting. This article presents the 14
D. A. Earlman, D. A. Case, J. W. Caldwell, W. S. Ross, T. E. Cheatham,
method to estimate the level of accuracy of free energy cal- S. Debolt, D. Ferguson, G. Seibel, and P. Kollman, Comput. Phys.
culation based on perturbation calculation and reweighting Commun. 91, 1 共1995兲.
15
procedures based on the distribution of reweighting factor W. D. Cornell, P. Cieplak, C. I. Bayly, I. R. Gould, K. M. Merz, D. M.
Ferguson, D. C. Spellmeyer, T. Fox, J. W. Caldwell, and P. A. Kollman,
p共s兲. This in itself is typically much easy to obtain, making
J. Am. Chem. Soc. 117, 5179 共1995兲.
this method potentially proper for a priori, the design of a 16
A. Onufriev, D. Bashford, and D. A. Case, J. Phys. Chem. B 104, 3712
long simulation. 共2000兲.
17
Though only demonstrated in two types of situations for S. Gnanakaran, H. Nymeyer, J. Portman, K. Y. Sanbonmatsu, and A. E.
the calculation of equilibrium properties from reweighting of Garcia, Curr. Opin. Struct. Biol. 13, 168 共2003兲.
18
B. Silverman, Density Estimation for Statistics and Data Analysis 共Chap-
altered energy function and temperature, it has been clearly man and Hall, New York, 1986兲.
pointed out that the analysis is equally suited for a more 19
T. E. Holy, Phys. Rev. Lett. 79, 3545 共1997兲.
20
general situation, such as the reweighting of the dynamical M. M. Steiner, P. A. Genilloud, and J. W. Wilkins, Phys. Rev. B 57,
properties. We also discussed a related issue: The free energy 10236 共1998兲.
21
S. Pal and K. A. Fichthorn, Chem. Eng. J. 74, 77 共1999兲.
perturbation calculation by cumulant expansion method. It is 22
J. Rahman and J. C. Tully, J. Chem. Phys. 116, 8750 共2002兲.
becoming increasingly evident that these reweighting issues 23
L. Yang, M. P. Grubb, and Y. Q. Gao, J. Chem. Phys. 126, 125102
are going to be an indispensable part of many simulation 共or 共2007兲.
24
even some experimental兲 methods. D. Hamelberg, T. Shen, and J. A. McCammon, J. Chem. Phys. 122,
241103 共2005兲.
25
D. Hamelberg, T. Shen, and J. A. McCammon, J. Chem. Phys. 125,
ACKNOWLEDGMENTS
094905 共2006兲.
26
We would like to thank Dr. J. A. McCammon, Dr. S. C. Xing and I. Andricioaei, J. Chem. Phys. 124, 034110 共2006兲.
27
J. Xing, Phys. Rev. Lett. 99, 168103 共2007兲.
Gnanakaran, and Dr. P. G. Wolynes for their kind support 28
T. Shen, D. Hamelberg, and J. A. McCammon, Phys. Rev. E 73, 041908
and encouragement, and M. Fajer and Dr. C. Zong for read- 共2006兲.
29
ing the manuscript. T. Y. Shen, K. Tai, and J. A. McCammon, Phys. Rev. E 63, 041902
共2001兲.
30
1
D. Frenkel and B. Smit, Understanding Molecular Simulation: From Al- L. S. Schulman, Techniques and Applications of Path Integration 共Dover,
gorithms to Applications, 2nd ed. 共Academic, San Diego, 2002兲. New York, 2005兲.
31
2
D. P. Landau and K. Binder, A Guide to Monte Carlo Simulations in H. Kleinert, Path Integrals in Quantum Mechanics, Statistics, Polymer
Statistical Physics, 2nd ed. 共Cambridge University Press, Cambridge, Physics, and Financial Markets, 3rd ed. 共World Scientific, Singapore,
2005兲. 2003兲.
32
3
M. P. Allen and D. J. Tildesley, Computer Simulation of Liquids 共Oxford L. Y. Chen and N. J. M. Horing, J. Chem. Phys. 126, 224103 共2007兲.
33
University Press, Oxford, 1997兲. R. D. Astumian, Am. J. Phys. 74, 683 共2006兲.
34
4
J. A. McCammon and S. C. Harvey, Dynamics of Proteins and Nucleic C. Jarzynski, Phys. Rev. Lett. 78, 2690 共1997兲.
35
Acids 共Cambridge University Press, Cambridge, 1987兲. G. N. Bochkov and Y. E. Kuzovlev, Physica A 106, 443 共1981兲.
5 36
H. Grubmüller, Phys. Rev. E 52, 2893 共1995兲. M. P. Eastwood, C. Hardin, Z. Luthey-Schulten, and P. G. Wolynes, J.
6
A. F. Voter, Phys. Rev. Lett. 78, 3908 共1997兲. Chem. Phys. 117, 4602 共2002兲.
7 37
U. H. E. Hansmann and Y. Okamoto, Phys. Rev. E 56, 2228 共1997兲. S. Park, F. Khalili-Araghi, E. Tajkhorshid, and K. Schulten, J. Chem.
8
K. K. Bhattacharya and J. P. Sethna, Phys. Rev. E 57, 2553 共1998兲. Phys. 119, 3559 共2003兲.