You are on page 1of 9

Centre for Molecular Design

DASH Molecular Dynamics Analysis


Release 1.0, June 2008

Matt Ellis, Brian Hudson, David Salt and David Whitley

1 Overview
DASH is a technique for analysing molecular dynamics (MD) simulations based on the torsion angles of rotatable bonds. It extracts the major features for each torsion angle and combines these into states that describe the conformation of the whole molecule. By ignoring highfrequency oscillations within the DASH states, a compressed representation of an MD trajectory is obtained as the sequence of DASH states visited during the simulation, together with the length of time spent in each state. The latest version of the DASH algorithm is implemented here as a C++ program dash. A companion program dot regenerates pseudo-trajectories from DASH-compressed trajectories by adding random uctuations to the DASH states. The DASH algorithm takes care to treat the circular nature of the torsion angle data correctly and uses circular statistics to describe the DASH states. The main steps in the DASH algorithm are as follows. To remove some of the high-frequency noise, the initial torsion angles are replaced by moving average values. These are calculated over a window of user-specied length, typically 11 time steps. A discrete set of states for the individual torsion angles is identied by examining their histograms. Local maxima in the distribution of each torsion angle are located and states are dened by placing cut-points midway between adjacent maxima. The trajectory of each torsion angle is then represented by the sequence of states visited at each time step. A smoothing process is applied to eliminate short-lived states. The state sequence for each torsion angle is considered as a series of bouts in which the trajectory remains in a single state. Bouts of short duration (less than the bout length parameter) are removed by dividing them in half and assigning the two halves to the preceding and following bouts. The torsion angle state sequences are then combined into state sequences for the whole molecule. For example, if torsion T1 has state sequence 1, 1, 2, 1, 1, 1, 2, ... and torsion T2 has state sequence 1, 1, 3, 2, 1, 1, 1, ... then the combined sequence is (1,1), (1,1), (2,3), (1,2), (1,1), (1,1), (2,1),... . Trajectories typically remain in the same state for long periods, leading to long constant runs, so the sequences are compressed by writing them in the form of bouts (s1 , b1 ), (s2 , b2 ), ..., where the ith bout consists of bi time steps in state si . The combined sequences are smoothed to remove short-lived states in the same way as the individual torsion sequences. This determines a nal set of DASH states, but tends to remove too much dynamic detail as it removes all bouts of length less than the smoothing level . Therefore a nal roughening process is applied, using a bout length < , in which bouts of the DASH states with length between and are reinstated. The dot program generates approximate torsion angle trajectories from a DASH compressed trajectory. At each time step a reconstructed conguration of the molecule is obtained by adding a normally distributed random perturbation to the mean torsion angles for the current state. Each reconstructed torsion angle is dened as = + r, where and are the circular mean and standard deviation of the torsion angle in the current DASH state and r is a random number drawn from a normal distribution with zero mean and unit variance. For further details see: D. W. Salt, B. D. Hudson, L. Banting, M. J. Ellis and M. G. Ford, DASH: A novel analysis method for molecular dynamics simulation data. Analysis of ligands of PPAR-gamma, J. Med. Chem., 48, 3214-3220, 2005. M. J. Ellis, D. W. Salt and B. D. Hudson. Data compression of molecular dynamics simulations for dissemination of trajectories over the world wide web. (Submitted for publication.)

2 Running Dash and Dot


The syntax for the dash program is: dash -N nstep -T ntor [options] < input_file > output_file The input le contains an nstep rows by ntor columns matrix of ntor torsion angles at nstep time steps. The options are: -h -v Display a help message. Display the program version number.

-w win_size Set the moving average window size [default 11]. -b bin_size Set the bin size for the torsion angle distributions [default 4]. -u run_length Set the run length dening torsion angle maxima [default 3]. A local maximum is preceded by at least run length increasing bin values and followed by at least run length decreasing values. -f fmax -m smin Set the minimum value for a torsion angle maximum, specied as a percentage of nsteps [default 2.4]. Set the minimum distance allowed between torsion angle maxima [default 48]. Local maxima closer than this are merged into a single pseudo-maximum midway between them.

-l bout_length Set the minimum bout length dening a torsion angle state [default 20]. -s smooth Set the smoothing level [default 40]. -r rough -a small -z large -H -M -C -S -D Set the roughening level [default 20]. Set the minimum angle [default -180]. Set the maximum angle [default +180]. Change these if the torsion angles are in the range [0,360] rather than [-180,+180]. Print histograms of the torsion angle distributions. Print the moving average torsion angles. This produces copious output and is intended only for debugging. Print the combined state details. Print the smoothed state details. Print the dash state sequence. This produces copious output and is intended only for debugging.

The syntax for the dot program is simply: dot input_file > output_file There are no options for dot.

3 File Formats
3.1 Examples
The distribution includes the following example les for a 25 nanosecond simulation of the PPAR agonist rosiglitazone. tzd01_25ns.txt The values of 8 torsion angles at 25,000 time steps. tzd01_25ns_dash_output.txt The output from the command dash < tzd01_25ns.txt. tzd01_25ns_dot_output.txt The output from the command dot tzd01_25ns_dash_output.txt. tzd01_25ns_reg_dash_output.txt The dash output for the regenerated trajectory contained in tzd01_25ns_dot_ output.txt.

3.2 Dash Input Files


The input les for dash are whitespace-separated ASCII text les containing the values of ntor torsion angles, in degrees, at nstep time steps, with the values for each torsion angle in a single column and the values for each time step in a single row.

3.3 Dash Output Files


The output from dash is divided into blocks separated by tags in xml-style angle brackets. The rst section contains the dash version number, a timestamp, the dash parameters and the number of time steps and torsion angles in the input trajectory. DASH, Version 2.6 Tue Aug 28 15:25:29 2007 <Parameters> window size = 11 bin size = 4 run length = 3 fmax = 600 (2.4%) smin = 48 bout length = 20 smoothing level = 40 roughening level = 20 small = -180 large = 180 </Parameters> <Trajectory> torsion angles = 8 time steps = 25000 </Trajectory>

The second section describes the states for the individual torsion angles. For each torsion, dash prints the list of local maxima in the torsion angle distribution; the torsion angle states, numbered sequentially from 1; and the number of state transitions during the trajectory. For example, the following extract shows a single state for torsion angle T1 and 3 states for T2, with one state wrapping around +/-180. <T1> maxima: -62 states: 1 = [-180, 180) transitions: 1 </T1> <T2> maxima: -178, -58, 66 states: 1 = [-180, -118), 2 = [-118, 4), 3 = [4, 124), 1 = [124, 180) transitions: 27 </T2> If the -H ag is used, a histogram of the distribution of angles is printed within each torsion angle block. The following example shows a case where the torsion angle has frequencies 2330 in the [-180,-176) bin, 1466 in the [-176,-172) bin, etc. -180 2330 -176 1466 -172 408 ... 168 54 172 287 176 1284 If the -M ag is used the moving averages at each time step are printed for each torsion angle. This produces a large volume of output and is intended only for debugging. The states assigned at each time step before (raw state) and after (nal state) the smoothing of individual torsion trajectories is also included in this section of output. time moving raw final step average state state 1 72.89 3 1 2 49.73 3 1 3 100.64 3 1 ... 24998 -55.06 2 2 24999 -53.50 2 2 25000 -59.19 2 2 The nal section dealing with individual torsion angles shows the number of torsion state assignments that are altered by smoothing the torsion angle trajectories. These are given as a simple count in the rst row and as a proportion of the trajectory length in the second row. <TorsionErrorRates> T1 T2 T3 T4 T5 T6 T7 T8 0 7 823 0 133 132 891 0 0.00 0.03 3.29 0.00 0.53 0.53 3.56 0.00 </TorsionErrorRates>

The next section summarizes the eects of the Dash algorithm, giving the number of combined states, the number of nal states after the smoothing and roughening procedures, and two error rates. The smoothing and roughening errors are the number of time steps at which the smoothing and roughening processes alter the assigned state (expressed in parentheses as percentages of the trajectory length). <Summary> combined states = 82 final states = 55 smoothing error = 3887 (15.55%) roughening error = 1602 (6.41%) </Summary> The Dash states and their frequencies in the trajectory are listed next. Each state is dened as a sequence of individual torsion angle states. In the following example, the rst Dash state corresponds to state 2 in T6 and to state 1 in each of the other torsion angles. <DashStateDistribution> // state N = (T1,...,Tn) frequency state 1 = (1, 1, 1, 1, 1, 2, 1, 1) 265 state 2 = (1, 1, 1, 1, 1, 3, 1, 1) 243 state 3 = (1, 1, 1, 1, 1, 3, 2, 1) 46 ... state 53 = (1, 3, 2, 1, 2, 1, 2, 1) 397 state 54 = (1, 3, 2, 1, 2, 3, 2, 1) 361 state 55 = (1, 3, 2, 1, 3, 3, 2, 1) 120 </DashStateDistribution> This is followed by two blocks of summary statistics: the circular means and standard deviations of the torsion angles in each Dash state. We emphasize that these are both circular statistics. <DashStateMeanAngles> state [1] -62.18 179.86 -81.53 116.91 -63.46 76.14 -117.66 -8.01 state [2] -60.61 -175.35 -69.35 113.03 -64.25 179.49 -79.43 10.48 state [3] -60.41 -177.91 -72.71 115.27 -63.61 -178.24 97.16 -34.92 ... state [53] -67.23 67.36 79.28 64.36 60.81 -80.63 114.09 22.25 state [54] -67.30 67.56 78.05 63.26 61.85 174.99 78.67 0.48 state [55] -66.01 64.47 83.44 88.99 178.05 -178.88 62.02 8.06 </DashStateMeanAngles> <DashStateStandardDeviations> state [1] 13.98 10.30 26.18 state [2] 14.00 11.78 33.31 state [3] 12.95 10.36 13.88 ... state [53] 14.68 16.47 29.70 state [54] 13.48 19.19 16.53 state [55] 12.26 17.24 53.30 </DashStateStandardDeviations>

15.94 20.01 15.88 13.92 20.19 33.51

12.66 20.36 12.22 9.53 16.94 12.16

12.13 26.39 14.93 12.34 14.16 33.66

13.62 49.30 45.96 11.87 29.18 57.57

24.38 52.88 59.30 14.45 45.03 52.89

The Dash-compressed trajectory is dened by two columns containing Dash state numbers and their corresponding bout lengths. This example shows a trajectory spending 23 time steps in state 21, followed by 43 time steps in state 10, 49 time steps in state 11, etc. <DashStateTrajectory> 21 23 10 43 11 49 2 70 ... 29 39 30 23 29 54 30 98 Number of transitions = 269 </DashStateTrajectory> The nal section collects the bouts spent in each Dash state. Here there are two bouts, of length 192 and 75, in state 1, three bouts, of length 70, 105 and 68, in state 2, etc. <DashStateBouts> bouts[1] = 192 75 bouts[2] = 70 105 68 bouts[3] = 46 bouts[4] = 92 71 25 288 ... bouts[52] = 140 32 151 389 bouts[53] = 69 95 233 bouts[54] = 33 77 86 82 83 bouts[55] = 40 80 </DashStateBouts> If the -C ag is used, dash prints the details of the initial combined states before the smoothing and roughening procedures take place. This output is placed in the following blocks, which have the same format as the corresponding blocks for the Dash states. <CombinedStates> <CombinedStateMeanAngles> <CombinedStateStandardDeviations> <CombinedState Trajectory> <CombinedStateBouts> If the -S ag is used, dash prints the details of the states obtained after the smoothing procedure. First, the frequency distributions of the combined states before and after smoothing are listed. <CombinedStateDistribution> // state N = (T1,...,Tn) combined_state_frequency smoothed_state_frequency state 1 = (1, 1, 1, 1, 1, 1, 2, 1) 22 0 state 2 = (1, 1, 1, 1, 1, 2, 1, 1) 257 265 state 3 = (1, 1, 1, 1, 1, 2, 2, 1) 4 0 ... state 80 = (1, 3, 2, 1, 2, 3, 2, 1) 346 358 state 81 = (1, 3, 2, 1, 3, 3, 1, 1) 70 0 state 82 = (1, 3, 2, 1, 3, 3, 2, 1) 103 105 </CombinedStateDistribution> 7

The combined states with zero frequency after smoothing are removed to leave a nal set of states, which are renumbered at this point. The -S ag causes dash to print the following blocks for these smoothed states. Once again, these have the same format as the corresponding blocks for the Dash Sates. <SmoothedStates> <SmoothedStateMeanAngles> <SmoothedStateStandardDeviations> <SmoothedStateTrajectory> <SmoothedStateBouts>

3.4 Dot Input Files


The dot program reads dash output les (created with any combination of options). However, only 4 sections of the dash output are required by dot: <Summary> <DashStateMeanAngles> <DashStateStandardDeviations> <DashStateTrajectory>

3.5 Dot Output Files


The structure of a dot output le is shown below. First, the sections required from the dash output le are printed. This is follwed by a block containing the regenerated trajectory, in the same format as the original dash input les: a matrix of ntor torsion angles at nstep time steps in an nstep by ntor array. <Summary> time steps = 25000 torsion angles = 8 final states = 55 </Summary> <DashStateMeanAngles> <DashStateStandardDeviations> <DashStateTrajectory> <DotTrajectory> -59.21 172.18 -114.26 -61.43 -153.35 98.02 -69.15 175.52 41.08 ... -76.55 -79.81 -109.76 -69.39 -51.76 -107.45 -80.37 -52.24 -83.55 </DotTrajectory>

114.38 -155.29 147.60 42.79 155.97 -176.17 59.32 -166.56 170.19 66.35 88.35 43.43 54.34 -169.47 55.42 -171.15 51.55 -171.81

117.32 -14.70 -93.85 125.40 76.65 108.39

-44.51 -34.16 -63.28 16.78 56.41 -53.11

4 Installation
4.1 Obtaining the software
The software is distributed under the terms of the GNU General Public License (see the le COPYING for details), and can be downloaded from http://www.port.ac.uk/research/cmd in three formats: dash-x.y.tar.gz dash-x.y.tar.bz2 dash-x.y.zip Use one of the commands gzip -cd dash-x.y.tar.gz | tar xf bzip2 -cd dash-x.y.tar.bz2 | tar xf unzip dash-x.y.zip as appropriate to unpack the archive into a directory dash-x.y.

4.2 Compiling the programs


The programs are written in standard C++; any recent C++ compiler should be able to build them. Dot uses a psuedo-random number generator from the GNU Scientic Library (GSL) so this must be installed before compiling Dot. Pre-built GSL packages are available for the major GNU/Linux distributions. Both the GSL header les (often in a development package) and the GSL libraries (often in a runtime package) are required. The GSL can also be obtained from the GSL home page (http://www.gnu.org/software/gsl/). To build the programs, edit the Makefile to suit your C++ compiler, change the CPPFLAGS and LDFLAGS variables to point to the GSL header les and libraries if these are not installed in a standard location, then run make.

4.3 Installing the programs


To install the programs, just move the executables dash and dot to somewhere in your path.

4.4 Documentation
The les dash.html, dash.pdf and dash.info contain documentation in the indicated formats.

4.5 Contact
Any comments, suggestions or bug reports should be addressed to: Dr. Brian Hudson Centre for Molecular Design, University of Portsmouth, Mercantile House, Hampshire Terrace, Portsmouth PO1 2EG. Email: brian.hudson@port.ac.uk