Molecular Dynamics Group, University of Queensland
Molecular Dynamics Group, University of Groningen



Step 4: Analysis

The initial trial simulation was only 50ps. This is too short to see any significant changes in the structure of the protein. So that you have a longer period to analyse a 1 ns simulation of this system will need to be be performed.

A parameter file that may be used to run the 1ns simulation (fullmd_sol_1ns.mdp) can copied using:

cp /home/MDEducation/mdcourse/fullmd_sol_1ns.mdp ./

The file can also be downloaded from here.

Note, there are a number of differences between this parameter file and the parameter file used to perform the inital (starting) simulation. In particular as this is a continuation run we do not want to generate new velocities. Instead the velocities will be taken from the final configuration of the previous run.

Use the preprocessor grompp to obtain the input fie for mdrun (.tpr):

grompp -f md_sol_1ns_cont.mdp -c md_final.gro -p -o md_sol_1ns_cont.tpr

After grompp has finished the simulation can begin:

mdrun -nice 0 -v -s md_sol_1ns_cont.tpr -o md_traj_1ns.trr -x md_traj_1ns.xtc -c md_final_1ns.gro -e md_ener_1ns.edr

As an alternative the output files of a simulation that has been performed previously can be copied to you own directory using the following command:

cp /home/MDEducation/mdcourse/fullmd_sol_1ns/* ./

This will copy all files (denoted by '*') in the directory ~mdcourse/fullmd_sol_1ns/ to your directory.

For external users the files can be downloaded here. Note, as the .tar.gz file is 37 Mb, people are requested to download it once to a central computer and distribute the files locally or better still to perform the simulation themselves.

Usually, the first step in analyzing a simulation is to attempt to determine if the system has reached equilibium. This is often done by looking at quantities such as the total energy, the temperature and the pressure of the system.

Use g_energy to examine the total energy of your system as a function of time.

g_energy -f md_ener_1ns.edr -o md_ener.xvg

Select 12 and type 0 to quit. It is also possible to select more than one option, e.g. 10 11 12 0 <enter> will generate a graph of the potential, kinetic and total energy.

xmgrace -nxy md_ener.xvg

Notice what happens to the total energy.

Use g_energy to check the temperature and the pressure of the system.

g_energy -f md_ener_1ns.edr -o md_T.xvg

Select 13 14 and type 0 to quit. It is also possible to select other options, e.g. 61 63 0 will generate a graph of the temperature of the solute and of the solvent. To visualize it:

xmgrace -nxy md_T.xvg

Compare the average temperature and pressure to the reference values (see fullmd_sol_1ns.mdp)

Question 5. Based on the total energy does it appear that the simulation has equilibrated? Examine other quantities from the simulation stored in the energy file. Have all quantities converged within 1 ns? List some criteria might you use to determine if a simulation has reached equilibrium? Is it possible to define equilibrium for a single molecule in solution?

Assuming that the system has equilibrated we can compare the final structure obtained in the simlation to that downloaded from the Protein Data Bank. To do this, we will use the program in the Gromacs package called 'g_confrms'. To obtain a brief description of this program type:

g_confrms -h

In essence this program takes two structures and performs a least squares fit. The portion of the molecule to be used for the fit can be specified by supplying an index (.ndx) file. Alternatively, the program will ask you to choose from default groups. To compare the structures type:

g_confrms -f1 1AKI.pdb -f2 md_final_1ns.gro

Note, that the program takes both .pdb and .gro files as input. When asked to 'Select a group:' choose '4' and press Enter. Choose '4' again the second time. The root mean square deviation (RMSD) indicates how much the structures differ.

It is possible to view the two structures that have been superimposed. The structures are stored in the file fit.pdb. Open the output file generated by g_confrms (fit.pdb) in the graphics package RasMol using the following command:

rasmol fit.pdb

You can rotate the structures using the mouse. To display only the protein, type the command 'Restrict protein' in the teminal window. 'Under 'Colours', select 'Chain' and under 'Display', select 'Backbone'. This allows you to display just the backbone of the structures that have been superimposed.

The program g_rms can be used examine the variation the RMSD from the original structure as a function of time during the simulation. Type:

g_rms -h

This program requires the input file for the run (fullmd_1ns.tpr) and the output trajectory file (md_traj_1ns.xtc). To calculate the RMSD, all of the structures generated during the simulation must first be fitted to a reference structure. The program will initially ask for a group to perform this (least squares) fit.

g_rms -s fullmd_1ns.tpr -f md_traj_1ns.xtc

Select group '4' (backbone) for least squares fit. Compare 1 group. Select '4'. A graph is now generated (rmsd.xvg), showing the root mean square deviation over time. Open this graph with xmgrace:

xmgrace -nxy rmsd.xvg

You will see that after a given time the RMSD values seem to reach a plateau (a constant value). You can extract a range of structures of your protein from the trajectory using the program trjconv:

trjconv -h

trjconv can be used to either transform a trajectory file into another format (for example into the pdb format) or to extract specific conformations of interest from a trajectory. It requires the input file to the run (fullmd_1ns.tpr) and the trajectory file (md_traj_1ns.xtc). The option -b defines the time (in ps) at which trjconv should begin to read the trajectory. Using the option -fit each structure of the trajectory will be fitted onto the starting structure. The option -dt can be used to select the time interval between sucsesive frames. The program will initially ask for a group to perform the fit (select 4) and then for a group to be written in the output (select 1). For example to extract a configuration every 10 time steps from the trajectory after skipping the first 100 ps you would use the command:

trjconv -s fullmd_1ns.tpr -f md_traj_1ns.xtc -dt 20 -b 100 -o last.pdb -fit rot+trans

Examine the structures saved in last.pdb ( these structure have similar RMSD value with the xray structure), open the output file (last.pdb) in RasMol:

rasmol last.pdb

Question 6. Pick a region of the trajectory after which you feel the simulation has equilibrated. Extract these conformations using trjconv and examine the structures in RasMol. Do you observe a unique structure? If not, why not? Is 1 ns sufficient for the system converge? Would you ever expect to be able to represent the structure of a protein by a single conformation?

Use VMD to visualize the extented trajectory. You may wish to explicitly animate the residues involved in the active site (Glu 35, Asp 52, Gln 57 and Trp 108) or to display the hydrogen bonds during the animation.

Changes in the secondary structure as a function of time can be examined using the program do_dssp.

do_dssp -h

The program can be run using the same input as g_rms:

do_dssp -s fullmd_1ns.tpr -f md_traj_1ns.xtc

This produces an image of the format .xpm. It can be converted to a postcript (.eps) file and viewed as follows:

xpm2ps -f ss.xpm -o ss.eps -bx 1 -by 1

using a postcript reader such as

ghostview ss.eps


evince ss.eps

There are many other properties that could be investigated. What type of scientific questions can be addressed, will depend on the time scale of the simulation, the accuracy of the force field, the chioce of boundary conditions and the experimental data against which these parameters can be verified.

Getting Started
MD Theory
MD Practice

©2005/2007 T.A.Wassenaar/A.E.Mark