NOTE: This tutorial was intended to be used with Gromacs version 3.1.4 and may not be compatible with newer versions. The tutorial will be revised in the near future. Note, a brief tutorial on Free Energy calculation with Gromacs 3.3.x can be found at the DillGroup Wiki.

Solvation free energy

Peptide and protein folding, binding of ligands to receptors are, among others, some of the processes one might be interested in when looking at biomolecular systems. One important aspect of all these processes is the partitioning of specific functional groups between different environments. Whether it is during folding or binding to a receptor, amino acid groups must choose between a highly polarizable aqueous environment or the usually more hydrophobic interior of a protein. Being able to calculate the solvation free energy of molecules or group of atoms is therefore of primary importance whether it is to validate the behavior of residues in a force field or simply to determine the stabilization/destabilization effect of the environment on specific groups of atoms.


Hydration free energy of toluene

The hydration free energy can be calculated using the following thermodynamic cycle:

thermodynamic cycle

The atoms of the molecule of toluene are mutated into dummy atoms along the processes ΔG1 and ΔG3 of the cycle (which are therefore nonphysical processes).
Dummy atoms are atoms that do not have non-bonded interactions, which basically means that they do not interact with their environment. In practice these atoms are not charged and do not have any Van der Waals interactions with their environment.

ΔGhyd: hydration free energy.
ΔG1: free energy associated to the mutation of toluene into dummy atoms in water.
ΔG2: same in vacuo.
ΔG3: can be seen as the hydration free energy of dummy atoms. This term is equal to 0 since dummies do not have non-bonded interactions and bonded interactions remain the same.

ΔGhyd = ΔG1 - ΔG3 - ΔG2
ΔGhyd = ΔG1 - ΔG3

The physics behind these nonphysical processes is that the internal potentials of the molecule of toluene are switched off in ΔG1 and that you have to account for the energy associated to that phenomena (which is done by subtracting ΔG2).

Hydration free energy calculations using Gromacs:

In this first example, we will calculate the hydration free energy of a molecule of toluene using the MD software package Gromacs (http://www.gromacs.org).Gromacs is a widely used molecular dynamics simulation package developed at the University of Groningen.

An outline of the procedure you will have to follow to perform MD simulations is given by the Gromacs flow chart:http://www.gromacs.org/documentation/reference_3.0/online/flow.html.

During this exercise you will:

1. Convert the pdb-file of a molecule of toluene into a Gromacs structure file and a Gromacs topology file

2. Edit the topology file to make it suit your needs

3. Put the molecule in a simulation box and solvate it

4. Perform an energy minimization of the system

5. Perform an equilibration of the system

6. perform the free energy calculations

7. Analyze the results

As seen previously, to obtain the free energy hydration, you will need to carry out the simulations both in vacuo and in water.

Gromacs is not a single program but a package of programs which operate using a series of files.  A short description of the primary files used by the package is given below:

Molecular Topology file (.top)
The molecular topology file contains a complete description of all the interactions within a particular molecule. A molecular topology file for a peptide or protein can be generated using the program pdb2gmx. pdb2gmx uses the sequence information from the pdb structure file together with force generic field information to build an appropriate molecular topology file.

Molecular Structure file (.gro, .pdb)
The Gromacs molecular structure file contains information on the coordinates and velocities for all atoms in the system. The program pdb2gmx program can be used to translate a PDB structure file (.pdb file) to a gromos structure file (.gro file). The main difference between a pdb file and a gromos file is that a .gro file may contain coordinates and velocities. However, most Gromacs programs that do not require velocities also read pdb files.

Molecular Dynamics parameter file (.mdp)
The Molecular Dynamics Parameter (.mdp) file contains the parameters for the Molecular Dynamics simulation itself e.g. time-step, number of steps, temperature, pressure etc. The easiest way of generating this file is to adapt a sample .mdp file. Sample mdp file can be found online.

Index file (.ndx)
An index file can be used to restrict specific actions to groups of atoms (e.g. Temperature coupling, accelerations, constraints). Usually the default index groups are sufficient, and we will not consider the use of index files.

Run input file (.tpr)
The run file contains all information needed to start a simulation with GROMACS. It is generated by combining the molecular structure (.gro file), topology (.top file) MD-parameters (.mdp file) and (optionally) the index file (ndx). The grompp program processes all input files and generates the run input .tpr file.

Trajectory (.trr) and energy (.edr) files.
The trajectory file contains information on the coordinates, forces and velocities generated during the simulation. The energies are stored in the .edr file. These are generated by the program mdrun. The input file of mdrun the run input file (.tpr file). The output files of mdrun are the trajectory and energy files

First you should create a directory where you will perform all the simulations related to toluene: mkdir Toluene

Enter this directory (cd Toluene). We now need a coordinate file for toluene. Note, Gromacs is orientated toward simulation of biomolecular systems and there is no building block for toluene in the default distribution. We could build toluene by hand. However, toluene is nothing more than the side-chain of the Phenylalanine amino-acid where the C alpha is replaced by a H atom. Knowing this you can run the pdb2gmx command using Phenylalanine as a template and easily convert the side chain into toluene by hand.

To start, you only need a pdb file with one Phenylalanine side chain (phenylalanine-sc.pdb). To save the pdb file of the molecule, right click + save or follow the link and copy/paste in a text editor.

Step 1. Use pdb2gmx to convert the pdb file phenylalanine-sc.pdb to a Gromacs structure file and a Gromacs topology file.

pdb2gmx -f phenylalanine-sc.pdb  -p  toluene.top -o  toluene.gro -missing

Of course the program will complain about missing atoms (the backbone atoms of the Phe residue are missing since the pdb file only contains the side chain) but you can discard them (option -missing) as toluene is not supposed to contain them. There are many options within this program.  You are running the program in the default mode and so you are only required to select a force field.  Select 2.  This is the Gromos96 force field for protein simulations in water.  The program will continue and provide information on the variety of items being automatically set such as protonation states for certain residues such as the N- and C-termini.  It is also possible to set these manually.

To see the available options type:

pdb2gmx -h

The pdb2gmx program has generated a topology file toluene.top and a GROMACS structure file toluene.gro.  It has also (re)generated hydrogen positions as they are usually not present in pdb files of proteins. The -p and -o options with the filenames are optional; without them the files topol.top and conf.gro will be generated.  To list the files:

ls

To examine the .gro file type

less toluene.gro (press q to quit)

You will see a close resemblance to the pdb file. It contains the same information regarding the positions of the atoms. The layout of the file is however different and the units have been converted to nm.

Examine the topology file:

less toluene.top

This file contains the atom names, atom types, mass and charge information.  It also contains a description of bonds, angles, dihedrals, etc.
 

Step 2. Edit the topology file with your favorite text editor (vi/emacs/xedit...).

- Under the directive [atoms], change the following for the atom 1:

  CH2   ------->   CH3

14.027 -------> 15.035

Remember that the topology strictly corresponds to the side chain of Phe so you have to correct it to account for the differences with the molecule of toluene. Once this first modification is done, the topology file corresponds to the molecule of toluene and you can make a copy of it for a later use (cp toluene.top  toluene_ini.top ).

- Define the B state that will be used for the free energy calculation. For this, you need to specify the new atom type of each mutated atom and the set of parameters associated to it. In this case, all the atoms (represented by one single united atom group) are turned into dummy atoms with a charge of 0. The masses are kept identical, as well as the rest of the bonded parameters, but you still need to add a second (identical) entry for each of them. Entries for the [atoms] and [bonds] directives are provided below, but the [angles] and [dihedrals] directives also need to be edited.

[ atoms ]
;   nr   type  resnr residue  atom   cgnr     charge       mass  typeB    chargeB      massB
     1    CH3      1    PHE     CB      1          0     15.035   DUM        0        15.035

[ bonds ]
    1     2     2    gb_26   gb_26

 

Step 3. Put the toluene in a simulation box. The size of the box must be big enough for the distance between the molecule of toluene and its periodic image to be larger than the cut-off radius, but in this case it should not be a problem given the little size of the molecule of toluene.

editconf -f toluene.gro -o cell.gro -c -box 3.0 3.0 3.0

The options -c and -box are used to center the molecule in the box and specify its size respectively. We will use the resulting file (cell.gro) for the simulation in vacuo. But for the simulation in water you still need to fill the box with water molecules:

genbox -cs -cp cell.gro -o solv.gro -p toluene.top

The file solv.gro should contain the toluene and the water. The file toluene.top should also have been updated with the number of water molecules (indicated by SOL in the last line of the file).

You can check the result by transforming solv.gro in a pdb file and visualizing it with a molecule viewer (rasmol):

editconf -f solv.gro -o solv.pdb
rasmol solv.pdb

In rasmol, you can type in the command line:

set unitcell true           to make the box appear
select protein               select a group of atoms to which you can apply the different representations that you can find in the menu
exit                                  to quit the program
Press shift and the left mouse button to zoom in and out by moving the mouse.

Before moving to the next step, you should create a directory for the simulation in water and another one for the simulation in vacuo:
mkdir water vacuo

Then copy the files related to the simulation in water (solv.gro, toluene.top) to the water directory and do the same for the files related to the simulation in vacuo (cell.gro, toluene.top):

cp filename directory

Watch for the fact that toluene.top now also contains water molecules (last line of the file) and that you will need to remove them for the simulation in vacuo.

From now on we will focus on the simulation in water but the steps are exactly the same for the simulations in vacuo.

Simulation in water
Go to the water directory (cd water).
 

Step 4. Perform an energy minimization of the system

Now you have to perform an energy minimization of the structure to remove the local strain in the protein (due to generation of hydrogen positions) and to remove bad Van der Waals contacts (particles that are too close). This can be done with the mdrun program which is the MD program. Before you  can use the mdrun program however, you have to pre-process the topology file (toluene.top), the structure file (solv.gro) and a special parameter file (steep.mdp).

Create a directory for the minimization (mkdir EM), enter it and copy the parameter file in it.

Check the contents of this file

less steep.mdp

To generate a file that can be run, the input files are scanned with a pre-processor program called grompp. This reads the file steep.mdp (where the parameters for the minimization are given), the file solv.gro (which contains the coordinates) and toluene.top (topology file):

grompp  -f steep.mdp  -c ../solv.gro  -p ../toluene.top

You have made a run input file (topol.tpr) which serves as input for the mdrun program. Now you  can do the energy minimization:

mdrun  -s  topol.tpr  -c  minimized.gro 

In this command  the -c option sets the filename of the structure file after energy minimization. This file you will subsequently use as input for the MD run. The energy minimization takes some time, the amount depending on the CPU in your computer, the load of your computer, etc.

You can check the evolution of the potential energy using g_energy. The program can use the energy file (ener.edr) of the simulation to create a file (energy.xvg) containing the values you are interested in (potential energy,kinetic energy, temperature, pressure and many others).

g_energy [-f ener.edr]

You will see something like this:

Opened ener.edr as single precision energy file
Select the terms you want from the following list
End your selection with 0
   1=       G96Bond   2=      G96Angle   3=   Proper Dih.   4= Improper Dih.
   5=         LJ-14   6=    Coulomb-14   7=       LJ (SR)   8=  Coulomb (SR)
   9=     Potential  10=   Kinetic En.  11=  Total Energy  12=   Temperature
  13=Pressure (bar)  14=        Vir-XX  15=        Vir-XY  16=        Vir-XZ
  17=        Vir-YX  18=        Vir-YY  19=        Vir-YZ  20=        Vir-ZX
  21=        Vir-ZY  22=        Vir-ZZ  23= Pres-XX (bar)  24= Pres-XY (bar)
  25= Pres-XZ (bar)  26= Pres-YX (bar)  27= Pres-YY (bar)  28= Pres-YZ (bar)
  29= Pres-ZX (bar)  30= Pres-ZY (bar)  31= Pres-ZZ (bar)  32= #Surf*SurfTen
  33=          Mu-X  34=          Mu-Y  35=          Mu-Z  36=Coul-SR:Protein-Protein
  37=LJ:Protein-Protein  38=Coul-14:Protein-Protein  39=LJ-14:Protein-Protein  40=Coul-SR:Protein-rest
  41=LJ:Protein-rest  42=Coul-14:Protein-rest  43=LJ-14:Protein-rest  44=Coul-SR:rest-rest
  45=  LJ:rest-rest  46=Coul-14:rest-rest  47=LJ-14:rest-rest  48=        T-rest
  49=     Lamb-rest
 

Select 9 (potential energy). You examine the values of various energy and other terms calculated during the simulation.  Currently, our main interest is the total value of the potential energy.

Then type 0 (zero) to exit this option.

Now you have to make a graph with the software xmgr  (help for this software can be found at http://plasma-gate.weizmann.ac.il/Xmgr/doc/xmgr.html). Simply type:

xmgr energy.xvg

Note, this graph has an error: The label for the horizontal axis must be Step instead of Time (ps).
(To change it go to Plot (in the upper part of the xmgr window), select Tick labels/tick marks and in Axis label delete Time (ps) and write Step. Then click Accept and Close.)

To exit xmgr go to File and select Exit.
 

Step 5. Perform equilibration

Leave the current directory and go up in the directory tree:

cd ..

Create a new directory for the equilibration and enter it:

mkdir EQ

cd EQ

Copy the parameter file for the equilibration (equilibration.mdp).

Check the contents of this file:

less equilibration.mdp

Essentially this file has information on the process to do (MD), the information to write in the output files, the treatment of the long range interactions (Coulomb and Van der Waals interactions), the treatment of thermodynamic quantities (temperature and pressure), initial velocities with a Maxwellian distribution and constrains on the protein. The equilibration process mainly aims at getting a correct velocity distribution for the molecules in order to simulate a valid statistical ensemble and of course to put the system at equilibrium.

Pre-processing is done like previously with   grompp (Be careful with the name of the files):

grompp  -f equilibration.mdp  -c ../EM/minimized.gro  -p ../toluene.top

You now have made a run input file (topol.tpr) which serves as input for the mdrun program. Now you  can do the molecular dynamics calculation:

mdrun   -s  topol.tpr

The MD calculation takes some time, the amount depending on the CPU in your computer, the load of your computer, etc.
 

Step 6. Free energy calculation.

As usual, leave the current directory, create a new one and enter it:

cd ..  ;  mkdir SIMULATION  ;  cd SIMULATION

Copy the result of the equilibration (since we did not specify any particular name, it should be present under the default name, confout.gro) in the current directory and rename it to confstart.gro:

cp ../EQ/confout.gro confstart.gro

To get the mutation free energy of the toluene in dummy atoms, you need to run several simulations, one for each lambda value. For each simulation at a given λi, you will get (dG/dλ)λi.

Depending on the transformation, you usually need 15 and 25 points to get a smooth curve.

Here we will only run the simulation for a couple of lambda points:

mkdir lambda0.00 lambda0.02

cd lambda0.00

Get the parameter files for the equilibration (equ-l0.00.mdp) and the simulation (data-l0.00.mdp). Remember that when you run a simulation at a new lambda value, your starting structure is equilibrated but for a different lambda value, thus you need to equilibrate it again at the current lambda value.

Take a look at one of the parameter files, equ-l0.00.mdp for instance:

less equ-l0.00.mdp

You can see that free energy is enabled and also for which lambda value the simulation is performed.

To run the equilibration:

grompp  -f  equ-l0.00.mdp  -c  ../confstart.gro  -p  ../../toluene.top

mdrun -s  topol.tpr

To run the simulation (after the equilibration is finished):

grompp  -f  data-l0.00.mdp  -c  confout.gro  -p  ../../toluene.top

mdrun -s  topol.tpr

To run the simulation at a different lambda value, just go to the corresponding directory (create it if necessary), copy the parameter files and edit them to change the lambda value to the one you want. As starting structure you will use the output configuration of the simulation performed at the previous lambda value.
 

Step 7. Analysis.

You can can see that a file dgdl.xvg was created during the simulation. It contains (dG/dλ)λi values calculated at each step of the simulation. To calculate the average value and the error associated to it you can use g_analyse:

g_analyze  -f  dgdl.xvg  -ee

You should see the average value below... average and the error next to err.est.

You can then create a file containing all the results according to the following format:

lambda_value   dgdl_average   err.est

To do this, go in the upper directory and write the results according to the specified format in a file that you can call dhdl-ini.xvg.

You should do this for each lambda value and when you do, you should get something like the file dHdl.xvg.

To see the graph with the errors:
xmgr  -xydy  rmsd.xvg

To integrate the plot, go to the data menu with the mouse and select Transformations/integration. Choose "sum only" and accept.
What value do you get ? Remember that you still need the value in vacuo to Compare the result to the experimental value.

Simulation in vacuo.
To do the simulation in vacuo, you have to follow exactly the same procedure starting from Step 4, but you will need different parameter files:

Integrate the plot to obtain the mutation free energy in vacuo.

Results

Once you have the result in vacuo, you can finally calculate the hydration free energy of toluene:

ΔGhyd = ΔGmutation(vacuo) - ΔGmutation(water)

Compare it to the experimental value: ΔGhyd,exp(toluene) = -3.1 kJ.mol-1
 


<<- Back


Proceed ->>

This page was last modified on Jan. 19th 2004