Field of Science

Multiconformational MMGBSA Rescoring; Advancing On Mount Free Energy
Blogging has been a little slow lately mainly because there have been exciting new developments with one of the projects I have been involved in and I was in meetings related to this. One of the topics that was discussed at the conference I was at last week was the accurate prediction of free energies of binding, one of the holy grails of drug discovery. Free-energy perturbation (FEP) still remains the gold standard to get relative free energies of binding, but the procedure is very computer intensive and therefore can be carried out only with small changes in congeneric series of inhibitors. The goal remains elusive and extremely challenging.

A poor man's way of quickly obtaining such ∆Gs is MMGBSA (Molecular Mechanics Generalized Born Surface Area). The GBSA model is well-established as a continuum solvation model for taking solvation into account. What MMGBSA does is take a docked ligand structure and then calculate the free energy of binding as the difference between the bound and unbound states using a force field, including implicit solvation.

Therefore, it calculates
∆G (binding) = ∆G (protein-ligand complex) - ∆G (protein) - ∆G (ligand)
Clearly it has to calculate the energies of the free ligand and free protein. Much of the challenge lies in these two terms. For starters, one has to calculate the strain energy penalty that the protein has to pay in order to bind the ligand. The binding energy that we see experimentally emerges after the protein has paid this strain penalty. How much this strain energy can be has been a controversial topic recently and I will get into it in another post. Suffice it to say that it's a challenging calculation that is not always handled well by MMGBSA. This is because in calculating the ligand free energy, MMGBSA essentially uses a force field to relax the ligand from the bound conformation to the nearest local energy minimum. However, a complex ligand exists in several local energy minima in solution and this force field local minimum may not correspond to any of them. Thus, one has to consider the global strain penalty that the protein has to pay. For this the method also has to consider the multiple conformations that a ligand adopts in solution. Sadly there are very few techniques that will deconvolute the Boltzmann population of a ligand's real conformations in solution and give us the global minimum. This problem in calculating strain energies remains an important drawback of the method.

Calculating ∆G (protein) is also not a trivial matter. We need to consider the entropy of the protein. One can get this from time-consuming MD simulations but it's not certain if the force field is parametrized well and if conformational space has been sampled comprehensively. Another uncertain factor is the induced fit effects involved in binding. A lot of these effects can be subtle and may extend to second shell amino acid residues.

Given these drawbacks, MMGBSA has nonetheless been quite successful in improving agreement with experiment. One of the reasons it works so well is that when you are dealing with congeneric series of ligands for a given target, many of the terms like conformational entropy and protein reorganization energy are the same or very similar and cancel, although there can be surprises. It seems now that at least one of the problems in MMGBSA- not considering the multiple conformations of the ligand in solution- can be tackled. A simple way to get multiple conformations of a ligand in solution is to do a conformational search. Assuming that the search is "complete", one can then calculate the conformational entropy penalty that the ligand has to pay in order to sacrifice all conformations except one in which it binds to the protein. There has been an implicit way to take this into account- many docking programs include a penalty of 0.65 kcal/mol per frozen rotatable bond. But clearly this penalty may be quite less if there are hundreds of conformations in solution that would lead to a large conformational penalty.

Now a group from Amgen has done such multi-conformational MMGBSA rescoring for four important targets and their ligands- CDK2, Thrombin, Factor Xa and HIV-RT. They compare scores obtained with Schrodinger's GlideXP routine with experimental binding affinities. Then they compare scores obtained with MMGBSA rescoring either with a single ligand conformer representation or with a multiple conformer representation that takes ligand conformational entropy into account. The comparison between single and multiple conformers gives somewhat mixed results and sometimes the single conformer representation also does fairly well; however, one thing is strikingly clear, that MMGBSA rescoring can radically improve correlation with experimental affinities compared to simple GlideXP scoring. In some cases the correlation coefficient jumps from essentially 0.00 to a whopping (by current standards) 0.75. There is a lot of interesting methodology described in the paper worth taking a look at. But it's quite clear how including some of the explicit physical effects involved in protein-ligand binding can substantially improve correlation with experiment. In this case the extra effort expended is a fraction of the cost involved in FEP calculations and the methods can also tackle more diverse ligands.

Even if we are not close to conquering the free energy fort, at least we seem to be getting concrete footholds on it.

Guimaraes, C.R., Cardozo, M. (2008). MM-GB/SA Rescoring of Docking Poses in Structure-Based Lead Optimization. Journal of Chemical Information and Modeling, 48(5), 958-970. DOI: 10.1021/ci800004w


  1. Indeed, correct calculation of the conformational strain energy for a binding pose is not a trivial task. One needs to establish the baseline against solvent environment, preferably with explicit solvation model, because the implicit may have significant errors for specific H-bonding patterns, then consider entropic effects as well as intra-molecular interaction energies in addition to the basic steric strain. We are currently fine tuning our scoring function and testing the conformation strain minimization came accross a "gotcha" case in the CSD that seemed to defy logic first, see details here:

    To correctly compute the conformational strain energy, you also have to consider protonation state and possible change of it upon binding and the energy cost associated with such change, more about this topic here:

    So, there are still a lot of tough problems to solve in this area, but you are right, we are slowly inching closer to the summit of this mountain. :)

    Zsolt Zsoldos (aka ZZ)

  2. The issue of protonation state is a very interesting one; I hadn't thought of it. There naturally could be a very large energy cost associated with a change in protonation state

  3. My experience with MMPBSA--and I have a LOT of experience with free energy calculations (especially FEP)--is that it works well for certain congeneric series and not at all for others. You cannot determine, a priori, whether your system will be well suited for MMPBSA.

    In my experience, ANYTHING can beat Glide scores. Glide isn't bad for determining poses, but the scores themselves are crap. Which is not to single Glide out. The same is true for other docking programs, and they often are worse than Glide at determining poses.

  4. You are right; it's hard to determine prospectively whether it will work for you or not. While trusting Glide scores certainly will get you into a lot of trouble, they worked pretty well for my system; it turned out my protein was very similar to one of the proteins Glide was parametrized on. If it works it works. Since you have experience, I would be interested in knowing if you figured out why it did not work on some congeneric series. To me it seems like a lot depends on fortuitous cancellation of some energy terms, and there are going to be lots of cases where you will get a bad result if even one of them does not.


Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="">FoS</a> = FoS