Field of Science

Dock dock, who is there?

Docking is one of the holy grails of computational chemistry and the pharmaceutical industry. But it's also a big current unsolved problem. The process refers to the placement of an inhibitor or small molecule in the active site of a protein, and then assesing its interactions with the active site, thereby reaching a conclusion about whether it will bind strongly or weakly. This process, if perfected, naturally will be very valuable for finding new leads or testing yet untested compounds against pharmaceutical targets, and most importantly, high throughput screening. Two subprocedures have to be honed when doing this; there first needs to be a way for placing the inhibitor in the site and exploring various orientations in which this can be done, and once the ligand is placed in the site, there then needs to be some way of evaluating whether its interaction with the site is 'good' or 'bad'.

The most popular way in which this is done is by using a 'scoring function' which is simply a sum of different interaction energies, due to hydrogen bond, electrostatics, van der waals interactions, and hydrophobic interactions, to name a few. The number that comes out from the sum of these interactions with the protein for a particular compound is anything but reliable, and scoring functions correlate very poorly with experimentally determined free energies of binding in general. The most reliable way of estimating free energies computationally is by Free Energy Perturbation (FEP). Yet, scoring functions can be reasonably good on a relative basis, and offer the fastest way of doing an evaluation. However, what we are essentially trying to do is evaluate the free energy of interaction, which is inherently diabolically convoluted, and consists of complicated entropy and enthalpy terms. These include terms for the protein, the ligand, and the complex that is formed. Enormously complicating the matter is the fact that both the protein and ligand are solvated, and displacement of water and desolvation effects will massively affect the sum interaction of the ligand with the active site. In addition, conformational changes do take place in the ligand when it binds, and also the protein in many cases. Needless to say, the general process of a ligand binding to a protein is extremely complicated to understand, let alone computationally evaluate.

And yet, there are programs out there like Glide, DOCK, Flexx, and Gold, to name a few, which have repeatedly attempted to dock ligands into active sites. This whole program has been a big saga, with people publishing one article every week in J. Med. Chem. related to docking. Many of these programs include scoring functions with terms that have been parametrized from data, and from observations related to basic physical principles of intermolecular interactions. The programs don't work very well generally, but can work for the interaction of one inhibitor with homologous proteins, or for different inhibitors for the same protein (a more tenuous application). I have personally used only Glide, and in my specific project, it has provided impressive results.

Any docking program needs to accomplish two goals:

1. Find the bioactive conformation of the ligand when supplied with the protein and ligand structure.
2. Evaluate whether similar/dissimilar ligands will show activity or not.

In practice, every docking result gives a list of different conformations of the ligand and protein, known as 'poses' ranked in descending order of efficacy based on their perceived free energies of interaction. Looking at just the top pose and concluding that that is the bioactive pose is a big mistake. Sometimes, if that pose is repeatedly found among the top ten results, one might hypothesize that in fact it may be the bioactive pose. In my case, that did turn out to be the case. However, it must also be noted that such programs can be parametrized for particular proteins and ligands, where the ligands are known to have very specific interactions. Then it would be relatively easy for the program to find similar ligands, but given the literally infinite possibilities in which ligands bind to proteins, even this fails sometimes.

One of the big challenges of docking is to model protein conformational changes- the classic induced fit mechanism in biochemistry. Glide has an induced fit module which has again given me favourable results in many cases. Induced fit docking remains an elusive general goal, however.

Solvation, as mentioned above, is probably the biggest problem. For the same protein and different ligands which are known to bind with certain IC50 values, the Glide scoring function seldom reproduces this ranking in terms of free energy of binding. However, the MM-GBSA model, which uses continuum solvation, gave me good results which neither regular nor induced fit docking did.

Docking programs continue to be improved. The group at Schrodinger which developed Glide is doing some solid and impressive work. In their latest paper in J. Med. Chem., they discuss a further refinement of Glide which is called Extra Precision (XP) Glide. Essentially, the program works on the basis of 'penalties' and 'rewards' for bonds and interactions based on their nature. The main difference in succeeding versions of docking programs is not surprisingly, attempts to improve the terms in the scoring functions by modification and addition, and attempts to rigorously parametrize those terms by using hundreds of known protein-ligand complexes as training sets. In this particular paper, the Schrodinger team has included some very realistic modifications to the hydrophobic and hydrogen bonding terms.

In general, how does one evaluate the energy of a hydrogen bond between a ligand atom and protein atom, an evaluation that obviously would be crucial for assesing ligand-protein interaction? It depends on several factors, including the nature of the atoms and their charge, the nature of the binding cavity where the bonds are formed (polar or hydrophobic) as well as its exact geometry, and the relative propensity of water to form hydrogen bonds in that cavity. This last factor is particularly important. Hydrogen bonds will be favourable only if water does not form very favourable hydrogen bonds in the cavity, and if the desolvation penalty for the ligand is not excessive. The Glide team has come up with a protocol of assesing the relative ease for h-bond formation of both water and the ligand in the active site, and then deciding for which one it will be more favourable. H-bonds formed between ligand and protein when water in the active site is not 'comfortable' because the site is hydrophobic and cannot form its full complement of h-bonds, will be especially favourable. The group cites the program's ability to reproduce such a situation, that contributes significantly to the extraordinary affinity of streptavidin to biotin, the strongest such interaction known. In this case, four correlated hydrogen bonds provide solid binding interactions as shown below. The group says that theirs is the first scoring function that has explained this unique experimental result.

Image Hosted by

The other significant modification to the program is a better representation of the hydrophobic effect, an effect which again is quite complicated, and depends upon the binding of the ligand itself, as well as the displacement of water. The hydrophobic effect is extremely important; I remember one case in which a ligand bound to HIV-1 protease showed great binding affinity without having formed a single h-bond, purely on the basis of the hydrophobic nature of the binding site! The group has cleverly tried to include the effect of not just the lipophilicity, but the exact geometry of the hydrophobic site. A 'hydrophobic enclosure' as the group calls it is particularly favourable for lipophilic parts of the ligand, and is rewarded in the scoring function. Balancing this is the desolvation penalty for the ligand, which is enthalpically unfavourable for it and the water bound to it.

The new modifications also seem to have made accomodations for pi-pi stacking and cation-pi interactions, which can contribute significantly in certain cases.

Overall, the scoring functions and the program are getting better, as the group is parametrizing it based on commonly occuring structural motifs and better interaction terms. The nice thing is that these modifications are in the end based on sound physical principles of ligand-protein binding, principles that are complicated to understand, but are based on fundamental laws of physical organic chemistry such as the hydrophobic effect, solvation/desolvation, hydrogen bonding, and other intermolecular interactions. Finally, it's the chemistry that is most important.

Docking may never serve as a solve-all technique, and may never work for all situations universally, but with this kind of development based on experiment going on, I feel confident that it will become a major guiding, if not predictive tool, in both academic labs and pharma. As usual, the goal would remain to balance accuracy with speed, something which is invaluable for high-throughput screening. For more details, refer to the paper, which is detailed indeed.

Reference: Friesner, R. A.; Murphy, R. B.; Repasky, M. P.; Frye, L. L.; Greenwood, J. R.; Halgren, T. A.; Sanschagrin, P. C.; Mainz, D. T. "Extra Precision Glide: Docking and Scoring Incorporating a Model of Hydrophobic Enclosure for Protein-Ligand Complexes" (J. Med. Chem.; (Article); 2006; ASAP Article; DOI: 10.1021/jm051256o)


  1. nice information, dude. Thanks. However, wish you put them in shorter bits and pieces.

  2. Nice post. I personally think that docking is a good tool to spur the imagination, it moves the ideas from the less intuitive 2D to the more intuitive 3D.

  3. Thanks. Yep, it does, and the algorithms are really getting better now so that you don't just have pretty pictures. My rule as usual is; the prettier it looks, the more critical you should get!

  4. I just read this post. Nice post.

    I have a question ..

    "Sometimes, if that pose is repeatedly found among the top ten results, one might hypothesize that in fact it may be the bioactive pose. In my case, that did turn out to be the case."

    What do you do if the pose repeatedly found among the top ten results is not the bioactive pose? What other ways do you have to determine which the bioactive pose is?


Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="">FoS</a> = FoS