Field of Science "ranks billions of drug interactions"? Hold your horses.

Now here's a study that should make most seasoned molecular modelers cringe. Nature News reports on an effort by website that docked 600,000 compounds to 7,000 protein targets and predicted which ones would show activity against these targets based on docking scores:

Predicting how untested compounds will interact with proteins in the body, as Drugable attempts to do, is more challenging. In setting up the website, Cardozo’s group selected about 600,000 molecules from PubChem and the European Bioinformatics Institute’s ChEMBL, which together catalogue millions of publicly available compounds. The group evaluated how strongly these molecules would bind to 7,000 structural ‘pockets’ on human proteins also described in the databases. Computing giant Google awarded the researchers the equivalent of more than 100 million hours of processor time on its supercomputers for the mammoth effort.

But mammoth computing resources do not translate to carefully constructed protocols or correct predictions. In its current incarnation, docking is best for finding the binding pose, that is, the orientation of a drug bound into a protein pocket. Ranking compounds is far more difficult, and predicting absolute binding affinities is a very distant, currently unachievable third goal.

Anyone who has tried to run a hit to lead or lead optimization project based on docking scores would know how riddled with problems and qualifications any prediction based on these highly subjective numbers is. For starters, every modeling program gives you its own docking scores. Absolute values of these numbers (which ideally should reflect the free energy of binding but which seldom do) are almost always useless. If you are dealing with a congeneric series of molecules and are fairly confident about the binding orientation (usually confirmed by x-ray crystallography or some other technique) then maybe you could get some help from the scores in ranking the compounds, but even then mostly in terms of trends rather than quantitative differences.

Unfortunately the news piece says nothing about what method was used to generate the poses, whether there was any clustering or whether only the top pose was considered, what the false positive rate was, and most importantly, whether there was any experimental verification whatever of the ranking. The website is also not helpful in this regard. It also does not tell us if the protein structures used for docking were well-resolved or refined or whether they were homology models. In the absence of all this information the ranking of the compounds is tenuous at best and useless at worst and as it stands the study sounds little better than throwing darts in the dark and hoping some of them will stick. Ranking often fails even for similar compounds, so how well (or badly) it would work for 600,000 diverse compounds bound to 7,000 diverse protein targets is anyone's guess.

The report also compares the study to a similar activity prediction study by Brian Shoichet in which drug similarity was used to predict activity against unexpected targets. But that was a very different kettle of fish; it was a compound similarity - not docking - study so it did not have to deal with the complexities of error-ridden protein crystal structures or homology models, it verified a lot of the predictions using carefully constructed assays, and even then it gave a hit rate which did not exceed about 50%.

Either the study itself has failed to validate its predictions or the news report is woefully incomplete. Maybe I am wrong and in fact the study has laid the careful groundwork and validation that is necessary for trusting docking. As it stands however, the purpose of the report mainly seems to be to highlight the fact that Google generously donated 100 million hours of its computing power to the docking. This heightened, throw-technology-at-it sense of wonder and optimism is exactly what the field does not need. I would be the first one to welcome reliable predictions of drug-protein affinity based on wholesale docking of compounds to targets, but I don't think this work achieves that goal at all.


  1. As which is the case for most "large numbers", I doubt that many people know much computing 100 million core hours actually is. I'm not saying it's not a lot, but my lab, for instance, was awarded 105 million core hours for 2013 for about 6 projects and we have access to none of the world's top supercomputers. You can browse all of Canada's supercomputing allocations in core hours from this list to get a sense,

  2. Hello Ash,

    You raise some very valid points in your response to the Nature News article about Drugable. We look forward to answering your questions when we publish our methods. We will post links to all publications on In addition, we hope to set up an information resource for specific questions that may have not been answered by our papers. Please stay tuned and look for updates at



  3. it´s really Interesting to see ... thank you it's well done :)


Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="">FoS</a> = FoS