Virtual screening (VS), wherein a large number of compounds are screened, either by docking against a protein target of interest or by similarity searching against a known active, is one of the most popular computational techniques in drug discovery. The goal of VS is to complement high-throughput screening (HTS) and the ideal goal is to at least partly substitute HTS in finding new hits.
But this goal is still far from being achieved. VS still has to make a significant contribution in the discovery of a major drug and typical hit rates range from a few tenths of a percent to perhaps a percent or two. VS has been intensively investigated for more than a decade. What do we know about its limitations, and where do we go from here?
Gisbert Schneider of ETH Zurich has some thoughts on VS in a recent review. Success in VS ultimately boils down to understanding the detailed structure and dynamics of protein-ligand complexes, a goal that we are still miles away from. We still struggle to realistically include entropy in any calculation, and we are still not completely clear about the role that buried water molecules play in dictating ligand binding. Plus we cannot yet take allosteric binding properly into account, let alone more complex interactions like protein-protein interactions. Thus, maybe, as pointed out in a past post and article, the correct question to ask would be the "anti-question", namely, why does VS work at all in spite of this supposedly woeful lack of understanding?
First of all it is important to know what VS can do well and what it can't. As the article notes, VS is still best for negative selection, that is for weeding out inactive molecules which are bad binders. One of the goals of VS is also to duplicate the correct protein-bound x-ray conformation of the ligand, and in this endeavor (termed pose prediction) VS seems to be succeeding much better than in the ultimate goal which is to rank ligand binding to a protein in order of free energy of binding. As the article notes, the true binding interaction energy landscape for a protein might be more of a plateau; thus there may be a variety of protein-ligand contacts corresponding to a 'good' solution, rather than a global optimum. Plus, one may end up modeling details that are not very relevant to the gist of the ligand binding event; in such a case productive contacts can be preserved with no great sacrifice of qualitative prediction.
Nonetheless, tiny details can sometimes radically shift the balance. No wonder that VS has been heavily dependent on the target rather than on the computational algorithm. Nature continues to throw up surprises as protein entropy, hydrophobic interactions and subtle behavior of water molecules continue to be uncovered as powerful forces operating for a particular protein-ligand complex.
In the end, modeling the dynamic behavior of macromolecules is an absolute must for lending general utility to VS campaigns. In the absence of adequate modeling of entropy, it may be wise from a practical viewpoint to aim for ligand chemotypes whose binding is dominated more by enthalpic effects. It's interesting to note a past set of studies which I had highlighted which suggested that it's really the enthalpy rather than entropy which is rendered favorable in a drug discovery project as one proceeds from hit to lead.
Finally, the author makes an appeal to fields spread far and wide to come up with ideas that could be applied in VS and related approaches. It is likely that while incremental improvements will continue to be made in the field through better understanding of protein-ligand interactions, only a novel idea would revolutionize the field. Thus insights could possibly come from unlikely quarters, including complexity theory, non linear dynamics, other aspects of physics and even engineering and architecture.
How this might happen is not at all clear, but it definitely calls for more multidisciplinary work and for more scientists from diverse fields to become interested in the problem. After all VS is fundamentally an optimization problem, one of locating the optimal ligand energetic minimum in a multidimensional landscape of protein, ligand, ions and solvent. I can't see why any mathematician, physicist or engineer worth his or her salt won't find it exciting.
Schneider, G. (2010). Virtual screening: an endless staircase? Nature Reviews Drug Discovery, 9 (4), 273-276 DOI: 10.1038/nrd3139