Field of Science

Molecular modeling: How far can physics take us?

Of all the scientists writing about modeling and simulation in drug discovery in the last decade or so, I have found Anthony Nicholls of OpenEye Scientific Software to be one of the most insightful. Not only has he written important papers emphasizing the role of rigorous statistics in generating and communicating modeling results, but he has also been a relentless proponent of the need for rigorous, unglamorous but essential experimental data to validate modeling protocols. In the phalanx of modelers pointing to a better future for their field, Anthony has been one of the torchbearers. I usually pay close attention when he writes so I think it's worth noting what he has to say in a recent article titled "The character of molecular modeling".

He starts by asking what the real advances in the field have been in the past 25 years and by observing an apparently rather disconcerting fact about modeling and especially structure-based modeling - successes are still mainly anecdotal. He expresses his disappointment while noting that most of the successful results in modeling are still mainly of the "find protein pocket, fill pocket" type. The chief role of the crystallographer, it seems, is to supply pockets that the computational chemist can then fill. The problem according to Anthony is that chemists are not "abstracting principles of wide applicability; they are recognizing domains of expertise".

At this point let me interject and say that while Anthony's gloomy prognosis might be true, it's also true that "find pocket, fill pocket" (or "find pocket, kill pocket" if you are in a hunter-gatherer mood) campaigns are not as straightforward as we think. There can be unexpected effects on both protein conformation and ligand conformation, similar to the "activity cliffs" witnessed by medicinal chemists. Even if the binding orientation of the ligand stays constant upon small changes, the distribution of solution conformations of the modified ligand is likely quite different, leading to differing energetic penalties that the protein has to pay for binding. I am sure I am not alone in saying that small changes in ligand structure leading to changes in binding affinity enforced by ligand strain and conformation are uncomfortably frequent. But there's another dimension to the "find pocket, fill pocket" campaign; it can actually be quite satisfying to suggest changes to a medicinal chemist for filling the pocket that are borne out by further crystallography. Finding pockets may generate anecdotes, but chemistry is a more anecdotal science than say physics, and chemists often revel in these little successes and failures. Chemists are more frogs than eagles.

But the real sticking point for Anthony is not really the anecdotal success of structure-based modeling but the lack of general physics-based principles and laws for doing molecular modeling. Docking is an example. In the last several years there have been many attempts to use physics-based "scoring functions" - essentially ways to sum up different protein-ligand interactions to a number - for calculating the binding affinity of a ligand. Programs for docking have evolved to a stage where ligands can be docked in the correct orientation with a roughly 30% success rate, depending on how similar the docked ligands are to a reference co-crystallized ligand. But the truth of the matter is that we still fail miserably when trying to dock an arbitrary ligand to an arbitrary protein in an arbitrary conformation. And of course, we are light years away from predicting free energies of binding for the general case. There have been cases in which physics in the form of electrostatics and quantum mechanics (more on this later) has significantly accelerated the search for similar molecules, but the promised land still seems far.

Does this failure reflect an absence of general principles of physics for computing protein-ligand interactions? Paraphrasing Rutherford (not Niels Bohr), in the next few decades will we do more physics or simply collect more stamps? Is this concern even warranted? To some extent, yes. It would certainly be very satisfying to have a general explanatory framework, a pool of more or less universal laws that explained the wide variety of protein-ligand complexes as completely as Newton's laws explain the behavior of an astonishingly diverse set of particle interactions in the classical world. Curiously, such a general framework does exist in the form of statistical mechanics and quantum mechanics. In theory, both these disciplines encompass the binding of every single protein to every single drug. So does that mean we can look forward to a time when every modeler can "abstract these principles of wide applicability" and use them to solve the particular case of his or her protein and ligand?

Here is where I part ways with Anthony at least partly. The reason in my mind is not too hard to discern. Think about how far we have come in explaining protein-ligand binding using the rather extensive developments in either quantum or statistical mechanics over the past five decades. The answer is, not as far as we would have liked to. While we have indeed made great advances in understanding the basic thermodynamics of protein-ligand binding, we have not been very successful in incorporating these principles into predictive computational models. Why so? For the same reason that we have not been successful in using physics to explain "all of chemistry", in Paul Dirac's words. Quantum mechanics has been applied to chemistry for fifty years and exponentially increasing computational power has significantly furthered its application, but even now, for most practical systems chemists use a variety of empirical models to understand and predict. That's partly because most real systems are too complex for the direct use of quantum mechanics, and an imperfectly understood protein and ligand immersed in an imperfectly understood solvent certainly belong to this category. It's also because we are still far from calculating things like entropy and being able to model the differential behavior of water at interfaces and in the bulk.

But even more importantly, physics may not solve our problems because chemists need to abstract general principles at the level of chemistry to ply their trade. Thus, in expressing doubts about the utility of general physics-based principles, I am appealing to the strong sense of non-reductionism that permeates chemistry and separates it from physics. The same principle applies to biology and I have written about this often. Principles drawn from physics have always been very useful in gaining insights into molecular interactions and they will continue to be an essential part of the mix. But unlike Anthony, I see a far smaller role that pure physics can truly make in enabling a general, practical predictive approach to modeling that's "chemical" enough to be widely used by chemists.

So are there cases in which physics can make a contribution? Here I actually do agree with Anthony when he mentions two areas where physics really promises to have a substantial impact, both conceptually and practically. The first is crystal structure prediction for organic molecules which is a notoriously fickle problem (a measure of the difficulty can be gleaned by the fact that even the simple benzene can crystallize in more than 30 different geometries), essentially one of being able to predict fine energy differences between almost equienergetic arrangements. Yet I see this problem as one of the more reductionist problems in chemistry, and as Anthony notes, it is conceivable that it will yield to physics-based approaches in the near future.

The other problem is one of the holy grails of chemistry and biology - protein structure prediction. In various guises, the last few years have seen a startlingly impressive set of cases where protein structures of small and (some) medium-sized proteins were predicted with atomic level accuracy. Protein structure prediction has to overcome the twin challenges of sampling and energy estimation that are a mainstay of almost every other modeling method. In this case Anthony thinks that we will have to get the physics right to address this issue.

But we have to be careful to distinguish between two cases here. The first case is where we get the right structure even if we have no idea how we got there. This is the field of empirical (non-physics based) protein fold prediction and the biggest success in this area has been the ROSETTA suite of programs. ROSETTA has definitely turned heads within the community by its ability to generate accurate structures for hundreds of proteins, but the big drawback of the approach is that it only generates the end result. Curiously Anthony does not mention ROSETTA, but I am also surprised that he does not mention in detail another significant development that does fit into the physics-based paradigm. This is the molecular dynamics
approach developed by David Shaw, Vijay Pande and others. Unlike ROSETTA, MD can actually shed light on the process leading to a correct structure, although the details of the process are subject to errors, most notably in the force fields that underlie the simulation. It's quite clear that with all their limitations, ROSETTA and MD have been the biggest contributors to successful protein folding simulations over the last decade.

And yet as Anthony rightly says, their success seems almost like a miracle. This becomes clear when we realize that even now we have trouble predicting something as simple as the solvation energy of a simple organic molecule or the interaction energy of two simple molecules using even sophisticated quantum mechanics calculations. If our ability to predict even such simple scenarios is dismal, how on earth are we getting the structures of all those complex proteins right? The answer deserves as much scrutiny as the solution to these problems, scrutiny that is severely lacking. Anthony's answer (and mine) is "cancellation of errors along with a need to calculate only relative, not absolute, energies" (it's well known that force fields are virtually worthless for the calculation of absolute energies). It still strains my mind to think that these two factors could contribute to so many successful predictions published in the likes of Nature and Science. Cancellation of errors was partly made famous by Enrico Fermi. If that's really what's happening in all these cases, then the entire field needs to start celebrating Fermi as their guardian angel.

Ultimately, there is no doubt that advances will continue to be made with increasing computational firepower, but the foundations of the field will stay brittle unless these fundamental issues are addressed. Anthony ends with something he has been doing for a long time now - appealing to experimentalists, industry and government to contribute a small part of their funds to the kind of basic experiments that can further the field of modeling. This especially involves experiments that can refute an idea, a philosophy that has been dominant in the practice of science since its modern conception but one which seems to be unusually neglected in drug discovery because of the emphasis on positive data gathering. Science has always progressed by the testing of ideas that have no immediate practical bearing, except that they perform the invaluable function of making future scientific research worthwhile. It would be fundamentally unscientific if such ideas are not supported. Anthony puts it well:

"The simple commitment to spend a small percentage of the science budget at the NIH or at pharmaceutical companies on nontranslational work, providing support for the small cabals of scientists actually interested in making fundamental progress would be enormous. Reestablishing the contact between theorists and experimentalists, the publishing of high quality data, conferences devoted to the actual testing of ideas—in 25 years we might hope molecular modeling could become a real scientific discipline."

1 comment:

  1. It's worth remembering that we can't currently measure the free energy changes associated with removal of water from pockets on protein surfaces where (we believe) that hydrophobic enclosure is an important factor. Is it reasonable to expect to be able to predict that which we cannot (and arguably do not even know how to) measure?

    The other point worth making is that molecular design is not all about prediction. Hypothesis-driven molecular design is about gathering relevant information as efficiently as possible and is still useful even if it doesn't fit tidily into a physics-centric world view. Also things like pharmacophores, which are often dismissed as 'unphysical' actually encode physics, for example in the use of atoms types to encode potential for forming intermolecular interactions.


Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="">FoS</a> = FoS