So I am back from the eCheminfo meeting at Bryn Mawr College. For those having the inclination (both computational chemists and experimentalists), I would strongly recommend the meeting for the small group and consequent close interaction. The campus with its neo-gothic architecture and verdant lawns provides a charming environment.
Whenever I go to most of these meetings I am usually left with a slightly unsatisfied feeling at the end of many talks. Most computational models to describe proteins and protein-ligand interactions are patchwork models based on several approximations. Often one finds several quite different methods (force fields, QSAR, quantum mechanics, docking, similarity based searching) giving similar answers to a given problem. The choice of method is usually made on the basis of availability and computational power and past successes, rather than some sound judgement allowing one to choose that particular method over all others. And as usual it depends on what question you are trying to ask.
But in such cases, I am always left with two questions; firstly, if several methods give similar answers (and sometimes if no method gives the right answer), then which is the "correct" method? And secondly, because there is no one method that gives the right answer, one cannot escape the feeling at the end of a presentation that the results that were obtained could have been obtained by chance. Sadly, it is not even always possible to actually calculate the probability that a result was obtained by chance. An example is our own work on the design of a kinase inhibitor which was recently published; docking was remarkably successful in this endeavor, and yet it's hard to pinpoint why it worked. In addition a professor might use some complex model combining neural networks and machine learning and may get results agreeing with experiment, and yet by that time the model may have become so abstract and complex that one would have trouble understanding any of its connections to reality (that is partly what happened to financial derivatives models when their creators themselves stopped understanding why they are really working, but I am digressing...)
However, I remind myself in the end about something that is always easy to forget; models are emphatically not supposed to be "correct" from the point of view of modeling "reality", no matter what kind of fond hopes their creators may have. The only way in which it is possible to gauge the "correctness" of a model is by comparing it to experiment. If several models agree with experiment, then it may be meaningless to really argue about which one is the right one. There are metrics suggested by people to discriminate between such similar models, for instance employing that time-honored principle of Occam's Razor where a model with fewer parameters might be better. Yet in practice such philosophical distinctions are hard to apply and the details can be tricky.
Ultimately, while models can work well on certain systems, I can never escape the nagging feeling that we are somehow "missing reality". Divorcing models from reality, irrespective of whether they are supposed to represent reality or not, can have ugly consequences, and I think all these models are in danger of falling into a hole on specific problems; adding too many parameters to comply with experimental data can easily lead to overfitting for instance. But to be honest, at this point what we are trying to model is so complex (the forces dictating protein folding or protein-ligand interactions only get more and more convoluted like Alice's rabbit hole) that this is probably the best we can do. Even ab initio quantum mechanics involves acute parameter fitting and approximations in modeling the real behavior of biochemical systems. The romantic platonists like me will probably have to wait, perhaps forever.
Subrahmanyan Chandrasekhar: A study in fortitude and rigor
2 days ago in The Curious Wavefunction