Computer models of chemical and biological systems are not reality;
rather they are what I call “invitations to reality”. They provide guidance to
experimentalists to try out certain experiments, test certain techniques. They
are suggestive, not factual. However as any good modeler and chagrined
experimentalist knows, it’s not hard to mistake models for reality, especially
when they look seductive and are replete with bells and whistles.
This was one of the many excellent points that Anthony
Nicholls made in his lunch critique of molecular dynamics yesterday at the
offices of OpenEye scientific software in Cambridge, MA. In his talks and
papers Anthony has offered not just sound technical criticism but also a rare
philosophical and historical perspective. He has also emerged
as one of the sharpest critics of molecular dynamics in the last few years, so
we were all eager to hear what it exactly is about the method that rubs him the
wrong way. Many of his friends and colleagues call him ‘Ant’, so that’s what I
will do here.
Here’s some background for a general audience: Molecular
dynamics (MD) is a computational technique that is used to simulate the motion
of atoms and molecules. It is used extensively in all kinds of fields, from
biochemistry to materials science. Most MD employed in research is classical
MD, based on Newton’s laws of motion. We know that the atomic world is
inherently quantum mechanical in nature, but it turns out we can get away to a
remarkable extent using classical mechanics as an approximation. Over the last
few years user-friendly software and advances in computing hardware have
brought MD to the masses, so that even non-specialists can now run MD calculations
using brightly colored and accessible graphical user interfaces and desktop
computers. A leader in this development is David E. Shaw, creator of the famed
D E Shaw hedge fund who has made the admirable decision to spend all his time
(and a good deal of his money) developing MD software and hardware for
biochemistry and drug discovery.
Ant’s 2-hour talk was very comprehensive and enjoyable,
covering several diverse topics including a few crucial ones from the
philosophy of science.
It would be too much to describe everything that Ant said
and I do hope OpenEye puts the video up on their website. I think it would be
most convenient to summarize his main points here.
MD is not
a useless technique but it’s not held up to the same standards as other techniques,
and therefore its true utility is at best unknown: Over the last few years
the modeling community has done a lot of brainstorming about the use of
appropriate statistical and benchmarking methods to evaluate computational
techniques. Statistical tests have thus emerged for many methods, including
docking, shape-based screening, protein-based virtual screening and quantum
chemical calculations. Such tests are however manifestly lacking for molecular
dynamics. As Ant pointed out, almost all statements in support of MD are
anecdotal and uncontrolled. There are almost no follow-up studies.
MD can
accomplish in days what other techniques can achieve in seconds or hours:
No matter how many computational resources you throw at it, the fact remains
(and will likely always remain) that MD is a relatively slow technique. Ant
pointed out cases where simpler techniques gave the same results as MD but in
much lesser time. I think this reveals a more general caveat; that before
looking for complicated explanations for any phenomenon in drug discovery or
biology (potency, selectivity, differences in assay behavior etc.), one must
look for simple ones. For instance is there a simple physicochemical property
like molecular weight, logP, number of rotatable bonds or charge that correlates with the observed effect? If there is one, why run a simulation lasting hours
or days to get the same result?
A case in point is the recent Nature paper by D. E. Shaw’s group
described by Derek on his blog. Ant brought our attention to the Supporting
Information which says that they got the same result for the ligand pose using
docking which they got using MD, a difference translating to a simulation time
of days vs seconds. In addition they saw a protein pocket expansion in the
dynamics simulation whose validity was tested by synthesizing one compound. That they prospectively
tested the simulation is a good thing, but one compound? Does that prove that
MD is predictive for their system?
MD can
look and feel “real” and seductive: This objection really applies to all
models which by definition are not real. Sure, they incorporate some elements
of reality but they also leave many others out. They simplify, use fudge
factors and parameters and often neglect outliers. This is a not a strike
against models since they are trying to model some complex reality and they
cannot do this without simplification, but it does indicate reasons for being
careful when interpreting their results. However I agree that MD is in a
special category since it can generate very impressive movies that emerge from
simulations run on special purpose machines, supercomputers or GPUs for days or
months at a time. Here’s one that looks particularly impressive and denotes a
drug molecule successfully “finding” its binding site on a protein.
This apparently awesome power of computing
power and graphical software brought to bear on an important problem often
makes MD sound way more important than what it is. The really damning thing
though may be that shimmering protein on your screen. It’s very easy for
non-computational chemists to believe that that is how the proteins in our body
actually move. It’s easy to believe
that you are actually seeing the physics of
protein motion being simulated, down to the level of individual atoms.
But none of this is really true. Like many
other molecular models what you are seeing in front of you is a model, replete
with approximations and error bars. As Ant pointed out, it’s almost impossible
to get real variables like statistical mechanical partition functions, let
alone numbers from experiment, out of such simulations. Another thing that’s
perpetually forgotten is that in the real world, proteins are not isolated but
are tightly clustered together with other proteins, ions, small molecules and a
dense blanket of water. Except perhaps for the water (and poorly understood
water at that), we are ignoring all of this when we are running the simulation.
There are other problems in real systems, like thermal averaging and
non-ergodicity which physicists would appreciate. And of course, let’s not even
get started on the force fields, the engines at the heart of almost every
simulation technique that are consistently shown to be imperfect. No, the
picture that you see in a molecular dynamics movie is a shadow of its “real”
counterpart, even if there is some agreement with experiment. At the very least
this means you should keep your jaw from dropping every time you see such a
movie.
Using
jargon, movies and the illusion of reality, MD oversells itself to the public
and to journals: Ultimately it’s not possible to discuss the science behind
MD without alluding to the sociological factors responsible for its perception.
The fact is that top journals like Nature or Science are very impressed when they
see a simulation shepherded by a team led by Big Name Scientist being run for
days using enough computing power to fly a jetfighter. They are even more
impressed when they see movies that apparently mirror the actual motion of
proteins. Journals are only human, and they cannot be entirely faulted for
buying into seductive images. But the unfortunate consequence of this is that
MD gets oversold. Because it seems so real, because simulations that are run
for days must undoubtedly be serious stuff because they have been run for days,
because their results are published in prestigious journals like Nature,
therefore it all must be important stuff. This belief is however misplaced.
What’s the take home message here? What was strange in one sense
was that although I agreed with almost everything that Ant said, it would not
really affect the way I personally use MD in my day-to-day to work, and I
suspect this is going to be the case for most sane modelers. For me MD is a
tool, just like any other. When it works I use its results, when it doesn’t I
move on and use another tool. In addition there are really no other ways to capture protein and ligand motion. I think Ant’s talk is best directed at the high
priests of MD and their followers, people who either hype MD or think that it
is somehow orders of magnitude better than other modeling techniques. I agree
that we should all band together against the exhortations of MD zealots.
I am however in the camp of modelers who have always used MD
as an idea generator, a qualitative tool that goads me into constructing
hypothesis and making suggestions to experimentalists. After all the goal of
the trade I am involved in is not just ideas but products. I do care about
scientific rigor and completeness as much as the other person, but the truth is
that you won’t get too far in the business I am involved in if you constantly
keep worrying about scientific rigor rather than the utility – even if it’s
occasional – of the tools we are using. And this applies to theoretical as well
as experimental tools; when was the last time my synthetic chemistry friends used
a time-tested reaction on a complex natural product and got the answer they
expected? If we think MD is anecdotal, we should also admit that most other
drug design strategies are anecdotal too. In fact we shouldn’t expect it to be
otherwise. In a field where the validity of ideas is always being tested
against a notoriously complex biological system whose workings we don’t
understand and where the real goal is to get a useful product, even occasional
successes are treasured and imperfect methods are constantly embraced.
Nonetheless, in good conscience my heart is in Ant’s camp
even if my head protests a bit. The sound practice of science demands that
every method be duplicated, extensively validated, compared with other methods,
benchmarked and quantified to the best of our abilities if we want to make it
part of our standard tool kit. This has manifestly not happened with MD. It’s the
only way that we can make such methods predictive. In fact it’s part of a paradigm
which as Ant pointed out goes back to the time of Galileo. If a method is not
consistently predictive it does not mean it is useless, but it does mean that
there is much in it that needs to be refined. Just because it can work even when
it’s not quantitative does not mean trying to make it quantitative won’t help.
As Ant concluded, this can happen when the community comes together to compare
and duplicate results from their simulations, when it devotes resources to
performing the kind of simple benchmarking experiments that would help make sense
of complicated results, when theorists and experimentalists both work together
to achieve the kinds of basic goals that have made science such a successful
enterprise for five hundred years.