Computer models of chemical and biological systems are not reality;
rather they are what I call “invitations to reality”. They provide guidance to
experimentalists to try out certain experiments, test certain techniques. They
are suggestive, not factual. However as any good modeler and chagrined
experimentalist knows, it’s not hard to mistake models for reality, especially
when they look seductive and are replete with bells and whistles.
This was one of the many excellent points that Anthony
Nicholls made in his lunch critique of molecular dynamics yesterday at the
offices of OpenEye scientific software in Cambridge, MA. In his talks and
papers Anthony has offered not just sound technical criticism but also a rare
philosophical and historical perspective. He has also emerged
as one of the sharpest critics of molecular dynamics in the last few years, so
we were all eager to hear what it exactly is about the method that rubs him the
wrong way. Many of his friends and colleagues call him ‘Ant’, so that’s what I
will do here.
Here’s some background for a general audience: Molecular
dynamics (MD) is a computational technique that is used to simulate the motion
of atoms and molecules. It is used extensively in all kinds of fields, from
biochemistry to materials science. Most MD employed in research is classical
MD, based on Newton’s laws of motion. We know that the atomic world is
inherently quantum mechanical in nature, but it turns out we can get away to a
remarkable extent using classical mechanics as an approximation. Over the last
few years user-friendly software and advances in computing hardware have
brought MD to the masses, so that even non-specialists can now run MD calculations
using brightly colored and accessible graphical user interfaces and desktop
computers. A leader in this development is David E. Shaw, creator of the famed
D E Shaw hedge fund who has made the admirable decision to spend all his time
(and a good deal of his money) developing MD software and hardware for
biochemistry and drug discovery.
Ant’s 2-hour talk was very comprehensive and enjoyable,
covering several diverse topics including a few crucial ones from the
philosophy of science.
It would be too much to describe everything that Ant said
and I do hope OpenEye puts the video up on their website. I think it would be
most convenient to summarize his main points here.
MD is not a useless technique but it’s not held up to the same standards as other techniques, and therefore its true utility is at best unknown: Over the last few years the modeling community has done a lot of brainstorming about the use of appropriate statistical and benchmarking methods to evaluate computational techniques. Statistical tests have thus emerged for many methods, including docking, shape-based screening, protein-based virtual screening and quantum chemical calculations. Such tests are however manifestly lacking for molecular dynamics. As Ant pointed out, almost all statements in support of MD are anecdotal and uncontrolled. There are almost no follow-up studies.
MD can accomplish in days what other techniques can achieve in seconds or hours: No matter how many computational resources you throw at it, the fact remains (and will likely always remain) that MD is a relatively slow technique. Ant pointed out cases where simpler techniques gave the same results as MD but in much lesser time. I think this reveals a more general caveat; that before looking for complicated explanations for any phenomenon in drug discovery or biology (potency, selectivity, differences in assay behavior etc.), one must look for simple ones. For instance is there a simple physicochemical property like molecular weight, logP, number of rotatable bonds or charge that correlates with the observed effect? If there is one, why run a simulation lasting hours or days to get the same result?
A case in point is the recent Nature paper by D. E. Shaw’s group
described by Derek on his blog. Ant brought our attention to the Supporting
Information which says that they got the same result for the ligand pose using
docking which they got using MD, a difference translating to a simulation time
of days vs seconds. In addition they saw a protein pocket expansion in the
dynamics simulation whose validity was tested by synthesizing one compound. That they prospectively
tested the simulation is a good thing, but one compound? Does that prove that
MD is predictive for their system?
MD can look and feel “real” and seductive: This objection really applies to all models which by definition are not real. Sure, they incorporate some elements of reality but they also leave many others out. They simplify, use fudge factors and parameters and often neglect outliers. This is a not a strike against models since they are trying to model some complex reality and they cannot do this without simplification, but it does indicate reasons for being careful when interpreting their results. However I agree that MD is in a special category since it can generate very impressive movies that emerge from simulations run on special purpose machines, supercomputers or GPUs for days or months at a time. Here’s one that looks particularly impressive and denotes a drug molecule successfully “finding” its binding site on a protein.
MD can look and feel “real” and seductive: This objection really applies to all models which by definition are not real. Sure, they incorporate some elements of reality but they also leave many others out. They simplify, use fudge factors and parameters and often neglect outliers. This is a not a strike against models since they are trying to model some complex reality and they cannot do this without simplification, but it does indicate reasons for being careful when interpreting their results. However I agree that MD is in a special category since it can generate very impressive movies that emerge from simulations run on special purpose machines, supercomputers or GPUs for days or months at a time. Here’s one that looks particularly impressive and denotes a drug molecule successfully “finding” its binding site on a protein.
This apparently awesome power of computing
power and graphical software brought to bear on an important problem often
makes MD sound way more important than what it is. The really damning thing
though may be that shimmering protein on your screen. It’s very easy for
non-computational chemists to believe that that is how the proteins in our body
actually move. It’s easy to believe
that you are actually seeing the physics of
protein motion being simulated, down to the level of individual atoms.
But none of this is really true. Like many
other molecular models what you are seeing in front of you is a model, replete
with approximations and error bars. As Ant pointed out, it’s almost impossible
to get real variables like statistical mechanical partition functions, let
alone numbers from experiment, out of such simulations. Another thing that’s
perpetually forgotten is that in the real world, proteins are not isolated but
are tightly clustered together with other proteins, ions, small molecules and a
dense blanket of water. Except perhaps for the water (and poorly understood
water at that), we are ignoring all of this when we are running the simulation.
There are other problems in real systems, like thermal averaging and
non-ergodicity which physicists would appreciate. And of course, let’s not even
get started on the force fields, the engines at the heart of almost every
simulation technique that are consistently shown to be imperfect. No, the
picture that you see in a molecular dynamics movie is a shadow of its “real”
counterpart, even if there is some agreement with experiment. At the very least
this means you should keep your jaw from dropping every time you see such a
movie.
Using jargon, movies and the illusion of reality, MD oversells itself to the public and to journals: Ultimately it’s not possible to discuss the science behind MD without alluding to the sociological factors responsible for its perception. The fact is that top journals like Nature or Science are very impressed when they see a simulation shepherded by a team led by Big Name Scientist being run for days using enough computing power to fly a jetfighter. They are even more impressed when they see movies that apparently mirror the actual motion of proteins. Journals are only human, and they cannot be entirely faulted for buying into seductive images. But the unfortunate consequence of this is that MD gets oversold. Because it seems so real, because simulations that are run for days must undoubtedly be serious stuff because they have been run for days, because their results are published in prestigious journals like Nature, therefore it all must be important stuff. This belief is however misplaced.
Using jargon, movies and the illusion of reality, MD oversells itself to the public and to journals: Ultimately it’s not possible to discuss the science behind MD without alluding to the sociological factors responsible for its perception. The fact is that top journals like Nature or Science are very impressed when they see a simulation shepherded by a team led by Big Name Scientist being run for days using enough computing power to fly a jetfighter. They are even more impressed when they see movies that apparently mirror the actual motion of proteins. Journals are only human, and they cannot be entirely faulted for buying into seductive images. But the unfortunate consequence of this is that MD gets oversold. Because it seems so real, because simulations that are run for days must undoubtedly be serious stuff because they have been run for days, because their results are published in prestigious journals like Nature, therefore it all must be important stuff. This belief is however misplaced.
What’s the take home message here? What was strange in one sense
was that although I agreed with almost everything that Ant said, it would not
really affect the way I personally use MD in my day-to-day to work, and I
suspect this is going to be the case for most sane modelers. For me MD is a
tool, just like any other. When it works I use its results, when it doesn’t I
move on and use another tool. In addition there are really no other ways to capture protein and ligand motion. I think Ant’s talk is best directed at the high
priests of MD and their followers, people who either hype MD or think that it
is somehow orders of magnitude better than other modeling techniques. I agree
that we should all band together against the exhortations of MD zealots.
I am however in the camp of modelers who have always used MD
as an idea generator, a qualitative tool that goads me into constructing
hypothesis and making suggestions to experimentalists. After all the goal of
the trade I am involved in is not just ideas but products. I do care about
scientific rigor and completeness as much as the other person, but the truth is
that you won’t get too far in the business I am involved in if you constantly
keep worrying about scientific rigor rather than the utility – even if it’s
occasional – of the tools we are using. And this applies to theoretical as well
as experimental tools; when was the last time my synthetic chemistry friends used
a time-tested reaction on a complex natural product and got the answer they
expected? If we think MD is anecdotal, we should also admit that most other
drug design strategies are anecdotal too. In fact we shouldn’t expect it to be
otherwise. In a field where the validity of ideas is always being tested
against a notoriously complex biological system whose workings we don’t
understand and where the real goal is to get a useful product, even occasional
successes are treasured and imperfect methods are constantly embraced.
Nonetheless, in good conscience my heart is in Ant’s camp
even if my head protests a bit. The sound practice of science demands that
every method be duplicated, extensively validated, compared with other methods,
benchmarked and quantified to the best of our abilities if we want to make it
part of our standard tool kit. This has manifestly not happened with MD. It’s the
only way that we can make such methods predictive. In fact it’s part of a paradigm
which as Ant pointed out goes back to the time of Galileo. If a method is not
consistently predictive it does not mean it is useless, but it does mean that
there is much in it that needs to be refined. Just because it can work even when
it’s not quantitative does not mean trying to make it quantitative won’t help.
As Ant concluded, this can happen when the community comes together to compare
and duplicate results from their simulations, when it devotes resources to
performing the kind of simple benchmarking experiments that would help make sense
of complicated results, when theorists and experimentalists both work together
to achieve the kinds of basic goals that have made science such a successful
enterprise for five hundred years.
Several of these arguments are actually directed at the "field / practitioners" and not the technique. I agree that molecular dynamics studies should be held to higher statistical standards, but with growing computational resources and limited grad student manpower, molecular dynamics studies will invariably improve in statistics.
ReplyDeleteWith respect to the technique, it's misguided to base a critique on molecular dynamics on the application to "binding studies" alone. There are many examples of molecular dynamics applications for which no simpler computational technique can be formulated and for which wet-lab experiments can provide little information.
I feel like many points of critics for MD are more related to limitations of modern force fields than to the actual concept of MD.
ReplyDeleteI want so second this. I am rather convinced that if we could run MD with forces and energies derived from coupled-cluster QM calculations in a reasonable time (which IMHO is likely to happen in a few decades given the strides made in developing both better algorithms and faster computer hardware) we will have to worry much more about the validity of the experiments than those of the compuatations.
DeleteSince the author has taken the ethically dubious route of omitting any statement about potential conflict of interest, I'll add it here: OpenEye is a for-profit company that markets a line of computer products as alternatives to extensive molecular simulation. The financial success of OpenEye relies on researchers buying into their vision and then buying their software.
ReplyDeleteMany of the criticisms in the article above are warranted, but the data is cherry picked and there is no attempt at statistical rigour while criticizing a field for lack of statistical rigour. To use the original authors words, this article is anecdotal and uncontrolled.
Except that it does not; many of OpenEye's products do things quite different from what MD does. FYI I have no financial ties to OpenEye, nor do I even use their software at this time. If you want to engage in a productive discussion it would be a much better idea to provide scientific counterexamples than to try to kill the messenger. If you have examples of MD being used predictively I would be genuinely interested in knowing about them.
DeleteI'm not sure what you mean by "except that it does not" so I can not reply to that. On your other comment, here is one example of MD being used predictively from my own work. Detergents are often used to solubilize membrane proteins for structural determination because detergents can reduce membrane-protein aggregation. My simulations of membrane proteins and detergent free in aqueous solution suggest that high concentrations of detergents can actually lead to aggregation by a completely different mechanism (suggesting that there is a "sweet-spot" of detergent concentration). The MD simulations were done first. We were confounded, so we asked some experimentalists to check it out. The experiments are in line with the simulations (although of course the experiments can not provide the same level of detail, so it is not a mechanistic proof). http://www.sciencedirect.com/science/article/pii/S0009308413000418
DeleteYet another shallow marketing exercise from Nicholls. As someone said above, there is a clear conflict of interest and this "lecture" really sounds like a buy-my-products drill.
ReplyDeleteWhoever is talking about the limitations of MD without mentioning the key problem of sampling is not an expert, just a commentator. It is hard to understand how so many "scientists" are buying this. No doubt there are a lots of problems with MD, but one needs a deep understanding to solve them. You won't find this here.
There was not a single mention of any OpenEye product in the talk so I am not sure where you're coming from. The talk was very general, of the kind you would find in academia but very much applicable to drug discovery. Oh, and we did talk about sampling, no way you can avoid that.
DeleteFrom your blog entry: "Ant pointed out cases where simpler techniques gave the same results as MD but in much lesser time". Which simpler techniques do you think he is referring too even if he did not name them? Are you really this naive?
DeleteMD vs PB. He was referring to energies here. Also, docking vs MD for cases like the Shaw study.
DeleteMD is made for exploring the conformational space of large biological systems. PB is a good speed-reliability compromise for computing energies, best used as a post-docking optimization tool. Makes no sense comparing those methods as if they had the same purpose.
DeleteIf a paper used MD when PB was enough, it was a misuse of MD.
Shaw uses brute-force MD, so what? Anton is made for that, this machine cannot run OpenEye software as far as I know. At least we get an idea of how costly it can be to use MD instead of docking on regular computers...
If I find some publication that have tried hopelessly to use OpenEye's PB-based minimizer for refining crude homology models, would this eventually suggest that MD vs PB => MD >>> PB? It is easy to do cherry picking like this.
Ant having taken his time to read Shaw studies SI, did he notice the forcefield tinkering in the folding-unfolding paper?
My gripe with MD is when it is used in a manner that could be described as either 'qualitative' or 'graphical'. For example, somebody runs a simulation of something really huge and then shows a movie of an animation. It could be really great or it could be complete horse shit but how can I tell? Similarly, when an MD simulation is used to 'confirm' a docked pose.
ReplyDeleteSampling was mentioned by the previous commentator and this is an important consideration if attempting to do free energy calculations (which generate quantitative and testable predictions). A key question here is whether MD samples more or less effectively than Monte Carlo (may be system and even force field dependent). One can debate as to whether free energy calculations with protein flexibility are worth their computational expense (I keep an open mind here) but you are going to need effective sampling in order to apply the statistical mechanics. Force fields will always be an issue and another of my gripes is the use of atom centered charges (but don't let that distract us from this discussion).
I also have an issue with people talking about dynamic effects on affinity. Free energies of binding are ensemble averages and you should get the same answer whether you sample with MD or MC.
Yes, I agree, especially with your first point. That is precisely why we need some kind of metric(s) that would allow us to get a handle on the validity and usefulness of an MD simulation. As of now the only kind of detail required of MD in the methods section is what's stated in the manual. We need some kind of standardized validation requirements too.
DeleteTo me, the main problem is that there is too many people making free energy calculations without having a minimum understanding of the underlying mathematical complexity of sampling.
DeleteTo say that "Free energies of binding are ensemble averages and you should get the same answer whether you sample with MD or MC" is simply wrong. The particular MCMC that one uses to sample a problem with a thousand degrees of freedom is easily the most important choice one has to make.
Regarding standards, prospective validations of free energy calculations are trivial. You don't see them published because they don't work in practice. Give it another 40 years (optimistic mode off).
Well, I am an academic molecular modeler and I am using MD quite often, almost not exclusively, and I try my best not to use MD when simple and more effective techniques are available. When I am asked to perform some MD I always ask for solid experimental knowledge first. I prefer to be guided by experimental facts when I start the modeling part, and I also want data to validate my results, am I annoying. I constantly remind colleagues full of good intents of the limitations and approximations of computational techniques. I am not simply aware of those, I want the people I work with to understand them too, at least a little bit. As a result of living by the golden rule of being a good modeler according to Derek Rowe, I am just causing myself a lot of trouble.
ReplyDeleteEverybody around me just seem to assume that
(1) MD is the golden standard in molecular modeling
(2) "Blind" MD provides interesting structural data provided it is run long enough. Nothing yet? Just run the simulation longer, "something will happen" - the sad thing about this is that it does sometimes... persuading everybody that it should, all the time, without fail
(3) Validated experimental data, even if perfectly good as it is, provided it is vaguely structurally-related, will just get "better" with the help of some MD. At best, I am asked to make complicated modeling jobs just to find out what has been proven already. At worst, I am suggested to perform a biased, useless simulation by injecting the results into the input... (who says MD is not predictive enough?)
(4) The above should be enough for me to publish "pure" molecular modeling papers in decent journals...
The biggest progress in the molecular modeling field in the recent years as percieved by most people I work with is -from what I have experienced- the implementation of ambient occlusion rendering in VMD... The pictures I can provide look so much nicer since then...
By being so stubbornly critical of my own area of expertise I am just eventually persuading others that I am being lazy. Should promise wonders with molecular modeling, and expecially molecular dynamics, this would make my life easier. All the nonsense like "this movie made from the MD trajectory shows the actual motion of the system", "ok, I will compute the free energy of binding of your very original molecule to this target, this is really easy", "sure, you can make a reliable protein model yourself, just paste the sequence to a Web server, it works without fail so this is good" is just what most people want to hear. I really do not like that.
I am not sure Antony Nichols is as good for voicing a healty criticism for MD as Gerard Kleywegt had been for crystallography, even if Ant seems to coin perfectly the biggest issue we are currently facing (forcefields: their lack of sound validation, their obscure parameterization protocols...). As with many in silico techniques MD is consistently useful (and sometimes, spectacularly so) provided simulations are prepared and analyzed with the help validated experimental facts. "MD zealots" are overselling it? Sure. Please just notice that those zealots are often not the MD specialists themselves, most of whom I know would agree with most of what Ant says.
Scientists concerned with the quality of drug design research in academia should better take care of the "omics zealots" urgently. Now I can live with all the misconceptions related to molecular modeling and MD around me, but I do not want to be told one day "just collect all biochemical data from every possible source, throw all the numbers in a big database, then code a computer algorithm will find out the needle in the haystack".
Thanks for your comment (which I unfortunately saw much later). I agree with most of your points regarding MD. I think one of Ant's most important points was that we need a good survey of MD augmented with extensive case studies, precisely so that we don't mistake what works sometime for what works all the time.
DeleteAnother late reply...
DeleteFrom my experience, the basic requirements for *anything* to work with MD are, from the most important to the least:
- good forcefield parameters (and I consider the only good "ready to use" parameters out there are those for peptides and nucleic acids)
- skill and patience (especially when forcefield parameterization/validation is required)
- enough CPU time
If the situation is clear, what works (almost) all the time with MD, in my opinion:
- better understanding of *dynamical* processes such as conformational changes occurring upon binding something
- protein conformational sampling (e.g. for preparing an ensemble of representative receptor conformations prior to docking) - this is the only case where I may introduce some kind of bias in the simulations for speeding things up
- validation of protein-ligand X-ray models: some of the interactions (especially those mediated by water) can be artefacts stabilized by the X-ray process, this happens more often than most people think it does, and a short (2 ns) explicit-water MD simulation is the best filter for this potential issue. The improvement in later SAR analyses can be so impressive that I do short MD simulations every time I get my hands on a protein-ligand X-ray structure. At best I identify artefacts and/or interactions that may be strengthened in a lead optimization process, with better accuracy than any docking program, and end up with a *better model* than the X-ray one (note: almost no one apart from experienced modelers and crystallographers would accept that - the majority assume that a X-ray structure is not a model but "reality"). At worst the system appears unstable, but still this is an useful reminder that as modeler I missed something (bad FF parameters and/or lazy ligand fit by the crystallographer before me coming first).
- protein model refinement after homology modeling (if it does not work, the modeling was bad, not the MD)
- post-docking calculations: just like any X-ray-derived model a docking-derived model may be challenged/improved by a short MD simulation, but in a virtual screening context this is rarely done because it takes too much time, and the process has to be automated somewhat (requiring dangerous compromises with FF parameterization...)
And this is all.
My second (much bigger) category: "will never work but may sometimes look like it did (by chance, and if the modeler agrees not to disclose all the devilish details about what was done)": everything that happens when MD is done for wrong reasons or started with too scarce experimental data to validate against (just like any other in silico technique used for drug design).
I personnally consider everything that does not belong to the above cases as "may work sometimes and for the good reasons" and/or "not used too much to that, need some guidance" (protein folding and FEP simulations belonging to both, since I am optimistic about my field).
Your criticisms (1) - (4) made me cringe... they are too true!
ReplyDeleteHere are some neat sampling techniques called replica exchange and transition path sampling.
http://www.ncbi.nlm.nih.gov/pubmed/16957325
http://en.wikipedia.org/wiki/Transition_path_sampling
How ARE the forcefield parameters adjusted and fitted for forcefields... let's say the Charmm forcefield just to be concrete? I suspect that machine learning has a lot to offer here.
--Geoff
Great article, thanks for the debate.
ReplyDeleteI don't like this kind of criticism. It is as fundamentalist as its opposite. I am a experimentalist and a modeler on protein conformation, both with the same rigor (I hope).
I found many weak points in the Ant's critique, but I'd like to focus on the following point. Scientist in general make dubious or weak assumptions on their results, even experimentalist. After every CD or fluorescence spectrum we read that "the protein" behaves this or that: "the protein": just one, no ensemble, no averaging, all the molecules the same... The native structure of proteins is usually assumed to be a rock the shape you see from the crystal structure, disregarding packing artifacts or the fact that the structure is an averaging (some people have found alternative conformation from the data discarded by crystallographers), even disregarding dynamical evidence from NMR. How many dimers emerge irresponsible from crystal structures, without further tests? How many more emerge from throughput screening "in vitro" tests in uncontrolled conditions, disregarding the effects of fusion-proteins, tags, cysteine oxidation, aggregation an so on? Almost everybody fit their unfolding curves to two- or three-state models assuming that this is the truth without further evidence, even without considering that other models could also fit (e.g: downhill). We read a lot of concluding information based on Western blots and anybody that did a western in his life knows how much the result depends on the staining (every year more and more sensitive), how uncertain remains the concentration of the protein detected, how unreliable are commercial antibodies... Most of the best in vitro experiments are also carried out in (almost pure) water; crowding effect is modeled with polymers as dextran or PEG (albumin if you want to be more realistic).
Today, science-journals (we) are eager of concluding results and demand every paper to be a final proof. In this scenario, authors adopt a car-seller over-confident attitude, and papers are full of dubious and weak conclusions based either on experimental or simulation techniques. Maybe, to solve the doubts about MD universal application, we should allow the modelers to publish their results without any expectation of truth, and wait to experimentalist to prove or discard their assumptions.
In brief. We are making assumptions and models every day, even with experimental results. Nobody saw a protein folding or finding its ligand. Everything is assumed from indirect facts. In this context, maybe MD could be a more indirect tool, but I can tell that it helped me to understand and predict more than once.