Molecular dynamics: I have a bad feeling about this.

Computer models of chemical and biological systems are not reality; rather they are what I call “invitations to reality”. They provide guidance to experimentalists to try out certain experiments, test certain techniques. They are suggestive, not factual. However as any good modeler and chagrined experimentalist knows, it’s not hard to mistake models for reality, especially when they look seductive and are replete with bells and whistles.

This was one of the many excellent points that Anthony Nicholls made in his lunch critique of molecular dynamics yesterday at the offices of OpenEye scientific software in Cambridge, MA. In his talks and papers Anthony has offered not just sound technical criticism but also a rare philosophical and historical perspective. He has also emerged as one of the sharpest critics of molecular dynamics in the last few years, so we were all eager to hear what it exactly is about the method that rubs him the wrong way. Many of his friends and colleagues call him ‘Ant’, so that’s what I will do here.

Here’s some background for a general audience: Molecular dynamics (MD) is a computational technique that is used to simulate the motion of atoms and molecules. It is used extensively in all kinds of fields, from biochemistry to materials science. Most MD employed in research is classical MD, based on Newton’s laws of motion. We know that the atomic world is inherently quantum mechanical in nature, but it turns out we can get away to a remarkable extent using classical mechanics as an approximation. Over the last few years user-friendly software and advances in computing hardware have brought MD to the masses, so that even non-specialists can now run MD calculations using brightly colored and accessible graphical user interfaces and desktop computers. A leader in this development is David E. Shaw, creator of the famed D E Shaw hedge fund who has made the admirable decision to spend all his time (and a good deal of his money) developing MD software and hardware for biochemistry and drug discovery.

Ant’s 2-hour talk was very comprehensive and enjoyable, covering several diverse topics including a few crucial ones from the philosophy of science.

It would be too much to describe everything that Ant said and I do hope OpenEye puts the video up on their website. I think it would be most convenient to summarize his main points here.

MD is not a useless technique but it’s not held up to the same standards as other techniques, and therefore its true utility is at best unknown: Over the last few years the modeling community has done a lot of brainstorming about the use of appropriate statistical and benchmarking methods to evaluate computational techniques. Statistical tests have thus emerged for many methods, including docking, shape-based screening, protein-based virtual screening and quantum chemical calculations. Such tests are however manifestly lacking for molecular dynamics. As Ant pointed out, almost all statements in support of MD are anecdotal and uncontrolled. There are almost no follow-up studies.

MD can accomplish in days what other techniques can achieve in seconds or hours: No matter how many computational resources you throw at it, the fact remains (and will likely always remain) that MD is a relatively slow technique. Ant pointed out cases where simpler techniques gave the same results as MD but in much lesser time. I think this reveals a more general caveat; that before looking for complicated explanations for any phenomenon in drug discovery or biology (potency, selectivity, differences in assay behavior etc.), one must look for simple ones. For instance is there a simple physicochemical property like molecular weight, logP, number of rotatable bonds or charge that correlates with the observed effect? If there is one, why run a simulation lasting hours or days to get the same result?

A case in point is the recent Nature paper by D. E. Shaw’s group described by Derek on his blog. Ant brought our attention to the Supporting Information which says that they got the same result for the ligand pose using docking which they got using MD, a difference translating to a simulation time of days vs seconds. In addition they saw a protein pocket expansion in the dynamics simulation whose validity was tested by synthesizing one compound. That they prospectively tested the simulation is a good thing, but one compound? Does that prove that MD is predictive for their system?

MD can look and feel “real” and seductive: This objection really applies to all models which by definition are not real. Sure, they incorporate some elements of reality but they also leave many others out. They simplify, use fudge factors and parameters and often neglect outliers. This is a not a strike against models since they are trying to model some complex reality and they cannot do this without simplification, but it does indicate reasons for being careful when interpreting their results. However I agree that MD is in a special category since it can generate very impressive movies that emerge from simulations run on special purpose machines, supercomputers or GPUs for days or months at a time. Here’s one that looks particularly impressive and denotes a drug molecule successfully “finding” its binding site on a protein.

This apparently awesome power of computing power and graphical software brought to bear on an important problem often makes MD sound way more important than what it is. The really damning thing though may be that shimmering protein on your screen. It’s very easy for non-computational chemists to believe that that is how the proteins in our body actually move. It’s easy to believe that you are actually seeing the physics of protein motion being simulated, down to the level of individual atoms.

But none of this is really true. Like many other molecular models what you are seeing in front of you is a model, replete with approximations and error bars. As Ant pointed out, it’s almost impossible to get real variables like statistical mechanical partition functions, let alone numbers from experiment, out of such simulations. Another thing that’s perpetually forgotten is that in the real world, proteins are not isolated but are tightly clustered together with other proteins, ions, small molecules and a dense blanket of water. Except perhaps for the water (and poorly understood water at that), we are ignoring all of this when we are running the simulation. There are other problems in real systems, like thermal averaging and non-ergodicity which physicists would appreciate. And of course, let’s not even get started on the force fields, the engines at the heart of almost every simulation technique that are consistently shown to be imperfect. No, the picture that you see in a molecular dynamics movie is a shadow of its “real” counterpart, even if there is some agreement with experiment. At the very least this means you should keep your jaw from dropping every time you see such a movie.

Using jargon, movies and the illusion of reality, MD oversells itself to the public and to journals: Ultimately it’s not possible to discuss the science behind MD without alluding to the sociological factors responsible for its perception. The fact is that top journals like Nature or Science are very impressed when they see a simulation shepherded by a team led by Big Name Scientist being run for days using enough computing power to fly a jetfighter. They are even more impressed when they see movies that apparently mirror the actual motion of proteins. Journals are only human, and they cannot be entirely faulted for buying into seductive images. But the unfortunate consequence of this is that MD gets oversold. Because it seems so real, because simulations that are run for days must undoubtedly be serious stuff because they have been run for days, because their results are published in prestigious journals like Nature, therefore it all must be important stuff. This belief is however misplaced.

What’s the take home message here? What was strange in one sense was that although I agreed with almost everything that Ant said, it would not really affect the way I personally use MD in my day-to-day to work, and I suspect this is going to be the case for most sane modelers. For me MD is a tool, just like any other. When it works I use its results, when it doesn’t I move on and use another tool. In addition there are really no other ways to capture protein and ligand motion. I think Ant’s talk is best directed at the high priests of MD and their followers, people who either hype MD or think that it is somehow orders of magnitude better than other modeling techniques. I agree that we should all band together against the exhortations of MD zealots.

I am however in the camp of modelers who have always used MD as an idea generator, a qualitative tool that goads me into constructing hypothesis and making suggestions to experimentalists. After all the goal of the trade I am involved in is not just ideas but products. I do care about scientific rigor and completeness as much as the other person, but the truth is that you won’t get too far in the business I am involved in if you constantly keep worrying about scientific rigor rather than the utility – even if it’s occasional – of the tools we are using. And this applies to theoretical as well as experimental tools; when was the last time my synthetic chemistry friends used a time-tested reaction on a complex natural product and got the answer they expected? If we think MD is anecdotal, we should also admit that most other drug design strategies are anecdotal too. In fact we shouldn’t expect it to be otherwise. In a field where the validity of ideas is always being tested against a notoriously complex biological system whose workings we don’t understand and where the real goal is to get a useful product, even occasional successes are treasured and imperfect methods are constantly embraced.

Nonetheless, in good conscience my heart is in Ant’s camp even if my head protests a bit. The sound practice of science demands that every method be duplicated, extensively validated, compared with other methods, benchmarked and quantified to the best of our abilities if we want to make it part of our standard tool kit. This has manifestly not happened with MD. It’s the only way that we can make such methods predictive. In fact it’s part of a paradigm which as Ant pointed out goes back to the time of Galileo. If a method is not consistently predictive it does not mean it is useless, but it does mean that there is much in it that needs to be refined. Just because it can work even when it’s not quantitative does not mean trying to make it quantitative won’t help. As Ant concluded, this can happen when the community comes together to compare and duplicate results from their simulations, when it devotes resources to performing the kind of simple benchmarking experiments that would help make sense of complicated results, when theorists and experimentalists both work together to achieve the kinds of basic goals that have made science such a successful enterprise for five hundred years.

Molecular Dynamics: Manna from heaven or spawn of Satan?

This is the question that Anthony Nicholls of OpenEye Scientific Software will try to answer tomorrow at the OpenEye offices in Cambridge, MA. Well, ok, not exactly this question but a more nuanced version thereof. 

As those in the field would probably know, Anthony who is one of the leaders in the field of industrial computational chemistry has had a history of offering pointed, articulate and informed criticism on what is rapidly becoming an important tool in the drug industry. In the last few years MD has captured the imagination of many, especially through the efforts of researchers like David Shaw and Vijay Pande who have enabled simulations to approach realistic time scales approximating large-scale conformational changes in proteins and protein-ligand binding. Nonetheless it remains a technique that often sparks a range of responses among its practitioners and critics, which to me makes it even more interesting because it's no fun when everyone agrees or disagrees, right?

I am not an expert when it comes to MD (that's precisely why I want to hear from the experts) but I am instead like the vast majority of scientists who use the technique, find it useful to varying degrees and are intrigued by what the fundamental issues in the field exactly are. What makes this issue even more interesting for me is that it seems to tread into some of the more relevant questions from the philosophy of science, including evergreen gems like "What is utility?", "What do you mean when you say a technique 'works', and is this definition the same for different techniques?", "What is more important, prediction or understanding?" and the ultimate zinger, "What is science, exactly?". I am particularly interested in the question of how exactly you validate a 'correct' prediction for a complex system like a protein-drug interaction where there can be considerable uncertainty. I am sure Anthony will have more to say about this since he has made extremely valuable contributions to pointing out the key role of statistics in molecular modeling.

In any case, I have no doubt that the talk will be characteristically stimulating and provocative. If you want to attend you should RSVP to Scott Parker at OpenEye. Derek also mentioned this on his blog. And of course, I will be there and will have a summary here soon, so watch this space.

Update: My report on the talk is here.

A discussion on Big Science, Small Science and the future of all science

Tomorrow I have the privilege of joining a panel discussion on Big Science with three very distinguished scientists: Nobel Laureate Steven Weinberg, MIT astrophysics professor Sara Seager and Perimeter Institute cosmologist Neil Turok. The conversation will mostly focus on the problems facing Big Science in a bad economy and how science can retool itself in the new millennium.

The program will be broadcast on Canada's TV Ontario, more specifically on their "The Agenda with Steve Paikin" show at 8 and 11 PM EST. It will be preceded by an interview with star astronomer Chris Hadfield who entertained and informed all of us through his YouTube videos from the International Space Station.

If you are in Canada and have access you might want to check it out since I am sure the conversation will be stimulating. I will have a summary of the discussion here soon, hopefully along with a video or a podcast.

Arsenic DNA, chemistry and the problem of differing standards of proof in cross-disciplinary science

Arsenic-based linkages in DNA would be unstable and would quickly break, a fact suspected by chemists for years (Image: Johannes Wilbertz)
When the purported discovery of the now infamous “arsenic DNA” bacteria was published, a friend of mine who was studying astrobiology could not stop praising it as an exciting scientific advance. When I expressed reservations about the discovery mainly based on my understanding of the instability of biomolecules containing arsenic, she gushed, “But of course you will be skeptical; you are an organic chemist!"

She was right. As chemists me and many of my colleagues could not help but zero in on what we thought was the most questionable aspect of the whole discovery; the fact that somehow, contrary to everything we understood about basic chemistry, the “arsenic DNA” inside the bacteria was stably chugging along, replicating and performing its regular functions.

It turned out that the chemists were right. Measurements on arsenic DNA analogs made by researchers several months later found that the arsenic analogs differed in stability from their phosphate versions by a mind-boggling factor of 1017. Curiously, physicists, astronomers, geologists and even biologists were far more accommodating about the validity of the discovery. For some reason the standards used by these scientists were different from those used by chemists, and in the end the chemists’ standard turned out to be the “correct” one. This is not a triumph of chemists and a blemish on other sciences since there could well be cases where other sciences might have used the correct standards in nailing down the truth or falsehood of an unprecedented scientific finding.

The arsenic DNA fiasco thus illustrates a very interesting aspect of modern cross-disciplinary science – the need to reconcile what can be differing standards of evidence or proof between different sciences. This aspect is the focus of a short but thought-provoking piece by Steven Benner, William Bains and Sara Seager in the journal Astrobiology.

The article explains why it was that standards of proof that were acceptable to different degrees to geologists, physicists and biologists were unacceptable to chemists. The answer pertains to what we call “background knowledge”. In this case, chemists were compelled to ask how DNA with arsenic replacing phosphorus in its backbone could possibly be stable given everything they knew about the instability of arsenate esters. The latter had been studied for several decades, and while arsenic DNA itself had not been synthesized before, simpler arsenate esters were known to be highly unstable in water. The chemists were quite confident in extrapolating from these simple cases to questioning the stable existence of arsenic DNA; if arsenic DNA indeed were so stable, then almost everything they had known about arsenate esters for fifty years would have been wrong, a possibility that was highly unlikely. Thus for chemists, arsenic DNA was an extraordinary claim. And as Carl Sagan said, they needed to see extraordinary evidence before they could believe it, evidence that was ultimately not forthcoming.

For geologists however, it was much easier to buy into the claims. That is because as the article points out, there are several cases where elements in minerals are readily interchanged for other elements in the same column of the periodic table. Arsenic in particular is known to replace phosphorus in rocks bearing arsenate and phosphate minerals. Unlike chemists, geologists found the claim of arsenic replacing phosphorus quite consistent with their experiences. Physicists too bought readily into the idea. As the authors say, physicists are generally tuned to distinguishing two hypotheses from one another; in this case the hypothesis that DNA contains arsenic versus the hypothesis that it does not. The physicists thus found the many tests apparently indicating the presence of arsenate in the DNA to provide support for one hypothesis over another. Physicists did not appreciate that the key question to ask would be regarding the stability of arsenic DNA.

Like chemists biologists were also skeptical. Biologists usually check the validity of a claim for a new form of life by comparing it to existing forms. In this case, when the genetic sequence and lineage of the bacteria were inspected they were found to be very similar to garden variety, phosphate-containing bacteria. The biologists’ background knowledge thus compelled them to ask how it could possibly be that a bacterium that was otherwise similar to other existing bacterium could suddenly survive on arsenic instead of phosphorus.

In the end of course, none of the duplicated studies found the presence of arsenic in the GFAJ-1 bacteria. But this was probably the least surprising to chemists. The GFAJ-1 case thus shows that different sciences can have different standards for what they regard as “evidence”. What may be suitable for one field may be controversial or unacceptable for others. This fact helps answer at least one question for the GFAJ-1 paper: Why was it accepted in a prestigious journal like Science? The answer almost certainly concerns the shuttling of the manuscript to planetary scientists rather than chemists or biologists as reviewers. These scientists had different standards of evidence, and they enthusiastically recommended publication. One of the key lessons here is that any paper on cross-disciplinary topics must be sent to at least one specialist from each discipline comprising the field. Highly interdisciplinary fields like astrobiology, drug discovery, and social psychology are prime candidates for this kind of a policy.

Discipline-dependent standards of proof not only explain how occasionally bad science gets published or how promising results get rejected but it also goes into the deeper issue of what in fact constitutes “proof” in science. This question reminds me of the periodic debates about whether psychology or economics is a science. The fact is that many times the standard of proof in psychology or economics might be unacceptable to a physicist or statistician. As a simple example, it is often impossible to get correlations of better than 0.6 in a psychological experiment. And yet such standards can be accepted as proof in the psychological community, partly because an experiment on human beings is too complex to get more accurate numbers; after all, most human beings are not inclined planes or balls dropped from a tower. In addition one may not always need accurate correlations for discerning valuable trends and patterns. Statistical significance may not always be related to real world significance (researchers running clinical trials would be especially aware of this fact).

The article by Benner, Bains and Seager concludes by asking how conflicting standards of proof can be reconciled in highly cross-disciplinary sciences, and this is a question which is going to be increasingly important in an age of inherently cross-disciplinary research.

I think the GFAJ-1 fiasco itself provides one answer. In that case the most “obvious” objection was raised by chemists based on years of experience. In addition it was a “strong” objection in the sense that it really raised the stakes for their discipline; as noted before, if arsenic DNA exists then much of what chemists know about elementary chemical reactivity might have to be revised. In that sense it was really the kind of falsifiable, make-or-break test advocated by Karl Popper. So one cogent strategy might be to first consider these strong, obvious objections, no matter what discipline they may arise from. If a finding passes the test of these strong objections, then it could be subjected to less obvious and more relaxing criteria provided by other disciplines. If it passes every single criterion across the board then we might actually be able to claim a novel discovery, of the kind that rarely comes along and advances the entire field.

First published on the Scientific American Blog Network.

Computational chemistry wins 2013 Nobel Prize in Chemistry

An equation for a force field, a computational chemistry model that uses simple terms and parameters from experiment to calculate the energy of a molecule (Image: Amit Kessel)
It's always very nice to wake up and see your own professional working field win a Nobel Prize. I am very happy to note that this year's prize in chemistry has been awarded to Martin Karplus, Michael Levitt and Arieh Warshel for their development of "multiscale methods for complex systems". More simply put, these three chemists have been recognized for their development and application of methods to simulate the behavior of molecules at various scales, from single molecules to proteins. The work sheds light on phenomena as diverse as protein folding, catalysis, electron transfer and drug design. It enables chemists like me to calculate a variety of things, from the rates of chemical reactions and the stability of molecules to the probability that a drug will block a crucial protein implicated in a disease.

I will have a more detailed post on the prize later but for now it's worth noting that more than many other Nobel Prizes, this one recognizes afield rather than a particular individual. It really tells you how pervasive modeling and calculation have become in solving all kinds of chemical problems. In addition for all three chemists it's really a lifetime achievement award rather than one for a specific discovery.

Computers have been applied in chemistry since the 1960s. They were a direct outgrowth of theoretical techniques that calculated all kinds of molecular properties, from a molecule's stability and movement to its reactions with other molecules. For the longest time these calculations could be done only for simple systems, and it was only in the 90s or so that computing power and algorithms began catching up with theory to enable the application of calculations to large, practically relevant molecules like proteins, drugs and materials. It was Karplus, Warshel and Levitt among others who made this possible.

This year's laureates have developed and applied every kind of theoretical technique, from strictly quantum mechanical calculations to highly parametrized classical and semi-classical ones based on empirical data, to simulating a vast number of diverse molecules. They have also developed software that brought these computations to the masses. The quantum mechanical methods are often called 'ab initio' - from first principles - and were already recognized with a prize in 1998, but this prize honors something much broader. Techniques developed by the trio include molecular dynamics (MD) which tries to simulate the real life movement of complex entities like proteins, along with electrostatic calculations which try to calculate the attraction and repulsion between charged atoms and molecules. 

Martin Karplus is sort of like the godfather of the field- he was Linus Pauling's last graduate student - and there's not an area of molecular simulation which he has not touched. Undergraduate chemistry students would know his name from the so-called Karplus equation that allows you to relate magnetic resonance properties of molecules to their geometry; now they will find out that he is more than a textbook relic. Warshel and Levitt have made key contributions at the boundary between quantum mechanics and classical mechanics. All three of the recipients have mainly become well-known for simulating the behavior of small organic molecules like drugs and proteins, but their techniques have also been equally applicable to materials like zeolites and solar cells.

As with many Nobel Prizes there are a few other individuals who have contributed enormously to the field who were inevitably left out. Although little known outside the field, my personal list would include Norman Allinger, Andrew McCammon, Ken Houk, Roberto Car, Bill Goddard and Michele Parrinello. Foremost in my mind is University of George chemist Norman Allinger. Allinger was the first person to widely develop and apply the force field approach that underlies much of this year's work. A force field is basically a set of simple equations for describing the bonds, angles, torsional (rotatable) angles and long-range non-bonded interactions in a molecule (see illustration above). It is a simplified model based on classical mechanics which regards atoms and bonds as balls and springs. Because it is a simplified model it needs to be shored up with parameters from experiment or from rigorous quantum mechanical calculations. Allinger developed two very widely used versions of a force field initially called MM2 (the method itself is called 'molecular mechanics, MM'), spent years carefully parametrizing and benchmarking them and then applied them to a variety of molecules.

Karplus, Warshel and Levitt applied these ideas but also developed their own, and then went much beyond the initial work by creating hybrid methods that combined classical with quantum mechanics ("QM/MM"). This is a major part of the prize announcement; it took a lot of efforts to refine and troubleshoot such methods and make them accessible to non-specialist chemists. You could use the quantum mechanical technique for describing the core part of a molecular system and then use the classical part for simulating the rest of it. Among other things this would enormously save time, since doing a quantum mechanical calculation on the entire system would be prohibitively expensive in terms of time. The three chemists have also developed purely classical methods to simulate molecular motion. These days most of us who do the calculations take this work for granted, so thoroughly ingrained it is in the language and tools of computational chemistry.

This year's Nobel Prize really recognizes many other things connected with molecular modeling. First and foremost it is a prize for the field rather for individuals, a signal from the Nobel Committee that computational methods have come of age. You would be hard-pressed these days to find papers that don't include at least some computational component, from the simple visualization of a molecule to very rigorous high-level quantum mechanical calculations. The prize is also a testament to the amazing growth of computer hardware and software in the last two decades without which such calculations could never have become practical; today my desktop can do in a day a calculation that would have taken a few days on a supercomputer in the 90s. The award also recognizes how scientists can understand matter at many different levels, from single molecules to complex assemblies of interacting molecules which may demonstrate emergent behavior.

But perhaps most importantly, the prize recognizes the key role of models in enabling the growth of chemistry, and other disciplines for that matter; computational chemistry models in fact share general principles with models in climate science, ecology and economics. I am not so optimistic as to believe that computers can possibly supplant any human scientist even in the distant future, but the prize tells us that a carefully constructed model that recognizes its strengths and limitations can stand shoulder to shoulder with a competent chemist in attacking a thorny problem. Together experiment and modeling can make a difference.

Update: A (mostly) healthy debate over the value of molecular dynamics (MD) simulations has broken out in the comments section of In the Pipeline. I will have much more to say about this later but for now it's worth noting that there are two separate issues here; the validity of MD as a science and the validity of a Nobel Prize for computational chemistry. The latter is often a field-wide recognition that is given for utility rather than fundamental scientific insights.

Nobel Prize for membrane vesicle trafficking

Randy Sheckman, James Rothman and Thomas Südhof have been awarded a well-deserved and long-predicted Nobel Prize for their work on membrane vesicle trafficking. I had listed this prize in 2011 but for some reason it escaped my attention this year. It's clearly important, and the kind of basic biological discovery that should have been rewarded a long time back.

The committee seems to have split the prize rather neatly between the three recipients. Essentially Schekman found the genes responsible for vesicle formation and deployment; defective copies of these genes led vesicles to pile up in the wrong places and misguided transport of the relevant proteins. Rothman seems to have focused on the proteins responsible for vesicle fusion, using the right model organism in the form of a novel virus. He revealed the role of a number of interesting proteins like SNAP and SNARE which are now widely studied. Südhof's work was especially important for understanding neurotransmission where synaptic vesicle fusion and release of neurotransmission is a fundamental biological event. He discovered key proteins like complexin and synaptotagmin involved in this process.

The medicine prize for this year also recognized individuals who could have potentially been recognized by the chemistry prize. To me this makes it slightly (although not much more) likely that the chemistry prize would be awarded for "straight" chemistry, perhaps physical and analytical chemistry. I suspect that the decisions of the various committees are not as independent as we might think, and there's probably at least some cross-talk between them.

I am sure everyone is psyched now about tomorrow's physics prize.

(Image credit: