Field of Science "ranks billions of drug interactions"? Hold your horses.

Now here's a study that should make most seasoned molecular modelers cringe. Nature News reports on an effort by website that docked 600,000 compounds to 7,000 protein targets and predicted which ones would show activity against these targets based on docking scores:

Predicting how untested compounds will interact with proteins in the body, as Drugable attempts to do, is more challenging. In setting up the website, Cardozo’s group selected about 600,000 molecules from PubChem and the European Bioinformatics Institute’s ChEMBL, which together catalogue millions of publicly available compounds. The group evaluated how strongly these molecules would bind to 7,000 structural ‘pockets’ on human proteins also described in the databases. Computing giant Google awarded the researchers the equivalent of more than 100 million hours of processor time on its supercomputers for the mammoth effort.

But mammoth computing resources do not translate to carefully constructed protocols or correct predictions. In its current incarnation, docking is best for finding the binding pose, that is, the orientation of a drug bound into a protein pocket. Ranking compounds is far more difficult, and predicting absolute binding affinities is a very distant, currently unachievable third goal.

Anyone who has tried to run a hit to lead or lead optimization project based on docking scores would know how riddled with problems and qualifications any prediction based on these highly subjective numbers is. For starters, every modeling program gives you its own docking scores. Absolute values of these numbers (which ideally should reflect the free energy of binding but which seldom do) are almost always useless. If you are dealing with a congeneric series of molecules and are fairly confident about the binding orientation (usually confirmed by x-ray crystallography or some other technique) then maybe you could get some help from the scores in ranking the compounds, but even then mostly in terms of trends rather than quantitative differences.

Unfortunately the news piece says nothing about what method was used to generate the poses, whether there was any clustering or whether only the top pose was considered, what the false positive rate was, and most importantly, whether there was any experimental verification whatever of the ranking. The website is also not helpful in this regard. It also does not tell us if the protein structures used for docking were well-resolved or refined or whether they were homology models. In the absence of all this information the ranking of the compounds is tenuous at best and useless at worst and as it stands the study sounds little better than throwing darts in the dark and hoping some of them will stick. Ranking often fails even for similar compounds, so how well (or badly) it would work for 600,000 diverse compounds bound to 7,000 diverse protein targets is anyone's guess.

The report also compares the study to a similar activity prediction study by Brian Shoichet in which drug similarity was used to predict activity against unexpected targets. But that was a very different kettle of fish; it was a compound similarity - not docking - study so it did not have to deal with the complexities of error-ridden protein crystal structures or homology models, it verified a lot of the predictions using carefully constructed assays, and even then it gave a hit rate which did not exceed about 50%.

Either the study itself has failed to validate its predictions or the news report is woefully incomplete. Maybe I am wrong and in fact the study has laid the careful groundwork and validation that is necessary for trusting docking. As it stands however, the purpose of the report mainly seems to be to highlight the fact that Google generously donated 100 million hours of its computing power to the docking. This heightened, throw-technology-at-it sense of wonder and optimism is exactly what the field does not need. I would be the first one to welcome reliable predictions of drug-protein affinity based on wholesale docking of compounds to targets, but I don't think this work achieves that goal at all.

The future of nuclear energy: Let a thousand flowers bloom

The interior of a TRIGA nuclear reactor at Oregon State University (Image: Oregon State University)
In the summer of 1956, a handful of men gathered in a former little red schoolhouse in San Diego. These men were among the most imaginative scientists and engineers of their generation. There was their leader, Frederic de Hoffmann who had worked on the Manhattan Project and was now the president of the company General Atomics. Hoffmann was not only a creative physicist but also an unusually shrewd and capable manager and entrepreneur; in the later years of his life he would take the celebrated Salk Institute to great heights. There was also Freeman Dyson, a remarkably versatile mathematical physicist from the Institute for Advanced Study in Princeton who had previously reconciled disparate theories of quantum electrodynamics - the strange theory of light and matter. And there was Edward Teller, another Manhattan Project veteran; a dark, volatile and brilliant physicist who would become so convinced of the power of nuclear weapons to save the world that he would inspire the caricature of the mad scientist in Stanley Kubrick's classic film "Dr. Strangelove".

Together these men and their associates worked on a single goal: the creation of a nuclear reactor that was intrinsically safe, one that would cease and desist its nuclear transformations even in the face of human folly and stupidity. The reactor would have the rather uninspired name TRIGA (Training, Research, Isotopes, General Atomics) but its legacy would be anything but uninspiring. At the heart of the reactor's success was not a technical innovation but an open atmosphere of debate and discussion. Every day someone - mostly Teller - would come up with ten ideas, most of which sounded crazy. The others - mostly Dyson - would then patiently work through the ideas, discarding several of them, extracting the gems from the dross and giving them rigorous shape.

TRIGA benefited from a maximum of free inquiry and individual creativity and a minimum of bureaucratic interference. There was no overarching managerial body dictating the thoughts of the designers. Everyone was free to come up with any idea they thought of, and the job of the rest of the group was to either refine the idea and make it more rigorous and practical or discard it and move on to the next idea. The makers of TRIGA would have been right at home with the computer entrepreneurs of Silicon Valley a few decades later.

At the core of TRIGA's operation was a principle called the warm neutron principle. In a conventional reactor the neutrons in the fuel are moderated by hydrogen in the cooler water from the surroundings. There is a significant potential for a meltdown if someone pulls out the control rods, since the water which stays cool for a while will continue to moderate the neutrons and sustain their efficacy for causing fission. Dyson and Teller's idea was to place half of the hydrogen in the water and the rest in the fuel in the form of a uranium and zirconium hydride alloy. This would result in only half of the hydrogen staying cool enough to moderate the neutrons, while the other half in the hydride stays warm and diminishes the ability of the neutrons to fission uranium. This results in the fuel having what is called a negative temperature coefficient. The fuel rods were fashioned with care and precision by Massoud Simnad, an Iranian metallurgist working on the project.

The warm neutron principle is what made TRIGA intrinsically safe, very unlikely to sustain a meltdown or catastrophic failure. It took less than three years for the engineers and technicians to take the reactor from the design stage to manufacturing. The first TRIGA was inaugurated by none other than Niels Bohr in San Diego. Seventy of these safe reactors were built. They were safe and cheap enough to be operated in hospitals and universities by students and their main function was to produce isotopes for scientific and engineering experiments. They were also robust and safe enough to be proliferation resistant. As Dyson recounts in his elegant memoir "Disturbing the Universe", the TRIGA is perhaps the only nuclear reactor that made a profit for its creator.

TRIGA made the development of nuclear power seem relatively easy, cheap and fast. Why didn't other reactors enjoy the same success? Why, after fifty years, is nuclear power still struggling in the face of economics and political and social backlash? There are many reasons, but the principal reason is simple: the designers of TRIGA were encouraged to have fun and they had the kind of freedom of inquiry commonly found in a startup company. The problem is that the fun went out of the nuclear business in the 70s and with fun creativity and cost considerations also went out of the window. In his book Dyson swiftly cuts through to the central issue:
"The fundamental problem of the nuclear industry is not reactor safety, not waste disposal, not the dangers of nuclear proliferation, real though all these problems are. The fundamental problem of the industry is that nobody any longer has any fun building reactors....Sometime between 1960 and 1970 the fun went out of the business. The adventurers, the experimenters, the inventors, were driven out, and the accountants and managers took control. The accountants and managers decided that it was not cost effective to let bright people play with weird reactors. So the weird reactors disappeared and with them the chance of any radical improvement beyond our existing systems. We are left with a very small number of reactor types, each of them frozen into a huge bureaucratic organization, each of them in various ways technically unsatisfactory, each of them less safe than many possible alternative designs which have been discarded. Nobody builds reactors for fun anymore. The spirit of the little red schoolhouse is dead. That, in my opinion, is what went wrong with nuclear power."
Nobody builds reactors for fun anymore. What Dyson is getting at is quite simple. For any technological development to be possible, the technology needs to drive itself with the fuel of Darwinian innovation. It needs to generate all possible ideas - including the weird ones - and then fish out the best while ruthlessly weeding out the worst. This leads not only to quality but cost reduction since no entrepreneur is going to risk introducing an inherently expensive technology into the market. But all this is not possible until you allow people to play with ideas of their own volition and have fun doing it. People are not going to selflessly generate ideas by fiat, they are only going to do so when they are supported by funds and infrastructure but otherwise left to their own devices. The accountants and managers need to get the process started and then need to get out of the way.

Almost every successful technology has gone through this Darwinian phase. Dyson gives the example of motorcycles, which motorcyclists from his father's generation designed and serviced with care and affection. In our generation the most resounding example is that of computer technology. We have lost track of how many versions of software and hardware young computer enthusiasts experimented with in their California garages before their own technical and artistic sensibilities and the will of the market picked the best ones. Both Bill Gates and Steve Jobs made their fortunes in a milieu of young upstarts experimenting with the latest electronics and code and competing with fellow upstarts sprawled across the country. Just like the nuclear designers of the little red schoolhouse, the computer designers of the Silicon Valley garages were unencumbered by the demands of a central authority. So were the genetic engineers who founded companies like Genentech and Amgen. They could let their imaginations roam, bouncing ideas off one another and ruthlessly shooting down clumsy, expensive or ostentatious designs. It was the ability of bright young people to brainstorm to their hearts' content and to launch nimble startups rapidly exploring diverse and cheap technological solutions that allowed computer technology to become the all-pervasive life force that it is today. Biotechnology is now poised to do the same. A similar process of Darwinian survival of the fittest permeates other successful technologies, from flight to automobile engineering to house construction. And most importantly, the creators of all these technologies had fun creating them.

Nothing like this happened with nuclear power. It was a technology whose development was dictated by a few prominent government and military officials and large organizations and straitjacketed within narrow constraints. Most of the developers of nuclear technologies were staid, elderly bureaucrats rather than young iconoclasts like Frederic de Hoffmann. An early design invented by Admiral Hyman Rickover - suitable for submarines but hardly optimal for efficient land-based power stations - was frozen and applied to hundreds of reactors around the country. Since then there have been only a hundred or so reactor designs and only half a dozen or so prominent ones. Due to a complicated mix of factors including public paranoia, lack of economies of scale, political correctness and misunderstandings about radiation, nuclear technology was never given a chance to be played around with, to be entrusted to youthful entrepreneurs experimenting with ideas, to find its own way through the creative and destructive process of Darwinian evolution to a plateau of technological and economic efficiency. The result was that the field remained both scientifically narrow and expensive. Even today there are only a handful of companies building and operating most of the world's reactors.

To reinvigorate the promise of nuclear power to provide cheap energy to the world and combat climate change, the field needs to be infused with the same entrepreneurial spirit that pervaded the TRIGA design team and the Silicon Valley entrepreneurs. Young people who are brimming with ideas especially need to be given as many resources as possible to come up with solutions and explore them in startups, even if not garages. Just like any other technology, nuclear power can thrive only when the maximum number of people apply their creative minds to improving both the quality and cost of energy from fission. Fortunately a minority of companies and their creators are setting the trends.

I live in Cambridge, MA which has been a hotbed of innovation for several decades. In a few square miles along the picturesque Charles River lie literally hundreds of biotech, pharmaceutical and information technology startups, most enabled by the proximity of MIT and Harvard whose laboratories provide a steady supply of ideas that can be potentially turned into useful products. The scientists, engineers and managers in these startups constantly compete against each other and between themselves for the best ideas. My own startup is based on a novel way to make complex drugs using the specific base-pairing properties of DNA. Every year dozens of startups fail, and a few go public or are bought by other companies. The whole startup enterprise in Cambridge is subject to the forces of Darwinian selection that enables the filtering of the best ideas.

One component of this enterprise is named Transatomic Power. It was started by a duo of graduate students from MIT named Leslie Dewan and Mark Massie in 2010. The goal of Transatomic Power is to design a reactor that can generate power from nuclear waste, thus addressing the twin issues of clean energy and nuclear waste removal at the same time. The reactor which is a molten salt reactor lives off the preponderance of energy trapped in unfissioned reactor fuel from light water reactors. It is also compact enough to be shipped individually to the reactor site. Dewan and Massie are two of the few young people who actually see opportunity in the nuclear field and are willing to take risks in order to develop a novel approach to the problem.

On the other coast of the United States in Seattle is another team of nuclear entrepreneurs led by Nathan Myhrvold, a former CTO of Microsoft with degrees in physics and economics. Myhrvold has founded a company named Terrapower which operates on a novel nuclear design called the traveling wave reactor (TWR) which was also in part explored by Edward Teller and Lowell Wood in the 90s. The TWR is another reactor which can operate on waste, using depleted uranium to sustain a fission wave that spreads outward into the reactor, transforming the uranium into plutonium and leaving a small amount of fissile waste behind. The TWR promises to run for decades without having to refuel it or recover spent fuel, thus promising both safety and proliferation resistance. Among the enthusiasts of the TWR is Bill Gates, who knows a thing or two about Darwinian innovation in technology.

The founders of Terrapower and Transatomic are following in the footsteps of the dreamers in the little red schoolhouse. They have transformed nuclear technology into an entrepreneurial game of ideas and funding sustained by a healthy interplay between academic, industrial and government laboratories. I do not know whether their reactors will be the ones supplying the world's energy in the near future, but what I do know is that they are doing exactly what needs to be done to sustain the innovative process of creation and destruction that is necessary for the evolution of any successful technology. They are bucking the trend set by the large, bureaucratic government organizations and their industrial counterparts. And most importantly, they are having fun doing it, trading ideas and exploring new technical ground. I see hope in the adventures of these nuclear explorers, just like the makers of TRIGA saw hope in the future of nuclear power and the whole world saw hope in the explorers of computer and biotechnology in the 80s. When it comes to nuclear technology we should let a thousand flowers bloom. And then we can pick the most beautiful.

This post was first published on the Nobel Week Dialogue website.

The golden age of computational materials science gives me a disturbing feeling of déjà vu

Graphene, a wonder material which was made by scientists using a version of Scotch tape (Image: Wikipedia)
I was a mere toddler in the early 1980s when they announced the “golden age of computational drug design”. Now I may have been a toddler, but I often hear stories about the impending golden age from misty-eyed veterans in the field. A cover story in Fortune magazine (which I can never seem to find online) announced that pharmaceutical scientists were now designing drugs on computers. The idea was that once you feed in the parameters for how a drug behaves in the human body, the computer would simply spit out the answer. The only roadblock was computing power limited by hardware and software advances. Give it enough time, the article seemed to indicate, and the white-coat clad laboratory scientist might be a historical curiosity. The future looked rosy and full of promise.

Fast forward to the twilight days of 2013. We are still awaiting the golden age of computational drug design. The preponderance of drug discovery and design is still enabled by white coat-clad laboratory scientists. Now let’s be clear about one thing: the computational side of the field has seen enormous advances since the 1980s and it continues to thrive. There will almost undoubtedly be a time when its contributions to drug design would be seen as substantial. Most drugs perform their magic in living systems by binding to specific proteins, and computational drug design is now competent enough so that it can predict with a fair degree of accuracy, the structure and orientation of a drug molecule bound in a protein’s deep binding pocket. Computational scientists can now suggest useful modifications to a drug’s structure which laboratory chemists can make to improve multiple properties including solubility, diffusivity across cell membranes, activity inside cells and ability to avoid getting chewed up by enzymes in the body. You would be hard pressed to find a drug design project where computational modeling does not play at least a modest role. The awarding of this year’s Nobel Prize in chemistry to computational chemists is only one indication of how far the field has advanced.

And yet it seems that computational drug designers are facing exactly the same basic challenges they faced in the 80s. They have certainly made progress in understanding these challenges, but robust prediction is still a thing of the future. The most significant questions they are dealing with are the same ones they dealt with in the 80s: How do you account for water in a protein-drug system? How do you calculate entropies? How do you predict the folded structure of a protein? How do you calculate the different structures a drug molecule adopts in the aqueous milieu of the body? How do you modify a drug compound so that cells – which have evolved to resist the intrusion of foreign molecules - don’t toss it right out? How do you predict the absolute value of the binding energy between drug and protein? And scientists are grappling with these questions in spite of tremendous, orders-of -magnitude improvements in software and hardware.

I say all this because a very similar cover story about computational materials design in this month's Scientific American evokes disturbing feelings of déjà vu in me. The article is written by a pair of scientists who enthusiastically talk about a project whose goal is to tabulate calculated properties of materials for every conceivable application: from lightweight alloys in cars to new materials for solar cells to thermoelectric materials that would convert dissipated heat into electricity. The authors are confident that we are now approaching a golden age of computational materials design where high-throughput prediction of materials properties will allow us to at least speed up the making of novel materials.
We can now use a century of progress in physics and computing to move beyond the Edisonian process (of trial and error). The exponential growth of computer-processing power, combined with work done in the 1960s and 1970s by Walter Kohn and the late John Pople, who developed simplified but accurate solutions to the equations of quantum mechanics, has made it possible to design new materials from scratch using supercomputers and first-principle physics. The technique is called high-throughput computational materials design, and the idea is simple: use supercomputers to virtually study hundreds or thousands of chemical compounds at a time, quickly and efficiently looking for the best building blocks for a new material, be it a battery electrode, a metal alloy or a new type of semiconductor.
It certainly sounds optimistic. However the article seems big on possibilities and short on substance and shortcomings. This is probably because it occupies only three pages in the magazine; I think it deserved far more space, especially for a cover article. As it stands the piece appears more pollyannaish than grounded in cautious optimism.

I applaud the efforts to build a database of computed materials properties but I am far more pessimistic about how well this knowledge can be used in designing new materials in the near future. I am not a materials scientist, but I think some of the problems the computational end of the discipline faces are similar to those faced by any computational chemist. As the article notes, the principal tools used for materials design are quantum mechanics-based chemistry methods developed mainly by John Pople and Walter Kohn in the 1970s, a discovery that got the duo the 1998 chemistry Nobel Prize. Since then these methods have been coded into dozens of efficient, user-friendly computer programs. Yet these methods – based as they are on first principles – are notoriously slow. Even with heavy computing power it can take several days to do a detailed quantum mechanical calculation on an atomic lattice. With materials involving hundreds of atoms and extended frameworks it would take much longer.

I am especially not convinced that the methods would allow the kind of fast, high-throughput calculations that would substitute for experimental trial and error. One reason why I feel pessimistic is because of the great difficulty of predicting crystal structures. Here’s the problem: the properties of a material depend on the geometric arrangement of its atoms in a well-defined crystal lattice. Depending on the conditions (temperature, pressure, solvent) a material can crystallize in dozens of possible structures, which makes the exercise of assuming “a” crystal structure futile. What’s worse for computer calculations is that the energy differences between these structures may be tiny, within the error limits of many theoretical techniques.

On the other hand, the wrong crystal structure could give us the wrong properties. The challenge for any kind of computational prediction method is therefore two-fold: firstly, it has to predict the various possible crystal forms that a given material can adopt (and sometimes this number can run into the hundreds). Secondly, even if it can achieve this listing, it now has to rank these crystal forms in order of energy and predict which would be the most stable one. Since the energy differences between the various forms are tiny, this would be a steep challenge even for detailed calculation on a single material. Factoring conditions of temperature, pressure and solvent into the calculation would make it even more computationally expensive. To me, it seems like doing all this in a high-throughput manner for dozens or hundreds of materials would be an endeavor fraught with delays and errors. It would certainly make for an extremely valuable intellectual contribution that advances the field, but I cannot see how we can be on the verge of practically and cheaply using such calculations to design complex new materials at a pace which at least equals experiment.

The second problem I foresee is a common one, what almost any scientist or engineer calls the multi-parameter optimization problem. We in the drug design field face it all the time; not only do we need to optimize the activity of a drug inside cells, but we also need to simultaneously optimize other key properties like stability, toxicity and solubility and – at a higher level – even non-scientific properties like price and availability of starting materials for making the drug. Optimizing each one of these properties would be an uphill battle, but optimizing them all at once (or at least two or more at a given time) strains the intellect and resources of the best scientists and computers. I assume that new materials also have to satisfy similar multiple properties; for instance a new alloy for cars would have to be lightweight, environmentally benign, stable to heat and light and inexpensive. One of the principal reasons drug discovery is still so hard is this multi-parameter optimization problem, and I cannot see how the situation would be different for materials science on a computational level, especially if the majority of techniques involve expensive quantum mechanical calculations.

One way in which calculations can be sped up – and this is something which I would have loved to read about in the article – is by using cheap, classical mechanics-based parameterized methods. In these methods you simplify the problem by using parameters from experiment in a classical model that implicitly includes quantum effects by way of the experimentally determined variables. While these calculations are cheap they can also result in larger error, although they work almost as well as detailed quantum calculations for simpler systems. It seems to me that this database of properties they are building could be shored up with experimental values and used to build parameterized, cheaper models that can actually be employed in a high-throughput capacity.

Does all this make me pessimistic about the future of computational materials design? Not at all; we are just getting started and we need an influx of talented scientists in this area. Computational drug design followed the classic technology curve, with inflated expectations followed by a valley of disappointment culminating in a plateau of realistic assessment. Perhaps something similar will happen for computational materials design. But I think it’s a mistake to say that we are entering the golden age. We are probably testing the waters right and getting ready for a satisfying dip. And that is what it should be called, a dip, not a successful swim across the materials channel. I wish those who take the plunge all good luck.

First published on the Scientific American Blog Network.

Note: Profs. Chris Cramer and Alan Aspuru-Guzik have pointed me to some successful examples of the paradigm. It's clear that certain kinds of problems (especially involving MOFs) are more accessible to the approach than others.

Enthalpy-entropy compensation and water networks

Enthalpy-entropy compensation (EEC) is an endlessly interesting phenomenon; it's the kind of topic that makes scientists either roll up their sleeves for a good fight or slowly walk away from the table. The basic idea is simple; when you are building new chemical functionality into a drug molecule to interact better with a protein (improving ∆H) you are also tying down the molecule (worsening ∆S) and constraining its movement. However since the two variables oppose each other this won't be reflected in the overall ∆G of binding which will stay the same.

Scientists have been going back and forth over the causes of EEC and now there's a new paper from George Whitesides's group at Harvard, Schrodinger and Brookhaven which sheds some light on one possible, usually neglected factor: the subtle changes in the thermodynamics of the network of water molecules surrounding a ligand. These are not the water molecules displaced by the ligand from the protein pocket (which have received considerable attention over the last decade or so) but the ones on the surface that contact the ligand on the outside.

The paper is based on a workhorse protein system that Whitesides's group has been working on for a while now - carbonic anhydrase. The protein is stable, relatively rigid, biochemically well-studied, amply expressed and easily crystallized by itself and with several ligands; all features which make it a good model system to look at the thermodynamics of binding. Whitesides's group has found out that you can have ligands with different fluorination patterns that bind to the protein and show very similar ∆Gs of binding. This is unexpected, since you expect additional fluorines to give you better entropy from the hydrophobic effect.

To explore the phenomenon the authors use two techniques; x-ray crystallography and molecular dynamics simulations. The former provides information on intermolecular interactions while the latter provides information on the thermodynamics of surrounding water molecules, more specifically about their enthalpy and entropy. The MD and thermodynamic calculations are done using the WaterMap tool from Schrodinger.

From the crystal structures the authors find that the enthalpy of binding can actually get unfavorable from the added fluorines as a result of repulsive interactions with a few oxygens in the protein. Since ∆G stays the same this means that the unfavorable ∆H in the active site might be compensated for by ∆H changes in the water network surrounding the ligand along with corresponding ∆S adjustments. In the picture above, water molecules with more favorable ∆H values are colored green while the unfavorable ones are colored red. 

Notice the difference between the three difluorinated analogs: the 4,6 analog has the most green waters, the 5,6 analog has the most red waters (and an extra red water compared to the others) and the 6,7 analog is somewhere in between. The gradation of unfavorable water molecules around the three compounds tracks well with enthalpies extracted from ITC. The entropies duly compensate. The thermodynamics of surface water molecules therefore certainly seem to be one possible reason for the EEC. It's also worth noting that the behavior of the water molecules corresponds to what you would call an "enthalpy-driven hydrophobic effect".

While we are neglecting second-order effects and while it's still hard to get quantitative agreement down to a kcal or so, I like the fact that we can eyeball such figures and at least qualitatively rank cases by favorable and unfavorable enthalpies. I also find it promising that we can actually do this kind of thing for surface water molecules which are part of a network; ten years ago most people might have thrown up their hands when asked to do this. Of course not every drug-protein binding case is going to be dictated by surface water behavior but the fact that we can at least get a semi-quantitative look at this important factor is, in my opinion, a valuable stepping stone toward the future.


Barry Werth on the cost of new drugs

Barry Werth who wrote the swashbuckling book about the creation of Vertex (sequel out in February) has an excellent piece (also highlighted by @Chemjobber) in the MIT Technology Review about the cost of new drugs. He asks a question which is usually the first question that any pharmaceutical scientist who tells a layperson what he/she does for a living encounters: Why do drugs cost so much? (The next question is usually "Why do drugs have so many side-effects?")

Werth compares two drugs to illustrate the strange world of drug pricing and the moral dilemma that riddles that world: Vertex's cystic fibrosis drug Kalydeco and Regeneron/Sanofi's cancer drug Zaltrap. Here's the problem: Kalydeco is a breakthrough medicine which has breathed completely new life into the treatment of a disease for which no effective therapies existed before. It costs about $300K a year. Zaltrap increases the median lifespan of patients with advanced colorectal cancer by 1.5 months. And it costs $11K a month. Now is it surprising why people are so critical of the pharmaceutical industry? I would be too, if I was constantly bombarded by news of "breakthroughs" like Zaltrap.

The reason why this whole thing seems so absurd is that the actual price of a drug often sounds almost completely arbitrary. As Werth notes, Zaltrap caused an outrage among patients and physicians, leading a group led by doctors from Memorial Sloan Kettering Hospital to protest the price of the drug in an unprecedented NYT Op-Ed. In response Sanofi cut the price of the drug by half through rebates and other schemes. If a drug company can reduce the price of a medication by 50% just like that without major catastrophe, it really makes you ask what the "true" price of the drug is.

In any case, the whole thing is definitely worth a read, especially in an age where drugs are paradoxically going to start becoming more effective - even as they are targeted toward select, small patient subpopulations - and simultaneously more expensive.

Molecular dynamics: I have a bad feeling about this.

Computer models of chemical and biological systems are not reality; rather they are what I call “invitations to reality”. They provide guidance to experimentalists to try out certain experiments, test certain techniques. They are suggestive, not factual. However as any good modeler and chagrined experimentalist knows, it’s not hard to mistake models for reality, especially when they look seductive and are replete with bells and whistles.

This was one of the many excellent points that Anthony Nicholls made in his lunch critique of molecular dynamics yesterday at the offices of OpenEye scientific software in Cambridge, MA. In his talks and papers Anthony has offered not just sound technical criticism but also a rare philosophical and historical perspective. He has also emerged as one of the sharpest critics of molecular dynamics in the last few years, so we were all eager to hear what it exactly is about the method that rubs him the wrong way. Many of his friends and colleagues call him ‘Ant’, so that’s what I will do here.

Here’s some background for a general audience: Molecular dynamics (MD) is a computational technique that is used to simulate the motion of atoms and molecules. It is used extensively in all kinds of fields, from biochemistry to materials science. Most MD employed in research is classical MD, based on Newton’s laws of motion. We know that the atomic world is inherently quantum mechanical in nature, but it turns out we can get away to a remarkable extent using classical mechanics as an approximation. Over the last few years user-friendly software and advances in computing hardware have brought MD to the masses, so that even non-specialists can now run MD calculations using brightly colored and accessible graphical user interfaces and desktop computers. A leader in this development is David E. Shaw, creator of the famed D E Shaw hedge fund who has made the admirable decision to spend all his time (and a good deal of his money) developing MD software and hardware for biochemistry and drug discovery.

Ant’s 2-hour talk was very comprehensive and enjoyable, covering several diverse topics including a few crucial ones from the philosophy of science.

It would be too much to describe everything that Ant said and I do hope OpenEye puts the video up on their website. I think it would be most convenient to summarize his main points here.

MD is not a useless technique but it’s not held up to the same standards as other techniques, and therefore its true utility is at best unknown: Over the last few years the modeling community has done a lot of brainstorming about the use of appropriate statistical and benchmarking methods to evaluate computational techniques. Statistical tests have thus emerged for many methods, including docking, shape-based screening, protein-based virtual screening and quantum chemical calculations. Such tests are however manifestly lacking for molecular dynamics. As Ant pointed out, almost all statements in support of MD are anecdotal and uncontrolled. There are almost no follow-up studies.

MD can accomplish in days what other techniques can achieve in seconds or hours: No matter how many computational resources you throw at it, the fact remains (and will likely always remain) that MD is a relatively slow technique. Ant pointed out cases where simpler techniques gave the same results as MD but in much lesser time. I think this reveals a more general caveat; that before looking for complicated explanations for any phenomenon in drug discovery or biology (potency, selectivity, differences in assay behavior etc.), one must look for simple ones. For instance is there a simple physicochemical property like molecular weight, logP, number of rotatable bonds or charge that correlates with the observed effect? If there is one, why run a simulation lasting hours or days to get the same result?

A case in point is the recent Nature paper by D. E. Shaw’s group described by Derek on his blog. Ant brought our attention to the Supporting Information which says that they got the same result for the ligand pose using docking which they got using MD, a difference translating to a simulation time of days vs seconds. In addition they saw a protein pocket expansion in the dynamics simulation whose validity was tested by synthesizing one compound. That they prospectively tested the simulation is a good thing, but one compound? Does that prove that MD is predictive for their system?

MD can look and feel “real” and seductive: This objection really applies to all models which by definition are not real. Sure, they incorporate some elements of reality but they also leave many others out. They simplify, use fudge factors and parameters and often neglect outliers. This is a not a strike against models since they are trying to model some complex reality and they cannot do this without simplification, but it does indicate reasons for being careful when interpreting their results. However I agree that MD is in a special category since it can generate very impressive movies that emerge from simulations run on special purpose machines, supercomputers or GPUs for days or months at a time. Here’s one that looks particularly impressive and denotes a drug molecule successfully “finding” its binding site on a protein.

This apparently awesome power of computing power and graphical software brought to bear on an important problem often makes MD sound way more important than what it is. The really damning thing though may be that shimmering protein on your screen. It’s very easy for non-computational chemists to believe that that is how the proteins in our body actually move. It’s easy to believe that you are actually seeing the physics of protein motion being simulated, down to the level of individual atoms.

But none of this is really true. Like many other molecular models what you are seeing in front of you is a model, replete with approximations and error bars. As Ant pointed out, it’s almost impossible to get real variables like statistical mechanical partition functions, let alone numbers from experiment, out of such simulations. Another thing that’s perpetually forgotten is that in the real world, proteins are not isolated but are tightly clustered together with other proteins, ions, small molecules and a dense blanket of water. Except perhaps for the water (and poorly understood water at that), we are ignoring all of this when we are running the simulation. There are other problems in real systems, like thermal averaging and non-ergodicity which physicists would appreciate. And of course, let’s not even get started on the force fields, the engines at the heart of almost every simulation technique that are consistently shown to be imperfect. No, the picture that you see in a molecular dynamics movie is a shadow of its “real” counterpart, even if there is some agreement with experiment. At the very least this means you should keep your jaw from dropping every time you see such a movie.

Using jargon, movies and the illusion of reality, MD oversells itself to the public and to journals: Ultimately it’s not possible to discuss the science behind MD without alluding to the sociological factors responsible for its perception. The fact is that top journals like Nature or Science are very impressed when they see a simulation shepherded by a team led by Big Name Scientist being run for days using enough computing power to fly a jetfighter. They are even more impressed when they see movies that apparently mirror the actual motion of proteins. Journals are only human, and they cannot be entirely faulted for buying into seductive images. But the unfortunate consequence of this is that MD gets oversold. Because it seems so real, because simulations that are run for days must undoubtedly be serious stuff because they have been run for days, because their results are published in prestigious journals like Nature, therefore it all must be important stuff. This belief is however misplaced.

What’s the take home message here? What was strange in one sense was that although I agreed with almost everything that Ant said, it would not really affect the way I personally use MD in my day-to-day to work, and I suspect this is going to be the case for most sane modelers. For me MD is a tool, just like any other. When it works I use its results, when it doesn’t I move on and use another tool. In addition there are really no other ways to capture protein and ligand motion. I think Ant’s talk is best directed at the high priests of MD and their followers, people who either hype MD or think that it is somehow orders of magnitude better than other modeling techniques. I agree that we should all band together against the exhortations of MD zealots.

I am however in the camp of modelers who have always used MD as an idea generator, a qualitative tool that goads me into constructing hypothesis and making suggestions to experimentalists. After all the goal of the trade I am involved in is not just ideas but products. I do care about scientific rigor and completeness as much as the other person, but the truth is that you won’t get too far in the business I am involved in if you constantly keep worrying about scientific rigor rather than the utility – even if it’s occasional – of the tools we are using. And this applies to theoretical as well as experimental tools; when was the last time my synthetic chemistry friends used a time-tested reaction on a complex natural product and got the answer they expected? If we think MD is anecdotal, we should also admit that most other drug design strategies are anecdotal too. In fact we shouldn’t expect it to be otherwise. In a field where the validity of ideas is always being tested against a notoriously complex biological system whose workings we don’t understand and where the real goal is to get a useful product, even occasional successes are treasured and imperfect methods are constantly embraced.

Nonetheless, in good conscience my heart is in Ant’s camp even if my head protests a bit. The sound practice of science demands that every method be duplicated, extensively validated, compared with other methods, benchmarked and quantified to the best of our abilities if we want to make it part of our standard tool kit. This has manifestly not happened with MD. It’s the only way that we can make such methods predictive. In fact it’s part of a paradigm which as Ant pointed out goes back to the time of Galileo. If a method is not consistently predictive it does not mean it is useless, but it does mean that there is much in it that needs to be refined. Just because it can work even when it’s not quantitative does not mean trying to make it quantitative won’t help. As Ant concluded, this can happen when the community comes together to compare and duplicate results from their simulations, when it devotes resources to performing the kind of simple benchmarking experiments that would help make sense of complicated results, when theorists and experimentalists both work together to achieve the kinds of basic goals that have made science such a successful enterprise for five hundred years.

Molecular Dynamics: Manna from heaven or spawn of Satan?

This is the question that Anthony Nicholls of OpenEye Scientific Software will try to answer tomorrow at the OpenEye offices in Cambridge, MA. Well, ok, not exactly this question but a more nuanced version thereof. 

As those in the field would probably know, Anthony who is one of the leaders in the field of industrial computational chemistry has had a history of offering pointed, articulate and informed criticism on what is rapidly becoming an important tool in the drug industry. In the last few years MD has captured the imagination of many, especially through the efforts of researchers like David Shaw and Vijay Pande who have enabled simulations to approach realistic time scales approximating large-scale conformational changes in proteins and protein-ligand binding. Nonetheless it remains a technique that often sparks a range of responses among its practitioners and critics, which to me makes it even more interesting because it's no fun when everyone agrees or disagrees, right?

I am not an expert when it comes to MD (that's precisely why I want to hear from the experts) but I am instead like the vast majority of scientists who use the technique, find it useful to varying degrees and are intrigued by what the fundamental issues in the field exactly are. What makes this issue even more interesting for me is that it seems to tread into some of the more relevant questions from the philosophy of science, including evergreen gems like "What is utility?", "What do you mean when you say a technique 'works', and is this definition the same for different techniques?", "What is more important, prediction or understanding?" and the ultimate zinger, "What is science, exactly?". I am particularly interested in the question of how exactly you validate a 'correct' prediction for a complex system like a protein-drug interaction where there can be considerable uncertainty. I am sure Anthony will have more to say about this since he has made extremely valuable contributions to pointing out the key role of statistics in molecular modeling.

In any case, I have no doubt that the talk will be characteristically stimulating and provocative. If you want to attend you should RSVP to Scott Parker at OpenEye. Derek also mentioned this on his blog. And of course, I will be there and will have a summary here soon, so watch this space.

Update: My report on the talk is here.

A discussion on Big Science, Small Science and the future of all science

Tomorrow I have the privilege of joining a panel discussion on Big Science with three very distinguished scientists: Nobel Laureate Steven Weinberg, MIT astrophysics professor Sara Seager and Perimeter Institute cosmologist Neil Turok. The conversation will mostly focus on the problems facing Big Science in a bad economy and how science can retool itself in the new millennium.

The program will be broadcast on Canada's TV Ontario, more specifically on their "The Agenda with Steve Paikin" show at 8 and 11 PM EST. It will be preceded by an interview with star astronomer Chris Hadfield who entertained and informed all of us through his YouTube videos from the International Space Station.

If you are in Canada and have access you might want to check it out since I am sure the conversation will be stimulating. I will have a summary of the discussion here soon, hopefully along with a video or a podcast.

Arsenic DNA, chemistry and the problem of differing standards of proof in cross-disciplinary science

Arsenic-based linkages in DNA would be unstable and would quickly break, a fact suspected by chemists for years (Image: Johannes Wilbertz)
When the purported discovery of the now infamous “arsenic DNA” bacteria was published, a friend of mine who was studying astrobiology could not stop praising it as an exciting scientific advance. When I expressed reservations about the discovery mainly based on my understanding of the instability of biomolecules containing arsenic, she gushed, “But of course you will be skeptical; you are an organic chemist!"

She was right. As chemists me and many of my colleagues could not help but zero in on what we thought was the most questionable aspect of the whole discovery; the fact that somehow, contrary to everything we understood about basic chemistry, the “arsenic DNA” inside the bacteria was stably chugging along, replicating and performing its regular functions.

It turned out that the chemists were right. Measurements on arsenic DNA analogs made by researchers several months later found that the arsenic analogs differed in stability from their phosphate versions by a mind-boggling factor of 1017. Curiously, physicists, astronomers, geologists and even biologists were far more accommodating about the validity of the discovery. For some reason the standards used by these scientists were different from those used by chemists, and in the end the chemists’ standard turned out to be the “correct” one. This is not a triumph of chemists and a blemish on other sciences since there could well be cases where other sciences might have used the correct standards in nailing down the truth or falsehood of an unprecedented scientific finding.

The arsenic DNA fiasco thus illustrates a very interesting aspect of modern cross-disciplinary science – the need to reconcile what can be differing standards of evidence or proof between different sciences. This aspect is the focus of a short but thought-provoking piece by Steven Benner, William Bains and Sara Seager in the journal Astrobiology.

The article explains why it was that standards of proof that were acceptable to different degrees to geologists, physicists and biologists were unacceptable to chemists. The answer pertains to what we call “background knowledge”. In this case, chemists were compelled to ask how DNA with arsenic replacing phosphorus in its backbone could possibly be stable given everything they knew about the instability of arsenate esters. The latter had been studied for several decades, and while arsenic DNA itself had not been synthesized before, simpler arsenate esters were known to be highly unstable in water. The chemists were quite confident in extrapolating from these simple cases to questioning the stable existence of arsenic DNA; if arsenic DNA indeed were so stable, then almost everything they had known about arsenate esters for fifty years would have been wrong, a possibility that was highly unlikely. Thus for chemists, arsenic DNA was an extraordinary claim. And as Carl Sagan said, they needed to see extraordinary evidence before they could believe it, evidence that was ultimately not forthcoming.

For geologists however, it was much easier to buy into the claims. That is because as the article points out, there are several cases where elements in minerals are readily interchanged for other elements in the same column of the periodic table. Arsenic in particular is known to replace phosphorus in rocks bearing arsenate and phosphate minerals. Unlike chemists, geologists found the claim of arsenic replacing phosphorus quite consistent with their experiences. Physicists too bought readily into the idea. As the authors say, physicists are generally tuned to distinguishing two hypotheses from one another; in this case the hypothesis that DNA contains arsenic versus the hypothesis that it does not. The physicists thus found the many tests apparently indicating the presence of arsenate in the DNA to provide support for one hypothesis over another. Physicists did not appreciate that the key question to ask would be regarding the stability of arsenic DNA.

Like chemists biologists were also skeptical. Biologists usually check the validity of a claim for a new form of life by comparing it to existing forms. In this case, when the genetic sequence and lineage of the bacteria were inspected they were found to be very similar to garden variety, phosphate-containing bacteria. The biologists’ background knowledge thus compelled them to ask how it could possibly be that a bacterium that was otherwise similar to other existing bacterium could suddenly survive on arsenic instead of phosphorus.

In the end of course, none of the duplicated studies found the presence of arsenic in the GFAJ-1 bacteria. But this was probably the least surprising to chemists. The GFAJ-1 case thus shows that different sciences can have different standards for what they regard as “evidence”. What may be suitable for one field may be controversial or unacceptable for others. This fact helps answer at least one question for the GFAJ-1 paper: Why was it accepted in a prestigious journal like Science? The answer almost certainly concerns the shuttling of the manuscript to planetary scientists rather than chemists or biologists as reviewers. These scientists had different standards of evidence, and they enthusiastically recommended publication. One of the key lessons here is that any paper on cross-disciplinary topics must be sent to at least one specialist from each discipline comprising the field. Highly interdisciplinary fields like astrobiology, drug discovery, and social psychology are prime candidates for this kind of a policy.

Discipline-dependent standards of proof not only explain how occasionally bad science gets published or how promising results get rejected but it also goes into the deeper issue of what in fact constitutes “proof” in science. This question reminds me of the periodic debates about whether psychology or economics is a science. The fact is that many times the standard of proof in psychology or economics might be unacceptable to a physicist or statistician. As a simple example, it is often impossible to get correlations of better than 0.6 in a psychological experiment. And yet such standards can be accepted as proof in the psychological community, partly because an experiment on human beings is too complex to get more accurate numbers; after all, most human beings are not inclined planes or balls dropped from a tower. In addition one may not always need accurate correlations for discerning valuable trends and patterns. Statistical significance may not always be related to real world significance (researchers running clinical trials would be especially aware of this fact).

The article by Benner, Bains and Seager concludes by asking how conflicting standards of proof can be reconciled in highly cross-disciplinary sciences, and this is a question which is going to be increasingly important in an age of inherently cross-disciplinary research.

I think the GFAJ-1 fiasco itself provides one answer. In that case the most “obvious” objection was raised by chemists based on years of experience. In addition it was a “strong” objection in the sense that it really raised the stakes for their discipline; as noted before, if arsenic DNA exists then much of what chemists know about elementary chemical reactivity might have to be revised. In that sense it was really the kind of falsifiable, make-or-break test advocated by Karl Popper. So one cogent strategy might be to first consider these strong, obvious objections, no matter what discipline they may arise from. If a finding passes the test of these strong objections, then it could be subjected to less obvious and more relaxing criteria provided by other disciplines. If it passes every single criterion across the board then we might actually be able to claim a novel discovery, of the kind that rarely comes along and advances the entire field.

First published on the Scientific American Blog Network.