The right questions about modeling in drug discovery

There's been a fair bit of discussion recently about the role of modeling in drug design, and as a modeler I have found it encouraging to see people recognize the roles that modeling to varying extents plays in the whole "rational" drug design paradigm. But sometimes I also see a question being asked that's not quite how it should be pitched.

This question is usually along the lines of "Would you (as a medicinal chemist) or would you not make a compound that a modeler recommends?". Firstly, the way the question is set up ignores the statistical process evident in all kinds of drug design and not just modeling. You never recommend a single compound or even five of them but usually a series, often built around a common scaffold. Now suppose the chemist makes fifty compounds and finds only one of them to be active. Does this mean the modeling was useful? It depends. If the one hit that you get is a novel scaffold, or a compound with better properties, or one without that pesky unstable ester, or even a compound similar to ones that you have made that nonetheless picks up some hitherto unexplored interactions with the protein, then modeling would not have been in vain in spite of what is technically a low hit rate. The purpose of modeling is to provide interesting ideas, not rock-solid predictions. Plus you always have to ask the other question: "How likely is it that you would have come up with this structure without the aid of modeling or structural information"? As with almost everything else in drug discovery, the value of a particular approach is not absolute and you have to judge it in the context of alternatives.

The other problem I have is that it's frequently just not possible to objectively quantify the value of modeling or any specific approach for that matter in drug design. That's because anyone who has contributed to the design of a successful drug or even a potent ligand knows how haphazard and multifaceted the process is, with contributions flying in from every direction in many guises. Every project that I have contributed to which has resulted in more potent ligands has involved a patchwork process with an essential back-and-forth exchange with the chemists. Typically I will recommend a particular compound based on some modeling metric and then the synthetic chemist will modify parts of it based on synthetic accessibility, cost of building blocks, ease of purification etc. I might then remodel it and preserve or extend the synthetic modifications.

But it can also be the other way round. The synthetic chemist will recommend a scaffold and I will suggest tweaks based on what I see on the computer, which will lead to another round of suggestions and counter-suggestions. The creative process goes both ways and the point is that it's hard to put a number on the individual contributions that go into the making of the final product. Sometimes there may be a case where one individual recommends that magic fluorine which makes all the difference, but this is usually the exception rather than the rule and more importantly, it's often not possible to trace that suggestion to a strictly rational thought process. Finally, it's key to realize that my suggestions themselves will very often be based as much on my knowledge of physical and organic chemistry as on any specific modeling technique. Thus, even internally it's not always possible for me to separate modeling-based suggestions from other kinds. On a related note, the frequent necessity of this interactive process is why I tend to find the "modeling vs medicinal chemistry" turf wars a little irksome (although they sometimes do help in fleshing out the details); the fact of the matter is that both kinds of scientists are ultimately an important part of successful drug discovery projects involving any kind of model-building.

Now this rather unmethodical approach makes it difficult to pin down the role of any specific technology in the drug discovery process, and this usually makes me skeptical every time someone writes a paper trying to assess "The contribution of HTS/virtual screening/fragment-based drug design in the discovery of major drugs during the last decade". But that's how drug discovery is, a hodgepodge of recommendations and tweaks made by a variety of rational and irrational human beings in an environment where any kind of idea from any corner is welcome.

The man who made it possible

There's a nice set of articles in this week's Nature celebrating the birth centenary and work of a man whose work underlies almost all of modern life - Alan Turing. Considering the complete transformation of human life that computing has enabled, Turing's work along with that of his fellow computer pioneer John von Neumann will likely be recognized as one of those watersheds in human history, comparable to the invention of the calculus and the discovery of electricity. It is remarkable to consider how many millions of software engineers and billions of dollars in revenue are being generated every day off these scientists' ideas.

The historian and writer George Dyson starts off by documenting Turing and von Neumann's basic achievements. Dyson has a book out next week on this very topic, and given his past writings on technology and computing I am looking forward to it. As Dyson tells us, the basic groundbreaking idea of Turing and von Neumann was not just a machine which performs calculations at superhuman speed. It was the concept of a shared program and the then startling idea that you could code both the data and the instructions for it in the same language (binary) in the same machine. This idea really is one of those "big ideas", simple to state but absolutely revolutionary in its impact. Turing and von Neumann's greatness thus lies not in conceiving the physical manifestation (or 'instantiation' as programmers would say) of a computing recipe but in their abstract generalization of the very idea of a programmed computer.

What is not always recognized is that Von Neumann went a step ahead and floated an even more remarkable notion, that of a machine which contains instructions for assembling copies of itself. Von Neumann immediately tied this to biology; but this was a few years before Watson and Crick discovered the specific mechanism in the form of DNA base pairing. Unlike the "dynamic duo", nobody remembers von Neumann as making a signal conceptual contribution to biology. I remember the writer John Casti lamenting the general public's lack of recognition of von Neumann as the man who first really thought of the mechanism of heredity on the general basis that mathematicians are used to. As Casti pithily put it in his wonderful book 'Paradigms Lost': "Such are the fruits of the theoretician, especially one who solves 'only' the general case". To be fair, biology is an experimental science and no amount of theorizing can nail an experimental fact, but I suspect mathematicians would widely commiserate with Casti's lament.

The biologist Sydney Brenner then follows up by recognizing Turing's contributions to biology. In 1952 he wrote what is considered the first paper on nonlinear dynamics in which he described pattern formation in chemical reactions and possibly in developmental biology. We are still trying to understand the implications of that discovery even as nonlinear dynamics itself has become an enormously fruitful field with deep applications in modeling almost any complex natural phenomenon.

Finally, mathematician Barry Cooper from the University of Leeds points out a tantalizing question stemming from Turing's work; are complex, emergent phenomena strictly computable? This is partly a question about the limits of 'strong' reductionism and one that I have explored on this blog often. We don't yet know the answer to this question, to whether we can compute higher-order phenomena starting from a few simple particles and fields bequeathed to us by the particle physicists. As we tackle problems as complex as the future of the cosmos and the source of consciousness in the twenty-first century, this question will continue to hound scientists and philosophers. It waits for an answer as profound as Turing's answer to Hilbert's famous Entscheidungsproblem.

The end of biofuels?

Hartmut Michel from the Max Planck Institute of Biophysics has an editorial (open access!) in Angewandte Chemie with a title that makes his views clear: "The Nonsense of Biofuels". He essentially comes down hard on biofuel proponents of all stripes and not just the much hyped ethanol-from-corn lobby. Michel won the Nobel Prize for cracking open the structure of one of earth's most important proteins - the photosynthetic reaction center - so he certainly knows his photosynthesis.

He starts by looking at the energy efficiency of the process. It's not always appreciated that for all its rightly deserved glory, photosynthesis is not as efficient as we think, which would indeed be the case for something that's been tuned by evolutionary fits and starts and historical contingency. For one thing, UV, IR and green light cannot be utilized by plants so that leaves out a pretty high-energy part of the spectrum. Then there's all the wonderful machinery of electron transfer and light-harvesting proteins involved in the dark and light processes. The light step essentially captures photon energy and generates NADPH and ATP, and the dark step uses this energy source and reducing potential to synthesize carbohydrates from CO2. Considering the inefficiencies inherent in using the energy of massless, transient photons, only about 12% of energy from sunlight is stored as NADPH.

Then there's the question of light intensity which seems to invoke a classic catch-22 situation. At low intensities where the process is most efficient you don't have a lot of photons by definition. But try to improve the efficiency by bumping up the intensity and you get photodamage which, in Michel's words, 3.5 billion years of evolution hasn't been able to circumvent. To avoid this photodamage, plants have to recycle one of the key proteins in photosystem II about thrice every hour which inherently limits the efficiency. Finally, the key protein in the second step, RuBisCO, has a hard time distinguishing between CO2 and O2. A significant amount of energy has to be spent in getting rid of the product formed from O2 insertion.

All these hurdles lead to a rather drastic lowering of photosynthetic efficiency which gets watered down to a rather measly (but still staggeringly efficient by human standards) 4% or so.

It's pretty clear from this description that any kind of efforts to get better efficiency from biofuels will have to overcome enormous protein engineering hurdles. This does not bode well for current studies aimed at such goals. While these are very exciting from an academic standpoint, they will have to lead to a very drastic retooling of the basic photosynthetic apparatus, involving re-engineering numerous genetic pathways and their products, to be of large-scale commercial value. It's all too easy to underestimate the sheer amount of energy that we want to generate from these technologies. I feel the same about the synthetic biology efforts that seek to produce all kinds of valuable industrial chemicals and drugs from engineering bacteria. These efforts are undoubtedly promising, but getting bacteria to do something which they have not evolved to and that too on a scale rivaling the fossil fuel industry is a very long shot indeed. Michel doesn't even seem optimistic about the recent excitement regarding biofuel production from red algae, and reading his prognosis one wonders how much collaborations such as the one between Exxon and Craig Venter are actually going to yield. And finally, there's all the alternatives which that land for biofuels and the biofuel feedstock itself can be put to, which is a question still being pondered.

Michel is much more optimistic about photovoltaics which already promise energy conversion efficiencies of more than 15%. When the end product is used in a car, photovoltaic batteries also funnel about 80% of their energy into propelling the vehicle. In addition, Michel sees promise in recent advances in battery technology leading to much higher energy density.

Personally I am a proponent of context-specific energy use. I think that considering the vast variation of resource distribution, geography, energy requirements, paying capacity and economics, it doesn't make much sense to search for any one-size-fits-all solution. But that seems to usually be the case every time someone touts a single, seemingly miraculous solution as being universally applicable around the world. I have similar thoughts about solar energy. The solutions currently available don't seem to solve the problem of transmission and availability in regions where the sun doesn't shine that much. Part of Michel's "vision" is the widespread deployment of superconducting cables which even now (more than 25 years after the discovery of "high-temperature" superconductors) seems like a minor fantasy. Notwithstanding these issues though, solar power certainly seems to have a much larger role to play in our economy than it currently does, especially in regions which get plenty of sunlight.

But biofuels? The problems there seem much more grounded in fundamental biological constraints rather than technological ones. And it's hard to overturn 3.5 billion years of evolution so I am not sure I should hold my breath. Time will tell.

Identifying new drug targets in zebrafish

Classically, when very little about molecular biology and protein structure was known, one of the best methods to discover drugs was to try to guess physiological effects by looking at drug similarity. The rationale was simple; drug A has this chemical structure and lowers blood pressure. Drug B which is being used for a completely different indication also has a similar chemical structure. Maybe we can use drug B for lowering blood pressure too?

In the absence of rational approaches this technique (which these days is called "repurposing") could be surprisingly fruitful. However, as molecular biology, crystallography and structure-based drug design took off in the 80s, rational drug discovery became much more focused on protein structure and drug developers began trying to guess drug function by looking at target similarity in terms of sequences and binding pockets.

But the relative lack of dividends from structure-based drug design (which nonetheless continues to be important) has led to a re-examination of the old paradigm. One of the most interesting advances to come out of this thinking was pioneered by Brian Shoichet and Bryan Roth's groups (at UCSF and UNC-Chapel Hill) a few years back. Their rationale was simple too; look at drug similarity, albeit using modern computational methods and metrics, and try to cross-correlate drug targets using these similarity metrics. Similar drugs should hit each other's targets. The method seems somewhat crude but has worked surprisingly well. It was even listed as one of Wired magazine's top ten scientific breakthroughs of 2009.

In a recent publication the authors take the method a step further and apply it to phenotypic screening which has emerged as an increasingly popular method in the pharmaceutical industry. Phenotypic screening is attractive because it bypasses having knowledge of the exact molecular target; basically you just inject different potential drugs in a test system and look at the results which are usually quantified by a phenotypic response such as a change in membrane potential, cell differentiation or even something as general as locomotion. Once a particular compound has elicited a particular response, we can start the arduous task of finding out what it's doing at a molecular level. Not surprisingly, several proteins can be responsible for a general response like locomotion and detecting all of them is non-trivial to say the least. It is for this rather involved exercise that the authors provide a remarkably simple potential (partial) solution.

The current study looks at phenotypic screening on the zebrafish, a favorite model organism of biologists. 14,000 molecule were screened to elicit a photomotor response in the zebrafish embryos. Out of these, about 700 were deemed active. To find out the targets for these molecules, the authors interrogated their chemical "similarity" against a large group of compounds for which targets are known. Importantly, the authors use a statistical technique to calculate an expectation value (E value) which indicates whether the similarity arises by chance alone. A lower E value means a higher likelihood of statistical significance. One of the most remarkable things in these studies is that the metric for similarity is a simple, computationally cheap 2D metric called a fingerprint which looks at the presence or absence of certain functional groups in the molecules. That such a metric can work at all is remarkable because we know that an accurate estimation of similarity should ideally include the 3D conformation that the drug presents to the protein target.

Nonetheless, 284 molecules predicted to be active on 404 targets were picked based on their low E values. Out of these, 20 molecules were especially interesting because they were seen to have novel drug-target associations not seen before. When the authors tested these molecules against the relevant targets, 11 among them had activities ranging from low nanomolar to 10 µM. For a computational method this hit rate is quite impressive, although a more realistic measure of the hit rate would have come from testing all 284 compounds. The activity of the molecules was validated by running comparisons with molecules that were known to elicit the same response from the predicted targets. Confirmation also came from competition experiments with antagonists. Some of the unexpected targets predicted include the beta-adrenergic receptor, dopamine receptors and potassium ion channels.

I find these studies very encouraging. For one thing, the computational method can potentially save a huge amount of time needed to experimentally uncover the target for every active compound. As mentioned above, it's also remarkable that a very simple metric of 2D similarity yields useful results. The success rate is impressive; however, an even lower rate would still have been worth the modest effort if it resulted in new drug-target relationships (which are far more useful than new chemically similar ligands for the same target). However I do think it would have been very interesting to look at the failures. An obvious source of failure comes from using the wrong measure of similarity; at least some compounds are failing presumably because their 2D similarity does not translate to the 3D similarity in conformation required for binding to the protein target. In addition there could be protein flexibility which would result in very different binding for supposedly similar compounds. Medicinal chemists are well aware of "activity cliffs" where small differences in chemical structure lead to great differences in activity. These cliffs could also lead to lack of binding to a predicted target.

Nevertheless, in an age when drug discovery is only getting harder and phenotypic screening seems to be an increasingly important technique, these computational approaches promise to be a useful complement. Gratifyingly, the authors have developed a website where the algorithm is available for free. The technique has also spawned a startup.

Image source

Roald Hoffmann on the futility of classifying chemists

Roald Hoffmann has an editorial (open access!) in Angewandte Chemie in which he (mostly) gently scolds those who have criticized many of the last decade's Nobel Prizes as being "insufficiently chemical". I agree with him that any kind of preconceived expectations about who should get the Nobel Prize tries to fit chemistry into a straitjacket and denies scientists who may not have been trained in traditional chemistry departments the right to call themselves chemists.

As I have written elsewhere, it's partly the changing nature of what's considered important in chemical research that has shaped the face of the chemistry Nobel Prize since it was first awarded. With biology being the most exciting science of the twenty-first century and chemistry playing a foundational role in its progress, it is inevitable that more biologists are going to get chemistry prizes. And for those who may be uncomfortable with the prize awarded to biology-oriented research in the last decade, Hoffmann's observation that biology has been recognized much less over the last thirty years may provide some solace.

But any such qualms are beside the point. As Hoffmann says, the variety of chemistry Nobels given out over the years simply demonstrates the sheer reach of chemistry into multiple fields of biology, physics and even engineering. As we enter the second decade of the new century there's little doubt that fields traditionally associated with physics or engineering may increasingly be recognized by all kinds of chemistry prizes.

"Ubiquitin and the ribosome, fluorescent proteins and ion channels are as fundamentally chemical as metal surfaces, enantioselective catalysts, olefin metathesis, or, just to name some fields squarely in our profession that should be (or should have been) recognized, laser chemistry, metal–metal multiple bonding, bioinorganic chemistry, oral contraception, and green or sustainable chemistry."


And ultimately he emphasizes something that we should all constantly remind each other. It's a prize, awarded by human beings. It's an honor all right, but it does very little to highlight the objective value of the research which is usually evident far before the actual recognition. The fact that we were informally nominating Robert Grubbs or Roger Tsien years before they received the prizes makes it clear that no prize was really going to change our perception of how important their work was. Today we look at Tsien's research on green fluorescent protein with the same joyful interest that we did ten years ago.


Hoffmann sees the principal function of the Nobel Prize as providing an incentive for young students and researchers from scientifically underprivileged countries, and he cites the examples of Kenichi Fukui and Ahmed Zewail inspiring their fellow countrymen. The Nobel Prize certainly serves this function, but I have always been a little wary of pitching the benefits of scientific research by citing any kind of prize. The fact is that most people who do interesting research will never win the Nobel Prize and this does nothing to undervalue the importance of their work. So even from a strictly statistical standpoint, it would continue to be much more fruitful to point out the real benefits of science to young people- as a means of understanding the world and having fun while you are at it. Prizes may or may not follow.


Hat tip: Excimer


Image source

Book review: Philip Anderson's "More and Different"

The Nobel Prize-winning physicist Philip Warren Anderson is one of those rare species - a scientist who is not only world-class in his own field but who seems capable of saying something interesting about virtually every topic under the sun. His career at Bell Labs overlapped with the lab's most illustrious period and apart from his prizewinning work in solid-state physics, Anderson has made groundbreaking contributions to at least two other diverse fields - particle physics (he was actually the first one to suggest the existence of the Higgs boson) and the epistemology of science. In this book he holds forth on a wide variety of subjects ranging from postmodernism to superconductivity. The chapters consist of book reviews, commemorative essays, transcripts of talks, opinion pieces and a variety of other writings over the past five decades. In every chapter there are at least a few rather deep statements which deserve close scrutiny.

The book is roughly divided into three parts. The first part details Anderson's views on the history and philosophy of science including his own field - solid-state physics. The second part talks about Anderson's reminiscences and thoughts on his scientific peers, mostly in the form of book reviews that he has written for various magazines and newspapers. The third part deals with science policy and politics and the fourth is dedicated to "attempts" at popularizing science.

Some of the chapters are full of scientific details and can be best appreciated by physicists but there's also a lot of fodder for the layman in here. A running thread through several essays is Anderson's criticism of ultra-reductionism in science which is reflected in the title of the book, "More and Different". Anderson's basic viewpoint is that more is not just quantitatively but qualitatively different from less. In 1972 he made a splash by discussing in an article in Science magazine how "higher-level" sciences are based on their own fundamental laws which cannot be reduced to physics. In the book he details this philosophy through several examples from physics, chemistry, biology and psychology. He does not deny the great value of reductionism in the development of modern science but he incisively explores its limits.

Other chapters contain critiques of the latest fads in physics including string theory. Anderson bemoans string theory's lack of connection to concrete experiment and its failure to predict unique, robust solutions. He makes it clear that string theory is really mathematics and that it fails to adhere to the tried and tested philosophy of science which has been successful for almost five hundred years. Other chapters have insightful commentary on the role of mathematics in physics, Bayesian probability and physics at Bell Labs. A particularly amusing essay critiquing the current funding situation in the United States proposes a hypothetical alternative history of quantum mechanics in the US, where scientific pioneers like Dirac and Heisenberg may not have been able to do groundbreaking research because of the publish-or-perish environment and the dominance of the old guard.

There's also some valuable material in here about the sociology of science. This is exemplified by an especially insightful and detailed chapter on scientific fraud where Anderson explores the reasons why some scientists commit fraud and others don't expose it as widely as they should. In Anderson's opinion the most valuable method to expose fraud is to ask whether it destroys what he calls the "seamless web of science" - the existing framework of fundamental laws and principles that allow relatively little room for revolutionary breakthroughs on a regular basis. In many cases the web's integrity is clearly not consistent with the new finding, and the rare case where the web can subsume the new discovery and still stay intact leads us into genuinely new scientific territory. He also takes scientists to task for failing to point out the destruction of this seamless web by apparently far-reaching but fundamentally flawed new discoveries. In other chapters Anderson also comes down hard on the postmodernism distortion of science, critiquing such philosophers as Nancy Cartwright and upholding the views of debunkers like Alan Sokal. He also has some valuable commentary on science policy, especially on Star Wars and missile defense. Other writers have written much more detailed critiques of such programs, but Anderson succinctly demonstrates the limitations of the concept using commonsense thinking (The bottom line: Decoys can easily foil the system and a marginal improvement by the offense will result in a vastly increased cost for the defense).

Finally, the book contains mini sketches of some of Anderson's peers who happened to be some of the great scientific minds of the twentieth century. Anderson reviews books by and about Richard Feynman, Murray Gell-Mann, Stuart Kauffman, Stephen Hawking, Roger Penrose, John Bardeen and William Shockley among others. I happen to agree with him that books by scientists like Hawking, Penrose and Greene, while fascinating to read, paint a rather biased picture of physics and science. For one thing, they usually oversell the whole reductionist methodology in their constant drive to advertise the "Theory of Everything". But more importantly, they make it sound like particle physics and cosmology are the only games in town worth thinking about and that everything else in physics is done on the periphery. This is just not true. As Anderson makes it clear, there are lots of fields of physics including condensed matter physics, biophysics and non-linear dynamics which contain questions as exciting, fundamental and research-worthy as anything else in science. As just one example, classical physics was considered a staid old backwater of the physics world until chaos burst upon the scene. It's also clear, as was the case with chaos, that some of the most exciting advances will come from non-physicists. There are foundational phenomena and rich dividends to be mined from the intersection of physics with other fields in the twenty-first century.

Anderson's book might precisely be the kind of writing ignored by the public because they are too taken with the Hawkings, Greenes and Randalls. To those folks this volume would be an essential and healthy antidote. There's something in there for everyone, and it makes it clear that science still presents infinite horizons on every level. After all, more is different.