Field of Science

Scientific challenges in drug discovery (Part 1): We are still infants

In the face of a flagging pharmaceutical industry and depleted pipeline of new drugs, you will inevitably find someone who tries to put what he or she thinks is a positive spin on the whole depressing situation. That person will say something to the effect that it's probably ok if we don't have more drugs since we already have very good drugs for many disorders and reasonably good ones for most others, so we shouldn't worry too much if we don't continue to come up with miracle breakthroughs.

The recent deaths of Steve Jobs (56) and Ralph Steinman (63, who tragically died only two days before he was awarded the Nobel Prize) should put any such pseudocheerful beliefs to rest. These were people who had access to the best treatment money can buy. Jobs could have afforded any treatment anywhere in the world and Steinman was a professor at one of the world's leading medical research institutes. Yet both were mowed down in the prime of their lives and careers by pancreatic cancer, a disease for which our current pharmaceutical tools are as blunt and primitive as stones and twigs were for fighting wars. Cancer is the great equalizer, unsparing and all-encompassing enough to compete with death itself as one of the great facts of life.

Pancreatic cancer however is just one ailment which we are light years from conquering. Include many other cancers, Alzheimer's disease and ALS, antibiotic-resistant infections and diseases like malaria that are still rampant in developing countries, and it's quite clear that nobody should have to make a case for a thriving pharmaceutical and biotech industry that needs every inch of financial and societal support that we can muster. That we still have to do so is of course a tragic reflection on the state of our myopic short-term vision and misguided financial goals.

But even if we disregard this massive mass of humanity that still very much needs the fruits of pharmaceutical research, there are still enough unsolved problems and challenges on a purely scientific level that should keep researchers - both in academia and industry- hungry for more ideas and solutions. These challenges are longstanding and we can be assured that, whatever happens to drug research in the new century, they will always stick around waiting for us to tackle them. The scope and diversity of these problems widely vary, but you would be hard-pressed to find a researcher who is satisfied with their current status. In this post and the next, I hope to briefly dwell on some problems that we will surely need to solve if we want to radically improve the chances of discovering a potent organic molecule and taking it to the market as a drug.

1. We are still far from being able to predict toxic side effects for any drug

As those in the know are well aware, most drugs fail in the clinic because of unacceptable adverse effects and it's obvious that being able to predict toxic side effects will work wonders for medical science. The truth however is that it's rather shameful that, even after achieving so much sophistication in our knowledge of biology and chemistry, exuberant advertisements for drugs have to be tempered with a laundry list of side-effects (proclaimed sotto voce of course) that outnumber the benefits by at least five to one. It's rather shameful that all these years of unprecedented funding for cancer have resulted in hundreds of clinical compounds, even the best of which evidence rather horrible side effects like nausea, immune system debilitation and loss of fertility. No wonder the bar for the approval of cancer therapeutics is so low; any higher and almost nothing from our current list would make the cut. If this is really the best we can do to fight cancer, then the war on cancer has not even started. A Martian who has conquered most diseases in his own land would call us primitive savages if he saw our present arsenal of anticancer drugs.

As far as prediction goes, we are barely able to predict the effects of unknown drugs with a modest degree of success, and that too using statistical models based on empirical data. Calculating side-effects from first principles is pretty much impossible right now. We will have to develop exceedingly clever and complicated model systems approximating whole-organism physiology to do any kind of respectable toxicity prediction. Of course we don't have to predict side effects in order to minimize them, but any such kind of optimization is currently largely a matter of trial and error. Black box approaches can only get us so far and at some point we will need a healthy understanding of the interplay of various proteins and systems in our body that contributes to toxicity. Systems biology could help us pin down these interactions, but ultimately there would be no substitute for a detailed understanding of biology at a molecular level. For now doing all this is a dream, and toxicity prediction remains one of the great challenges of drug discovery that should keep researchers busy for decades.

2. We still cannot accurately predict the free energy of binding of an arbitrary small molecule to an arbitrary protein from first principles

From clinical-level challenges to the most basic of problems, this one essentially being one in physical chemistry. It is a commentary on both the limitations and the opportunities inherent in computational modeling of biochemical systems that we have barely started to scratch the surface of being able to understand, let alone predict, the myriad factors that go into dictating the free energy of binding of a small molecule to a protein. Part of the problem is just the sheer number of variables involved, from the conformational complexity of ligand and protein to the behavior of solvent and solvation of the various parts (vida infra) to the plethora of energetic interactions that any two molecules have with each other.

The other problem involves a battle against nature herself. Because of the exponential dependence of the binding constant on the free energy of binding, errors as small as 1 kcal/mol can significantly under or overpredict small molecule binding. This problem itself subsumes a difficulty that is inherent to any kind of system susceptible to perturbation by minute changes- being able to predict small differences between large numbers; whether the numbers are economic statistics, variables influencing the weather or in this case, free energies of binding.

How far have we come in being able to predicting these energies? The good news is that we now understand a lot more of the physical chemistry of protein ligand binding than before. We have a rather good understanding of the statistical mechanics framework involved in calculating free energies. Using model systems, we have been able to do a reasonably good job of classifying the various factors - hydrogen bonding, the hydrophobic effect, entropy, electrostatic interactions- that encapsulate these energies. The other good news is that phenomenal improvements in hardware and software continue to allow us to push the boundaries of accuracy.

The bad news is that listing the contributing factors is like listing the genes in a human genome without understanding their function. For instance, listing "entropy" as a factor and being able to calculate the entropy of the protein are two quite different things, with the latter still being largely beyond our capabilities. As with other chemical systems, being able to predict protein ligand systems means predicting the precise contribution (even as a rough percentage) of each one these factors to a single resultant number. And even if we were able to calculate these variables in theory, implementing this theoretical framework inevitably involves patching all the factors together in a model. Using the model inevitably involves parameterizing it. And parametrization is a well-known fickle beast, subject to statistical and systematic errors. The worse news is that when it comes to complex systems subject to so many causes, model building is always going to be our best bet. So we can only hope for the time when we have essentially unlimited computing power and have been able to either parametrize our model to kingdom come (without overparameterizing it) or have been able to implement every minute part of the physics of protein-ligand binding in our model. While I would love to have the latter, I would be more than happy to settle for the former if it ever happens.

3. We understand very little about the behavior of solvents, especially water

This is a significant factor in its own right to separate it from the second point. My optimism about the prospects of computational modeling of proteins and molecules in general took an exponential leap downwards when I schooled myself about solvent behavior and realized that we lack the resources to calculate accurate solvation energies even for simple organic molecules, let alone proteins. It's one of life's enduring mysteries; that which is most familiar succumbs the least to our efforts in understanding it- in this case that elusive entity would be water.

It's become something of a cliche to say that water is an anomalous solvent essential to life and yet we understand so little of its depths, but the gloomy implication of this for drug discovery is that we have always been struggling to incorporate this essential factor in our understanding of biomolecular systems. The problem is again essentially one in physical chemistry and has withstood decades of computational and experimental assault. The earliest attempts to incorporate water into molecular simulation simply involved...ignoring it. Not exactly ignoring it, but replacing its discrete tapestry with a continuous electric field that duplicated its dielectric constant. This effort gave short shrift to both the role of discrete water molecules in mediating molecular interactions as well as the shimmering dynamic hydrogen bond network that is the soul of water's behavior. I am hoping that three decades from now we will look back and laugh at the barbaric simplicity of this approximation, but we have to be excused for attempting this feat in the absence of massive computing power (which could handle thousands of discrete water molecules). And we must confess we did it for a very simple reason - it worked. Even today, these so-called "implicit solvation" models can give us surprisingly satisfactory results for many systems (partly due to cancellation of errors, but let's not go there).

But the implicit solvent consigned water to become the proverbial elephant in the room and researchers, especially those developing molecular dynamics (MD) techniques, strove to replace this continuum with "real" water molecules. But in the world of modeling, even "real" water molecules correspond to models with calculated point charges, dipole moments mimicking the polarizability of the water molecule and so on and these models largely ignore the special behavior that particular water molecules in a system may evidence. Nonetheless, these modeled waters are widely used today in molecular dynamics simulations and massive computing power can now allow us to routinely handle thousands of such entities.

But the elephant has not left the room. Revealing experimental approaches (mainly spectroscopic) in the last few years have painted a very different picture for water in the bulk compared to water surrounding a biomolecule like a protein. Not surprisingly, water surrounding a protein is less mobile and more viscous than that in the bulk. This allows the protein to project something like a "ghost field", a watery extension of its shape and form into the surrounding solvent. This proxy effect can cause other molecules in the vicinity to respond, although its precise effects are still not known. This also brings us to another related big elephant which we will discuss in the next post- the understanding of molecular behavior in the cell as opposed to in a test tube or on the computer. For now it suffices to say that water in cells behaves very differently from water in dilute solution. Its electrostatics, hydrophobicity, composition and hydrogen bonding network are very different. And we have barely started scratching the surface of this "crowding" that is the hallmark of cellular systems.

A related unsolved problem is the behavior of discrete water molecules in protein active sites. We already know from our knowledge of enzymatic catalysis that water can play an essential role in enzyme reactions. In addition, new evidence indicates that hydrophobic interactions can lead to "dewetting" or the sudden expulsion of water between surfaces. Crystal structures usually don't have enough of resolution to clearly pinpoint the locations of bound water molecules. But the real problem is that we still don't know how to accurately predict the thermodynamic and other features of such trapped water. There have been some notable recent advances which attempt to calculate the thermodynamics of enclosed discrete water but these are just getting started. Ultimately the problem boils down to taking theories describing average bulk behavior and using them to calculate specific instances, a problem well-known to statisticians.

It should go without saying that we cannot aim for a realistic understanding of the physics and chemistry of small molecule-protein interactions before we can understand the behavior of water and predict solvation energies of simple organic molecules. But the truth of the matter is that modeling of biomolecular systems has proceeded largely in the absence of such detailed understanding but in the presence of ever so many clever tricks, ad hoc parameterization efforts, models shored up by experimental data and a "Shut up and just use it" attitude. And all this may be justified if our end goal is to find new drugs. But we haven't discovered very many, and while this cannot mainly be blamed on not being able to model water, there is little doubt that a better understanding of solvation will help not just computational modeling but also other drug discovery related activities like formulation, dosing and storage of drug molecules.

In the next part we will look at some other fundamental scientific challenges in drug discovery. Perhaps the people who are disillusioned by the current pharmaceutical bloodbath can take perverse pleasure in the fact that, even when the last scientist has been laid off by the last pharmaceutical organization, these problems will still be standing and needing our attention. When it comes to satisfactorily solving these problems, we are still infants.

Next post: Predicting crystal structures, understanding intracellular interactions, deconstructing the drug-protein-gene connection and more. Stay tuned.


  1. Prediction is one thing but there are also experimental challenges such as measurement of intracellular free concentrations of drugs in (live) humans. Measuring desolvation of proteins is hardly a solved problem but we need this information in order to check our models.

  2. Great post and a great read. We are still infants really.

    No doubt there are more/better drugs to discover, primarily against some very difficult targets - pancreatic cancer has long been a tough one. However, big pharma had been basing its profits on continually putting out many NCEs every year, but that simply is not possible to maintain, especially in the face of such difficult therapeutic areas.

  3. Awesome post -- we are still infants indeed.

    Of course, if you talk to Kurzweil et al., they would tell you we will solve all these problems in the next five years through the application of massive computing power, at which point we will then live forever. Which merely illustrates the peculiar way that computer science folk seem to think about biology...

  4. Yes, it's interesting how CS folks think about biology. The network biology people for instance will occasionally quip that understanding metabolic or neurological networks is like plotting a wiring diagram. Knowing the connections is pretty much tantamount to...well, knowing. But take into account things like the inhibitory/excitatory nature of neurons or allosteric inhibition and network redundancy and you are in a pickle that goes beyond merely understanding mapping connectivity.

  5. I would like to dispute your statement “As those in the know are well aware, most drugs fail in the clinic because of unacceptable adverse effects”

    John Arrowsmith upon reviewing Phase III submission and failures between 2007 and 2010 found 66% were due to lack of efficacy and 21% were due to safety concerns. (

    Similarly, for Phase II trials, failures attributed to safety issues accounted for 19% , while lack of efficacy accounted for about half of the failures. (

    While I agree that drug development isn’t perfect in determining the safety profile in advance of clinical testing, I would argue that it is lack of efficacy that accounts for the majority of failures.

  6. Sorry: knowing the connections (in the brain) is NOT pretty much tantamount to . . . well knowing. There's no question it would help. For why it isn't enough see


  7. Yes, that's what I said in my comment. It's the network biologists and not me who think that knowing the connections is tantamount to knowing. Not all of them, there are some honest attempts to relate the connectivity to events at the molecular level, but the fact is we still don't know enough about feedback in cancer pathways for instance to develop targeted therapeutics with predictable effects, leading to the lack of efficacy that the above commenter cited.


Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="">FoS</a> = FoS