The Curious Wavefunction: October 2011

Overturning hydrophobic assumptions

By Wavefunction on Monday, October 31, 2011

One of the most fun things about chemistry is that for every laundry list of examples, there is always a counterexample. The counterexample does not really violate any general principles, but it enriches our understanding of the principle by demonstrating its richness and complexity. And it keeps chemists busy.

One such key principle is the hydrophobic effect, an effect with an astounding range of applicability, from the origin of life to cake baking to drug design. Textbook definitions will tell you that the signature of the "classical" hydrophobic effect is a negative heat capacity change resulting from the union of two unfavorably solvated molecular entities. The nonpolar surface area of the solute is usually proportional to the change in heat capacity. The textbooks will also tell you that the hydrophobic effect is favorable principally because of entropy; the displacement of "unhappy" water molecules that are otherwise uncomfortably bound up in solvating a solute contributes to a net favorable change in free energy. Remember, free energy is composed of both enthalpy and entropy (∆G = ∆H - T∆S) and it's the latter term that's thought to lead to hydrophobic heaven.

But not always. Here's a nice example of a protein-ligand interaction where the improvements in free energy across a series of similar molecules comes not from entropy but from improved enthalpy with the entropy actually being unfavorable. A group from the University of Texas tested the binding of a series of tripeptides against the Grb2 protein SH2 domain. The exact details of the protein are not important; what's important is that the molecules only differed in the size of the cycloalkane ring in the central residue of the peptide- going from a cyclopropane to a cyclohexane. They found that the free energy of binding improves as you go from a 3-membered to a 5-membered ring but not for the reason you expect, namely a greater hydrophobic effect and entropic gain from the larger and more lipophilic rings.

Instead, when they experimentally break down the free energy into enthalpy and entropy using isothermal titration calorimetry (ITC), they find that all the gain in free energy is from enthalpy. They find that every extra methylene group contributes about 0.7 kcal/mol to the interaction. In fact the entropy becomes unfavorable, not favorable as you move up the series. There's another surprise waiting in the crystal structures of the complexes. There are a couple of ordered water molecules stuck in some of the complexes. Ordered water molecules are fixed in one place and are "unhappy", so you would expect these complexes to display unfavorable free energy. Again, you would be surprised. It's the ones without ordered water molecules that have worse free energy. The nail in the coffin of conventional hydrophobic thinking is driven by the observation that the free energy does not even correlate with decreased heat capacity, something that's supposed to be a hallmark of the "classical" hydrophobic effect.

Now it's probably not too surprising to find the enthalpy being favorable; after all as they note, you are making more Van der Waals contacts with the protein with larger rings and greater nonpolar surface area. But in most general cases this value is small, and the dominant contribution to the free energy is supposed to come from the "classical" hydrophobic effect with attendant displacement of waters. Not in this case where enthalpy dominates and entropy worsens. They don't really speculate much on why this may be happening. One factor that comes to my mind is the flexibility of the protein. The improved contacts between the larger rings and the protein may well be enforcing rigidity in the protein, leading to a sort of "ligand enthalpy - protein entropy" compensation. Unfortunately a comparison between bound and unbound protein is precluded by the fact that the free protein forms not a monomer but a domain-swapped dimer. In this case I think that molecular dynamics simulations might be able to shed some light on the flexibility of the free protein compared to the bound structures; it might especially be worthwhile to do this exercise in the absence of the apo structure

Nonetheless, this study provides a nice counterexample to the conventional thermodynamic signature of the hydrophobic effect. The textbooks probably don't need to be rewritten anytime soon, but chemists will continue to be frustrated, busy and amused as they keep trying to tame these unruly creatures, the annoying wrinkles in the data, into an organized whole.

Myslinski, J., DeLorbe, J., Clements, J., & Martin, S. (2011). Protein–Ligand Interactions: Thermodynamic Effects Associated with Increasing Nonpolar Surface Area Journal of the American Chemical Society DOI: 10.1021/ja2068752

Scientific challenges in drug discovery (Part 1): We are still infants

By Wavefunction on Wednesday, October 19, 2011

In the face of a flagging pharmaceutical industry and depleted pipeline of new drugs, you will inevitably find someone who tries to put what he or she thinks is a positive spin on the whole depressing situation. That person will say something to the effect that it's probably ok if we don't have more drugs since we already have very good drugs for many disorders and reasonably good ones for most others, so we shouldn't worry too much if we don't continue to come up with miracle breakthroughs.

The recent deaths of Steve Jobs (56) and Ralph Steinman (63, who tragically died only two days before he was awarded the Nobel Prize) should put any such pseudocheerful beliefs to rest. These were people who had access to the best treatment money can buy. Jobs could have afforded any treatment anywhere in the world and Steinman was a professor at one of the world's leading medical research institutes. Yet both were mowed down in the prime of their lives and careers by pancreatic cancer, a disease for which our current pharmaceutical tools are as blunt and primitive as stones and twigs were for fighting wars. Cancer is the great equalizer, unsparing and all-encompassing enough to compete with death itself as one of the great facts of life.

Pancreatic cancer however is just one ailment which we are light years from conquering. Include many other cancers, Alzheimer's disease and ALS, antibiotic-resistant infections and diseases like malaria that are still rampant in developing countries, and it's quite clear that nobody should have to make a case for a thriving pharmaceutical and biotech industry that needs every inch of financial and societal support that we can muster. That we still have to do so is of course a tragic reflection on the state of our myopic short-term vision and misguided financial goals.

But even if we disregard this massive mass of humanity that still very much needs the fruits of pharmaceutical research, there are still enough unsolved problems and challenges on a purely scientific level that should keep researchers - both in academia and industry- hungry for more ideas and solutions. These challenges are longstanding and we can be assured that, whatever happens to drug research in the new century, they will always stick around waiting for us to tackle them. The scope and diversity of these problems widely vary, but you would be hard-pressed to find a researcher who is satisfied with their current status. In this post and the next, I hope to briefly dwell on some problems that we will surely need to solve if we want to radically improve the chances of discovering a potent organic molecule and taking it to the market as a drug.

1. We are still far from being able to predict toxic side effects for any drug

As those in the know are well aware, most drugs fail in the clinic because of unacceptable adverse effects and it's obvious that being able to predict toxic side effects will work wonders for medical science. The truth however is that it's rather shameful that, even after achieving so much sophistication in our knowledge of biology and chemistry, exuberant advertisements for drugs have to be tempered with a laundry list of side-effects (proclaimed sotto voce of course) that outnumber the benefits by at least five to one. It's rather shameful that all these years of unprecedented funding for cancer have resulted in hundreds of clinical compounds, even the best of which evidence rather horrible side effects like nausea, immune system debilitation and loss of fertility. No wonder the bar for the approval of cancer therapeutics is so low; any higher and almost nothing from our current list would make the cut. If this is really the best we can do to fight cancer, then the war on cancer has not even started. A Martian who has conquered most diseases in his own land would call us primitive savages if he saw our present arsenal of anticancer drugs.

As far as prediction goes, we are barely able to predict the effects of unknown drugs with a modest degree of success, and that too using statistical models based on empirical data. Calculating side-effects from first principles is pretty much impossible right now. We will have to develop exceedingly clever and complicated model systems approximating whole-organism physiology to do any kind of respectable toxicity prediction. Of course we don't have to predict side effects in order to minimize them, but any such kind of optimization is currently largely a matter of trial and error. Black box approaches can only get us so far and at some point we will need a healthy understanding of the interplay of various proteins and systems in our body that contributes to toxicity. Systems biology could help us pin down these interactions, but ultimately there would be no substitute for a detailed understanding of biology at a molecular level. For now doing all this is a dream, and toxicity prediction remains one of the great challenges of drug discovery that should keep researchers busy for decades.

2. We still cannot accurately predict the free energy of binding of an arbitrary small molecule to an arbitrary protein from first principles

From clinical-level challenges to the most basic of problems, this one essentially being one in physical chemistry. It is a commentary on both the limitations and the opportunities inherent in computational modeling of biochemical systems that we have barely started to scratch the surface of being able to understand, let alone predict, the myriad factors that go into dictating the free energy of binding of a small molecule to a protein. Part of the problem is just the sheer number of variables involved, from the conformational complexity of ligand and protein to the behavior of solvent and solvation of the various parts (vida infra) to the plethora of energetic interactions that any two molecules have with each other.

The other problem involves a battle against nature herself. Because of the exponential dependence of the binding constant on the free energy of binding, errors as small as 1 kcal/mol can significantly under or overpredict small molecule binding. This problem itself subsumes a difficulty that is inherent to any kind of system susceptible to perturbation by minute changes- being able to predict small differences between large numbers; whether the numbers are economic statistics, variables influencing the weather or in this case, free energies of binding.

How far have we come in being able to predicting these energies? The good news is that we now understand a lot more of the physical chemistry of protein ligand binding than before. We have a rather good understanding of the statistical mechanics framework involved in calculating free energies. Using model systems, we have been able to do a reasonably good job of classifying the various factors - hydrogen bonding, the hydrophobic effect, entropy, electrostatic interactions- that encapsulate these energies. The other good news is that phenomenal improvements in hardware and software continue to allow us to push the boundaries of accuracy.

The bad news is that listing the contributing factors is like listing the genes in a human genome without understanding their function. For instance, listing "entropy" as a factor and being able to calculate the entropy of the protein are two quite different things, with the latter still being largely beyond our capabilities. As with other chemical systems, being able to predict protein ligand systems means predicting the precise contribution (even as a rough percentage) of each one these factors to a single resultant number. And even if we were able to calculate these variables in theory, implementing this theoretical framework inevitably involves patching all the factors together in a model. Using the model inevitably involves parameterizing it. And parametrization is a well-known fickle beast, subject to statistical and systematic errors. The worse news is that when it comes to complex systems subject to so many causes, model building is always going to be our best bet. So we can only hope for the time when we have essentially unlimited computing power and have been able to either parametrize our model to kingdom come (without overparameterizing it) or have been able to implement every minute part of the physics of protein-ligand binding in our model. While I would love to have the latter, I would be more than happy to settle for the former if it ever happens.

3. We understand very little about the behavior of solvents, especially water

This is a significant factor in its own right to separate it from the second point. My optimism about the prospects of computational modeling of proteins and molecules in general took an exponential leap downwards when I schooled myself about solvent behavior and realized that we lack the resources to calculate accurate solvation energies even for simple organic molecules, let alone proteins. It's one of life's enduring mysteries; that which is most familiar succumbs the least to our efforts in understanding it- in this case that elusive entity would be water.

It's become something of a cliche to say that water is an anomalous solvent essential to life and yet we understand so little of its depths, but the gloomy implication of this for drug discovery is that we have always been struggling to incorporate this essential factor in our understanding of biomolecular systems. The problem is again essentially one in physical chemistry and has withstood decades of computational and experimental assault. The earliest attempts to incorporate water into molecular simulation simply involved...ignoring it. Not exactly ignoring it, but replacing its discrete tapestry with a continuous electric field that duplicated its dielectric constant. This effort gave short shrift to both the role of discrete water molecules in mediating molecular interactions as well as the shimmering dynamic hydrogen bond network that is the soul of water's behavior. I am hoping that three decades from now we will look back and laugh at the barbaric simplicity of this approximation, but we have to be excused for attempting this feat in the absence of massive computing power (which could handle thousands of discrete water molecules). And we must confess we did it for a very simple reason - it worked. Even today, these so-called "implicit solvation" models can give us surprisingly satisfactory results for many systems (partly due to cancellation of errors, but let's not go there).

But the implicit solvent consigned water to become the proverbial elephant in the room and researchers, especially those developing molecular dynamics (MD) techniques, strove to replace this continuum with "real" water molecules. But in the world of modeling, even "real" water molecules correspond to models with calculated point charges, dipole moments mimicking the polarizability of the water molecule and so on and these models largely ignore the special behavior that particular water molecules in a system may evidence. Nonetheless, these modeled waters are widely used today in molecular dynamics simulations and massive computing power can now allow us to routinely handle thousands of such entities.

But the elephant has not left the room. Revealing experimental approaches (mainly spectroscopic) in the last few years have painted a very different picture for water in the bulk compared to water surrounding a biomolecule like a protein. Not surprisingly, water surrounding a protein is less mobile and more viscous than that in the bulk. This allows the protein to project something like a "ghost field", a watery extension of its shape and form into the surrounding solvent. This proxy effect can cause other molecules in the vicinity to respond, although its precise effects are still not known. This also brings us to another related big elephant which we will discuss in the next post- the understanding of molecular behavior in the cell as opposed to in a test tube or on the computer. For now it suffices to say that water in cells behaves very differently from water in dilute solution. Its electrostatics, hydrophobicity, composition and hydrogen bonding network are very different. And we have barely started scratching the surface of this "crowding" that is the hallmark of cellular systems.

A related unsolved problem is the behavior of discrete water molecules in protein active sites. We already know from our knowledge of enzymatic catalysis that water can play an essential role in enzyme reactions. In addition, new evidence indicates that hydrophobic interactions can lead to "dewetting" or the sudden expulsion of water between surfaces. Crystal structures usually don't have enough of resolution to clearly pinpoint the locations of bound water molecules. But the real problem is that we still don't know how to accurately predict the thermodynamic and other features of such trapped water. There have been some notable recent advances which attempt to calculate the thermodynamics of enclosed discrete water but these are just getting started. Ultimately the problem boils down to taking theories describing average bulk behavior and using them to calculate specific instances, a problem well-known to statisticians.

It should go without saying that we cannot aim for a realistic understanding of the physics and chemistry of small molecule-protein interactions before we can understand the behavior of water and predict solvation energies of simple organic molecules. But the truth of the matter is that modeling of biomolecular systems has proceeded largely in the absence of such detailed understanding but in the presence of ever so many clever tricks, ad hoc parameterization efforts, models shored up by experimental data and a "Shut up and just use it" attitude. And all this may be justified if our end goal is to find new drugs. But we haven't discovered very many, and while this cannot mainly be blamed on not being able to model water, there is little doubt that a better understanding of solvation will help not just computational modeling but also other drug discovery related activities like formulation, dosing and storage of drug molecules.

In the next part we will look at some other fundamental scientific challenges in drug discovery. Perhaps the people who are disillusioned by the current pharmaceutical bloodbath can take perverse pleasure in the fact that, even when the last scientist has been laid off by the last pharmaceutical organization, these problems will still be standing and needing our attention. When it comes to satisfactorily solving these problems, we are still infants.

Next post: Predicting crystal structures, understanding intracellular interactions, deconstructing the drug-protein-gene connection and more. Stay tuned.

GPCR modeling: The devil hasn't left the details

By Wavefunction on Tuesday, October 11, 2011

The last decade has been a bonanza decade for the elucidation of structures of G Protein-Coupled Receptors (GPCRs), culminating with the landmark structure of the first GPCR-G protein complex published a few weeks ago. With 30% of all drugs targeting these proteins and their involvement in virtually every key aspect of health and disease, GPCRs remain glowingly important targets for pure and applied science.

Yet there are miles to go before we sleep. Although we now have more than a dozen structures of half a dozen GPCRs in various states (inactive, active, G-protein coupled), there are still hundreds of GPCRs whose structures are not known. The existing GPCRs all fall into the 'Class A' GPCRs. We still have to mine the vast body of Class B and C GPCRs which comprise a huge number of functionally relevant proteins. The crystal structures which we do have comprise an invaluable resource but from the point of view of drug discovery, we still don't have enough.

In the absence of crystal structures, homology modeling wherein a protein of high sequence homology is used to build a computational model for an unknown structure has been the favorite tool of modelers and structural biologists. Homology modelers were recently provided an opportunity to pit their skills against nature when a contest asked them to predict the structures of the D3 and CXCR4 receptors just before the real x-ray structures came out. Both proteins are important targets involved in multiple processes like neurotransmission, depression, psychoses, cancer and HIV infection. The D3 structure prediction involved predicting the ligand-bound structure of the protein complexed with eticlopride, a D3 antagonist.

The results of the contest have been published before, but in a recent Nature Chemical Biology paper, a team led by Brian Shoichet (UCSF) and Bryan Roth (UNC-Chapel Hill) perform another test of homology modeling, this time connected to the ability to virtually screen potential D3 receptor ligands and discover novel active molecules with interesting chemotypes.

Two experiments provided the comparison. One protocol used the D3 homology model to screen about 3 million compounds by docking, out of which about 20 were picked and tested in assays based on docking scores and inspection. The homology model was built on the basis of the published structure of the ß2 adrenergic receptor which has been structurally heavily studied. Then, after the x-ray structure of the D3 was released, they repeated the virtual screening protocol with the crystal structure; again, 3 million compounds out of which roughly 20 were picked and tested.

First the somewhat surprising and heartening result; both homology model and crystal structure demonstrated similar hit rates- about 20%. In both the cases the actual affinity of the ligands ranged from about 200 nM - 3 µM. In addition, the screen revealed some novel chemotypes that did not resemble known D3 antagonists (although not surprisingly, some hits were similar to eticlopride). As an added bonus, the top ranked ligands using the homology model did not measurably inhibit the template ß2 adrenergic receptor, which means that the homology model probably did not retain the "memory" of the original template.

Now for the bee in the bonnet. The very fact that the homology model and the crystal structure produced different hits means that the two models were not identical (only one hit overlapped between the two). Of course, it's too much to expect a model of a protein with thousands of moving parts to be identical to the experimental structure, but it goes to show how careful homology modeling has to be performed and how it can still be imperfect. What is more disturbing is that the differences between the model and the crystal structure responsible for the different hits were small; in one case the difference between two carbons was only 1 Å between the two models. Other amino acids differed by less than that.

And all this even after generating a stupendous number of models of unbound and ligand-bound protein. As the paper says, the team generated about 98 million initial ligand-bound homology models. Screening the top models among these involved generating multiple conformations and binding modes of the 3 million compounds; the total number of discrete protein-ligand complexes resulting from this exercise numbered about 2 trillion. That such kind of evaluation is possible is a tribute to the enormous computing power we have at our fingertips. But it's also a commentary on how relatively primitive our models are so that we are still at a loss to predict minute structural differences with significant consequences in finding new active molecules.

So where does this lead us? I think it's really useful to be able to perform such comparisons between homology models and crystal structures and we can only hope more such comparisons will be possible by virtue of an increasing pipeline of GPCR structures. Yet these exercises demonstrate how challenging it is to generate a truly accurate homology model. A few years ago a similar study demonstrated that a difference in a single torsional angle of a phenylalanine residue (and that too resulting in a counter-intuitive gauche conformation) affected the binding of a ligand to a homology model of the ß2 adrenergic receptor. Our ability to pinpoint such tiny differences in homology models is still in its infancy. And this is just for Class A GPCRs for which relatively accurate templates are available. Get into Class B and Class C territory and you start looking for the proverbial black cat in the dark.

Now throw in the fascinating phenomenon of functional selectivity and you have a real wrench in the works. Functional selectivity, whereby different conformations of a GPCR binding to the same ligand modulate different signal transduction pathways and cause the ligand to change its mode of action (agonist, inverse agonist etc.) takes modeling of GPCRs to unknown levels of difficulty. Most modeling currently being done does not even attempt to consider protein flexibility which is at the heart of functional selectivity. Routinely including protein flexibility in GPCR modeling has some way to go.

That is why I think that, as much as we will continue to learn from GPCR homology modeling, it's not going to contribute massively to GPCR drug discovery anytime soon. Constructing accurate homology models of even a fraction of the GPCR universe will take a long time. Using such models would be like throwing darts at a board for which the center is unknown. Until we can locate the center and are plagued with the complexities of functional selectivity, we may be better off pursuing experimental approaches that that can map the effect of ligands on a particular GPCR using multifunctional assays. Fortunately, such approaches are definitely seeing the light of day.

Carlsson, J., Coleman, R., Setola, V., Irwin, J., Fan, H., Schlessinger, A., Sali, A., Roth, B., & Shoichet, B. (2011). Ligand discovery from a dopamine D3 receptor homology model and crystal structure Nature Chemical Biology DOI: 10.1038/nchembio.662

On being a computational chemist in industry

By Wavefunction on Friday, October 07, 2011

In a recent post on Chemjobber, Lisa Balbes interviewed a computational chemist in the pharmaceutical industry about his job description and the skills that are needed to work as a modeler in industry. And as a computational chemist working on applied problems for almost a decade now (goodness gracious), this gives me the perfect reason to hold forth a little on this topic. I may do a series of posts later, but for now here's what I think is the low down.

Let's get the most important thing out of the way first. It is absolutely important for a modeler to speak the language of the medicinal chemist and biologist. Personally, in spite of being a computational chemist, I always consider myself first and foremost an organic chemist (and I did go to graduate school in organic chemistry before specializing in modeling), using modeling only as a set of tools to shed light on interesting chemical problems. In fact I find myself spending as much time studying the literature on synthesis, physical chemistry, biological assays and protein structure as on modeling.

Computational chemistry is certainly a bonafide field of chemistry in itself now, but especially in industry it's primarily the means to an end. It doesn't matter how well versed you are with a particular technique like molecular dynamics or quantum chemistry, what matters the most is how well you understand the strengths and limitations of these methodologies. Understanding the limitations is as important since only this can help you decide in the end how much you can trust your results - a prerequisite for any scientist. What is key is your knowledge of the chemical system under consideration that will allow you to best choose a judicious combination of relevant techniques. And even this is not as important as the final goal: being able to interpret the results in the language of chemistry that everyone understands, telling your colleagues what it means and how they should now proceed, with all the appropriate caveats and optimism that apply. Understanding and conveying the uncertainty in your methods is as important as anything else since your colleagues need to hear an informed viewpoint that tells them what they are in for rather than a blind prediction.

Unfortunately I have met my share of modelers who think that their expertise in programming or in the intimate working details of one particular method automatically qualifies them to shed light on the details of an interesting medicinal system. Broadly speaking, modelers can be categorized between method developers and application scientists. There is of course considerable overlap between the two and both are valuable but let's make no mistake; in industry the ones who can directly contribute to a project the most are the latter, using tools developed by the former. No amount of training in C++ or in the mathematical wizardry behind a quantum chemical method can prepare you for intuiting the subtle interplay between electrostatic, steric, polar and nonpolar interactions that cause a ligand to bind to a protein with high affinity and selectivity. Much of this comes from experience of course, but it also develops from being able to constantly appreciate the basic chemical features of a system rather than getting hung up on the details of the method.

As we have seen in other posts, a lot of chemical problem solving depends on intuition, an almost tactile feel for how atoms and molecules interact with each other. This falls squarely within the purview of basic chemistry, most of the kind that we learnt in college and graduate school. An ideal computational chemist in industry should first and foremost be a chemist; the "computational" part of the title describes the means to the end. There is no substitute for basic familiarity with the principles of conformational analysis, acid-base equilibria, physical organic chemistry, protein structure, thermodynamics and stereochemistry. Nobody can be good computational chemists if they are not good chemists to begin with.

Apart from these skills, modelers can also bring some more under-appreciated skills to the table. Those who look at protein and ligand structures on the screen all day long usually have a much better sense of molecular sizes and volumes compared to bench chemists. A medicinal chemist might look at a protein cavity and conclude that it's big enough to fit a cyclohexyl group, but a modeler might display the cavity in space-filling interactions and doom any such idea to the realm of steric hell. Unfortunately the kind of line drawings that chemists are accustomed to give a false impression of size and shape, and sometimes simply looking at structures in space-filling mode on a screen can do wonders for deciding whether a particular group will fit into a particular part of a protein. This also makes modelers responsible for something that may need awesome powers of persuasion; convincing your experimental colleagues to regularly come to your desk and look at some pretty pictures (as an aside, modelers may have to play especially nice with their colleagues). Looking at protein structures and molecules all the time should ideally also make a modeler something of an informal expert in structural biology and physical chemistry. Thermodynamics especially is one area where modelers might know more than their organic colleagues because of their focus on the free energy of binding, and I have occasionally productively contributed to discussions about enthalpy, entropy and isothermal titration calorimetry (ITC). In addition, doing structure-based design is always a good opportunity to learn about x-ray crystallography and NMR spectroscopy. You may increasingly find that your colleagues come to you for advice on many structural aspects of their disciplines.

Ultimately, modelers' value to an organization is going to be judged on the basis of their abilities to offer practical suggestions to their colleagues in the language of their own disciplines (as well as the shared language of basic chemistry). The more organic chemistry and biology they know, the more they will be cherished. The more they empathize with the particular intricacies of their colleagues' disciplines, the more they will be regarded as an asset. As just an illustration, let me recount a personal anecdote.

I was collaborating with some chemists on a kinase inhibitor project. At one point I thought of a modification to our compound that looked very promising. At the next meeting, here's what I said to my medicinal chemistry colleague: "Jim, there are two modifications that I thought might improve the potency of our hits. One looks very promising, but I have studied your synthetic scheme and I think this modification might be a little intractable, especially considering the cost of your building blocks. On the other hand, here's this other modification which would be my second-best choice, but which you can probably easily install using a Buchwald-Hartwig coupling reaction."

Both me and my colleague were whistling all day long.

Image source

The future of science: Will models usurp theories?

By Wavefunction on Wednesday, October 05, 2011

This year's Nobel Prize for physics was awarded to Saul Perlmutter, Brian Schmidt and Adam Riess for their discovery of an accelerating universe, a finding leading to the startling postulate that 75% of our universe contains a hitherto unknown entity called dark energy. All three were considered favorite candidates for a long time so this is not surprising at all. The prize also underscores the continuing importance of cosmology since it had been awarded in 2o06 to George Smoot and John Mather, again for confirming the Big Bang and the universe's expansion.

This is an important discovery which stands on the shoulders of august minds and an exciting history. It continues a grand narrative that starts from Henrietta Swan Leavitt (who established a standard reference for calculating astronomical distances) through Albert Einstein (whose despised cosmological constant was resurrected by these findings) and Edwin Hubble, continuing through George Lemaitre and George Gamow (with their ideas about the Big Bang) and finally culminating in our current sophisticated understanding of the expanding universe. Anyone who wants to know more about the personalities and developments leading to today's event should read Richard Panek's excellent book "The 4 Percent Universe".

But what is equally interesting is the ignorance that the prizewinning discovery reveals. The prize was really awarded for the observation of an accelerating universe, not the explanation. Nobody really knows why the universe is accelerating. The current explanation for the acceleration consists of a set of different models, none of which has been definitively proven to explain the facts well enough. And this makes me wonder if such a proliferation of models without accompanying concrete theories is going to embody science in the future.

The twentieth century saw theoretical advances in physics that agreed with experiment to an astonishing degree of accuracy. The culmination of achievement in modern physics was surely quantum electrodynamics (QED) which is supposed to be the most accurate theory of physics we have. Since then we have had some successes in quantitatively correlating theory to experiment, most notably in the work on validating the Big Bang and the development of the standard model of particle physics. But dark energy- there's no theory for it that remotely approaches the rigor of QED when it comes to comparison with experiment.

Of course it's unfair to criticize dark energy since we are just getting started on tackling its mysteries. Maybe someday a comprehensive theory will be found, but given the complexity of what we are trying to achieve (essentially explain the nature of all the matter and energy in the universe) it seems likely that we may always be stuck with models, not actual theories. And this may be the case not just with cosmology but with other sciences. The fact is that the kinds of phenomena that science has been dealing with recently have been multifactorial, complex and emergent. The kind of mechanical, reductionist approaches that worked so well for atomic physics and molecular biology may turn out to be too impoverished for taking apart these phenomena. Take biology for instance. Do you think we could have a complete "theory" for the human brain that can quantitatively calculate all brain states leading to consciousness and our reaction to the external world? How about trying to build a "theory" for signal transduction that would allow us to not just predict but truly understand (in a holistic way) all the interactions with drugs and biomolecules that living organisms undergo? And then there's other complex phenomena like the economy, the weather and social networks. It seems wise to say that we don't anticipate real overarching theories for these phenomena anytime soon.

On the other hand, I think it's a sign of things to come that most of these fields are rife with explanatory models of varying accuracy and validity. Most importantly, modeling and simulation are starting to be considered as a respectable "third leg" of science, in addition to theory and experiment. One simple reason for this is the recognition that many of science's greatest current challenges may not be amenable to quantitative theorizing, and we may have to treat models of phenomena as independent, authoritative explanatory entities in their own right. We are already seeing this happen in chemistry, biology, climate science and social science, and I have been told that even cosmologists are now extensively relying on computational models of the universe. Admittedly these models are still far behind theory and experiment which have had head starts of about a thousand years. But there can be little doubt that such models can only become more accurate with increasing computational firepower. How accurate remains to be seen, but it's worth noting that there are already books that make a case for an independent, study-worthy philosophy of modeling and simulation. These books extol philosophers of science to treat models not just as convenient applications and representations of theories (which are then the only fundamental things worth studying) but as ultimate independent explanatory devices in themselves that deserve separate philosophical consideration.

Could this then be at least part of the future of science? A future where robust experimental observations are encompassed not by beautifully rigorous and complete theories like general relativity or QED but only by different models which are patched together through a combination of rigor, empirical data, fudge factors and plain old intuition? This would be a new kind of science, as useful in its applications as its old counterpart but rooting itself only in models and not in complete theories. Given the history of theoretical science, such a future may seem dark and depressing. That is because as the statistician George Box famously quipped, although some models are useful, all models are wrong. What Box meant was that models often feature unrealistic assumptions about all kinds of details that nonetheless allow us to reproduce the essential features of reality. Thus they can never provide the sure connection to "reality" that theories seem to. This is especially a problem when disparate models give the same answer to a question. In the absence of discriminating ideas, which model is then the "correct" one? The usual answer is "none of them", since they all do an equally good job of explaining the facts. But this view of science, where models that can be judged only on the basis of their utility are the ultimate arbiters of reality and where there is thus no sense of a unified theoretical framework, feels deeply unsettling. In this universe the "real" theory will always remain hidden behind a facade of models, much as reality is always hidden behind the event horizon of a black hole. Such a universe can hardly warm the cockles of the heart of those who are used to crafting grand narratives for life and the universe. However it may be the price we pay for more comprehensive understanding. In the future, Nobel Prizes may be frequently awarded for important observations for which there are no real theories, only models. The discovery of dark matter and energy and our current attempts to understand the brain and signal transduction could well be the harbingers of this new kind of science.

Should we worry about such a world rife with models and devoid of theories? Not necessarily. If there's one thing about science that we know, it's that it evolves. Grand explanatory theories have traditionally been supposed to be a key part- probably the key part- of the scientific enterprise. But this is mostly because of historical precedent as well a psychological urge for seeking elegance and unification. Such belief has been resoundingly validated in the past but it's utility may well have plateaued. I am not advocating some "end of science" scenario here - far from it - but as the recent history of string theory and theoretical physics in general demonstrates, even the most mathematically elegant and psychologically pleasing theories may have scant connection to reality. Because of the sheer scale and complexity of what we are trying to currently explain, we may have hit a roadblock in the application of the largely reductionist traditional scientific thinking which has served us so well for half a millennium

Ultimately what matters though is whether our constructs- theories, models, rules of thumb or heuristic pattern recognition- are up to the task of constructing consistent explanations of complex phenomena. The business of science is explanation, whether through unified narratives or piecemeal explanation is secondary. Although the former sounds more psychologically satisfying, science does not really care about stoking our egos. What is out there exists, and we do whatever's necessary and sufficient to unravel it.

Chemistry Nobel Prizes redux

By Wavefunction on Tuesday, October 04, 2011

In tribute to tomorrow's impending chemistry Nobel Prize, I thought I would repost a slightly updated list of predictions.

1. Computational chemistry and biochemistry (Difficult):
Pros: Computational chemistry as a field has not been recognized since 1999 so the time seems due. One obvious candidate would be Martin Karplus. Another would be Norman Allinger, the pioneer of molecular mechanics.
Cons: This would definitely be a lifetime achievement award. Karplus did do the first MD simulation of a protein ever but that by itself wouldn’t command a Nobel Prize. The other question is regarding what field exactly the prize would honor. If it’s specifically applications to biochemistry, then Karplus alone would probably suffice. But if the prize is for computational methods and applications in general, then others would also have to be considered, most notably Allinger but perhaps also Ken Houk who has been foremost in applying such methods to organic chemistry. Another interesting candidate is David Baker whose program Rosetta has really produced some fantastic results in predicting protein structure and folding. It even spawned a cool game. But the field is probably too new for a prize and would have to be further validated by other people before it's recognized.

2. Chemical biology and chemical genetics (Easy)
Another favorite for years, with Stuart Schreiber and Peter Schultz being touted as leading candidates.
Pros: The general field has had a significant impact on basic and applied science
Cons: This again would be more of a lifetime achievement award which is rare. Plus, there are several individuals in recent years (Cravatt, Bertozzi, Shokat) who have contributed to the field. It may make some sense to award Schreiber a ‘pioneer’ award for raising ‘awareness’ but that’s sure going to make a lot of people unhappy. Also, a prize for chemical biology might be yet another one whose time has just passed.

3. Single-molecule spectroscopy (Easy)
Pros: The field has obviously matured and is now a powerful tool for exploring everything from nanoparticles to DNA. It’s been touted as a candidate for years. The frontrunners seem to be W E Moerner and M Orrit, although Richard Zare has also been floated often.
Cons: The only con I can think of is that the field might yet be too new for a prize

4. Electron transfer in biological systems (Easy)
Pros: Another field which has matured and has been well-validated. Gray and Bard seem to be leading candidates.
Cons: Although electron transfer in biological systems is important, Gray and Bard's discoveries don't seem to have the ring of fundamental importance that, say, Marcus's electron transfer theory has, nor do they seem to be widely utilized by other chemists in the way that, say, palladium catalyzed reactions are.

Among other fields, I don’t really see a prize for the long lionized birth pill and Carl Djerassi; although we might yet be surprised, the time just seems to have passed. Then there are fields which seem too immature for the prize; among these are molecular machines (Stoddart et al.) and solar cells (Gratzel).

5. Statins (Difficult)
Akira Endo’s name does not seem to have been discussed much. Endo discovered the first statin. Although this particular compound was not a blockbuster drug, since then statins have revolutionized the treatment of heart disease.
Pros: The “importance” as described in Nobel’s will is obvious since statins have become the best-selling drugs in history. It also might be a nice statement to award the prize to the discovery of a drug for a change. Who knows, it might even boost the image of a much maligned pharmaceutical industry...
Cons: The committee is not really known for awarding actual drug discovery. Precedents like Alexander Fleming (antibiotics), James Black (beta blockers, antiulcer drugs) and Gertrude Elion (immunosuppresants, anticancer agents) exist but are far and few in between. On the other hand this fact might make a prize for drug discovery overdue.

6. DNA fingerprinting and synthesis (Easy)
Now this seems to me to be very much a field from the "obvious" category. The impact of DNA fingerprinting and Western and Southern Blots on pure and applied science- everything from discovering new drugs to hunting down serial killers- is at least as big as the prizeworthy PCR. I think the committee would be doing itself a favor by honoring Jeffreys, Stark, Burnette and Southern.

And while we are on DNA, I think it's also worth throwing in Marvin Caruthers whose technique for DNA synthesis really transformed the field. In fact it would be nice to award a dual kind of prize for DNA- for both synthesis and diagnosis.

Cons: Picking three might be tricky.

7. GPCR structures (Difficult)
When the latest GPCR structure (the first one of a GPCR bound to a G protein) came out I remember remarking that Kobilka, Stevens and Palczewski are probably up for a prize sometime. Palczewski solved the first structure of rhodopsin and Stevens and Kobilka have been churning out structure after important structure over the last decade, including the first structure of an active receptor along with several medicinally important ones including the dopamine D3 and CXCR4 receptors. These feats are definitely technical tour de forces.
Pros: GPCR's are clearly important for basic and applied science, especially drug discovery where 30% of drugs already target these proteins. Plus, structural biology has often been awarded a Nobel so there's lots of precedents (hemoglobin, potassium channel, ATPase etc.)
Cons: Probably too early.

Other predictions: Canine Ed, Sam@EverydayScientist

A posthumous Nobel Prize

By Wavefunction on Monday, October 03, 2011

The Nobel Prize for Medicine was announced today and it went to Bruce Beutler, Jules Hoffmann and Ralph Steinman for their discoveries concerning innate immunity. More specifically the prize was awarded to the discovery of tumor necrosis factor (TNF), toll-like receptors (TLRs) and dendritic cells. All three are undoubtedly key components of the immune system so the prize is well deserved.

In a tragic twist of fate, Ralph Steinman of the the Rockefeller University (who discovered dendritic cells) died only two days ago after fighting pancreatic cancer. Apparently the committee was not aware of this so it makes the prize a posthumous one. Has this happened before? The rules do seem to stipulate that someone who dies after the announcement is still a legitimate candidate, and it would of course be cruel to withdraw the prize now so they probably won't court controversy (when it comes to science prizes the committee is considered pretty conservative).

Book review: Robert Laughlin's "Powering the Future"

By Wavefunction on Sunday, October 02, 2011

In the tradition of physicists writing for the layman, Robert Laughlin has emerged as a writer who pens unusually insightful and thought-provoking books. In his "A Different Universe" he explored the consequences and limitations of reductionism-based physics for our world. In this book he takes an equally fresh look at the future of energy. The book is not meant to be a comprehensive survey of existing and upcoming technologies; instead it's more like an assortment of appetizers designed to stimulate our thinking. For those who want to know more, it offers an impressive bibliography and list of calculations which is almost as long as the book itself.

Laughlin's thinking is predicated on two main premises. The first is that carbon sources are going to eventually run out or become inaccessible (either because of availability or because of legislation). However we will still largely depend on carbon because of its extraordinarily fortuitous properties like high energy density, safety and ease of transportation. But even in this scenario, simple rules of economics will trump most other considerations for a variety of different energy sources. The second premise which I found very intriguing is that we need to uncouple our thinking on climate change from that on energy instead of letting concerns about the former dictate policy about the latter. The reason is that planetary-level changes in the environment are so vast and beyond the ability of humans to control that driving a few more hybrids or curbing carbon emissions will have little effect on millennial events like the freezing or flooding of major continents. It's worth noting here that Laughlin (who has been called a climate change skeptic lately) is not denying global warming or its consequences here; it's just that he thinks that it's sort of beside the point when it comes to thinking about future energy, which will be mainly dictated by economics and prices more than anything else. I found this to be a commonsense approach based on an appreciation of human nature.

With this background Laughlin takes a sweeping and eclectic look at several interesting technologies and energy sources including nuclear energy, biofuels, energy from trash, wind and solar power and energy stored beneath the sea. In each case Laughlin explores a variety of problems and promises associated with these sources.

Because of dwindling uranium resources, the truly useful form of nuclear energy for instance will come from fast breeder reactors which produce their own plutonium fuel. However these reactors are more susceptible to concerns about proliferation and theft. Laughlin thinks that a worldwide, tightly controlled system of providing fuel rods to nations would allow us to fruitfully deploy nuclear power. One of his startling predictions is the possibility that we may put up with occasional Chernobyl-like events if nuclear power truly becomes cheap and we don't have any other alternatives.

Laughlin also finds promises and pitfalls in solar energy. The basic problem with solar energy is its irregular availability and problems with storage. Backup power inevitably depends on fossil fuel sources which sort of defeats the purpose. Laughlin sees a bright future for molten salt tanks which can very efficiently store solar energy as heat and which can be used when the sun is not shining. These salts are simple eutectic mixtures of potassium and sodium nitrates with melting points that are conveniently lowered even more by the salts' decomposition products. Biofuels also get an interesting treatment in the book. One big advantage of biofuels is that they are both sources and sinks of carbon. Laughlin talks about some recent promising work with algae but cautions that meeting the sheer worldwide demand for energy with biofuels that don't divert resources away from food is very challenging. Further on there's a very intriguing chapter on energy stored under the sea. The sea provides a stupendous amount of land beneath it and could be used for energy storage through novel sources like high-density brine pools and compressed natural gas tanks. Finally, burning trash which has a lot of carbon might appear like a useful source of energy but as Laughlin explains, the actual energy in trash will provide only a fraction of our needs.

Overall the book presents a very thought-provoking treatment of the nature and economics of possible future energy sources in a carbon-strapped world. In these discussions Laughlin wisely avoids taking sides, realizing how fraught with complexity and ambiguity future energy production is. Instead he simply offers his own eclectic thoughts on the pros and cons of energy-related topics which may (or may not) prove important in the future. Of the minor gripes I have with the volume is the lack of discussion of promising recent advances in solar cell design, thorium-based fuels and next generation nuclear reactor technology. Laughlin's focus is also sometimes a little odd and meandering; for instance at one point he spends an inordinate amount of time talking about interesting aspects of robotic technology that may make deep sea energy sequestration possible. But these gripes detract little from the volume which is not really supposed to be an exhaustive survey of alternative energy technologies.

Instead it offers us a very smart scientist's miscellaneous musings on energy dictated by commonsense assumptions based on the simple laws of demand and supply and of human nature. As responsible citizens we need to be informed on our energy choices which are almost certainly going to become more difficult and constrained in the future. Laughlin's book along with others will stimulate our thinking and help us pick our options and chart our direction.

Field of Science

The Curious Wavefunction

Overturning hydrophobic assumptions

Scientific challenges in drug discovery (Part 1): We are still infants

GPCR modeling: The devil hasn't left the details

On being a computational chemist in industry

The future of science: Will models usurp theories?

Chemistry Nobel Prizes redux

A posthumous Nobel Prize

Book review: Robert Laughlin's "Powering the Future"

Previous Posts

Popular Posts

Follow

Blogroll

Journals and Magazines

Archives