Field of Science

The Uncertainty Principle for climate (and chemical) models

A recent issue of Nature had an interesting article on what seems to be a wholly paradoxical feature of models used in climate science; as the models are becoming increasingly realistic, they are also becoming less accurate and predictive because of growing uncertainties. I can only imagine this to be an excruciatingly painful fact for climate modelers who seem to be facing the equivalent of the Heisenberg uncertainty principle for their field. It's an especially worrisome time to deal with such issues since the modelers need to include their predictions in the next IPCC report on climate change which is due to be published next year.

A closer look at the models reveals that this behavior is not as paradoxical as it sounds, although it's still not clear how you would get around it. The article especially struck a chord with me I see similar problems bedeviling models used in chemical and biological research. In case of climate change, the fact is that earlier models were crude and did not account for many fine-grained factors that are now being included (such as the rate at which ice falls through clouds). In principle and even in practice there's a bewildering number of such factors (partly exemplified by the picture on top). Fortuitously, the crudeness of the models also prevented the uncertainties associated with these factors from being included in the modeling. The uncertainty remained hidden. Now that more real-world factors are being included, the uncertainties endemic in these factors reveal themselves and get tacked on to the models. You thus face an ironic tradeoff; as your models strive to mirror the real world better, they also become more uncertain. It's like swimming in quicksand; the harder you try to get out of it, the deeper you get sucked in.

This dilemma is not unheard of in the world of computational chemistry and biology. A lot of the models we currently use for predicting protein-drug interactions for instance are remarkably simple and yet accurate enough to be useful. Several reasons account for this unexpected accuracy; among them cancellation of errors (the Fermi principle), similarities of training sets to test sets and sometimes just plain luck. Error analysis is unfortunately not a priority in most of these studies, since the whole point is to publish correct results. Unless this culture changes our road to accurate prediction will be painfully slow.

But here's an example of how "more can be worse". For the last few weeks I have been using a very simple model to try to predict the diffusion of druglike molecules through cell membranes. This is an important problem in drug development since even your most stellar test-tube candidate will be worthless until it makes its way into cells. Cell membranes are hydrophobic while the water surrounding them is hydrophilic. The ease with which a potential drug transfers from the surrounding water into the membrane depends among other factors on its solvation energy, on how readily the drug can shed water molecules; the smaller the solvation energy, the easier it is for drugs to get across. This simple model which calculates the solvation energy seems to do unusually well in predicting the diffusion of drugs across real cell membranes, a process that's much more complex than just solvation-desolvation. 

One of the fundamental assumptions in the model is that the molecule exists in just one conformation in both water and the membrane. This assumption is fundamentally false since in reality, molecules are highly flexible creatures that interconvert between several conformations both in water and inside the membrane. To overcome this assumption, a recent paper explicitly calculated the conformations of the molecule in water and included this factor in the diffusion predictions. This was certainly more realistic. To their surprise, the authors found that making the calculation more realistic made the predictions worse. While the exact mix of factors responsible for this failure can be complicated to tease apart, what's likely happening is that the more realistic factors also bring more noise and uncertainty with them. This uncertainty piles up, errors which were likely canceling before no longer cancel, and the whole prediction becomes fuzzier and less useful.

I believe that this is what is partly happening in climate models. Including more real-life factors in the models does not mean that all those factors are well-understood. You are inevitably introducing some known unknowns. Ill-understood factors will introduce more uncertainty. Well-understood factors will introduce less uncertainty. Ultimately the accuracy of the models will depend on the interplay between these two kinds of factors, and currently it seems that the rate of inclusion of new factors is higher than the rate at which those factors can be accurately calculated.

The article goes on to note that in spite of this growing uncertainty the basic predictions of climate models are broadly consistent. However it also acknowledges the difficulty in explaining the growing uncertainty to a public which has become more skeptical of climate change since 2007 (when the last IPCC report was published). As a chemical modeler I can sympathize with the climate modelers. 

But the lesson to take away from this dilemma is that crude models sometimes work better than more realistic ones. Perhaps the climate modelers should remember George Box's quote that "all models are wrong, but some are useful". It is a worthy endeavor to try to make models more realistic, but it is even more important to make them useful.
Image source

Anticancer drugs form colloidal aggregates and lose activity

Over the last few years, one of the most interesting findings in drug screening and testing at a preclinical level has been the observation that many drugs form colloidal aggregates under standard testing conditions and nonspecifically inhibit target proteins which they otherwise would not affect. This are large aggregates, a hundred nanometers or more in diameter, and they cause proteins to stick and partially unfold, creating the illusion of inhibition. This leads to false positives, especially in high-throughput screening protocols. And these false positives can be absolutely rampant.

What's striking is the sheer ubiquity of this phenomenon which has been observed with all kinds of drugs under all kinds of conditions; while the initial observation was limited to isolated protein-based assays, the phenomenon has also been seen in simulated gastric fluids and in the presence of many different kinds of proteins like serum albumin which are found inside the body. The colloid spirit seems to emphatically favor a shotgun approach.

Now a team led by the brother-sister duo Brian and Molly Shoichet (UCSF and Toronto) has found something that should give drug testers further pause for thought; they see some bestselling anticancer drugs forming colloids (shown above) in cell-based assays to an extent that actually diminishes their activity, leading not to false positives but to false negatives. They test seven known anticancer drugs in cell assays both under known colloid forming conditions along with conditions that break the colloids up. This is not as easy as it sounds since it involves adding a detergent which would usually be too toxic to cells; fortunately in this case they find the right one. Another interesting finding is the re-evaluation of a popular dye used to study "leaky" cancer blood vessels; unlike the previously proposed mechanism, the current study seems to suggest that the dye too forms large aggregates and nonspecifically inhibits the protein serum albumin.

The testing essentially reveals that the drugs when they form colloids basically show activity that's so low as to be negligible and equivalent to the controls. That's a self-(un)proclaimed false negative. Now anybody who deals with error analysis knows that false negatives are fundamentally worse than false positives since by definition they cannot even be detected. The present study raises the pertinent question; how many promising drugs might we be missing because they form aggregates and lower the observed response in cells? And since the colloid forming phenomenon has been shown to be so ubiquitous, could it possibly be influencing the mechanism of action of all kinds of drugs inside the body? And in what ways? It's a fascinating question, and one of those that continues to make basic research in drug discovery still so interesting.
Image source and credit: ACS

P-glycoprotein: The vacuum cleaner that makes Sir James weep

There has been a lot of discussion during the last decade about the continuing attrition in the pharmaceutical industry and the absence of novel drugs. Several factors including layoffs, narrow-minded management practices, outsourcing etc. have been held responsible for this trend which only promises to exacerbate in the near future. All eminently sensible points. But one thing should be clear; drug discovery remains hard because we still just don't understand a lot of the basic science very well. This is something that should always be on the mind of anyone who wants to hold  non-scientific factors responsible for drug failures. The fact is that there still remain very basic challenges that drug discoverers have to surmount. And by scientific challenges I am not talking about cutting edge, futuristic, overhyped strategies like gene therapy and nanotechnology that haven't yet borne fruit. I am talking about fundamental challenges here, problems that have been realized for years and yet not solved.

A review in this week's issue of Journal of Medicinal Chemistry has an account of something that greatly contributes to one of these challenges; getting compounds into cells. It's a basic problem in developing any drug. You may have a molecule that looks miraculous in the test tube but which utterly fails once you put it into a living organism. Many factors can contribute to this lack of translation but one of the most basic reasons is simply that the compound is not getting inside the cell. Recall that the cell membrane is expressly designed to keep things out, which is a good thing for evolution but a bad thing for drug designers. The membrane is composed of phospholipids with all kinds of proteins and other biomolecules embedded within it. Drugs can get across this membrane by simple passive diffusion, although in some cases they may be shuttled across by special helper proteins. In general any foreign substance will have to be hydrophobic enough to get past this membrane. But even otherwise it will have to satisfy some simple properties; it can't be too big and charged for instance. And it can't be too hydrophobic otherwise it won't dissolve in the aqueous medium surrounding the membrane in the first place.

But hydrophobicity is where your troubles only begin. Cells have an assortment of watchdog proteins whose purpose is to keep out unwanted substances. In the modern world, "unwanted substances" includes pretty much all drugs. The J. Med. Chem. review focuses on one of these watchdogs - very likely their king - which plagues drug designers all the time; the P-glycoprotein efflux multi-drug transporter (Pg). The fancy name only hides the fact that it's essentially a simple pump embedded within the membrane, designed to throw drugs out. It's the ultimate bouncer; even drugs that have the right mix of hydrophobic and hydrophilic character quake and rapidly exit when they encounter PgP. In fact the protein was discovered when it was found that some cancers were becoming resistant to certain drugs; what was happening was that these drugs were being pumped out or "effluxed". Even worse, the presence of these drugs was increasing the expression of the protein. Later it was found that a wide variety of drugs bind to and increase the expression of Pgp, reducing their effective concentration inside the cell; it's still one of the principal mechanisms of resistance in some kinds of cancer. 

Progress was only hindered by not knowing the structure of the protein (a part of which is illustrated above) which was only recently and partially solved by x-ray crystallography, and even then it's not really helping. The protein's structure and interior are exquisitely hideous to say the least; 12 transmembrane segments composed of 1280 amino acids, a mammoth internal cavity of 6000 Å3 and a wondrously complex mechanism of compound binding and extrusion during which the protein undergoes a massive conformational challenge. As it snakes its way through the lipid bilayer and wraps itself around drugs, the precision of this molecular machine would be wholly admirable if it were not for the eminent heartburn that it causes drug discoverers. 

The constant extrusion of drugs by Pgp means that you may have to increase the dosage of your drugs (or saturate the protein with another drug) to maintain high blood levels, but that's just skimming the surface of the Pgp world of pain. Since its original discovery the protein has turned into a minor nemesis for drug designers and it's become a part of a notorious list of proteins called "anti targets" that can lead to side-effects and lack of efficacy (we encountered one of these anti targets before - the hERG channel protein). And that's not only because Pgp is ubiquitously expressed in the intestine and liver where most drugs are metabolized. Nor is it because of its special role in the blood-brain barrier which creates additional problems for CNS drugs. It's because when it comes to Pgp, scientists may not have a clue about how to possibly solve the problem. Usually when you encounter an unwanted protein that binds to your drugs, you try to add a modification to your drug to block this binding. In many cases, structure-activity relationship (SAR) can help you pin down some trends; you remove a basic nitrogen atom here, you get rid of a double bond there, you add a fluorine to that ring. If you know what kinds of molecular features a rogue antitarget protein likes, you can avoid those features in your drug.

But not so for Pgp. Pgp is, in the words of the review author, a "hydrophobic vacuum cleaner". And it's one that will put Sir James to shame. What kinds of molecules does it like as substrates? Here's a description from Kerns and Di's book "Drug-like Properties":

"The substrate specificity for Pgp is very broad. Compounds ranging from a molecular weight of 250 to 1850 are known to be transported by Pgp. Substrates may be aromatic, non-aromatic, linear or circular. They can be basic, acidic, zwitterionic or uncharged. Some substrates are hydrophobic, others are hydrophilic and yet others are amphipathic."

The authors could have saved themselves all those words by simply saying something like "Pgp binds to and extrudes everything in the universe except possibly the human soul". As should be obvious, this kitchen sink description of every molecule of every kind is not exactly a guide for drug designers to rationally add modifications that would prevent Pgp binding. I was myself part of a project where the whole "rational" drug design process was going extremely well - well-defined changes in structure contributing to improved potency - until we found that the compounds were being generously ejected by Pgp. At this point our gung-ho approach screeched to a halt and we found ourselves transported from the sunlight of rational design into the night of Pgp-mediated chaos. Where before we had been confidently stepping across a brightly lit landscape, we now found ourselves groping around in the dark with our eyes closed. It was like falling down an abyss. There was no rational modification to our existing molecules that would ensure a Pgp-free existence. From then on it was largely about gut feelings, intuition and Hail Mary passes.

In reality, Pgp binding is sometimes considered so painfully complex to circumvent that the best strategy may actually be to wave a wand and temporarily forget about it. Counterintuitive as this seems, what this strategy means is that often the best way to prevent Pgp drug binding is to simply increase the passive diffusion of your compounds so much that it swamps any Pgp-enabled extrusion. Basically you just keep on bumping up the magnitude of one process until it can one-up the opposing process.

The present review in J. Med. Chem. provides some respite from this depressing existence. The author describes several case studies where strategies like tying up hydrogen bond donors, getting rid of them, reducing basicity or reducing polar surface area helped to design out Pgp binding. These are valuable examples, but they are anecdotal nonetheless and may not work for other molecules with similar functionalities. Well-defined rational approaches to Pgp binding are still lacking and the complex mechanism of the Pgp-drug binding precludes designing specific Pgp inhibitors even if the structure is known. Reviews like the present one provide useful guidelines, but for the foreseeable future at least, Pgp will stand in splendor as one among a handful of scientific challenges that continue to make drug discovery just so damn difficult.
Image source: Wikipedia

"Arsenic bacteria": If you hadn't nailed 'im to the perch 'e'd be pushing up the daisies

Rosie Redfield (who blogs on this network) has just published an official, careful and decisive rebuttal to the "arsenic bacteria" fiasco in collaboration with a group at Princeton. The paper which will appear in Science is under embargo for now, but there is a copy available at that bastion of free publication arXiv. Readers may remember Redfield as the scientist who offered the most meticulous preliminary criticism of the original paper by Felisa Wolfe-Simon and others. Wolfe-Simon and the rest of the arsenic group refused to engage in debate with Redfield and other critics at the time, citing the "non-official" nature of the offered criticism and asking for publication in a more formal venue. Looks like they finally got their wish.

The abstract could not be clearer:

"A strain of Halomonas bacteria, GFAJ-1, has been reported to be able to use arsenate as a nutrient when phosphate is limiting, and to specifically incorporate arsenic into its DNA in place of phosphorus. However, we have found that arsenate does not contribute to growth of GFAJ-1 when phosphate is limiting and that DNA purified from cells grown with limiting phosphate and abundant arsenate does not exhibit the spontaneous hydrolysis expected of arsenate ester bonds. Furthermore, mass spectrometry showed that this DNA contains only trace amounts of free arsenate and no detectable covalently bound arsenate."

It's a fairly short paper but there are many observations in it which quite directly contradict the earlier results. The strain of bacteria that was claimed to grow only when arsenic was added to the medium was found to not grow at all. In fact it did not budge even when some phosphate was added, growing only after the addition of other nutrients. Trace element analysis using several techniques detected no arsenate in DNA monomers and polymers. This is about as definitive an argument as can be published indicating that the claims about the bacteria using arsenic instead of phosphorus in their essential biomolecules were simply incorrect. Much credit goes to Redfield who patiently and probingly pursued the counterargument, undoubtedly at the expense of other research in her lab. In addition she did open-science a great service and described all the ongoing research on the blog. She sets a standard for how science should be done, and we should hope to see more of this in the future.

Sociologically the episode is a treasure trove of lessons on how science should not be done. It checks off some standard "don'ts" in the practice of science. Don't fall prey to wishful thinking and confirmation bias that tells you exactly what you wanted to hear for years. Don't carry out science by press conference and then refuse to engage in debate in public venues. And of course, don't fail in providing extraordinary evidence when making extraordinary claims. If the original paper had been published cautiously and without hullabaloo, it would have become part of the standard scientific tradition of argument and counterargument. As it turned out, the publicity accompanying the paper made it a prime candidate for demolition by blogs and websites. If nothing it provided a taste of how one needs to be extra careful in this age of instant online dissemination. There's also some "do's" that deserve to be mentioned. The researchers did reply to criticism later and make their bacterial strains available to everyone who wanted to study them in a gesture of cooperation, but their earlier behavior left a bad taste in everyone's mouth and detracted from these later acts.

When the original paper came out, many of us were left gaping with eyes wide open at visions of DNA, ATP, phosphorylated proteins and lipids swirling around in a soup of arsenic, carrying out the exact same crucial biological processes that they were carrying out before without skipping a heartbeat. We just had a gut feeling that this couldn't be quite right, mainly because of the sheer magnitude of the biochemical gymnastics an organism would have to undergo in order to retool for this drastically different environment. Gut feelings are often wrong in science, but in this case it seems they made perfect sense.

What next? As often happens in science, I suspect that the defenders of the original paper will not outright capitulate but will fight a rearguard retreat until the whole episode drops off everyone's radar. But this paper here, it clinches the case for normal biochemistry as well as anything could. Good old phosphorus is still one of life's essential elements, and arsenic is not.