Field of Science

Molecular modeling and physics: A tale of two disciplines

The LHC is a product of both time and multiple disciplines

In my professional field of molecular modeling and drug discovery I often feel like an explorer who has arrived on the shores of a new continent with a very sketchy map in his pocket. There are untold wonders to be seen on the continent and the map certainly points to a productive direction in which to proceed, but the explorer can't really stake a claim to the bounty which he knows exists at the bottom of the cave. He knows it is there and he can even see occasional glimpses of it but he cannot hold all of it in his hand, smell it, have his patron duke lock it up in his heavily guarded coffers. That is roughly what I feel when I am trying to simulate the behavior of drug molecules and proteins.

It is not uncommon to hear experimentalists from other disciplines and even modelers themselves grumbling about the unsatisfactory state of the discipline, and with good reason. Neither are the reasons entirely new: The techniques are based on an incomplete understanding of the behavior of complex biological systems at the molecular level. The techniques are parametrized based on a limited training set and are therefore not generally applicable. The techniques do a much better job of explaining than predicting (a valid point, although it's easy to forget that explanation is as important in science as prediction).

To most of these critiques I and my fellow brethren plead guilty; and nothing advances a field like informed criticism. But I also have a few responses to the critiques, foremost among which is one that is often under-appreciated: On the scale of scientific revolutions, computational chemistry and molecular modeling are nascent fields, only just emerging from the cocoon of understanding. Or, to be pithier, give it some more time. This may seem like a trivial point but it's an important one and worth contemplating. Turning a scientific discipline from an unpolished, rough gem-in-the-making to the Kohinoor diamond takes time. To drive this point home I want to compare the state of molecular modeling - a fledgling science - with physics - perhaps the most mature science. Today physics has staked its claim as the most accurate and advanced science that we know. It has mapped everything from the most majestic reaches of the universe at its largest scale to the production of virtual particles inside the atom at the smallest scale. The accuracy of both calculations and experiments in physics can beggar belief; on one hand we can calculate the magnetic moment of the electron to sixteen decimal places using quantum electrodynamics (QED) and on the other hand we can measure the same parameter to the same degree of accuracy using ultra sensitive equipment.

But consider how long it took us to get there. Modern physics as a formal discipline could be assumed to have started with Isaac Newton in the mid 17th century. Newton was born in 1642. QED came of age in about 1952 or roughly 300 years later. So it took about 300 years for physics to go from the development of its basic mathematical machinery to divining the magnetic moment of the electron from first principles to a staggering level of accuracy. That's a long time to mature. Contrast this with computational chemistry, a discipline that spun off from the tree of quantum mechanics after World War 2. The application of the discipline to complex molecular entities like drugs and materials is even more recent, taking off in the 1980s. That's thirty years ago. 30 years vs 300 years, and no wonder physics is so highly developed while molecular modeling is still learning how to walk. It would be like criticizing physics in 1700 for not being able to launch a rocket to the moon. A more direct comparison of modeling is with the discipline of synthetic chemistry - a mainstay of drug discovery - that is now capable of making almost any molecule on demand. Synthetic chemistry roughly began in about 1828 when German chemist Friedrich Wöhler first synthesized urea from simple inorganic compounds. That's a period of almost two hundred years for synthetic chemistry to mature.

But it's not just the time required for a discipline to mature; it's also the development of all the auxiliary sciences that play a crucial role in the evolution of a discipline that makes its culmination possible. Consider again the mature state of physics in, say, the 1950s. Before it could get to that stage, physics needed critical input from other disciplines, including engineering, electronics and chemistry. Where would physics have been without cloud chambers and Geiger counters, without cyclotrons and lasers, without high-quality ceramics and polymers? The point is that no science is an island, and the maturation of one particular field requires the maturation of a host of others. The same goes for the significant developments in mathematics - multivariate calculus, the theory of Lie groups, topology - that made progress in modern physics possible. Similarly synthetic chemistry would not have been possible had NMR spectroscopy and x-ray diffraction not provided the means to determine the structure of molecules.

Molecular modeling is also constrained by similar input from other science. Simulation really took off in the 80s and 90s with the rapid advances in computer software and hardware; before this period chemists and physicists had to come up with clever theoretical algorithms to calculate the properties of molecules simply because they did not have access to the proper firepower. Now consider what other disciplines modeling is dependent on - most notably chemistry. Without chemists being able to rapidly make molecules and provide both robust databases as well as predictive experiments, it would be impossible for modelers to validate their models. Modeling has also received a tremendous boost from the explosion of crystal structures of proteins engendered by genomics, molecular biology, synchrotron sources and computer software for data processing. The evolution of databases, data mining methods and the whole infrastructure of informatics has also really fed into the growth of modeling. One can even say without exaggeration that molecular modeling is ultimately a product of our ability to manipulate elemental silicon and produce it in an ultrapure form.

Thus, just like physics was dependent on mathematics, chemistry and engineering, modeling has been crucially dependent on biology, chemistry and computer science and technology. And in turn, compared to physics, these disciplines are relatively new too. Biology especially is still just taking off, and even now it cannot easily supply the kind of data which would be useful for building a robust model. Computer technology is very efficient, but still not efficient enough to really do quantum mechanical calculations on complex molecules in a high-throughput manner (I am still waiting for that quantum computer). And of course, we still don't quite understand all the forces and factors that govern the binding of molecules to each other, and we don't quite understand how to capture these factors in sanitized and user-friendly computer algorithms and graphical interfaces. It's a bit like physics having to progress without having access to high-voltage sources, lasers, group theory and a proper understanding of the structure of the atomic nucleus.

Thus, thirty years is simply not enough for a field to claim a very significant degree of success. In fact, considering how new the field is and how many unknowns it is still dealing with, I would say that the field of molecular modeling is actually doing quite well. The fact that computer-aided molecular design was hyped during its inception does not make it any less useful, and it's silly to think so. In the past twenty years we have at least had a good handle on the major challenges that we face and we have a reasonably good idea of how to proceed. In major and minor ways modeling continues to make useful contributions to the very complicated and unpredictable science and art of drug design and discovery. For a field that's thirty years old I would say we aren't doing so bad. And considering the history of science and technology as well as the success of human ingenuity in so many forms, I would say that the future is undoubtedly bright for molecular simulation and modeling. It's a conviction that is as realistic as any other in science, and it's one of the things that helps me get out of bed every morning. In science fortune always favors the patient, and modeling and simulation will be no different.

Crystallography and chemistry: The culture issue

Image: Charles Reynolds and ACS Med Chem Letters
As the old saying goes, beware of crystallographers bearing ligands. Charles Reynolds who is a well-known structure-based drug design expert has an editorial in ACS Medicinal Chemistry Letters touching on an issue that lies at the confluence of crystallography, medicinal chemistry and modeling: flaws in protein ligand co-crystal structures. It's a problem with major ramifications for drug design, especially since it sits at the apex of the process and has the power to influence all subsequent steps. It's also an issue that has come up many times before, but like many deep-seated issues this is one that has not quite disappeared from the palette of the structure-based design scientist.

In 2003 Davis, Teague and Gerard Kleywegt (who is incidentally also one of the wittiest conference speakers I have come across) wrote an article pointing out one simple observation: in several PDB structures of proteins co-crystallized with small molecule druglike ligands, the protein seems to be well-resolved and assigned, but the small molecule is often strained, with unrealistic bond lengths, planar aromatic ring atoms, non-planar amide bonds, rings in boat or pseudo chair conformations and clashes between protein and ligand atoms. Now the protein can also be misassigned, and so can water molecules, but it turns out that the problem looms much larger for ligands.

Reynolds's editorial takes another, 2014 look at this 2003 problem. And it seems that while some people have actually become more cognizant of issues in crystal structures, things aren't exactly rosy at this point in time. He points out a 2009 study that located 75% of the structures in the data set whose geometries could be improved by using better restraints.

The first and foremost pitfall that non-specialists fall into when taking a crystal structure at face value is is to assume that whatever they see on that fancy computer screen is...real. The fact though is that, barring any structure solved to better than 1 Ã… (when was the last time you saw that?) every crystal structure is a model (and while we are on the topic, Morpheus's definition of "real" may also be somewhat relevant here). The raw data is those dots that you see in the x-ray diffraction; everything after that, including the pretty picture that you visualize in Pymol, comes from a series of steps undertaken by the crystallographer that involve intuition, parameter fitting, expert judgement and the divining of complete information from incomplete data. That's potentially a lot of guesswork and approximation, and so it shouldn't be surprising that it often leads to flaws in the results.

So is this problem primarily a technology issue? Not really. Reynolds points out several programs that can now fit ligands to the electron density better and get rid of strain and artifacts; Schrodinger's PrimeX and OpenEye's AFITT are only two prominent examples. Nor is it complicated to find out in the first place whether a ligand might be strained; any scientist who has access to a good molecular mechanics energy minimization program can take the ligand structure out of the protein, minimize it to the nearest local minimum, look at the energy difference (usually > 5kcal/mol for a strained ligand), visualize steric clashes between atoms and reach a reasonable conclusion regarding the feasibility of that particular ligand conformation.

The abundance of methods for both figuring out strained ligand conformations and refining them seems to point to something other than technology as the operative factor in the misinterpretation of crystal structures. I believe the problem, in significant part, is culture. Reynolds alludes to this when he says that "Crystallographers are not chemists". When you are a crystallographer and are in hot pursuit of a protein structure, you are rightly going to experience a moment of ecstasy when that huge hulking hunk of sheets and strands finally appears on your screen. But most crystallographers don't care about that little blimp in the binding site - a small molecule that's often crystallized with the purpose of stabilizing the protein as much as for aiding drug discovery - as they do about their beloved protein. In addition, many crystallographers don't have the knee-jerk, intuitive reaction to, say, rings in boat conformations that a good medicinal chemist or a medicinal chemistry-aware modeler would have.

The unfortunate consequence of all this is that the ligand often just comes along for the ride and the protein's gory structural details are exquisitely teased apart at the expense of the ligand's. Protein love often inevitably translates into ligand hate. For an organic chemist a cyclohexane boat may be a textbook violation of conformational preferences, but for a crystallographer it's a big, hydrophobic group filling up a big, fuzzy halo of electron density. Crystallographers are not chemists.

However, an honest assessment of the problem would not unfairly pin the blame for bad ligand structures on crystallographers alone. The fact is that structure-based drug design is an intimate covenant between crystallographers, medicinal chemists and modelers and true appreciation and progress can only come from each side speaking or at least understanding the other's language. To this end, chemists and modelers need to be aware of crystallographic parameters and need to ask the right questions to the crystallographer, beginning with a simple question about the resolution (even this question is rarer than you may think). A medicinal chemist or modeler who simply plucks the provided structure out of the PDB file and starts using it to design drugs is as guilty as a chemistry-challenged crystallographer.

A typical set of questions a modeler or medicinal chemist might ask the crystallographer is: 

- What's the resolution?
- What are the R-factors and the B-factors
- Do you have equal confidence in all parts of the structure? Which parts are more uncertain?
- Are the amides non-planar? 
- Where are the water molecules located? How much confidence do you have in their placement?
- Are atoms supposed to be planar non-planar? 
- Are there any gauche or eclipsed interactions? 
- Are there boats in rings? 
- Have you looked at the strain energy of the ligand?
- How did you refine the ligand?

These questions are not meant to be posed to the crystallographer by men in dark suits in a dimly lit room with bars on the windows, but rather are supposed to provide a reality check on the fidelity of the structure and its potential utility in drug design for all three arms of the SBDD process. The questions are part of a process that allows all three departments to confer and reach an agreement; anyone can and should ask them. They are meant to bring hands together, not to point fingers.

One of the cultural problems in drug discovery is still the reluctance of one group of scientists to adopt at least parts of the cultural behavior of other groups. Organic chemists are quick to look at stereochemistry or unstable functional groups, modelers are not. Modelers are much more prone to look at conformation, organic chemists are not. Crystallographers are far more likely to bear multiple conformations of loops and flexible protein side chains in their minds, the other two parties are not.

The best way to fill these gaps is for each group to speak the language of the other, but until then the optimal solution is to have all of them look at the evidence and emphasize what they think is the most important part. But for that to happen each party has to make as many details of its own domain accessible to the others, and that is partly what is being said here.

Update: As usual, the Yoda of chemistry blogging got there first.

Phil Baran is a man of style. We, of course, knew that for a while.

Thanks to @ChemicalBiology I just came to know about a piece of news that may simultaneously help resurrect chemistry's moribund public image and disintegrate multiple damsel hearts as efficiently as heterolytic bond fission:

Phil Baran may be a seriously hotshot scientist—the recent winner of a MacArthur Fellowship, in fact—but he’s also a bit of a wise guy. The proof? The pinky ring he wears even while working in his synthetic chemistry lab at La Jolla’s internationally acclaimed Scripps Research Institute. “I really love The Sopranos,” says Baran, whose wit is nearly as impressive as his CV. The affable scientist earned his Ph.D. at 24 and trained with a Nobel laureate at Harvard. At SRI, Baran’s team finds cutting-edge and cost-effective ways to produce important pharmaceutical components. And, yes, he’s a Breaking Bad fan. “But instead of making meth, we make something useful,” laughs the resoundingly modest brainiac. His first reaction when he heard about the so-called Genius grant? “It was a vote of confidence for all the people I’ve worked with,” says Baran, who credits regular workouts and twice-weekly boxing sessions for his fit body and mind. “The blood flows to your brain, and you do better science.” Also stimulating: trips to the zoo with his two kids. “They see wonder in everything!” At home in Carmel Valley, his Spanish-born wife—a chemist who’s expecting the couple’s third child this summer—often gives him fashion tips. “She’s converted me into a human,” Baran says. “I’d be happy in a pink potato sack. But I do have style and artistry in my chemistry.”

I wish this were a triumph for the public image of chemistry and chemists, but really, I think it's just Phil Baran. Once again Phil has set up impossible standards for the rest of us.

Free Energy Perturbation (FEP) methods in drug discovery: Or, Waiting for Godot

For interested folks in the Boston area it's worth taking a look at this workshop on Free Energy Perturbation (FEP) methods in drug design at Vertex from May 19-21. The list of speakers and topics is quite impressive, and this is about as much of a state-of-the-art discussion on the topic as you can expect to find in the area.

If computational drug discovery were a series of plays, then FEP might well be the "Waiting for Godot" candidate among them. In fact I would say that FEP is a textbook case of an idea that, if it truly works, can truly transform the early stages of drug discovery. What medicinal chemist would not want to know the absolute free energy of binding of his molecules to a protein so that he can actually rank known and unknown compounds in order of priority? And what medicinal chemist would not want to know exactly what she should make next?

But that's what medicinal chemists have expected from modelers ever since modeling started to be applied realistically to drug discovery, and I think it's accurate to say that it's good they haven't held their breath. FEP methods have always looked very promising because they aim to be very rigorous, bringing the whole machinery of statistical mechanics to bear on a protein-ligand system. The basic goal is "simple": you calculate the individual  energies of the protein and the drug - in explicit water - and then you calculate the energy of the bound system. The difference is the free energy of binding. Problem solved.

Except, not really. Predicting relative free energies is still a major challenge, and predicting absolute free energies is asking for a lot. The major obstacle to the application of these methods for decades was considered to be the lack of enough computing power. But if you really thought that was the major obstacle then you were still a considerable way off. Even now there seems to be a belief that given enough computing power and simulation time we can accurately calculate the free energy of binding between a drug and a target. But that's assuming that the fundamental underlying methodology is accurate, which is a big assumption.

The "fundamental underlying methodology" in this case mainly refers to two factors: the quality of the force field which you use to calculate the energy of the various components and the sampling algorithm which you use to simulate their motions and exhaustively explore their conformations. The force fields can overemphasize electrostatic interactions and can neglect polarization, and the sampling algorithms can fail to overcome large energy barriers. Thus both these components are imperfectly known and applied in most cases, which means that no amount of simulation time or computing power is then going to be sufficient. It's a bit like the Polish army fighting the Wehrmacht in September 1939; simply having a very large number of horses or engaging them in the fight for enough time is not going to help you win against tanks and Stukas.

These problems have all been well recognized however; in fact the two most general issues in any modeling technique are sampling and energy calculation. So parts of this month's workshop are aimed exactly at dissecting the factors that can help us understand and improve sampling and scoring.

The end goal of any applied modeling technique of course is how good it is at prediction. Not surprisingly, progress on this front using FEP has been rather thin. In fact FEP is the quintessential example of a technique whose successes have been anecdotal. Even retrospective examples, while impressive, are not copious. One of the problems is that FEP works only when you are trying to predict the impact of very tiny changes in structure on ligand affinity; for instance the impact of changing a methyl group on a benzene ring to a hydroxyl group. The trouble is that the method doesn't work even for these minor changes across the board; there are projects where a CH3--->OH change will give you quantitative agreement with experiment and there are cases where it will result in error bars large enough to drive a car through them. 

But anecdotes, while not being data, are still quite valuable in telling us what may or may not work. Computing power may not solve all our problems but it has certainly given us the opportunity to examine a large number of cases and try to abstract general rules or best practices for drug discovery. We may not be able to claim consistent successes for FEP right now, but it would help quite a lot even if we know what kinds of systems it works best for. And that, to me, is as good an outcome as we could expect at this time.

The structure of DNA, 61 years later: How they did it.

"A Structure for Deoxyribose Nucleic Acid", Nature, April 25, 1953 (Image: Oregon State University)
This month marks the sixty-first anniversary of the publication of the landmark paper on the structure of DNA by Watson and Crick, which appeared in the April 25, 1953 issue of the journal Nature. Even fifty years later the discovery is endlessly intriguing, not just because it's so important but because in 1954, both Watson and Crick were rather unlikely characters to have made it. In 2012 I wrote a post for the Nobel Week Dialogue event in Stockholm with a few thoughts on what it exactly was that allowed the duo to enshrine themselves in the history books; it was not sheer brilliance, it was not exhaustive knowledge of a discipline, but it was an open mind and a relentless drive to put disparate pieces of the puzzle together. Here's that post.

Somehow it all boils down to 1953, the year of the double helix. And it’s still worth contemplating how it all happened.

Science is often perceived as either a series of dazzling insights or as a marathon. Much of the public recognition of science acknowledges this division; Nobel Prizes for instance are often awarded either for a long, plodding project that is sustained by sheer grit (solving a protein crystal structure), a novel idea that seems to be an inspired work of sheer genius (formulating the Dirac equation) or an accumulated body of work (organic synthesis).

But in one sense, both these viewpoints of science are flawed since both of them tend to obscure the often haphazard, unpredictable, chancy and very human process of research. In reality, the marathon runner, the inspired genius and every scientist in between the two tread a tortuous path to the eureka moment, a path that’s highlighted by false alleys, plain old luck, unexpected obstacles and most importantly, the human obstacles of petty rivalry, jealousy, confusion and misunderstanding. A scientific story that fully captures these variables is, in my opinion, emblematic of the true nature of research and discovery. That is why the discovery of the double helix by Watson and Crick is one of my favorite stories in all of science.

The reason why that discovery is so appealing is because it really does not fit into the traditional threads of scientific progress highlighted above. During those few heady days in Cambridge in the dawn of those gloomy post-war years, Watson and Crick worked hard. But their work was very different from, say, the sustained effort akin to climbing a mountain that exemplified Max Perutz’s lifelong odyssey to solve the structure of hemoglobin. It was also different from the great flashes of intuition that characterized an Einstein or a Bohr, although intuition was applied to the problem – and discarded – liberally. Neither of the two protagonists was an expert in the one discipline that they themselves acknowledged mattered most for the discovery – chemistry. And although they had a rough idea of how to do it, neither really knew what it would take to solve the problem. They were far from being experts in the field.

And therein lies the key to their success. Because they lacked expertise and didn’t really know what would solve the problem, they tried all approaches at their disposal. Their path to DNA was haphazard, often lacking direction, always uncertain. Crick, a man who already considered himself an overgrown graduate student in his thirties, was a crystallographer. Watson, a precocious and irreverent youngster who entered the University of Chicago when he was fifteen, was in equal parts geneticist and bird-watcher. Unlike many of their colleagues, both were firmly convinced that DNA and not protein was the genetic material. But neither of them had the background for understanding the chemistry that is essential to DNA structure; the hydrogen bonding that holds the bases together, the acid-base chemistry that ionizes the phosphates and dictates their geometric arrangement, the principles of tautomerism that allow the bases to exist in one of two possible forms; a form that’s crucial for holding the structure together. But they were willing students and they groped, asked, stumbled and finally triumphantly navigated their way out of this conceptual jungle. They did learn all the chemistry that mattered, and because of Crick they already understood crystallography.

And most importantly, they built models. Molecular models are now a mainstay of biochemical research. Modelers like myself can manipulate seductively attractive three-dimensional pictures of proteins and small molecules on computer screens. But modeling was in its premature days in the fifties. Ironically, the tradition had been pioneered by the duo’s perceived rival, the chemist Linus Pauling. Pauling who would be widely considered the greatest chemist of the twentieth century had successfully applied his model-building approach to the structure of proteins. Lying in bed with a bad cold during a visiting sojourn at Oxford University, he had folded paper and marked atoms with a pencil to conform to the geometric parameters of amino acids derived from simple crystal structures. The end product of this modeling combined with detailed crystallographic measurements was one of twentieth century biochemistry’s greatest triumphs; the discovery of the alpha-helical and beta-sheet structures, foundational structural elements in virtually every protein in nature. How exactly the same model-building later led Pauling to an embarrassing gaffe in his own structure of DNA that violated basic chemical principles is the stuff of folklore, narrated with nonchalant satisfaction by Watson in his classic book “The Double Helix”.

Model building is more art than science. By necessity it consists of patching together imperfect data from multiple avenues and techniques using part rational thinking and part inspired guesswork and then building a picture of reality – only a picture – that’s hopefully consistent with most of the data and not in flagrant violation with important pieces. Even today modeling is often regarded skeptically by the data-gatherers, presumably because it does not have the ring of truth that hard, numerical data has. But data by itself is never enough, especially because the methods to acquire it themselves are incomplete and subject to error. It is precisely by combining information from various sources that one expects to somehow cancel these errors or render them unimportant, so that the signal from one source complements its absence in another and vice versa. The building of a satisfactory model thus often necessarily entails understanding data from multiple fields, each part of which is imperfect.

Watson and Crick realized this, but many of their contemporaries tackling the same problem did not. As Watson recounts it in a TED talk, Rosalind Franklin and Maurice Wilkins were excellent crystallographers but were hesitant to build models using imperfect data. Franklin especially came tantalizingly close to cracking DNA. On the other hand Erwin Chargaff and Jerry Donahue, both outstanding chemists, were less appreciative of crystallography and again not prone to building models. Watson and Crick were both willing to remedy their ignorance of chemistry and to bridge the river of data between the two disciplines of chemistry and crystallography. Through Donohue they learnt about the keto-enol tautomerism of the bases that gave rise to the preferred chemical form. From Chargaff came crucial information regarding constancy of the ratios of one kind of base (purines) to another (pyrimidines); this information would be decisive in nailing down the complementary nature of the two strands of the helix. And through Rosalind Franklin they got access – in ways that even today spark controversy and resentment – to the best crystallographic data on DNA that then existed anywhere.

What was left to do was to combine these pieces from chemistry and crystallography and put together the grand puzzle. For this model building was essential; since Watson and Crick were willing to do whatever it took to solve the structure, to their list of things-to-do they added model building. Unlike Franklin and Wilkins, they had no qualms about building models even if it meant they got the answer partially right. The duo proceeded from a handful of key facts, each of which other people possessed, but none of which had been seen by the others as part of an integrated picture. Franklin especially had gleaned very important general features of the helix from her meticulous diffraction experiments and yet failed to build models, remaining skeptical about the very existence of helices until the end. It was the classic case of the blind men and the elephant.

The facts which led Watson and Crick down the road to the promised land included a scattered bundle of truths about DNA from crystallography and chemistry; the distance between two bases (3.4 Ã…), the distance per turn of the helix (34 Ã…) which in turn indicated a distribution of ten bases per turn, the diameter of the helix (20 Ã…), Chargaff’s rules indicating equal ratios of the two kinds of bases, Alexander Todd’s work on the points of linkage between the base, sugar and nucleotide, Donohue’s important advice regarding the preferred keto form of the bases and Franklin’s evidence that the strands in DNA must run in opposite directions. There was another important tool they had, thanks to Crick’s earlier mathematical work on diffraction. Helical-diffraction theory told them the kind of diffraction pattern that would expect if the structure were in fact helical. This reverse process – predicting the expected diffraction parameters from a model – is today a mainstay of the iterative process of structure refinement used by x-ray crystallographers to solve structures as complex as the ribosome.

Using pieces from the metal shop in Cambridge, Watson gradually accumulated a list of parts for the components of DNA and put them together even as Crick offered helpful advice. Once the pieces were in place, the duo were in the position of an airline pilot who has every signpost, flag and light on the runway paving his way for a perfect landing. The end-product was unambiguous, incisive, elegant, and most importantly, it held the key to understanding the mechanism of heredity through complementary base-pairing. Franklin and Wilkins came down from London; the model was so convincing that even Franklin graciously agreed that it had to be correct. Everyone who saw the model would undoubtedly have echoed Watson and Crick’s sentiment that “a structure this beautiful just had to exist”.

In some sense the discovery of the DNA structure was easy; as Max Perutz once said, the technical challenges that it presented were greatly mitigated because of the symmetry of the structure compared to the controlled but tortuous asymmetry inherent in proteins. Yet it was Watson and Crick and not others who made this discovery and their achievement provides insight into the elements of a unique scientific style. Intelligence they did not lack, but intelligence alone would not have helped, and in any case there was no dearth of it; Perutz, Franklin, Chargaff and Pauling were all brilliant scientists who in principle could have cracked open the secret of life which its discoverers proudly touted that day in the Eagle Pub. 

But what these people lacked, what Watson and Crick possessed in spades, was a drive to explore, interrogate, admit ignorance, search all possible sources and finally tie the threads together. This set of traits also made them outsiders in the field, non-chemists who were trying to understand a chemical puzzle; in one sense they appeared out of nowhere. But because they were outsiders they were relatively unprejudiced. Their personalities cast them as misfits and upstarts trying to disrupt the established order. Then there was the famous irreverence between them; Crick once said that politeness kills science. All these personal qualities certainly helped, but none was as important as a sprightly open-mindedness that was still tempered by unsparing rigor, the ability to ask for and use evidence from all quarters while constraining it within reasonable bounds all the time; this approach led to model building almost as a natural consequence. And the open-mindedness also masked a fearlessness that was undaunted by the imperfect nature of the data and the sometimes insurmountable challenges that seemed to loom.

So that’s how they did it; by questioning, probing, conjecturing and model building even in the presence of incomplete data, and by fearlessly using every tool and idea at their disposal. As we approach problems of increasing biological complexity in the twentieth century, this is a lesson we should keep in mind. Sometimes when you don’t know what approach will solve a problem, you try all approaches, all the time constraining them within known scientific principles. Richard Feynman once defined scientific progress as imagination in a straitjacket, and he could have been talking about the double helix.

First published on Nobel Week Dialogue and the Scientific American Blog Network.

Drug costs and prices: Here we go again

Gilead's hepatitis C drug sofosbuvir (Sovaldi)
Misleading statements and conclusions regarding drug costs and prices are again being thrown around. It started with a post right here on Scientific American Blogs with the title "The Quest: $84,000 Miracle Cure Costs Less Than $150 to Make". As the title indicates, the post is about a new hepatitis C drug called Sovaldi developed by Gilead Sciences. The $150 was the manufacturing cost, the $84,000 was the price. The medicine is considered a real breakthrough and both Gilead and its shareholders have been rewarded with handsome profits during the last quarter. Sovaldi is also regarded as a first-in-class treatment for what was always considered a highly refractory disease.

To its credit the piece was measured and did make the point that the $84,000 was less than the hospitalization and liver transplant costs incurred by hep C patients until now. But unfortunately I think the title of the post was inevitably misleading because it made it sound like every extra penny in addition to the $150 was a profit margin. It also did not put the cost of developing the drug in the context of the very significant barriers to new drug development in the form of formidable scientific challenges, patent cliffs and FDA hurdles. The fact is that drug pricing is a very complex beast and while the system is definitely in need of reform and rational thinking, comparing actual drug production costs with price is basically a futile endeavor that is far more likely to mislead than to enlighten.

Was I mistaken in thinking the title was misleading? Not really. A day later, a Facebook post by Scientific American highlighting the same post ran with the headline "Profit Margin for Hep C Drug: Approximately one gazillion percent". This headline is egregiously, woefully wrong: the quarterly profit margin for Gilead was 44% during the last quarter, but the standard profit margins for pharmaceutical companies are about 20%. The title is grossly simplistic and says nothing about the cost of discovering (not manufacturing) the drug. Scientific American has 1.7 million followers on Facebook so this title is going to mislead quite a few, as is evident from the comments section of that post.

The bigger question to ask is, why are the profit margins so apparently high for Sovaldi? And that's a question the post did not address. The fact is that this class of hepatitis C drugs constitutes a real breakthrough in the treatment of the disease, and one that has been sought for decades. Until now the standard of treatment for the disease consisted of a combination of PEGylated interferon and ribavirin, a stop-gag measure that results in debilitating flu-like symptoms acutely affecting quality of life. The new class of medications directly targets a viral protein called a protease that the virus uses to make new copies of itself. It hits the source and is, by any definition, a vastly superior treatment compared to what we had before. For years the pharmaceutical company has been rightly lambasted for making "me-too" drugs, medicines that are marginally better than what existed on the market and whose high sales and profits are driven mainly by aggressive marketing rather than by real benefits. In this case we actually have a novel drug that provides real benefits, a significant achievement on the part of drug research that needs to be appreciated. Let's criticize high costs, but let's not ignore the improvements in quality of life for patients that simply did not exist before. And let's not fail to congratulate the teams of researchers who actually discover these novel medicines floating in a sea of me-too drugs; it's as big a testament to human ingenuity and perseverance as anything else in scientific history.

Does the price of Sovaldi sound high? Undoubtedly. But that's when I invoke my own "Law of Large Numbers" which roughly says, "The kind of reaction you have when you see a large number seldom has much to do with what that number really means". As with many other numbers we have to put the price in context. As the original post notes, it's still lower than what the price of hospitalization and liver transplants would have been. If you have insurance then your insurance company - encouraged largely by this comparative calculation - should take care of it. Is it still not accessible to a vast number of people in poor countries? Of course; I would like that to happen as much as anything else, especially since hep C is still a disease that puts most of its burden on the poor. But it's at least accessible to a few people whose quality of life will now be much better and it's still better than nothing. That's an important step in the right direction. As happened with AIDS drugs, at some point the medication will in fact become cheap enough, especially when generics take its place a few years down the line. But none of this would have happened if pharmaceutical companies stopped making profits and had to shut down because they couldn't recover R&D costs.

The most important context we need to understand again is one that is often neglected: a comparison of the price with the high cost of drug discovery and development rather than manufacturingwhich is always going to be low. The bare fact of the matter is that it takes about $5 billion to develop a new drug and this has little to do with profit margins. As a hypothetical example, even if the profit margin on Sovaldi was 0% it would still cost about $50,000 dollars and you would still hear a lot of criticism. The cost of a drug is not $5 billion because there's much profit to be made - as I indicated above pharmaceutical profit margins are about 20% - but because drug discovery is so hard. One can quibble about the exact number but that would only serve to obscure the real challenges. I have written four posts on the complexities of drug discovery science and I intend to write more. The reality is that Sovaldi would not have been invented in the first place had companies like Gilead not put enough money from profits into the R&D that went into its development. And sure, some pharmaceutical CEO's make obscene bonuses and and we need to have important conversations about that, but even that part does not significantly contribute to the high costs.

One thing I find amusing is that the same critics who talk about high drug costs also seem to have some kind of an exaggerated picture of drug company scientists as infallible oracles, discovering drugs with their left hand and then sitting back in their sleek sofas, cackling and waiting for the shower of green. But the truth could not be more different. As a drug discovery scientist I have very few moments when I actually think I am any closer to discovering even a new lead, let alone a new drug (and sadly no, I am also not rolling around in money). As I have quoted another scientist in one of my posts, the reason why drugs cost so much is not because we are greedy, it's because we are stupid. A lot of the times we feel the way Yamamoto must have felt in 1943, knowing that he was about to fight a war of attrition that he could not possibly win.

Drug discovery is one of the most wasteful research activities on the planet and it's all because most of the times we have no idea of how to go about discovering a new drug or what's going to happen when we put it inside the human body. The complexities of human biology thwart us at every stage and luck plays an inordinately large role in our success. Even basic issues in drug discovery - understanding how drugs get past cell membranes for instance - are generally unsolved problems, and the profligate inefficiency of the process would truly shame us if we knew how to do it better. The path from a new idea in pharmaceutical research to an actual drug is akin to a path trodden by a blind man along the edge of a cliff at night; any survival, let alone success, seems like a miracle in retrospect. No drug scientist will admit it, but every drug scientist crosses his or her fingers when a new drug makes it to the market because we are just not smart enough to figure out exactly what it will do in every single patient. That is why most drugs fail in advanced clinical trials, when hundreds of millions of dollars have already been spent on them.

The scientific challenges in drug discovery are a major reason why drugs are so expensive. Of course manufacturing costs are going to be low; once you have wasted so much money on R&D and hit the right solution you only have to make the actual substance (and I don't say "only" lightly here). Would one also decry the low manufacturing costs of integrated circuit boards and ignore the massive, extensive, often unsuccessful developments in R&D that resulted in the system being able to beat Moore's Law? It's the same for drug discovery, except that in this case nobody has any idea how we would even start to approach Moore's Law, let alone beat it. So criticize drug costs and CEO bonuses and profits all you like - and continue to have a much needed debate about healthcare costs in this country - but keep in mind that those numbers are more likely to wildly obfuscate than educate if you take them at face value.

Y Combinator and biotech: The wave of the future?

Marcus Wohlsen's book "Biopunk" details the efforts of dedicated garage biotech enthusiasts. Startup incubators like Y-Combinator could help bring their efforts to fruition (Image: Steve Zazeski)
Y Combinator is the well-known startup incubator that picks promising computer technology startup ideas from a competition every year and seeds them with a few tens of thousands of dollars and dedicated space in Silicon Valley in return for an equity stake. It has been cited as one of the most successful startup accelerators around and has funded several successful companies like Scribd and Dropbox. I know something about how competitive and interesting their process is since my wife made it to the final round a few years ago.

Now they're doing something new: planning to fund biotech startups. At first glance this might seem like a bad idea since biotech ventures are both riskier as well as much more expensive compared to IT startups; as well-known chemistry blogger Chemjobber rightly pointed out to me on Twitter, the net ROI for biotech has been negative. But I for one find this to be a very promising idea, and I think of Y Combinator's move as very much of a futuristic, put-money-in-the-bank kind of thing. As the president of the accelerator puts it:

"Six years ago, the starting costs for biotech firms would not work with our model: it took millions of dollars to get anywhere. But the landscape has changed. We’ve noticed, over the past year, more and more promising biotech firms asking about Y Combinator. I think there is a real trend for biotech start-ups looking more like software start-ups. In ten years it won’t even be unusual for a biotech firm to begin in the same way a software firm does, and we want to be on the leading edge of that trend."

And I think he's right. There have been two major developments in the last ten years that are very promising for the future of "garage biotech": The first development is the staggering decline in the cost of DNA sequencing which has now even surpassed Moore's Law, and the second development is the rise of DIY biotech, nicely imagined by one of my mentors Freeman Dyson a few years ago and described by Marcus Wohlsen in an entire book which documents the rise of amateur biotech enthusiasts around the world. Thanks to forward-thinking endeavors like Y Combinator, these enthusiasts might finally find an outlet for their creativity.

The fact is that both equipment and techniques for biotechnology have become massively cheap in the last few years, and it is no longer a pipe dream to imagine a biotech Steve Jobs or Bill Gates coming up with a revolutionary new biotechnology-based diagnostic method or therapeutic in their garage. We are entering an age where young people can actually play with genetic engineering, just like they played with software and hardware in the 1970s. The startup funds that Y Combinator plans to inject in its biotech ventures ($120,000) should be more than adequate for buying used equipment and kits off of eBay. For now they are also planning to focus on biotech research benefiting from software, which seems to be the right place to look for cheap(er) ideas. If I had to bet on specific ideas I would probably bet on diagnostic techniques since therapeutics are usually far more complicated and uncertain. I think they should also look at general tools which promise to make complex endeavors like drug discovery faster and easier; as biology has amply demonstrated, revolutions in science and technology are as much about tools as about ideas.

The legal framework as well as the ultimate market incarnation of a biotech diagnostic or therapeutic is of course much more complicated than for a new computer product, but that part kicks in after the idea has gotten off the ground. In fact endeavors such as the one Y Combinator is planning on funding may well accelerate not just the domestication of biotechnology but people's thinking about important legal and social matters related to its widespread use. To enable true garage biotech we need not just a scientific revolution (which is very much visible) but a social, legal and intellectual property revolution (which is not). Each of them can feed off the other.

But for me the most exciting aspect of the announcement is simply the interest it will hopefully fuel in young people with novel biotech ideas. The computer revolution would never have occurred without thousands of young people starting hobbyist groups and tinkering around with electronics and software in their kitchens and garages, away from their parents' prying eyes. Science and technology are very much Darwinian endeavors, so the more ideas that are available for inspection, rejection and growth, the better it will be for the future of a particular technological paradigm. As Linus Pauling said, "To have a good idea you must first have lots of ideas, then throw the bad ones away". Encouraging people to have lots of ideas is precisely what Y Combinator is doing, so I wish them all good luck.

"A Fred Sanger would not survive today's world of science."

Somehow I missed last year's obituary for double Nobel laureate and bench scientist extraordinaire Fred Sanger by Sydney Brenner in Science. The characteristically provocative Brenner has this to say about a (thankfully) fictional twenty-first century Sanger:
A Fred Sanger would not survive today's world of science. With continuous reporting and appraisals, some committee would note that he published little of import between insulin in 1952 and his first paper on RNA sequencing in 1967 with another long gap until DNA sequencing in 1977. He would be labeled as unproductive, and his modest personal support would be denied. We no longer have a culture that allows individuals to embark on long-term—and what would be considered today extremely risky—projects.
Depressing.

"Designing drugs without chemicals"


As usual Derek beat me to highlighting this rather alarming picture from an October 5, 1981 issue of Fortune magazine that I posted on Twitter yesterday. The image is from an article about computer-aided design and it looks both like a major communications failure (chemical-free drugs, anyone?) as well as a massive dollop of hype about computer-aided drug design. In fact the article has been cited by drug designers themselves as an example of the famous hype curve, with 1981 representing the peak of inflated expectations.

It's intriguing to consider both how we are still considering pretty much the same questions about computational drug design that we were in 1981 as well as how much progress we have made on various fronts since then. I posted an extended comment about both these aspects of the issue on Derek's blog so I am just going to copy it it below. Bottom line: Many of the fundamental problems are still the same and are unsolved on the general level. However, there has been enough understanding and progress to expect solutions to a wide variety of specific problems in the near future. My own attitude is one of cautious optimism, which in drug discovery is usually the best you can have...

For anyone who wants the exact reference it's the Fortune magazine issue from Oct 5, 1981. The reference is widely considered to be both the time when CADD came to the attention of the masses, as well as a classic lesson in hype. The article itself is really odd since most of it is about computer-aided design in the industrial, construction and aeronautical fields; these are fields where the tools have actually worked exceedingly well. The part about drug design was almost a throwaway with almost no explanation in the text.

Another way to look at the issue is to consider a presentation by Peter Goodford in 1989 (cited in a highly readable perspective by John van Drie (J Comput Aided Mol Des (2007) 21:591–601) in which he laid out the major problems in molecular modeling - things like including water, building homology models, calculating conformational changes, predicting solubility, predicting x-ray conformations etc. What's interesting is that - aside from homology modeling and x-ray conformations - we are struggling with the exact same problems today as we were in the 80s. 

That doesn't mean we haven't made any progress though. Far from it in fact. Even though many of these problems are still unsolved on a general level, the number of successful specific examples is on the rise so at some point we should be able to derive a few general principles. In addition we have made a huge amount of progress in understanding the issues, dissecting the various operational factors and in building up a solid database of results. Fields like homology modeling have actually seen very significant advances, although that's as much because of the rise of the PDB which was enabled through crystallography as accurate sequence comparison and threading algorithms. We are also now aware of the level of validation that our results need to have for everyone to take them seriously. Journals are implementing new standards for reproducibility and knowledge of the right statistical validation techniques is becoming more widespread; as Feynman warned us, hopefully this will stop us from fooling ourselves.

As you mention however, the disproportionate growth of hardware and processing power relative to our understanding of the basic physics of drug-protein interaction has led to an illusion of understanding and control. For instance it's quite true that no amount of simulation time and smart algorithms will help us if the underlying force fields are inaccurate and ill-tested. Thus you can beat every motion out of a protein until the cows come home and you still might not get accurate binding energies. That being said, we also have to realize that every method's success needs to be judged in terms of a particular context and scale. For instance an MD simulation on a GPCR might get some of the conformational details of specific residues wrong but may still help us rationalize large-scale motions that can be compared with experimental parameters. Some of the more unproductive criticism in the field has come from people who have the wrong expectations from a particular method to begin with.

Personally I am quite optimistic with the progress we have made. Computational drug design has actually followed the classic Gartner Hype curve, and it's only in the 2000s that we have reached that cherished plateau of realistic expectations. The hope is that at the very least this plateau will have a small but consistent positive slope.

Reproducibility in molecular modeling research

When I was a graduate student, a pet peeve of my advisor and I was the extreme unlikelihood of finding structural 3D coordinates of molecules in papers. A typical paper on conformational analysis would claim to derive a solution conformation for a druglike molecule and present a picture of the conformation, but the lack of 3D coordinates or a SDF or PDB file would make detailed inspection of the structure impossible. Sadly this was more generally in line with a lack of structural information regarding molecular structures in molecular modeling or structural chemistry articles.

Last year Pat Walters (Vertex) wrote an editorial in the Journal of Chemical Information and Modeling which I have wanted to highlight for a while. Walters laments the general lack of reproducibility in the molecular modeling literature and notes that while reproducibility is taken for granted in experimental science, there has been an unusual dearth of it in computational science. He also makes it clear that authors are running out of excuses in making things like original source code and 3D coordinates available under the pretext of lack of standard hardware and software platforms.
It is possible that the wide array of computer hardware platforms available 20 years ago would have made supporting a particular code base more difficult. However, over the last 10 years, the world seems to have settled on a small number of hardware platforms, dramatically reducing any sort of support burden. In fact, modern virtual machine technologies have made it almost trivial to install and run software developed in a different computing environment...
When chemical structures are provided, they are typically presented as structure drawings that cannot be readily translated into a machine-readable format. When groups do go to the trouble of redrawing dozens of structures, it is almost inevitable that structure errors will be introduced. Years ago, one could have argued that too manyfile formats existed and that it was difficult to agree on a common format for distribution. However, over the last 10 years, the community seems to have agreed on SMILES and MDL Mol or SD files as a standard means of distribution. Both of these formats can be easily processed and translated using freely available software such as OpenBabel, CDK, and RDKit. At the current time, there do not seem to be any technical impediments to the distribution of structures and data in electronic form.
Indeed, even experimental scientists are now quite familiar with the SD and SMILES file formats and standard chemical illustration software like ChemDraw is now able to readily handle such formats, so nobody should really claim to be hobbled by the lack of a standard communication language. 

Walters proposes some commonsense guidelines for aiding reproducibility in computational research, foremost of which is to include source code. These guidelines mirror some of the suggestions I had made in a previous post. Unfortunately explicit source code cannot always be made available for proprietary reasons, but as Walters notes, executable versions of this code can still be offered. The other guidelines are also simple and should be required for any computational submissions: mandatory provision of SD files; inclusion of scripts and parameter files for verification; a clear description of the method, preferably with an example; and an emphasis by reviewers on reproducibility aspects of a study.

As has been pointed out by many molecular modelers in the recent past - Ant Nicholls from OpenEye for instance - it's hard to judge even supposedly successful molecular modeling results because the relevant statistical validation has not been done and because there is scant method comparison. At the heart of validation however is simply reproduction, because your opinion is only as good as my processing of your data. The simple guidelines from this editorial should go some way in establishing the right benchmarks.