The Curious Wavefunction: December 2011

Is it too early to ask for automation in lab safety?

Let's face it. Graduate students do work occasionally without lab coats. Sometimes you have to do a quick procedure and sometimes you can just get bored of wearing the coat. I don't recall a time when I have walked into a lab and not seen someone not wearing a coat. Most of these individuals were hopefully not working with dangerous reagents, but the point is that it's human nature to be occasionally negligent. While Sheri Sangji's not wearing a coat was a breach of basic safety protocol (although it's not clear how far the coat would have gone in lessening her injuries), she was no more guilty of violating this safety norm than many other graduate students around the world. It's certainly not right but I don't think the practice is going to disappear anytime soon.

Then there's the question of Prof. Harran's responsibility in enforcing safety standards which I have talked about in a previous post. Even if a professor constantly monitors lab coat violation, he is naturally not going to patrol his lab 24 hours a day during each and every experiment. In addition, even the most diligent professor who is straggled with multiple responsibilities (research, grant writing, teaching, mentoring, administrative work) is going to have an occasional lapse of safety.

The reason I was mulling over these two points was to remind myself of something we all know about; you can't fight human nature. And since human nature is not going to go away, it seems odd to depend purely on human beings to enforce safety standards in a lab. The obvious question that then came to my mind was; why aren't automated systems employed for enforcing at least some safety standards in chemistry laboratories? Why do we still mainly depend on human beings to make sure everyone obeys safety protocols?

This seems especially pertinent if we think of the many other industries and research environments where technology reduces our dependence on the whims and uncertainties of human nature. In industries ranging from the nuclear to the aerospace to the automobile industries, multiple primary and backup systems are in place to kick in during those sadly too frequent occasions when human negligence, error and indifference endanger property and lives. It seems odd to me that in an age when technology is extensively used to deploy automated safety systems in multiple spheres of life, we are still depending on humans to constantly enforce basic and essential safety rules like the wearing of lab coats, glasses and gloves.

Automated systems would of course not protect lab personnel against every accident and it goes without saying that human review would still be necessary, but I don't see why relatively simple systems could not lead to a safer chemical workplace.

Two such simple systems come to my mind. In most current cars, you can open the door only when your keys are closer to it than a certain distance. There is clearly a proximity sensor in the car which detects the keys. A similar system could be used in a lab that would allow a chemical hood to function only when it detects a lab coat. A simple RFID tag embedded in the coat would activate a complementary sensor in the hood. So unless the person who approaches the hood has his or her lab coat on all the time, the hood would essentially go into lock down mode or at least activate an annoying alarm that can be turned off only when the coat is worn (similar to the beeping that results from not wearing a seat belt in a car). The proximity sensor system could hinge on RFID, infrared or optical sensors and the exact details would be dictated by cost, efficiency and mass deployment. But the technology certainly seems to exist and it should not be too expensive or difficult to install such a system in place. The system could of course also detect other safety gear like lab goggles and gloves.

As useful as such techniques for detecting lab gear could be, they would not stop an accident after it happens. A comprehensive automated safety framework needs provisions for both prevention and cure. These systems should especially be viable in the presence of a human being who is unable to take care of himself or herself. Although interfering with a runaway accident after it happens is difficult, there could be a few options. In case of Sheri Sangji, a violently flammable chemical spilled on her lab coat and caught fire, spreading to her sweater. For the next few minutes there was an intense cluster of "hot spots" in the room which she worked in. One could have a fairly simple infrared scanning system which sweeps the room and activates an alarm when it detects such a swarm of high-temperature spots, especially when they are moving. Implementing the condition of motion could help prevent the system from being set off by false positives such as hot flasks and beakers.

These are just a few thoughts. Naturally any such system would have to be refined, tuned and tested and would be subject to emergency human overrides. But it just seems to me that we should be able to implement at least a few robust automated safety systems for preventing lab tragedies when we take their existence in virtually every other aspect of our modern industrial lifestyle for granted.

Professorial oversight, availability bias and the Sheri Sangji case

There's a new twist on the tragic case of Sheri Sangji, a UCLA student working in the lab of Prof. Patrick Harran who died from burns resulting from her handling of tert-Butyllithium, a notoriously and violently flammable substance which has to be handled with the utmost case. This is a horrific example that reminds us of the perpetual and always potentially fatal dangers lurking in every corner of the lab. Our heart goes out to the Sangji family whose rage, grief and frustration are understandable.

But the issue gets murkier. It seems that criminal charges have now been brought against UCLA and Harran by the Los Angeles district attorney. Harran is going to surrender to the authorities when he comes back from what I am assuming is a holiday vacation.

I feel extremely doubtful that the charges would hold up, but I also think that these kinds of debates are generally conducive to maintaining a healthy safety culture. Something about Jefferson's quote about the price of democracy being eternal vigilance comes to mind. It's clear that the lab in which Sangji was working was found to violate safety standards, but I am sure that's probably the case for several other labs across the country. This does not excuse the lack of standards, but it makes one wonder if focusing on such stories leads to the typical situation where certain "rare events" seem to dictate our feelings and opinions on a more general issue because of their graphic nature and the emphasis that the media puts on them. More on this later.

The other reason the charges may not hold up is that the culpability of the institution and Prof. Harran, if it exists at all, is likely to be very fuzzy. Unfortunately Sangji was not wearing a lab coat, and I am guessing it would be very difficult, if not impossible, to find demonstrable evidence that she had not been told to constantly use this most basic of lab safety measures. In addition she was also wearing a sweater and was syringing out a rather large amount of the inflammable substance, and the prosecution will also have to find evidence that she was not warned against either of those practices. In addition Sangji was considered fairly well-versed in the hazards of chemical experimentation so she was expected to have known about basic lab protocols. None of this is to lay blame at her feet, but only to note that it muddies the legal aspect of the case.

But I think the greater issue deals with the amount of involvement that a professor should have in the safety of his students. I don't know of any faculty member (although I am sure there are a few) who schedules individual sessions with each of his or her students and instructs them in the minutiae of lab safety. Nor does every professor step into lab several times a day looking for every safety violation and I don't think it's realistic to expect them to. I don't know if it's legally required for any professor to specifically warn their students about the danger of handling t-BuLi. At most professors should periodically (but regularly) remind their students about safety standards, loudly denounce blatant violations and then expect senior graduate students and postdocs to enforce standards. If Prof. Harran is indeed guilty of transgressing safety norms, then it seems that the senior students and postdocs in his lab should share this blame even more. I am not saying either of them should, but it's hard for me to see how the responsibility for safety violations should fall squarely on the shoulders of Prof. Harran and not on his lab personnel.

Coming back to the highlighting of the issue as an indictment of lab safety, I am reminded of the always controversial issue of safety in the nuclear industry. We have constantly lived in times when the graphic, the dramatic and the most sensationalized events have dictated our opinions, no matter how rare they are. In case of the nuclear industry for instance, the occasional Chernobyl and Fukushima color our opinions of nuclear power for decades, even if thousands of nuclear reactors have been humming along for decades without major incidents. The safety record in the nuclear industry is way better than that in the chemical, coal or automobile industries, yet the nuclear industry gets an outrageous share of our derision and disapproval. The result? The distinct censure and under-utilization of nuclear power which has held its widespread deployment back for decades.

An undue focus on the perils of chemical research may similarly detract from the decades of productive chemical research and the education of promising chemists that has largely transpired without incident. I fear that bringing charges against UCLA and Prof. Harran will set a troubling precedent and may result in similar under-utilization of the benefits of chemical education. For instance I can see professors at other institutions holding back and being more reluctant to let undergraduates or technical assistants indulge in research involving common but potentially dangerous chemicals. We are already seeing the consequences of a disproportionate preoccupation with chemical safety in the lack of interesting experiments in chemistry sets for teenagers (presumably because most interesting experiments involve dangerous chemicals). Students themselves might be less eager to burnish their research credentials by working in a chemistry lab. Universities may enforce stricter rules restricting the availability of research opportunities for undergraduates on the grounds that they may lead to potential accidents.

Finally, the undue emphasis on safety and the resulting media circus may simply make worse what has been a perpetual headache for the proponents of chemistry - the public image of the discipline. The media has always been adept at exploiting a version of availability bias, a phenomenon delineated by psychologists Daniel Kahneman and Amos Tversky in which our perceptions of a phenomenon are shaped by what's easily remembered rather than what's the norm. One can be assured that the media will be far more eager to write about the occasional chemical tragedy than the countless number of times when the system actually worked and nobody was harmed. The Sangji case and the current charges against UCLA will do nothing to quell public fears about the dangers of chemical research. The public perception of working in a chemical laboratory will relate to what's "newsworthy" (deaths and fires) rather than what the facts are (thousands of safe experiments resulting in no harm). Ironically these dangers have always been there, but the countless number of times when they have caused no harm and in fact have led to great advances has gone unheeded.

Of course, none of this backlash may occur and certainly none of the ensuing discussion implies that we should be lackadaisical in the implementation and review of safety standards. Safety reviews should be second nature to lab personnel irrespective of tragedies like this one. Whenever they can professors should always remind every student under their wing of the ever-present dangers lurking in their laboratory. Senior graduate students and postdocs should consider the enforcing of lab safety their special responsibility since only a palpable safety-conscious culture could lead to an unconscious regard for safety. And universities should spare no effort in carrying out regular safety assessments.

But none of this should distract us from the very real benefits that chemical research and education have brought to countless young researchers whose time in the lab has inspired them to contribute to the advancement of chemical knowledge. It should not make us ignore the commendable tradition of chemical research in which professors and their students have carried out safe and illuminating chemical experiments in the presence of thousands of potentially fatal chemicals. Yes, students in labs are surrounded by chemical perils. But so are most of us in virtually every sphere of life. In the face of risks we do what we have always done, assess the dangers and constantly review, revise and research. And carry on.

What happened to Sheri Sangji was a tragedy, and sadly a preventable one at that. Yet if we overstep our boundaries of response and reaction, Sangji will not be the only victim. The real tragedy will be the discipline of chemistry itself.

A Christmas message from Steve Jobs for our friends in pharma

I am at the end of Walter Isaacson's excellent biography of Steve Jobs and it's worth a read even if you think you know a lot about the man. Love him or hate him, it's hard to deny that Jobs was one of those who disturbed our universe in the last few decades. You can accuse him of a lot of things, but not of being a lackluster innovator or product designer.

The last chapter titled "Legacy" has a distillation of Jobs's words about innovation, creativity and the key to productive, sustainable companies. In that chapter I found this:

"I have my own theory about why decline happens at companies like IBM or Microsoft. The company does a great job, innovates and becomes a monopoly or close to it in some field, and then the quality of product becomes less important. The company starts valuing the great salesmen, because they're the ones who can move the needle on revenues, not the product engineers and designers. So the salespeople end up running the company. John Akers at IBM was a smart, eloquent, fantastic salesperson but he didn't know anything about product. The same thing happened at Xerox. When the sales guys run the company, the product guys don't matter so much, and a lot of them just turn off."

Jobs could be speaking about the modern pharmaceutical industry. There the "product designers" are the scientists of course. Although many factors have been responsible for the decline of innovation in modern pharma, one of the variables that strongly correlates is the replacement of product designers at the helm by salespeople and lawyers beginning roughly in the early 90s.

There's a profound lesson in there somewhere. Not that wishes come true, but it's Christmas, and while we don't have the freedom to innovate, hold a stable job and work on what really matters, we do have the freedom to wish. So with this generous dose of wishful thinking, I wish you all a Merry Christmas.

Unruly beasts in the jungle of molecular modeling

The Journal of Computer-Aided Molecular Design is having a smorgasbord of accomplished modelers reflecting upon the state and future of modeling in drug discovery research and I would definitely recommend anyone - and especially experimentalists - interested in the role of modeling to take a look at the articles. Many of the articles are extremely thoughtful and balanced and take a hard look at the lack of rigorous studies and results in the field; if there was ever a need to make journal articles freely available it was for these kinds, and it's a pity they aren't. But here's one that is open access, and it's by some researchers from Simulations Inc. who talk about three beasts (or in the authors' words, "Lions and tigers and bears, oh my!") in the field that are either unsolved or ignored or both.

1. Entropy: As they say, entropy, taxes and death (entropy) are the three constant things in life. In modeling both small molecules and proteins, entropy has always been the elephant in the room, blithely ignored in most simulations. At the beginning there was no entropy. Early modeling programs then started extracting a rough entropic penalty for freezing certain bonds in the molecule. While this approximated the loss of ligand entropy in binding, it did nothing to take care of the conformational entropy loss that resulted in the compression of a panoply of diverse conformations in solution to a single bound conformation.

But we were just getting started. A very large part of the entropy of binding a ligand by a protein comes from the displacement of water molecules in the active site, essentially their liberation from being constrained prisoners of the protein to free-floating entities in the bulk. A significant advance in trying to take this factor into account was an approach that explicitly and dynamically calculated the enthalpy, entropy and therefore the free energy of bound waters in proteins. We have now reached the point where we can at least think of doing a reasonable calculation on such water molecules. But water molecules are often ill-localized in protein crystal structures because of low-resolution, inadequate refinement and other reasons. It's not easy to perform such calculations for arbitrary proteins without crystal structures.

However, a large piece of the puzzle that's still missing is the entropy of the protein which is extremely difficult to calculate on many fronts. Firstly, the dynamics of the protein is often not captured by a static x-ray structure so any attempts to calculate protein entropy in the presence and absence of ligands would have to shake the protein around. Currently the favored process for doing this is molecular dynamics (MD) which suffers from its own problems, most notably the accuracy of what's under the hood- namely force fields. Secondly, even if we can calculate the total entropy changes, what we really need to know is how the entropy is distributed between various modes since only some of these modes are affected upon ligand binding. An example of the kind of situation in which such details would be important is the case of slow, tight-binding inhibitors illustrated in the paper. The example is of two different prostaglandin synthase inhibitors which demonstrate almost identical binding orientations in the crystal structure. Yet one is a weak binding inhibitor which dissociates rapidly and the other is slow, tight-binding. Only a dynamic treatment of entropy can explain such differences, and we are still quite far from being able to do this in the general case.

2. Uncertainty: Out of all the hurdles facing the successful application and development of modeling in any field, this might be the most fundamental. To reiterate, almost every kind of modeling starts by using a training set of molecules for which the data is known and then proceeds to apply the results from this training set to a test set for which the results are unknown. Successful modeling hinges on the expectation that the data in the test set is sufficiently similar to that in the training set. But problems abound. For one thing, similarity is the eye of the beholder and what seems to be a reasonable criterion for assuming similarity may turn out to be irrelevant in the real world. Secondly, overfitting is a constant issue and results that look perfect for the training set can fail abysmally on the test set.

But as the article notes, the problems go further and the devil's in the details. Modeling studies very rarely try to quantify the exact differences between the two sets and the error resulting from that difference. What's needed is an estimate of predictive uncertainty for single data points, something which is virtually non-existent. The article notes the seemingly obvious but often ignored fact when it says that "there must be something that distinguishes a new candidate compound from the molecules in the training set". This 'something' will often be a function of the data that was ignored when fitting the model to the training set. Outliers which were thrown out because they were...outliers might return with a vengeance in the form of a new set of compounds that are enriched in their particular properties which were ignored.

But more fundamentally, the very nature of the model used to fit the training set may be severely compromised. In its simplest incarnation for instance, linear regression may be used to fit data points to a set of relationships that are inherently non-linear. In addition, descriptors (such as molecular properties supposedly related to biological activity) may not be independent. As the paper notes, "The tools are inadequate when the model is non-linear or the descriptors are correlated, and one of these conditions always holds when drug responses and biological activity are involved". This problem penetrates into every level of drug discovery modeling, from basic molecular level QSAR to higher-level clinical or toxicological modeling. Only a judicious and high-quality application of statistics, constant validation, and a willingness to wait (for publication, press releases etc.) before the entire analysis is available will preclude erroneous results from seeing the light of day.

3. Data Curation: This is an issue that should be of enormous interest to not just modelers but to all kinds of chemical and biological scientists concerned about information accuracy. The well-known principle of Garbage-In Garbage Out (GIGO) is at work here. The bottom line is that there is an enormous amount of chemical data on the internet that is flawed. For instance there are cases where incorrect structures were inferred from correct names of compounds:

"The structure of gallamine triethiodide is a good illustrative example where many major databases ended up containing the same mistaken datum. Until mid-2011, anyone relying on an internet search would have erroneously concluded that gallamine triethiodide is a tribasic amine. The error resulted from mis-parsing the common name at some point as meaning that the compound is a salt of gallamine and ‘‘ethiodidic acid,’’ identifying gallamine as the active component and retrieving the relevant structure. In fact, gallamine triethiodide is what you get when you react gallamine with three equivalents of ethyl iodide"

So gallamine triethiodide is the triply protonated salt, not the tribasic amine. Assuming otherwise can only lead to chemical mutilation and death. And this case is hardly unique. An equally common problem is simply assigning the wrong ionization state for chemical compounds as illustrated at the beginning of the post. I have already mentioned this as a rookie mistake, but nobody is immune to it. It should hardly be mentioned that any attempt to model an incorrect structure will result in completely wrong results. The bigger problem of course is when the results seem right and prevent us from locating the error; for example, docking an incorrectly positively charged structure into a negative binding site will result in very promising but completely spurious results.

It's hard to see how exactly the entire modeling community can rally together, collectively rectify these errors and establish a common and inviolable standard for performing studies and communicating their results. Until then all we can do is point out the pitfalls, the possibilities, the promises and the perils.

Clark, R., & Waldman, M. (2011). Lions and tigers and bears, oh my! Three barriers to progress in computer-aided molecular design Journal of Computer-Aided Molecular Design DOI: 10.1007/s10822-011-9504-3

On reproducibility in modeling

A recent issue of Science has an article discussing an issue that has been a constant headache for anyone involved with any kind of modeling in drug discovery - the lack of reproducibility in computational science. The author Roger Peng who is a biostatistician at Johns Hopkins talks about modeling standards in general but I think many of his caveats could apply to drug discovery modeling. The problem has been recognized for a few years now but there have been very few concerted efforts to address it.

An old anecdote from my graduate advisor's research drives the point home. He wanted to replicate a protein-ligand docking study done with a compound so he contacted the scientist who had performed the study and processed the protein and ligand according to the former's protocol. He appropriately adjusted the parameters and ran the experiment. To his surprise he got a very different result. He repeated the protocol several times but consistently saw the wrong result. Finally he called up the original researcher. The two went over the protocol a few times and finally realized that the problem lay in a minor but overlooked detail - the two scientists were using slightly different versions of the modeling software. This wasn't even a new version, just an update, but for some reason it was enough to significantly change the results.

These and other problems dot the landscape of modeling in drug discovery. The biggest problem to begin with is of course the sheer lack of reporting of details in modeling studies. I have seen more than my share of papers where the authors find it enough to simply state the name of the software used for modeling. No mention of parameters, versions, inputs, "pre-processing" steps, hardware, operating system, computer time or "expert" tweaking. The latter factor is crucial and I will come back to it. In any case, it's quite obvious that no modeling study can be reproducible without these details. Ironically, the same process that made modeling more accessible to the experimental masses has also encouraged the reporting of incomplete results; the incarnation of simulation as black-box technology has inspired experimentalists to widely use it, but on the flip side it has also discouraged many from being concerned about communicating under-the-hood details.

A related problem is the lack of objective statistical validation in reporting modeling results, a very important topic that has been highlighted recently. Even when protocols are supposedly accurately described, the absence of error bars or statistical variation means that one can get a different result even if the original recipe is meticulously followed. Even simple things like docking runs can give slightly different numbers on the same system, so it's important to be mindful of variation in the results along with their probable causes. Feynman talked about the irreproducibility of individual experiments in quantum mechanics, and while it's not quite that bad in modeling, it's still not irrelevant.

This brings us to one of those important but often unquantifiable factors in successful modeling campaigns - the role of expert knowledge and intuition. Since modeling is still an inexact science (and will probably remain so for the foreseeable future), intuition, gut feelings and a "feel" for the particular system under consideration based on experience can often be an important part of massaging the protocol to deliver the desired results. At least in some cases these intangibles are captured in any number of little tweaks, from constraining the geometry of certain parts of a molecule based on past knowledge to suddenly using a previously unexpected technique to improve the clarity of the data. A lot of this is never reported in papers and some of it probably can't be. But is there a way to capture and communicate at least the tangible part of this kind of thinking?

The paper alludes to a possible simple solution and this solution will have to be implemented by journals. Any modeling protocol generates a log file which can be easily interpreted by the relevant program. In case of some modeling software like Schrodinger, there's also a script that records every step in a format comprehensible to the program. Almost any little tweak that you make is usually recorded in these files or scripts. A log file is more accurate than an English language description at documenting concrete steps. One can imagine a generic log file- generating program which can record the steps across different modeling programs. This kind of venture will need collaboration between different software companies but it could be very useful in providing a single log file that captures as much of both the tangible and intangible thought processes of the modeler as possible. Journals could insist that authors upload these log files and make them available to the community.

Ultimately it's journals which can play the biggest role in the implementation of rigorous and useful modeling standards. In the Science article the author describes a very useful system of communicating modeling results used by the journal Biostatistics. Under this system authors doing simulation can request a "reproducibility review" in which one of the associate editors runs the protocols using the code supplied by the authors. Papers which pass this test are clearly flagged as "R" - reviewed for reproducibility. At the very least, this system gives readers a way to distinguish rigorously validated papers from others so that they know which ones to trust more. You would think that there would be backlash against the system from those who don't want to explicitly display the lack of verification of their protocols, but the fact that it's working seems to indicate its value to the community at large.

Unfortunately in case of drug discovery, any such system will have to deal with the problem of proprietary data. There are several papers without such data which could also benefit from this system, but there can be ways to handle proprietary data. Even proprietary data can be amenable to partial reproducibility. In a typical example for instance, molecular structures which are proprietary could be encoded into special organization-specific formats that are hard to decode (an example would be formats used by OpenEye or Rosetta). One could still run a set of modeling protocols on this cryptic data set and generate statistics without revealing the identity of the structures. Naturally there will have to be safeguards against the misuse of any such evaluation but it's hard to see why they would be difficult to institute.

Finally, it's only a community-wide effort equally comprised of industry and academia which can lead to the validation and use of successful modeling protocols. The article suggests creating a kind of "CodeMed Central" repository akin to PubMed Central, and I think modeling could greatly benefit from such a central data source. Code for successful protocols in virtual screening or homology modeling or molecular dynamics or what have you can be uploaded to a site (along with the log files of course). Not only would these protocols be used to verify their reproducibility, but they could also be used to practically aid data extraction from similar systems. The community as a whole would benefit.

Before there's any data generation or sharing, before there's any drawing of conclusions, before there's any advancement of scientific knowledge, there's reproducibility, a scientific virtue that has guided every field of science since its modern origin. Sadly this virtue has been neglected in modeling, so it's about time that we pay more attention to it.

Peng, R. (2011). Reproducible Research in Computational Science Science, 334 (6060), 1226-1227 DOI: 10.1126/science.1213847

Why drug design is like airplane design. And why it isn't.

Air travel constitutes the safest mode of travel in the world today. What is even more impressive is the way airplanes are designed by modeling and simulation, sometimes before the actual prototype is built. In fact simulation has been a mainstay in the aeronautical industry for a long time and what seems like a tremendously complex interaction of metal, plastic and the unpredictable movements of air flow can now be reasonably captured in a computer model.

In a recent paper, Walter Woltosz of Simulations Plus Inc. asks an interesting question: compared to the aeronautical industry where modeling has been applied to airplane design for decades, why has it taken so long for modeling to catch on in the pharmaceutical industry? In contrast to airplane design which is now a well-accepted and widely used tool, why is simulation of drugs and proteins still (relatively) in the doldrums? Much progress has surely been made in the field during the last thirty years or so, but modeling is nowhere as integrated in the drug discovery process as computational fluid dynamics is in the airplane design process.

Woltosz has an interesting perspective on the topic since he himself was involved in modeling the early Space Shuttles. As he recounts, what's interesting about modeling in the aeronautical field is that NASA was extensively using primitive 70s computers to do it even before they built the real thing. A lot of modeling in aeronautics involves figuring out the right sequence of movements an aircraft should take in order to keep itself from breaking apart. Some of it involves solving the Navier-Stokes equations that dictate the complicated air flow around the plane, some of it involves studying the structural and directional effects of different kinds of loads on materials used for construction. The system may seem complicated but as Woltosz tells it, simulation is now used ubiquitously in the industry to discard bad models and tweak good ones.

Compare that to the drug discovery field. The first simulations of pharmaceutically relevant systems started in the early 80s. Since then the field has progressed in fits and starts and while many advances have come in the last two decades, modeling approaches are not a seamless part of the process. Why the difference? Woltosz comes up with some intriguing reasons, some obvious and others more thought-provoking.

1. First and foremost of course, biological systems are vastly more complicated than aeronautical systems. Derek has already written at length about the fallacy of applying engineering analogies to drug discovery and I would definitely recommend his thoughts on the topic. In case of modeling, I have already mentioned that the modeling community is getting ahead of itself by trying to chew on more complexity than it can bite. Firstly you need to have a list of parts to simulate and we are still very much in the process of putting together this list. Secondly, having the list will tell us little about how the parts interact. Biological systems display complex feedback loops, non-linear signal-response features and functional "cliffs" where a small change in the input can lead to a big change in the output. As Woltosz notes, while aeronautical systems can also be complex, their inputs are much more well-defined.

But the real difference is that we can actually build an airplane to test our theories and simulations. The chemical analogy would be the synthesis of a complex molecule like a natural product to test the principles that went into planning its construction. In the golden age of organic synthesis, synthetic feats were undertaken for structure confirmation but also to validate our understanding of the principles of physical organic chemistry, conformational analysis and molecular reactivity. Even if we get to a point where we think we have a sound grounding of the principles governing the construction and workings of a cell, it's going to be a while before we can truly confirm those principles by building a working cell from scratch.

2. Another interesting point concerns the training of drug discovery researchers. Woltosz is probably right that engineers are much more of generalists than pharmaceutical scientists who are usually rigidly divided into synthetic chemists, biologists, pharmacologists, modelers, process engineers etc. The drawback of this compartmentalization is something I have experienced myself as a modeler; scientists from different disciplines can mistrust each other and downplay the value of other disciplines in the discovery of a new drug. This is in spite of the fact that drug discovery is an inherently complex and multidisciplinary process which can only benefit from an eclectic mix of backgrounds and approaches. A related problem is that some bench chemists, even those who respect modeling, want modeling to provide answers, but they don't want to run experiments (such as negative controls) which can advance the state of the field. They are reluctant to carry out the kind of basic measurements (such as measuring solvation energies of simple organic molecules) which would be enormously valuable in benchmarking modeling techniques. A lot of this is unfortunate since it's experimentalists themselves who are going to ultimately benefit from highly validated computational approaches.

There's another point which Woltosz does not mention but which I think is quite important. Unlike chemists, engineers are usually more naturally inclined to learn programming and mathematical modeling. Most engineers I know know at least some programming. Even if they don't extensively write code they can still use Matlab or Mathematica, and this is independent of their specialty (mechanical, civil, electrical etc.). But you would be hard-pressed to find a synthetic organic chemist with programming skills. Also, since engineering is inherently a more mathematically oriented discipline, you would expect an engineer to be more open to exploring simulation even if he doesn't do it himself. It's more about the culture than anything else. That might explain the enthusiasm of early NASA engineers to plunge readily into simulation. The closest chemical analog to a NASA engineer would be a physical chemist, especially a mathematically inclined quantum chemist who may have used computational techniques even in the 70s, but how many quantum chemists (as compared to synthetic chemists for instance) work in the pharmaceutical industry? The lesson to be drawn here is that programming, simulation and better mathematical grounding need to be more widely integrated in the traditional education of chemists of all stripes, especially those inclined toward the life sciences.

3. The third point that Woltosz makes concerns the existence of a comprehensive knowledge base for validating modeling techniques and he thinks that a pretty good knowledge base exists today upon which we can build useful modeling tools. I am not so sure. Woltosz is mainly talking about physiological data and while that's certainly valuable, the problem exists even at much simpler levels. I would like to stress again that even simple physicochemical measurements of parameters such as solvation energies which can contribute to benchmarking modeling algorithms are largely missing, mainly because they are unglamorous and underfunded. On the bright side, there have been at least some areas like virtual screening where researchers have judiciously put together robust datasets for testing their methods. But there's a long way to go and much robust basic scientific experimental data needs to be gathered. Again, this can come about only if scientists from other fields recognize the potential long-term value that modeling can bring to drug discovery and contribute to its advancement.

Woltosz's analogy of drug design and airplane design also reminds me of something that Freeman Dyson once wrote about the history of flight. In "Imagined Worlds", Dyson described the whole history of flight as a process of Darwinian evolution in which many designs (and lives) were destroyed in the service of better ones. Perhaps we also need a merciless process of Darwinian evaluation in modeling. Some of this is already taking place in the field of protein modeling field with CASP and in protein-ligand modeling with SAMPL, but the fact remains that the drug discovery community as a whole (and not just modelers) will have to descend on the existing armamentarium of modeling tools and efficiently and ruthlessly evaluate them to pick out the ones that work. This has not happened yet.

Ultimately I like the fact that Woltosz is upbeat, and while the real benefits coming out of the process are uncertain, I definitely do agree with him that that we will know the answer only if the pharmaceutical industry makes a concerted effort to test, refine, retain and discard modeling approaches to drug design at all levels. That's the only way we will know what works. Sadly, one of the problems is that it will necessarily be a slow, long-term validation and development effort that will need the constant engagement of the global drug discovery community as a whole. It may be too much to ask in this era of quick profits and five-year exit strategies. On the other hand, we are all in this together, and we do want to have our chance at the drug discovery equivalent of the moon shot.

References:

Woltosz, W. (2011). If we designed airplanes like we design drugs… Journal of Computer-Aided Molecular Design DOI: 10.1007/s10822-011-9490-5

Truth and beauty in chemistry

The mathematician Hermann Weyl who made many diverse contributions to his discipline once made the startling assertion that whenever he had to choose between truth and beauty in his works, he usually chose beauty. Mathematicians and theoretical physicists are finely attuned to the notion of beauty. They certainly have history on their side; some of the greatest equations of physics and theories of mathematics sparkle with economy, elegance and surprising universality, qualities which make them beautiful. Like Weyl, Paul Dirac was famously known to extol beauty in his creations and once said that there is no place in the world for ugly mathematics; the equation named after him is a testament to his faith in the harmony of things.

How do you define and reconcile truth and beauty in chemistry? And is chemical truth chemical beauty? In chemistry the situation is trickier since chemistry much more than physics is an experimental science based on models rather than universal overarching theories. Chemists more than physicists revel in the details of their subject. Perhaps the succinct equations of thermodynamics come closest in chemistry to defining beauty, but physics can equally lay claim to these equations. Is there a quintessentially chemical notion of beauty and how does it relate to any definition of truth? Keats famously said, “Beauty is truth, truth beauty”. Is this true in chemistry?

At this point it’s fruitful to compare any description of beauty in chemistry with that in science in general. Although scientific beauty can be notoriously subjective, many explanatory scientific frameworks deemed beautiful seem to share certain qualities. Foremost among these qualities are universality and economy; more specifically, the ability to explain the creation of complexity from simplicity. In physics for instance, the Dirac equation is considered a supreme example of beauty since in half a dozen symbols it essentially explains all the properties of the electron and also unifies it with the special theory of relativity. In mathematics, a proof – Euclid’s proof of the infinitude of prime numbers for instance – is thought to be beautiful if it combines the qualities of economy, generality and surprise. Beauty is inherent in biology too. Darwin’s theory of natural selection is considered to be especially elegant because just like equations in physics or theorems in mathematics, it explains an extraordinary diversity of phenomena using a principle which can be stated in a few simple words.

It is not easy to find similar notions of beauty in chemistry, but if we look around carefully we do find examples, even if they may not sound as profound or universal as those in chemistry’s sister disciplines. Perhaps not surprisingly, many of these examples are most manifest in theories of chemical bonding, since these theories underlie all of chemistry in principle. I certainly saw elegance when I studied crystal field theory. Crystal field theory uses a few simple notions of the splitting of energies of molecular orbitals to explain the color, magnetic and electric properties of thousands of compounds. It’s not a quantitative framework and it’s not perfect, but it can be taught to a high school student and has ample qualitative explanatory power. Another minor chemical concept which impressed me with its sheer simplicity was VSEPR (Valence Shell Electron Pair Repulsion). VSEPR predicts the shape of simple molecules based on the number of their valence electrons. Working out the consequences for a molecule’s geometry using VSEPR is literally a back of the envelope exercise. It’s the kind of idea one may call “cute”, but in its own limited way it’s certainly elegant. Yet another paradigm from the field of bonding is Hückel theory. Hückel theory seeks to predict the orbital energies and properties of unsaturated molecules like ethylene and benzene. It will tell you for instance why tomatoes are red and what happens when a photon of light strikes your retina. Again, the theory is not as rigorous as some of the advanced methods that followed it, but for its simplicity it is both elegant and remarkable useful.

As an aside, anyone who wants to get an idea of beauty in chemistry should read Linus Pauling’s landmark book “The Nature of the Chemical Bond”. The volume still stands as the ultimate example of how an untold variety of phenomena and molecular structures can be understood through the application of a few simple, elegant rules. The rules are derived through a combination of empirical data and rigorous quantum mechanics calculations. This fact may immediately lead purist physicists to denounce any inkling of beauty in chemistry, but they would be wrong. Chemistry is not applied physics, and its unique mix of empiricism and theory constitutes its own set of explanatory fundamental principles, in every way as foundational as the Dirac equation or Einstein’s field equations are to physics.

This mention of the difference between empiricism and theory reminds me of a conversation I once had with a colleague that bears on our discussion of elegance and beauty in chemistry. We were arguing the merits of using molecular mechanics and quantum mechanics for calculating the properties of molecules. Molecular mechanics is a simple method that can give accurate results when parameterized using empirical experimental information. Quantum mechanics is a complicated method that gives rigorous, first-principle results without needing any parameterization. The question was, is quantum mechanics or molecular mechanics more “elegant”? Quantum mechanics does calculate everything from scratch and in principle is a perfect theory of chemistry, but for a truly rigorous and accurate calculation of a realistic molecular system, its equations can become complicated, unwieldy and can take up several pages. Molecular mechanics on the other hand can be represented using a few simple mathematical terms which can be scribbled on the back of a cocktail napkin. Unlike quantum mechanics, molecular mechanics calculations on well-parameterized molecules take a few minutes and can give results comparable in accuracy to those of its more rigorous counterpart. The method needs to be extensively parameterized of course, but one could argue that its simple representation makes it more “elegant” than quantum mechanics. In addition, on a practical basis one may not even need the accuracy of quantum mechanics for their research. Depending on the context and need, different degrees of accuracy may be sufficient for the chemical practitioner; for instance, calculation of relative energies may not be affected by a constant error in each of the calculations, but that of absolute energy will not tolerate such an error. The discussion makes it clear than, while definitions of elegance are beyond a point subjective and philosophical, in chemistry elegance can be defined as much by practical accessibility and convenience as by perfect theoretical frameworks and extreme rigor. In chemistry “truth” can be tantamount to “utility”. In this sense the chemist is akin to the carpenter who judges the “truth” of his chair based on whether someone can comfortably sit on it.

While these expositions of beauty in theories of chemical bonding are abstract, there is a much starker and obvious manifestation of chemical pulchritude, in the marvelous little molecular machines that nature has exquisitely crafted through evolution. This is true of crystal structures in general but especially of protein structures. X-ray crystallographers who have cracked open the secrets of key proteins are all too aware of this beauty. Consider almost any structure of a protein deposited in the Protein Data Bank (PDB) – the world’s largest protein structure repository – and one immediately becomes aware of the sheer variety and awe-inspiring spatial disposition of nature’s building blocks. As someone who looks at protein and ligand structures for a living, I could spend days staring at the regularity and precise architecture of these entities. The structure of a profoundly important biochemical object like the ribosome is certainly pleasing to the eye, but more importantly, it contains very few superfluous elements and is composed of exactly the right number of constituent parts necessary for it to carry out its function. It is like a Mozart opera, containing only that which is necessary. In addition these structures often display elements of symmetry, always an important criterion for considerations of beauty in any field. Thus an elegant molecular structure in general and protein structure in particular straddles both the mathematician’s and biologist’s conception of beauty; it is a resounding example of economy and it resembles the biologist’s idea of geometric harmony as found in creatures like crustaceans and diatoms.

The ensuing discussion may make it sound like chemistry lacks the pervasive beauty of grand explanatory theories and must relegate itself to limited displays of elegance and beauty through specific models. But chemistry also has trappings of beauty which have no counterpart in physics, mathematics or biology. This is most manifest through the drawing of molecular structures which are an inseparable part of the chemist’s everyday trade. These displays of penmanship put chemistry in the same league as the visual arts and architecture and impart to it a unique element of art which almost no other science can claim. They constitute acts of creation and not just appreciation of existing phenomena. What other kind of scientist spends most of his working day observing and manipulating lines, circles, polygons and their intersections? A Robert Burns Woodward who could fill up a whole blackboard with stunningly beautiful colored handrawn structures and make this chemical canvas the primary focus of his three-hour talk can exist only in chemistry.

While contemplating these elegant structures, our original question arises again: is the beauty in these drawings the truth? What is astonishing in this case is that almost all the structures that chemists draw are purely convenient fictions! Consider the quintessential prototype aromatic hydrocarbon, benzene, drawn with its alternating double bonds. In reality there are no double bonds, not even dotted lines representing partial double bonds. All that exists is a fuzzy dance of electrons and nuclei which cannot be imagined, let alone drawn on paper. The same goes for every other molecule that we draw on paper in which one-dimensional geometric representations completely fail to live up to the task of corresponding to real entities. Like almost everything else in chemistry, these are models. And yet, think about how stupendously useful these models are. They have made their way into the textbooks of every budding student of chemistry and constitute the principal tools whereby chemists around the world turn the chaff of raw materials like hydrocarbons from crude oil into the gold of immensely useful products like pharmaceuticals, plastics and catalysts. The great physicist Eugene Wigner once wrote an influential article titled “The Unreasonable Effectiveness of Mathematics in the Natural Sciences”. Wigner was expressing awe at the uncanny correspondence between artificial squiggles of mathematical symbols on paper and the real fundamental building blocks of the natural world like elementary particles. Chemists need to express similar awe at the correspondence between their arrow pushing, molecular chairs and boats and the manifestation of these manipulations as the real solids, liquids and gases in their beakers. One kind of arrow pushing leads to the creation of a cancer drug, another kind leads to a better catalyst for petroleum refining. In this instance, the beauty of molecular structures quite spectacularly corresponds to the truth.

Finally, are their cases where chemists have to sacrifice truth for beauty just like Weyl did? Unlike mathematics and physics where equations can be unusually powerful in explaining the world, such a sacrifice would probably be far more wasteful and risky in the messy world of chemistry. In his Cope Lecture, Woodward said it best when he acknowledged the special challenge of chemistry compared to mathematics:

“While in mathematics, presumably one's imagination may run riot without limit, in chemistry, one's ideas, however beautiful, logical, elegant, imaginative they may be in their own right, are simply without value unless they are actually applicable to the one physical environment we have- in short, they are only good if they work! I personally very much enjoy the very special challenge which this physical restraint on fantasy presents."

The “physical restraint on fantasy” that Woodward talked about keeps every chemist from being seduced by beauty at the expense of truth. Beauty still reigns and is a guiding force for the chemist whenever he or she plans a synthesis, solves an x-ray structure, computes a molecular property or mixes together two simple chemicals with the expectation that they will form a wondrous, intricate lattice. But unlike Keats, the chemist knows that truth can be beauty but beauty may not be truth. As Woodward quipped, “In chemistry, ideas have to answer to reality”. And reality tends to define beauty in its own terms.