Field of Science

Drug Discovery, Models and Computers: A (necessarily incomplete) Personal Take

Drugs and rational drug discovery

Natural substances have been used to treat mankind’s diseases and ills since the dawn of humanity. The Middle Ages saw the use of exotic substances like sulfur and mercury to attempt to cure afflictions; most of these efforts resulted in detrimental side effects or death because of lack of knowledge of drug action. Quinine was isolated from the bark of the Cinchona tree and used for centuries to treat malaria. Salicylic acid was isolated from the Willow tree and was used for hundreds of years to treat fevers, knowledge that led to the discovery of Aspirin. The history of medicine has seen the use of substances ranging from arsenic to morphine, some of which are now known to be highly toxic or addictive.

The use of these substances reflected the state of medical knowledge of the times, when accidentally generated empirical data was the most valuable asset in the treatment of disease. Ancient physicians from Galen to Sushruta made major advances in our understanding of the human body and of medical therapies, but almost all of their knowledge was derived through patient and meticulously documented trial and error. A lack of knowledge of the scientific basis of disease meant that there were few systematic rational means of discovering new medicines, and serendipity and the traditional folk wisdom passed on through the centuries played the most important role in warding off disease.

This state of affairs continued till the 19th and 20th centuries when twin revolutions in biology and chemistry made it possible to discover drugs in a more logical manner. Organic chemistry formally began in 1848 when Friedrich Wöhler found that he could synthesize urea from simple inorganic substances like ammonium cyanate, thus dispelling the belief that organic substances could only be synthesized by living organisms (1). The further development of organic chemistry was orchestrated by the formulation of the structural theory in the late 19th century by Kekulé, Cooper, Kolbe, Perkin and others (1). This framework made it possible to start to elucidate the precise arrangement of atoms in biologically active compounds. Knowledge of this arrangement in turn led to routes for synthesis of these molecules. These investigations also provided impetus to the synthesis of non-natural molecules of practical interest, sparking off the field of synthetic organic chemistry. However, while the power of synthetic organic chemistry later provided several novel drugs, the legacy of natural products is still prominent, and about half of the drugs currently on the market are either natural products or derived from natural products (2).

Success in the application of chemistry to medicine was exemplified in the early 20th century by tentative investigations of what we currently call structure-activity relationships (SAR). Salvarsan, an arsenic compound used for treating syphilis, was perhaps the first example of a biologically active substance that had been improved by systematic investigation and modification. As the same time, chemists like Emil Fischer were instrumental in synthesizing further naturally occurring substances like carbohydrates and proteins, thus extending the scope of organic synthesis into biochemistry.

The revolution in structure determination initiated by physicists led to vastly improved synthesis and studies of bioactive substances. At this point, rational drug discovery began to take shape. Chemists working in tandem with biologists made hundreds of substances which were tested for their efficacy against various diseases. Knowledge from biological testing was in turn translated into modifications of the starting compounds. The first successful example of such rational efforts was the synthesis of sulfa drugs used to treat infections in the 1930s (3). These compounds were the first effective antibiotics and were followed by the famous discovery, but this time serendipitous, of penicillin by Alexander Fleming in 1928 (4).

Rational drug discovery received a substantial impetus because of the post-World War 2 breakthroughs of structure determination by x-ray crystallography that revealed the structures of small molecules, proteins and DNA. The discovery of the structure of DNA in 1953 by Watson and Crick heralded the advent of molecular biology (5). This landmark event led in succession to the elucidation of the genetic code and the transfer of genetic information from DNA to RNA that results in protein synthesis. The first structure determination of a protein- hemoglobin by Perutz (6)- was followed by the structure determination of several other proteins, some of which were pharmacologically important. Such advances and preceding ones by Pauling and others (7) led to the elucidation of common motifs in proteins such as alpha helices and beta sheets. The simultaneous growth of techniques in biological assaying and enzyme kinetics made it possible to monitor the binding of drugs to biomolecules. At the same time, better application of statistics and the standardization of double blind, controlled clinical trials caused a fundamental change in the testing and approval of new medicines. A particularly noteworthy example of one of the first drugs discovered through rational investigations is cimetidine (8), a drug for acid reflux that was for several years the best-selling drug in the world.

Structure-based drug design and CADD

As x-ray structures of protein-ligand complexes began to emerge in the 70s and 80s, rational drug discovery received enormous benefits. The development was also accompanied by High-Throughput Screening, an ability to screen thousands of ligands against a protein target to identify likely binders. These studies led to what today is known as “structure-based drug design” (SBDD) (9). In SBDD, the structure of a protein bound to a ligand is used as a starting point for further modification and improvement of properties of the drug. While care has to taken in order to fit the structure well to the electron density in the data (10), well-resolved data can greatly help in identifying points of contact between the drug and the protein active site as well as the presence of special chemical moieties such as metals and cofactors. Water molecules identified in the active site can play crucial roles in bridging interactions between the protein and ligand (11). Early examples of classes of drugs discovered using structure-based design include Captopril (12) (angiotensin-converting enzyme inhibitor- hypertension) and Trusopt13 (carbonic anhydrase inhibitor- glaucoma) and recent examples include Aliskiren (14) (renin inhibitor- hypertension) and HIV protease inhibitors (13).

As SBDD progressed, another approach called ligand-based design (LBD) has also recently emerged. Obtaining x-ray structures of drugs bound to proteins is still a tricky endeavor, and one is often forced to proceed on the basis of the structure of an active compound alone. Techniques developed to tackle this problem involve QSAR (Quantitative Structure-Activity Relationships) (15) and pharmacophore construction in which the features essential for a particular ligand to bind to a certain protein are conjectured from affinity data for several similar and dissimilar molecules. Molecules based on the minimal set of interacting features are then synthesized and tested. However, since molecules can frequently adopt diverse conformations when binding to a protein, care has to be exercised in developing such hypotheses. In addition, it is relatively easy to be led astray by a high correlation between affinity data in the training set. It is paramount in such cases to remember the general discrepancy between correlation and causation, and overfitting of models can lead to both spurious correlations and absence of causation (16). While LBD is more recent than SBDD, it has turned out to be valuable in certain cases. Noteworthy is a recent example where an inhibitor of NAADP was discovered by shape-based virtual screening (17) (vida infra)

As rational drug discovery progressed, software and hardware capacities of computers also grew exponentially, and CADD (Computer-Aided Drug Design) began to be increasingly applied to drug discovery. An effort was made to integrate CADD in the traditional chemistry and biology workflow and its principal development took place in the pharmaceutical industries, although academic groups were also instrumental in developing some capabilities (18). The declining costs of memory and storage, increasing processing power and facile computer graphics software put CADD within the grasp of relatively untrained computational chemists or experimental scientists. While the general verdict on the contribution of CADD to drug discovery is still forthcoming, many drugs currently on the market now include CADD as an important component of their discovery and development (19). Many calculations that once were impractical because of constraints of time and computing power can now be routinely performed, some on a common desktop. Currently the use of CADD in drug design aims to address three principal problems, all of which are valuable to drug discovery.

Virtual Screening

Virtual screening (VS) is defined by the ability to test thousands or millions of potential ligands against a protein, distinguish the actives from inactives and rank the ‘true’ binders in a certain top fraction. If validated, VS would serve as a valuable complement, if not substitute, for HTS and would save significant amounts of resources and time in HTS. Just like HTS, VS has to circumvent the problem of false positives and false negatives, the latter of which in some ways are more valuable since by definition they would not be identified. VS can be either structure-based or ligand-based. Both approaches have enjoyed partial success although recent studies have validated 3D ligand-based techniques in which ligand structures are compared to known active ligands by means of certain metrics as having a greater hit rate than structure-based techniques (20). Virtual libraries of molecules such as DUD (21) (Directory of Useful Decoys) and ZINC (22) have been built to test the performance of several VS programs and compare them with each other. These libraries typically consist of a few actives and several thousand decoys, with the goal being to rank the true actives above the true decoys using some metric.

Paramount in such retrospective assessment is an accurate method for evaluating the success and failure of these methods (23,24). Until now ‘enrichment factors’ have mostly been used for this purpose (24). The EF refers to the number of ‘true’ actives that rank in a certain top fraction (typically 1% or 10%) as a function of the screened database. However the EF suffers from certain drawbacks, such as being dependent on the number of decoys in the dataset. To circumvent this problem, recent studies have suggested the use of the ROC (Receiver Operator Characteristic) curve, a graph that plots false positives vs. true positives (24,25) (Figure 1). The curve indicates what the false positive rate is for a given true positive rate and the measured variable is the Area Under the Curve (AUC). A completely random performance gives a straight line (AUC 0.5), while better performance results in a hyperbolic curve (AUC > 0.5).

Image Hosted by

Figure 1: ROC curve for three different VS scenarios. Completely random performance will give the straight white line (AUC 0.5), an ideal performance (no false positives and all true positives) will give the red line (AUC 1.0) and a good VS algorithm will produce the yellow curve (0.5 < AUC < 1.0)

Until now VS has provided limited evidence of success. Yet its capabilities are being improved and it has become a part of the computational chemist’s standard repertoire. In some cases VS can provide more hits compared to HTS (26) and in others, VS at the very least provides a method to narrow down the number of compounds actually assayed (27). As advances in general SBDD and LBD continue, the power of VS to identify true actives will undoubtedly increase.


The second goal sought by computational chemists is to predict the binding orientation of a ligand in the binding pocket of a protein, a task that falls within the domain of SBDD. This endeavor if successful will provide an enormous benefit in cases where crystal structures of protein-ligand complexes are not easily obtained. Since such cases are still very common, pose-prediction continues to be both a challenge as well as a valuable objective. There are two principal problems in pose prediction. The first one relates to the scoring of the poses obtained in order to identify the top-scoring pose as the ‘real’ pose; current docking programs are notorious for their scoring unreliability, certainly in an absolute sense and sometimes even in a relative sense. The problem of pose prediction ultimately is defined by the ability of an algorithm to find the global minimum orientation and conformation of a ligand on the potential energy surface (PES) generated by the protein active site (28). As such it is susceptible to the common inadequacies inherent in comprehensively sampling a complex PES. Frequently however, as in the case of CDK7, past empirical data including knowledge of poses of known actives (roscovitine in this case) provides confidence about the pose of the unknown ligand.

Another serious problem in pose prediction is the inability of many current algorithms to adequately sample protein motion. X-ray structures provide only a static snapshot of ligand binding that may obscure considerable conformational changes in protein motifs. Molecular dynamics simulations followed by docking (‘ensemble docking’) have remedied this limitation to some extent (29), induced-fit docking algorithms have now been included in programs such as GLIDE30, and complementary information from dynamical NMR studies may help judicious selection between several protein poses. Yet simulating large-scale protein motions are still outside the domain of most MD simulations, although significant progress has been made in recent years (31,32).

An example of how pose prediction can shed light on anomalous binding modes and possibly save the allocation of time and financial resources was experienced by the present author during his study of a paper detailing the development of inhibitors of the p38 MAP kinase (33). In one instance the authors followed the SAR data in the absence of a crystal structure and observed contradictory changes in activity influenced by structural modifications. Crystallography on the protein ligand complex finally revealed an anomalous conformation of the ligand in which the oxygen of an amide at the 2 position of a thiophene was cis to the thiophene sulfur, when chemical intuition would have expected it to be trans. The crystal structure showed that an unfavorable interaction of a negatively charged glutamate with the sulfur in the more common trans conformation forced the sulfur to adopt the slightly unfavorable cis position with respect to the amide oxygen. Surprisingly this preference was seen in all top 5 GLIDE poses of the docked compound. This example indicates that at least in some cases pose prediction could serve as a valuable timesaving complement and possible alternative to crystallography.

Binding affinity prediction

The third goal is possibly the most challenging endeavor for computational chemistry. Rank-ordering ligands in terms of their binding affinity involves accurate scoring, which as noted above is a recalcitrant problem. The problem is a fundamental one since it really involves calculating absolute free energies of protein ligand binding. The most accurate and sophisticated approaches for calculating these energies are the Free-Energy Perturbation (FEP) (34) or Thermodynamic Integration (TI) methods based on MD simulations and statistical thermodynamics. The methods involve ‘mutating’ one ligand to another in hundreds of thousands of infinitesimal steps and evaluating the binding enthalpy and entropy at every step. As of now, these techniques are some of the most computationally expensive techniques in the field. This problem typically limits their use only to evaluating free energy changes between ligand that differ little in structure. Therefore successful examples where they have found their greatest use involve cases where small substituents on aromatic rings are modified to evaluate changes in binding affinity (35). However as computing power grows, these techniques will continue to find more applications in drug discovery.

Apart from these three goals, a major goal of computational science in drug discovery is to aid the later stages of drug development when pharmacokinetics (PK) and ADMET (Absorption Distribution Metabolism Excretion Toxicity) issues are key. Optimizing the binding affinity of a particular compound to a protein only results in an efficient ligand and not necessarily an efficient drug. Computational chemistry can make valuable contributions to these later developmental stages by trying to predict the relevant properties of ligands in the early stages, thus limiting the typically high attrition of drugs in the advanced phases. While much remains to be accomplished in this context, some progress has been made (36). For example, the well-known Lipinski Rule of Five (37) provides a set of physicochemical properties necessary for drugs to have good bioavailability and computational approaches are starting to help evaluate these properties during early stages. The QikProp program developed by Jorgensen et al. calculates properties like Caco-2 cell permeability, possible metabolites, % absorption in the GI tract and logP values (38). Such programs are still largely empirical, depending on a large dataset of properties of known drugs for comparison and fitting.

Models, computers and drug discovery

In applying models to designing drugs and simulating their interactions with proteins, the most valuable lesson to remember is that these are models that are generated by computers. Models seldom mirror reality; in fact they often may succeed in spite of reality. Models are not usually designed to simulate reality but they are designed to produce results that agree with experiment. There are many approaches that produce such results. These approaches may not always encompass factors operating in real environments. In QSAR for instance, it has been shown that adding enough number of parameters to your model can lead to a good fit to the data with a high correlation coefficient. However the model may be overfitted; that is, it may seem to fit the known data very well but may fail to predict the unknown data, which is what it was designed to do (16,39). In such cases, using more advanced statistical methods and using ‘bootstrapping’ (leaving out a part of the data and looking at the resulting fit to investigate whether that part of data is predicted) can lead to improvement in results (39).

Models can also be used in spite of outliers. A high correlation coefficient of 0.85 that leads to acceptance of a model may nonetheless lead to one or two outliers. It then becomes important to be aware of the physical anomaly which the outliers represent. The reason for this is clear. If the variable producing the outlier does not constitute a part of the model building, then applying the well-trained model to a system where that particular variable suddenly becomes dominant will result in a failure of the model. Such outliers, termed ‘black swans’, can prove extremely deleterious if their value is unusually high (40). This phenomenon is known to operate in the field of financial engineering (40). In modeling for instance, if the training set for a docking model consists of largely lipophilic protein active sites, then the model may fail to deliver cogent results if applied to a set of ligands binding to a protein that has an anomalously polar or charged active site. If the value of this protein is unusually high for a particular pharmaceutical project, an inability to predict its behavior under unforeseen circumstances may lead to valuable losses. Clearly in this case the physical variable, namely the polarity of the active site, was not taken into account in spite of the fact that the model delivered a high initial correlation merely because of the addition of a large number of parameters or descriptors, none of which was related in a significant way to the polarity of the binding pocket. The difference between correlation and causation is especially relevant in this respect. This hypothetical example illustrates one of the limitations of models iterated above; that they may not bear relationship to actual physical phenomena and may yet fit the data well enough because of various reasons to elicit confidence in their predictive ability.

In summary, models of the kind that are used in computational chemistry have to be carefully evaluated, especially in the context of practical applications like drug discovery where time and financial resources are valuable. Training the model on high-quality datasets, reiterating the difference between correlation and causation and better application of statistics and bootstrapping can help to avert model failure.

In the end however, it is experiment that is of paramount importance for building the model. Inaccurate experimental data with uncertain error margins will undoubtedly hinder the success of every subsequent step in model building. To this end, generating, presenting and evaluating accurate experimental data is a responsibility that needs to be fulfilled by both computational chemists and experimentalists, and it is only a fruitful and synergistic alliance between the two groups that can help overcome the complex challenges in drug discovery.


(1) Berson, J. A. Chemical creativity : ideas from the work of Woodward, Hückel, Meerwein and others; 1st ed.; Wiley-VCH: Weinheim ; Chichester, 1999.
(2) Paterson, I.; Anderson, E. A. Science 2005, 310, 451-3.
(3) Hager, T. The demon under the microscope : from battlefield hospitals to Nazi labs, one doctor's heroic search for the world's first miracle drug; 1st ed.; Harmony Books: New York, 2006.
(4) Macfarlane, G. Alexander Fleming, the man and the myth; Oxford University Press: Oxford [Oxfordshire] ; New York, 1985.
(5) Judson, H. F. The eighth day of creation : makers of the revolution in biology; Expanded ed.; CSHL Press: Plainview, N.Y., 1996.
(6) Ferry, G. Max Perutz and the secret of life; Cold Spring Harbor Laboratory Press: New York, 2008.
(7) Hager, T. Linus Pauling and the chemistry of life; Oxford University Press: New York, 1998.
(8) Black, J. Annu Rev Pharmacol Toxicol 1996, 36, 1-33.
(9) Jhoti, H.; Leach, A. R. Structure-based drug discovery; Springer: Dordrecht, 2007.
(10) Davis, A. M.; Teague, S. J.; Kleywegt, G. J. Angew. Chem. Int. Ed. Engl. 2003, 42, 2718-36.
(11) Ball, P. Chem. Rev. 2008, 108, 74-108.
(12) Smith, C. G.; Vane, J. R. FASEB J. 2003, 17, 788-9.
(13) Kubinyi, H. J. Recept. Signal Transduct. Res. 1999, 19, 15-39.
(14) Wood, J. M.; Maibaum, J.; Rahuel, J.; Grutter, M. G.; Cohen, N. C.; Rasetti, V.; Ruger, H.; Goschke, R.; Stutz, S.; Fuhrer, W.; Schilling, W.; Rigollier, P.; Yamaguchi, Y.; Cumin, F.; Baum, H. P.; Schnell, C. R.; Herold, P.; Mah, R.; Jensen, C.; O'Brien, E.; Stanton, A.; Bedigian, M. P. Biochem Biophys Res Commun 2003, 308, 698-705.
(15) Hansch, C.; Leo, A.; Hoekman, D. H. Exploring QSAR; American Chemical Society: Washington, DC, 1995.
(16) Doweyko, A. M. J. Comput. Aided Mol. Des. 2008, 22, 81-9.
(17) Naylor, E.; Arredouani, A.; Vasudevan, S. R.; Lewis, A. M.; Parkesh, R.; Mizote, A.; Rosen, D.; Thomas, J. M.; Izumi, M.; Ganesan, A.; Galione, A.; Churchill, G. C. Nat. Chem. Biol. 2009, 5, 220-6.
(18) Snyder, J. P. Med. Res. Rev. 1991, 11, 641-62.
(19) Jorgensen, W. L. Science 2004, 303, 1813-8.
(20) McGaughey, G. B.; Sheridan, R. P.; Bayly, C. I.; Culberson, J. C.; Kreatsoulas, C.; Lindsley, S.; Maiorov, V.; Truchon, J. F.; Cornell, W. D. J. Chem. Inf. Model. 2007, 47, 1504-19.
(21) Huang, N.; Shoichet, B. K.; Irwin, J. J. J. Med. Chem. 2006, 49, 6789-801.
(22) Irwin, J. J.; Shoichet, B. K. J. Chem. Inf. Model. 2005, 45, 177-82.
(23) Jain, A. N.; Nicholls, A. J. Comput. Aided Mol. Des. 2008, 22, 133-9.
(24) Hawkins, P. C.; Warren, G. L.; Skillman, A. G.; Nicholls, A. J. Comput. Aided Mol. Des. 2008, 22, 179-90.
(25) Triballeau, N.; Acher, F.; Brabet, I.; Pin, J. P.; Bertrand, H. O. J. Med. Chem. 2005, 48, 2534-47.
(26) Babaoglu, K.; Simeonov, A.; Irwin, J. J.; Nelson, M. E.; Feng, B.; Thomas, C. J.; Cancian, L.; Costi, M. P.; Maltby, D. A.; Jadhav, A.; Inglese, J.; Austin, C. P.; Shoichet, B. K. J. Med. Chem. 2008, 51, 2502-11.
(27) Peach, M. L.; Tan, N.; Choyke, S. J.; Giubellino, A.; Athauda, G.; Burke, T. R.; Nicklaus, M. C.; Bottaro, D. P. J. Med. Chem. 2009.
(28) Jain, A. N. J. Comput. Aided Mol. Des. 2008, 22, 201-12.
(29) Rao, S.; Sanschagrin, P. C.; Greenwood, J. R.; Repasky, M. P.; Sherman, W.; Farid, R. J. Comput. Aided Mol. Des. 2008, 22, 621-7.
(30) Sherman, W.; Day, T.; Jacobson, M. P.; Friesner, R. A.; Farid, R. J. Med. Chem. 2006, 49, 534-53.
(31) Shan, Y.; Seeliger, M. A.; Eastwood, M. P.; Frank, F.; Xu, H.; Jensen, M. O.; Dror, R. O.; Kuriyan, J.; Shaw, D. E. PNAS 2009, 106, 139-44.
(32) Jensen, M. O.; Dror, R. O.; Xu, H.; Borhani, D. W.; Arkin, I. T.; Eastwood, M. P.; Shaw, D. E. PNAS 2008, 105, 14430-5.
(33) Goldberg, D. R.; Hao, M. H.; Qian, K. C.; Swinamer, A. D.; Gao, D. A.; Xiong, Z.; Sarko, C.; Berry, A.; Lord, J.; Magolda, R. L.; Fadra, T.; Kroe, R. R.; Kukulka, A.; Madwed, J. B.; Martin, L.; Pargellis, C.; Skow, D.; Song, J. J.; Tan, Z.; Torcellini, C. A.; Zimmitti, C. S.; Yee, N. K.; Moss, N. J. Med. Chem. 2007, 50, 4016-26.
(34) Jorgensen, W. L.; Thomas, L. L. J. Chem. Theor. Comp. 2008, 4, 869-876.
(35) Zeevaart, J. G.; Wang, L. G.; Thakur, V. V.; Leung, C. S.; Tirado-Rives, J.; Bailey, C. M.; Domaoal, R. A.; Anderson, K. S.; Jorgensen, W. L. J. Am. Chem. Soc. 2008, 130, 9492-9499.
(36) Martin, Y. C. J. Med. Chem. 2005, 48, 3164-70.
(37) Lipinski, C. A.; Lombardo, F.; Dominy, B. W.; Feeney, P. J. Adv. Drug. Del. Rev. 1997, 23, 3-25.
(38) Ioakimidis, L.; Thoukydidis, L.; Mirza, A.; Naeem, S.; Reynisson, J. Qsar & Comb. Sci. 2008, 27, 445-456.
(39) Hawkins, D. M. J. Chem. Inf. Comput. Sci. 2004, 44, 1-12.
(40) Taleb, N. The black swan : the impact of the highly improbable; 1st ed.; Random House: New York, 2007.

Demolishing the 'vacuous' argument for the RRW

The Reliable Replacement Warhead program (RRW) is a long-standing and controversial proposal that aims to replace aging plutonium pits and other parts in nukes to modernize the current US nuclear arsenal. Proponents of the RRW say that having nuclear weapons of dubious function and quality will defeat the basic purpose of a deterrent and therefore modernizing the arsenal and replacing worn out parts is essential for its very existence. In addition, Russia has indicated that it is modernizing its nuclear arsenal and this fact has put further pressure on implementation of the RRW. Opponents of the RRW say that by modernizing the arsenal the US will send out the wrong signals to the rest of the world, indicating that nuclear deterrence and weapons development is still an important part of US defense strategy. In any case, very recently the top government advisory group JASON which among others included Freeman Dyson) did a study on the central plutonium 'pits' in the nuclear weapons and concluded that these would last for at least half a decade if not more. Since then several arguments have continued to float around for the RRW.

In the latest issue of The Bulletin, Jeffrey Lewis and Kingston Reif do a neat and clean job in demolishing the latest argument made by a General Chilton who, of all possible reasons, bases his argument on vacuum tubes, the point being that outdated vacuum tubes in nukes necessitate replacement. The last line is priceless and is not exactly BAS-like
Firstly, vacuum tubes are not used in the physics package of a single nuclear weapon design. Vacuum tubes are used only in the radar-fuse, which tells the firing system when the bomb is at the correct altitude for detonation, in some modifications (mods) of one warhead design, the B61 gravity bomb. In total, the B61 bombs that have vacuum tubes in their radar-fuses account for only about one in ten operationally deployed warheads. (Vacuum tubes are used in the radars of three B61 mods: 3, 4, and 7. Mods 10 and 11 have newer radars that use solid-state electronics.) The fuses in these weapons are old, but perfectly functional. To reiterate, vacuum tubes are not in use in any other warhead design, including the W76 warhead, a portion of which would be replaced by the first RRW warhead, the WR1, if it ever were funded and developed.

Secondly, the Energy Department has routinely replaced radars without nuclear testing or redesigning the physics package. In fact, during the 1990s, Sandia National Laboratories scientists developed the MC4033 common radar, which uses solid-state electronics, for planned refurbishments of the B61 and B83 gravity bombs. All B83 bombs now use the common radar, though similar plans to fit a new radar on all B61s have been repeatedly deferred.

Most recently, in 2006, Sandia planned to replace the remaining B61 vacuum tube radars as part of ALT 364/365/366. The National Nuclear Security Administration, which overseas the nuclear weapons complex, canceled these latest ALTs, which would have resulted in the removal of the last vacuum tubes from the U.S. nuclear stockpile, because the U.S. Air Force preferred replacement to life extension. Due to this absurd twist, one could say that vacuum tubes remain in the U.S. nuclear arsenal in part because of the RRW, contrary to Chilton's insistence that the RRW is needed to get rid of them.

The bottom line is that vacuum tubes are used only sparingly in the U.S. nuclear arsenal and can be replaced on short notice if the need arises, independent of whether Congress funds the RRW Program. Of the many reasons that Defense and Energy officials have put forth to justify the RRW Program, the need to replace vacuum tubes is the worst and has no place in the debate about the RRW or modernizing the nuclear stockpile. General Chilton can stick that prop in his, um, pocket.

FDA does, and should, stick to only science

Image Hosted by

In a welcome reversal of a key politics-driven Bush era mandate, the FDA has approved the Plan B morning-after emergency contraception pill for 17-year olds. Previously the reluctance of FDA to approve the product had led a senior official to rightly resign. Not surprisingly, this decision drew wrath from conservative groups who say that the pill would "encourage promiscuity". This statement is rather typical of conservative statements opposing abortion and promotion of contraceptive measures in school, in spite of the fact that abstinence-only programs have been shown to essentially cause no change or even an increase in "promiscuity".

But here's the thing, and it should be clear all along; the FDA should stick to science and nothing else. Just as the conservative FDA officials during the Bush era were utterly out of line opposing Plan B because of political and religious interests, so should liberals also not applaud the FDA decision as a moral value judgement. The business of the FDA is to determine the efficacy and safety of medical products, period. The moment it starts to pontificate on the moral or political value of its decision its immediately sets itself on a slippery slope.

So just like the NAS and the NCSE should stick to demonstrating the evidence for evolution and lack of evidence for ID/creationism and not pass judgement on whether science and religion are compatible, so should the FDA stick to the science behind the approval of medical products. Not making political or religious statements, either conservative or liberal, would be in the safe and best interests of both the FDA and society.

Timeless Classics

As someone who loves to read more than anything else, I have long also been addicted to classic textbooks. Some of the most memorable moments of my student life involved walking into ghostly libraries looking like medieval castles and dusting off inches of dust collected on tomes which I regarded as treasures, volumes of great works that had not been checked out in 25 years and were languishing in anonymity, begging to be touched and read.

Sadly very few seem to bother about these anymore and regard them as outdated. I can bet that no modern undergraduate that I can meet has browsed Pauling's classic "The Nature of the Chemical Bond", a book that is regarded by many as one of the most influential works in chemistry of all time. In my opinion these students confuse outdated with poorly written. But many of the basics of chemistry don't change, and many of these old works provide crystal clear treatments of basics that are lacking in more modern books. As far as fundamentals go these books have stood the test of time and several are still in print, although some regrettably are not. A comment by Srini about Morrison & Boyd brought back fond memories of favourite classics...

Morrison & Boyd: I have already mentioned it before. Crystal clear treatments of mechanism with an especially outstanding chapter on electrophilic aromatic substitution. For organic chemistry, I have to admit that the new book by Clayden et al. is probably the single-best book on the subject I have seen, but the elegance of explanation in M & B is still hard to beat. I also remember the book by Roberts and Caserio also being pretty good. For what it's worth, the book which compresses the most number of insights in the fewest number of words is a slim volume by Peter Sykes whose clarity in explaining mechanistic concepts in short, crisp paragraphs is unprecedented.

The Great Linus Pauling: I first saw "The Nature of the Chemical Bond" as a freshman. While I then perceived it as boring and too detailed, it was only later that I recognized its monumental significance. Many famous scientists including Max Perutz and Francis Crick have learnt chemistry from it and Pauling's other book. The number of ground-breaking concepts that Pauling invented and put into this book is staggering. Especially check out the chapters on hybridization and partial ionic character of bonds. I have browsed all three editions, and the second one is probably the best-written, although the third edition is the most up-to-date and still in print. A measure of the book's significance in the history of science can be gained from the simple fact that after publication of the first edition the volume was cited no less than 16,000 times in the next 10 years. One constantly keeps on finding new papers in journals like Science and Nature that still cite it. Pauling's "The Nature..." did for modern chemistry what the Principia did for natural philosophy; it infused its subject with logic and tied together disparate threads to formulate a comprehensive and lasting science.

As if one work were not enough, Pauling also authored "General Chemistry". Again, it's a model of simplicity and clarity (note for instance how he explains the source of the difference in the three pKa values of phosphoric acid) although its emphasis on more descriptive chemistry makes it look a little quaint. The text is still widely read and in print as a Dover reprint edition; I have a copy on my shelf for a while now and recently saw one in the Barnes & Noble@GeorgiaTech.

Finally, "Introduction to Quantum Mechanics with Applications to Chemistry" co-authored with E Bright Wilson at Harvard was the first book to explain quantum mechanics to chemists. I will admit the book is not easy to read, but with effort one can find many gems in the first few chapters, especially the treatment of the hydrogen atom. Again, a Dover reprint is available and cheap.

With these three books, the prodigious Pauling secured his place in history not only as the greatest chemist of all time, but one of the most successful and greatest scientific writers of the century.

Glasstone: Samuel Glasstone was a remarkably prolific and versatile technical writer. It was in high-school that I came across a compendium of nuclear science that he had written, "Sourcebook on Atomic Energy". It is hands down the single-best example of technical science writing that I have come across, and wherever I have been since then, I have always had a copy on my bookshelf. The book is a model for comprehensive, all-inclusive writing that is clear as water from a virgin glacier. It also satisfies the difficult condition of being extremely valuable for both laymen and scientists. No praise for the book matches the praise that then AEC chairman and Nobel Laureate Glenn Seaborg penned in the foreword. Seaborg quipped that this was "technical writing at its very best" and that Glasstone was a man who had fortuitously come along at the right time to fulfill the needs of science and technology.

The range of Glasstone's writing is amazing; multi-volume works on physical chemistry and treatments of thermodynamics, electrochemistry, nuclear reactor engineering and even a sourcebook on space sciences. An exhaustively detailed and yet comprehensible book on the effects of nuclear weapons served as a standard declassified guide for years and is still in print. Although his PChem books are now really out-of-date, they still educated me in the basics when I first found them, and I learnt a lot from his book on thermodynamics. But again, nothing beats "Sourcebook on Atomic Energy" a book against which I think every other technical work should be measured.

"Valence" by Charles Coulson: Very few people in history have had the capacity to be both fine scientists and excellent writers. Charles's Coulson's book was the first book, even before Pauling and Wilson, to make quantum chemistry comprehensible to students. When it comes to pedagogical explanation it's hard to beat the British, and this is the finest example of that. I was fortunate to secure a copy at the famous Powell's bookstore in Portland, OR.

Classic books are like old wine. They should be cherished, preserved, and sampled one concept at a time.

The Pope of Cosmology 'very ill'

For a 67 year old man with ALS who has already defied medical science, this is not good news at all. Remember what happened to Christopher Reeve. When you are in a condition like this, even otherwise normal ailments may become life-threatening.

I have been recently reading a lot about Hawking in Leonard Susskind's splendid book "The Black Hole War: My Battle with Stephen Hawking to Make the World Safe for Quantum Mechanics". The rather grandiose title of the book obscures a perfectly entertaining and informative romp through the world of black holes; this is about as close as possible to black hole thermodynamics, string theory and quantum mechanics that we laymen can get without being drowned in a whirlpool of math. Susskind who is a professor at Stanford tells the story of the paradox of information falling into a black hole and supposedly disappearing with lots of verve, hilarious personal anecdotes and tributes to famous physicists. Being a prime participant in the debate with Hawking on the other side, he is in a unique position to tell the story. His recounting of the way the physicist Jacob Bekenstein used high-school math to derive the formula for the entropy of black holes is astounding; very rarely has someone used such simple physics and mathematics to discover such profound relationships and the act reminded me of Bell's Theorem, another spectacular twentieth-century physics result that can essentially be derived using high-school mathematics.

But more than anyone else, it is Hawking's figure that looms large in the book. Susskind describes how his physical disability, his strange disembodied computer voice and his astonishingly brilliant and creative mind guarantees the kind of reverence and silence wherever he appears that otherwise only seems to be reserved for the Pope. Susskind vividly describes a typical Q & A session after a Hawking lecture; Hawking's physical condition means that he can compose even a "yes/no" answer only after several minutes, and what's striking is that during such times Susskind has witnessed audiences of thousands maintain stand-still silence with not a whisper spoken for sometimes fifteen minutes while the great man painfully communicates himself. Hawking may be the only living scientist whose presence provokes utter and rapt silence and attention that one would observe only during religious prayer. No wonder Hawking is compared to God by many, a comparison which only makes him uncomfortable. Susskind describes a particular time in a restaurant where a passerby went to his knees and virtually kissed Hawking's feet. Needless to say Hawking was embarrassed and galled.

In any case, we can only hope that Hawking feels better. However in one way we can rest assured; Stephen Hawking's name has been etched in the annals of science forever. That's the power of ideas. Their timelessness assures us that they remain youthful and vibrant, irrespective of the age and condition of their source. But let's all hope Hawking springs back from this illness to his mischievous, witty self.

Assessing the known and unknown unknowns: WYSI(N)WYG

Ken Dill and David Mobley from UCSF have a really nice review in Structure on computational modeling of protein-drug interactions and the problems inherent in the process. I would strongly recommend anyone interested in the challenges of calculating protein-drug binding to read the review, if not for anything else for the copious references provided. The holy grail of most such modeling is to accurately calculate the free energy of binding. For doing this we frequently start with a known structure of a protein-ligand complex. The main point that the authors emphasize is that when we are looking at a single protein-ligand complex, deduced either through crystallography or NMR, we are missing a lot of important things.

Perhaps the most important factor is entropy which is not at all obvious in a single structure. Typically both the protein and the ligand will populate several different conformations in solution. Both will have to pay complex entropic penalties to bind one another. The ligand strain energy (usually estimated at 2-3 kcal/mol for most ligands) also plays an important role. The desolvation cost for the ligand also can prominently figure. In addition both protein and ligand will have some residual entropy even in the bound state. As if this were not enough of a problem, much of the binding energy can come from the entropic gain that the release of water molecules from active sites engenders. Calculating all these entropies for protein, ligand and solvent is important for accurately calculating the free energy of protein-ligand binding. But there are few methods that can accomplish this complex task.

Among the methods reviewed in the article are most of the important methods used currently. Usually the tradeoff for each method is between cost and accuracy. Methods like docking are fast but inaccurate although they can work well on relatively rigid and well-parameterized systems. Docking also typically does not take protein motion and induced-fit effects into account. Slightly better methods are MM-PBSA or MM-GBSA which as the names indicate, combine docking poses with an implicit solvent model (PBSA or GBSA). Entropy and especially protein entropy is largely ignored, but since we are usually comparing similar ligands, such errors are expected to cancel. Going to more advanced techniques, relative free-energy calculations use molecular dynamics (MD) to try to map the detailed potential energy surfaces for both protein and ligand. Absolute free-energy perturbation calculations are perhaps the gold standard in calculating free energies but are hideously expensive. They work best for ligands that are simple.

There is clearly a long way to go before calculation of ∆Gs becomes a practical endeavor in the pharmaceutical industry. There are essentially two factors that contribute to the recalcitrance of the problem. The first factor as indicated is the sheer complexity of the problem; assessing the thermodynamic features of protein, ligand and solvent in multiple configurational and conformational states. The second problem is a problem inherent in nature; the sensitivity of the binding constant to the free energy. As iterated before, the all-holy relation ∆G = -RT ln K ensures that an error of even 1 kcal/mol in calculation will translate to a large error in the binding constant. The myriad complex factors noted above ensure that errors of 2-3 kcal/mol already constitute the limit of what the best methods can give us. Recall that an error of 3 kcal/mol means that you are dead and buried.

But we push on. One equal temper of heroic hearts. Made weak by time and fate, but strong in will. To strive, to seek, to find, and not to yield. At some point we will reach 1 kcal/mol. And then we will sail.

Mobley, D., & Dill, K. (2009). Binding of Small-Molecule Ligands to Proteins: “What You See” Is Not Always “What You Get” Structure, 17 (4), 489-498 DOI: 10.1016/j.str.2009.02.010

Reading C P Snow and The Two Cultures

Over at The Intersection blog which I often read and comment on, Chris Mooney (author of "The Republican War on Science") has initiated an informal reading of C. P. Snow's "The Two Cultures". Anyone who is interested is more than welcome to read the influential and very short lecture and blog or comment on it. The schedule is listed in the post. The recommended edition is the Canto edition, with a very readable introduction by Stefan Collini. Incidentally it was this version that I read many years ago. Time now for a re-reading.

I have encountered Snow in two other interesting books. The first one- "The Physicists: A Generation that Changed the World"- was authored by him and contains clear and abundant photographs as well as recollections and insights on some of the most famous physicists of the century whom he closely knew. In this for instance I read his generous assessment of Enrico Fermi that captures the supreme greatness of the man's talents and achievements
"If Fermi had been born twenty years earlier, it is possible to envisage him first discovering Rutherford's nucleus and then discovering Bohr's atom. If this sounds like hyperbole, anything about Fermi is likely to sound like hyperbole"
Snow also thought that Robert Oppenheimer's real tragedy was not his sidelining or victimization during the 1950s witch hunts but the fact that he would have thrown away all his fame, brilliance and glory if he had the privilege to make one timeless discovery like Pauli's exclusion principle.

Another book with Snow in it is a fascinating piece of "scientific fiction" written by John Casti. "The Cambridge Quintet: A Work Of Scientific Speculation" features four famous scientists and intellectuals- Ludwig Wittgenstein, Erwin Schrödinger, J B S Haldane and Alan Turing- being invited over to Snow's house for a multi-course dinner. As the dinner unfolds, so do the conversations between these stalwarts. The topic is artificial intelligence, and the participants hold forth in myriad and fascinating ways on the subject with excursions that not surprisingly take them into avenues like the philosophy of mind and language, epistemology and metaphysical questions. Very much worth reading.

In any case, I am looking forward to reading The Two Cultures again and writing about it. Anyone who is interested is more than welcome. The entire lecture is 50 pages and could be read in a few hours of thoughtful contemplation. The topic is as relevant today as it was then, which explains the lecture's enduring appeal. The consequences though could be vastly more pronounced.

The rest is all noise: errors in R values, and the greatness of Carl Friedrich Gauss reiterated

One of the questions seldom asked when building a model or assessing experimental data is "What's the error in that?". Unless we know the errors in the measurement of a variable, fitting predicted to experimental values may be a flawed endeavor. For instance when one expects a linear relationship between calculated and experimental values and does not see it, it could either mean that there is a flaw in the underlying expectation or calculation (commonly deduced) or that there is a problem with the errors in the measurements (not always discussed).

Unfortunately it's not easy to find out the distribution of errors in experimental values. The law of large numbers and central limit theorem often thwart us here; most of the times the values are not adequate enough to get a handle on the type of error. But in the absence of such concrete error estimation, nature has provided us with a wonderful measure to handle error; assume that the errors are normally distributed. The Gaussian or normal distribution of quantities in nature is an observation and assumption that is remarkably consistent and spectacularly universal in its application. You can apply it to people's heights, car accidents, length of nails, frequency of sex, number of photos emitted by a source and virtually any other variable. While the Gaussian distribution is not always followed (and strictly speaking it applies only when the central limit theorem holds), I personally regard it to be as much of a view into the "mind of God" as anything else.

In any case, it's thus important to calculate the distribution of errors in a dataset, Gaussian or otherwise. In biological assays where compounds are tested, this becomes especially key. An illustration of the importance in error estimation in these common assays is provided by this recent analysis of model performance by Philip Hajduk and Steve Muchmore's group at Abbott. Essentially what they do is to estimate the standard deviations or errors in a set of in-house measurements on compound activities and look at the effect of those errors on the resulting R values during comparison of calculated activities with these experimental ones. The R value or correlation coefficient is a time-tested and standard measure of fit between two datasets. The authors apply the error they have obtained in the form of "Gaussian noise" to a hypothetical set of calculated vs predicted activity plots with 4, 10 and 20 points. They find that after applying the error, the R-value itself adopts a Gaussian distribution that varies from 0.7 to 0.9 in case of the 20 point measurement. This immediately tells us that any such measurement in the real world that gives us, say, a R value of 0.95 is suspicious since the probability of such a value arising is very low (0.1%), given the errors in its distribution.

You know what should come next. The authors apply this methodology and look at several cases of calculated R values for various calculated vs measured biological activities in leading journals during 2006-2007. As they themselves say,
It is our opinion that the majority of R-values obtained from this (small) literature sample are unsubstantiated given the properties of the underlying data.
Following this analysis they then apply similar noise to measurements for High-Throughput Screening (HTS) and Lead Optimization (LO). Unlike HTS, LO usually deals with molecules sequentially synthesized by medicinal chemists that are separated by small changes in activity. To investigate the effect of such errors, enrichment factors (EFs) are calculated for both scenarios. The EF denotes the percentage of active molecules found or "enriched" in the top fraction of screened molecules relative to random screening, with a higher EF corresponding to better performance. The observation for HTS is that small errors give large EFs, but what is interesting is that even large errors in measurement can give modest enrichment, thus obscuring the presence of such error. For LO the dependence of enrichment on error is less, reflecting the relatively small changes in activity engendered by structure optimization.

The take home message from all this is of course that one needs to be aware of errors and needs to apply those errors in quantifying measures of model assessment. God is in the details and in this case his name is Carl Friedrich Gauss, who must be constantly beaming from his Hanover grave.

Brown, S., Muchmore, S., & Hajduk, P. (2009). Healthy skepticism: assessing realistic model performance Drug Discovery Today, 14 (7-8), 420-427 DOI: 10.1016/j.drudis.2009.01.012

Much ado about protein dynamics

Let me alert you, in case you haven't noticed, to the latest issue of Science which is a special issue on protein dynamics.

There is much of merit here, but this article is especially relevant to drug discovery. It talks about the interaction of small molecules and how it reshapes the energy landscape of protein conformational motion. One of the most useful ways of thinking about small molecule-protein interactions is to visualize a protein that fluctuates between several conformational states which are in equilibrium. A small molecule can inhibit the protein by preferentially stabilizing one of these states.

The article illustrates this concept with several examples, most notably inhibition of kinases. Many kinases exist in an inactive and active state and kinase inhibitors stabilize and block one of these states. Such conformational trapping can also reduce mobility of the protein. The article also describes how certain kinase inhibitors such as imatinib and nilotinib trap the kinase in the inactive state while others such as dasatinib trap it in an active state. Although all three of these are classified as ATP-competitive inhibitors, dasatinib blocks the Abl-Bcr kinase by effecting an allosteric movement of a particular loop. Allosteric inhibitors of kinases are of value since they won't target the highly conserved ATP-binding site, thus reducing problems with selectivity. But allosteric targeting is difficult since many times it involves targeting shallow, poorly defined sites including those involved in protein-protein interactions. HTS campaigns aimed at disrupting P-P interactions usually give very poor results. However, recent tools and especially NMR with labeled residues may improve the detection of weakly binding molecules that may be missed in assays (where the limit is usually 30 µM). HSQC spectra are generally taken of the protein, with and without the inhibitor, and changes in residue resonances can give an indication of conformational changes.

In any case, this article and the others are worth reading. Basically it seems that the remodeling of energy landscapes of proteins by either small molecules or other signals is a concept acquiring central traction. Such a concept could essentially tie together the dual problems of protein folding and inhibiting proteins with small molecules.

Lee, G., & Craik, C. (2009). Trapping Moving Targets with Small Molecules Science, 324 (5924), 213-215 DOI: 10.1126/science.1169378

New ligands for everyone's favorite protein

A landmark event in structural biology and pharmacology occurred in 2007 when the structure of the ß2-adrenergic receptor was solved using xray crystallography by Brian Kobilka's and Raymond Stevens's groups at Stanford and Scripps respectively. The structure was co-crystallized with the inverse agonist carazolol. Until then the only GPCR structure available was that of rhodopsin and all homology models of GPCR were based on this structure. The availability of this new high resolution structure opened new avenues for structure-based GPCR ligand discovery.

The ß2 binding pocket is especially suited for drug design since it is tight, narrow and lined with mostly hydrophobic residues with polar residues well-separated. Two crucial residues, an Asp and a Ser bind to the ubiquitous charged amino nitrogen present in most catecholamines and the aromatic section of the molecule docks deep into the hydrophobic pocket. These particular features also make computational docking more facile; a mix of polar and non-polar features with bridging waters can make docking and scoring more challenging.

Since the ß2 structure has been published, attempts are being made to use it as a template to build homology models of other GPCRs. A couple of months back I described an interesting proof-of-principle paper by Stefano Costanzi that sought to investigate how well a homology model based on the ß2 would perform. In that study carazolol itself was used as a ligand for docking into the homology model. Comparison with the original crystal structure revealed that while the ligand docked more or less satisfactorily, an important deviation in its orientation could be explained by a counterintuitive orientation of a Phe residue in the binding site. The study indicated that the devil is in the details when one is considering homology models.

However, finding ligands for the ß2 itself is also an important and interesting endeavor. Virtual screening could help in such studies. To this end Brian Shoichet, Brian Kobilka and their group have used the DOCK program to virtually screen one million lead-like ligands from their ZINC database against the ß2. Out of the 1 million ranked poses, they chose and clustered the top 500 compounds (0.05% of the database) into 25 unique chemotypes, a choice also guided by visual inspection of the protein-ligand interactions and commercial availability. They then tested these 25 compounds against the ß2 and found 6 compounds with IC50s better than 4 µM. One of these compounds with an IC50 of 9 nM is perhaps the most potent inverse agonist of the ß2 known. The binding poses revealed substantial overlap of similar functional groups with the carazolol structure. Two compounds turned out to have novel chemotypes and bore very little similarity with known ß2 ligands. A negative test was also run where a known predicted binder was chemical modified so that it would not bind.

Interestingly all the compounds found were inverse agonists. The ZINC library is somewhat biased against aminergic ligands as is most of chemical space. The catecholamine scaffold is one of the favourite scaffolds in medicinal chemistry. However, subtle difference in protein structure can sometimes turn an inverse agonist into an agonist. In this case, small changes in the orientation of the crucial Ser residue near the mouth of the binding pocket. In a past study for instance, slightly changing the rotameric features of the Ser residue thus resulting in a different orientation of the hydroxyl was sufficient to retrieve agonists.

The study thus shows the value of virtual screening in the discovery of new ß2 ligands and indicates the effect of library bias and protein structure on such ligand discovery. Many factors can contribute to the success or failure of such a search; nature is a multi-armed demon.

Kolb, P., Rosenbaum, D., Irwin, J., Fung, J., Kobilka, B., & Shoichet, B. (2009). Structure-based discovery of ß2-adrenergic receptor ligands Proceedings of the National Academy of Sciences DOI: 10.1073/pnas.0812657106

Post-docking as a post-doc, and some fragment docking

I am now ready to post-doc. I am also now ready to post-dock, that is, engage in activities beyond docking. Sorry, I could not resist cracking that terrible joke. It's been a long journey and I have enjoyed every most moments of it. Thanks to everyone in the chemistry blogworld who regaled, informed, provoked and entertained on this blog. I am now ready to move on to the freakingly chilly Northeast. Location not disclosed for now, but maybe later.

Speaking of docking, here is a nice paper from the Shoichet group in which they use fragment docking to divine hits from a large library for a beta-lactamase. Fragment docking can often be tricky compared to "normal" docking since fragments being small usually demonstrate promiscuity, low-affinity and non-selectivity in binding. Fragment docking thus is not yet a completely validated technique.

In their study, the present authors screen their ZINC library for fragments binding to the ß lactamase CTX-M by docking using the program DOCK. They also screen a lead-like library for larger molecules. The top hits from the fragment docking results were assayed and showed micromolar inhibition against the lactamase. These included tetrazole scaffolds not seen before. Importantly, five of these hits could be crystallized and the high-res crystal structures validated the docking modes.

What was interesting was that the same tetrazole scaffolds in the larger lead-like library were ranked very low (>900) and would not have ever been selected had their tetrazole fragments not showed up at the top in the fragment docking results. These compounds, when assayed showed sub-milimolar to micromolar activity against the lactamase. Thus, the protocol essentially demonstrated that fragment docking can reveal hits that can be missed by docking larger lead-like molecules. One of the reasons DOCK succeeds in this capacity is because of its use of a physics-based scoring function that has no bias against fragments. It also helps that the active site of CTM-X is relatively rigid with little protein motion.

The fragments were also assayed against another lactamase for Amp C. Usually, hits for CTM-X and Amp C are mutually exclusive. What was seen was that the higher the potency of the fragments for CTX-M, the higher the specificity for CTX-M, not surprising considering that increased potency translates to a much better complementary fit of the fragments for CTX-M.

Fragment docking can be messy since fragments can bind non-selectively and haphazardly to many different parts of many different proteins. But this study indicates that fragment docking is not an uninteresting strategy to possibly find hits from other lead-like libraries that may be otherwise concealed.

The potencies of the compounds found may look pretty weak, but because there are extremely few molecules inhibiting these medicinally important lactamases, such advances are welcome. Lactamases are of course an important target for overcoming resistance in antibiotic treatment.

Chen, Y., & Shoichet, B. (2009). Molecular docking and ligand specificity in fragment-based inhibitor discovery Nature Chemical Biology DOI: 10.1038/nchembio.155