Field of Science

Drug Discovery, Models and Computers: A (necessarily incomplete) Personal Take

Drugs and rational drug discovery

Natural substances have been used to treat mankind’s diseases and ills since the dawn of humanity. The Middle Ages saw the use of exotic substances like sulfur and mercury to attempt to cure afflictions; most of these efforts resulted in detrimental side effects or death because of lack of knowledge of drug action. Quinine was isolated from the bark of the Cinchona tree and used for centuries to treat malaria. Salicylic acid was isolated from the Willow tree and was used for hundreds of years to treat fevers, knowledge that led to the discovery of Aspirin. The history of medicine has seen the use of substances ranging from arsenic to morphine, some of which are now known to be highly toxic or addictive.

The use of these substances reflected the state of medical knowledge of the times, when accidentally generated empirical data was the most valuable asset in the treatment of disease. Ancient physicians from Galen to Sushruta made major advances in our understanding of the human body and of medical therapies, but almost all of their knowledge was derived through patient and meticulously documented trial and error. A lack of knowledge of the scientific basis of disease meant that there were few systematic rational means of discovering new medicines, and serendipity and the traditional folk wisdom passed on through the centuries played the most important role in warding off disease.

This state of affairs continued till the 19th and 20th centuries when twin revolutions in biology and chemistry made it possible to discover drugs in a more logical manner. Organic chemistry formally began in 1848 when Friedrich Wöhler found that he could synthesize urea from simple inorganic substances like ammonium cyanate, thus dispelling the belief that organic substances could only be synthesized by living organisms (1). The further development of organic chemistry was orchestrated by the formulation of the structural theory in the late 19th century by Kekulé, Cooper, Kolbe, Perkin and others (1). This framework made it possible to start to elucidate the precise arrangement of atoms in biologically active compounds. Knowledge of this arrangement in turn led to routes for synthesis of these molecules. These investigations also provided impetus to the synthesis of non-natural molecules of practical interest, sparking off the field of synthetic organic chemistry. However, while the power of synthetic organic chemistry later provided several novel drugs, the legacy of natural products is still prominent, and about half of the drugs currently on the market are either natural products or derived from natural products (2).

Success in the application of chemistry to medicine was exemplified in the early 20th century by tentative investigations of what we currently call structure-activity relationships (SAR). Salvarsan, an arsenic compound used for treating syphilis, was perhaps the first example of a biologically active substance that had been improved by systematic investigation and modification. As the same time, chemists like Emil Fischer were instrumental in synthesizing further naturally occurring substances like carbohydrates and proteins, thus extending the scope of organic synthesis into biochemistry.

The revolution in structure determination initiated by physicists led to vastly improved synthesis and studies of bioactive substances. At this point, rational drug discovery began to take shape. Chemists working in tandem with biologists made hundreds of substances which were tested for their efficacy against various diseases. Knowledge from biological testing was in turn translated into modifications of the starting compounds. The first successful example of such rational efforts was the synthesis of sulfa drugs used to treat infections in the 1930s (3). These compounds were the first effective antibiotics and were followed by the famous discovery, but this time serendipitous, of penicillin by Alexander Fleming in 1928 (4).

Rational drug discovery received a substantial impetus because of the post-World War 2 breakthroughs of structure determination by x-ray crystallography that revealed the structures of small molecules, proteins and DNA. The discovery of the structure of DNA in 1953 by Watson and Crick heralded the advent of molecular biology (5). This landmark event led in succession to the elucidation of the genetic code and the transfer of genetic information from DNA to RNA that results in protein synthesis. The first structure determination of a protein- hemoglobin by Perutz (6)- was followed by the structure determination of several other proteins, some of which were pharmacologically important. Such advances and preceding ones by Pauling and others (7) led to the elucidation of common motifs in proteins such as alpha helices and beta sheets. The simultaneous growth of techniques in biological assaying and enzyme kinetics made it possible to monitor the binding of drugs to biomolecules. At the same time, better application of statistics and the standardization of double blind, controlled clinical trials caused a fundamental change in the testing and approval of new medicines. A particularly noteworthy example of one of the first drugs discovered through rational investigations is cimetidine (8), a drug for acid reflux that was for several years the best-selling drug in the world.

Structure-based drug design and CADD

As x-ray structures of protein-ligand complexes began to emerge in the 70s and 80s, rational drug discovery received enormous benefits. The development was also accompanied by High-Throughput Screening, an ability to screen thousands of ligands against a protein target to identify likely binders. These studies led to what today is known as “structure-based drug design” (SBDD) (9). In SBDD, the structure of a protein bound to a ligand is used as a starting point for further modification and improvement of properties of the drug. While care has to taken in order to fit the structure well to the electron density in the data (10), well-resolved data can greatly help in identifying points of contact between the drug and the protein active site as well as the presence of special chemical moieties such as metals and cofactors. Water molecules identified in the active site can play crucial roles in bridging interactions between the protein and ligand (11). Early examples of classes of drugs discovered using structure-based design include Captopril (12) (angiotensin-converting enzyme inhibitor- hypertension) and Trusopt13 (carbonic anhydrase inhibitor- glaucoma) and recent examples include Aliskiren (14) (renin inhibitor- hypertension) and HIV protease inhibitors (13).

As SBDD progressed, another approach called ligand-based design (LBD) has also recently emerged. Obtaining x-ray structures of drugs bound to proteins is still a tricky endeavor, and one is often forced to proceed on the basis of the structure of an active compound alone. Techniques developed to tackle this problem involve QSAR (Quantitative Structure-Activity Relationships) (15) and pharmacophore construction in which the features essential for a particular ligand to bind to a certain protein are conjectured from affinity data for several similar and dissimilar molecules. Molecules based on the minimal set of interacting features are then synthesized and tested. However, since molecules can frequently adopt diverse conformations when binding to a protein, care has to be exercised in developing such hypotheses. In addition, it is relatively easy to be led astray by a high correlation between affinity data in the training set. It is paramount in such cases to remember the general discrepancy between correlation and causation, and overfitting of models can lead to both spurious correlations and absence of causation (16). While LBD is more recent than SBDD, it has turned out to be valuable in certain cases. Noteworthy is a recent example where an inhibitor of NAADP was discovered by shape-based virtual screening (17) (vida infra)

As rational drug discovery progressed, software and hardware capacities of computers also grew exponentially, and CADD (Computer-Aided Drug Design) began to be increasingly applied to drug discovery. An effort was made to integrate CADD in the traditional chemistry and biology workflow and its principal development took place in the pharmaceutical industries, although academic groups were also instrumental in developing some capabilities (18). The declining costs of memory and storage, increasing processing power and facile computer graphics software put CADD within the grasp of relatively untrained computational chemists or experimental scientists. While the general verdict on the contribution of CADD to drug discovery is still forthcoming, many drugs currently on the market now include CADD as an important component of their discovery and development (19). Many calculations that once were impractical because of constraints of time and computing power can now be routinely performed, some on a common desktop. Currently the use of CADD in drug design aims to address three principal problems, all of which are valuable to drug discovery.

Virtual Screening

Virtual screening (VS) is defined by the ability to test thousands or millions of potential ligands against a protein, distinguish the actives from inactives and rank the ‘true’ binders in a certain top fraction. If validated, VS would serve as a valuable complement, if not substitute, for HTS and would save significant amounts of resources and time in HTS. Just like HTS, VS has to circumvent the problem of false positives and false negatives, the latter of which in some ways are more valuable since by definition they would not be identified. VS can be either structure-based or ligand-based. Both approaches have enjoyed partial success although recent studies have validated 3D ligand-based techniques in which ligand structures are compared to known active ligands by means of certain metrics as having a greater hit rate than structure-based techniques (20). Virtual libraries of molecules such as DUD (21) (Directory of Useful Decoys) and ZINC (22) have been built to test the performance of several VS programs and compare them with each other. These libraries typically consist of a few actives and several thousand decoys, with the goal being to rank the true actives above the true decoys using some metric.

Paramount in such retrospective assessment is an accurate method for evaluating the success and failure of these methods (23,24). Until now ‘enrichment factors’ have mostly been used for this purpose (24). The EF refers to the number of ‘true’ actives that rank in a certain top fraction (typically 1% or 10%) as a function of the screened database. However the EF suffers from certain drawbacks, such as being dependent on the number of decoys in the dataset. To circumvent this problem, recent studies have suggested the use of the ROC (Receiver Operator Characteristic) curve, a graph that plots false positives vs. true positives (24,25) (Figure 1). The curve indicates what the false positive rate is for a given true positive rate and the measured variable is the Area Under the Curve (AUC). A completely random performance gives a straight line (AUC 0.5), while better performance results in a hyperbolic curve (AUC > 0.5).

Image Hosted by

Figure 1: ROC curve for three different VS scenarios. Completely random performance will give the straight white line (AUC 0.5), an ideal performance (no false positives and all true positives) will give the red line (AUC 1.0) and a good VS algorithm will produce the yellow curve (0.5 < AUC < 1.0)

Until now VS has provided limited evidence of success. Yet its capabilities are being improved and it has become a part of the computational chemist’s standard repertoire. In some cases VS can provide more hits compared to HTS (26) and in others, VS at the very least provides a method to narrow down the number of compounds actually assayed (27). As advances in general SBDD and LBD continue, the power of VS to identify true actives will undoubtedly increase.


The second goal sought by computational chemists is to predict the binding orientation of a ligand in the binding pocket of a protein, a task that falls within the domain of SBDD. This endeavor if successful will provide an enormous benefit in cases where crystal structures of protein-ligand complexes are not easily obtained. Since such cases are still very common, pose-prediction continues to be both a challenge as well as a valuable objective. There are two principal problems in pose prediction. The first one relates to the scoring of the poses obtained in order to identify the top-scoring pose as the ‘real’ pose; current docking programs are notorious for their scoring unreliability, certainly in an absolute sense and sometimes even in a relative sense. The problem of pose prediction ultimately is defined by the ability of an algorithm to find the global minimum orientation and conformation of a ligand on the potential energy surface (PES) generated by the protein active site (28). As such it is susceptible to the common inadequacies inherent in comprehensively sampling a complex PES. Frequently however, as in the case of CDK7, past empirical data including knowledge of poses of known actives (roscovitine in this case) provides confidence about the pose of the unknown ligand.

Another serious problem in pose prediction is the inability of many current algorithms to adequately sample protein motion. X-ray structures provide only a static snapshot of ligand binding that may obscure considerable conformational changes in protein motifs. Molecular dynamics simulations followed by docking (‘ensemble docking’) have remedied this limitation to some extent (29), induced-fit docking algorithms have now been included in programs such as GLIDE30, and complementary information from dynamical NMR studies may help judicious selection between several protein poses. Yet simulating large-scale protein motions are still outside the domain of most MD simulations, although significant progress has been made in recent years (31,32).

An example of how pose prediction can shed light on anomalous binding modes and possibly save the allocation of time and financial resources was experienced by the present author during his study of a paper detailing the development of inhibitors of the p38 MAP kinase (33). In one instance the authors followed the SAR data in the absence of a crystal structure and observed contradictory changes in activity influenced by structural modifications. Crystallography on the protein ligand complex finally revealed an anomalous conformation of the ligand in which the oxygen of an amide at the 2 position of a thiophene was cis to the thiophene sulfur, when chemical intuition would have expected it to be trans. The crystal structure showed that an unfavorable interaction of a negatively charged glutamate with the sulfur in the more common trans conformation forced the sulfur to adopt the slightly unfavorable cis position with respect to the amide oxygen. Surprisingly this preference was seen in all top 5 GLIDE poses of the docked compound. This example indicates that at least in some cases pose prediction could serve as a valuable timesaving complement and possible alternative to crystallography.

Binding affinity prediction

The third goal is possibly the most challenging endeavor for computational chemistry. Rank-ordering ligands in terms of their binding affinity involves accurate scoring, which as noted above is a recalcitrant problem. The problem is a fundamental one since it really involves calculating absolute free energies of protein ligand binding. The most accurate and sophisticated approaches for calculating these energies are the Free-Energy Perturbation (FEP) (34) or Thermodynamic Integration (TI) methods based on MD simulations and statistical thermodynamics. The methods involve ‘mutating’ one ligand to another in hundreds of thousands of infinitesimal steps and evaluating the binding enthalpy and entropy at every step. As of now, these techniques are some of the most computationally expensive techniques in the field. This problem typically limits their use only to evaluating free energy changes between ligand that differ little in structure. Therefore successful examples where they have found their greatest use involve cases where small substituents on aromatic rings are modified to evaluate changes in binding affinity (35). However as computing power grows, these techniques will continue to find more applications in drug discovery.

Apart from these three goals, a major goal of computational science in drug discovery is to aid the later stages of drug development when pharmacokinetics (PK) and ADMET (Absorption Distribution Metabolism Excretion Toxicity) issues are key. Optimizing the binding affinity of a particular compound to a protein only results in an efficient ligand and not necessarily an efficient drug. Computational chemistry can make valuable contributions to these later developmental stages by trying to predict the relevant properties of ligands in the early stages, thus limiting the typically high attrition of drugs in the advanced phases. While much remains to be accomplished in this context, some progress has been made (36). For example, the well-known Lipinski Rule of Five (37) provides a set of physicochemical properties necessary for drugs to have good bioavailability and computational approaches are starting to help evaluate these properties during early stages. The QikProp program developed by Jorgensen et al. calculates properties like Caco-2 cell permeability, possible metabolites, % absorption in the GI tract and logP values (38). Such programs are still largely empirical, depending on a large dataset of properties of known drugs for comparison and fitting.

Models, computers and drug discovery

In applying models to designing drugs and simulating their interactions with proteins, the most valuable lesson to remember is that these are models that are generated by computers. Models seldom mirror reality; in fact they often may succeed in spite of reality. Models are not usually designed to simulate reality but they are designed to produce results that agree with experiment. There are many approaches that produce such results. These approaches may not always encompass factors operating in real environments. In QSAR for instance, it has been shown that adding enough number of parameters to your model can lead to a good fit to the data with a high correlation coefficient. However the model may be overfitted; that is, it may seem to fit the known data very well but may fail to predict the unknown data, which is what it was designed to do (16,39). In such cases, using more advanced statistical methods and using ‘bootstrapping’ (leaving out a part of the data and looking at the resulting fit to investigate whether that part of data is predicted) can lead to improvement in results (39).

Models can also be used in spite of outliers. A high correlation coefficient of 0.85 that leads to acceptance of a model may nonetheless lead to one or two outliers. It then becomes important to be aware of the physical anomaly which the outliers represent. The reason for this is clear. If the variable producing the outlier does not constitute a part of the model building, then applying the well-trained model to a system where that particular variable suddenly becomes dominant will result in a failure of the model. Such outliers, termed ‘black swans’, can prove extremely deleterious if their value is unusually high (40). This phenomenon is known to operate in the field of financial engineering (40). In modeling for instance, if the training set for a docking model consists of largely lipophilic protein active sites, then the model may fail to deliver cogent results if applied to a set of ligands binding to a protein that has an anomalously polar or charged active site. If the value of this protein is unusually high for a particular pharmaceutical project, an inability to predict its behavior under unforeseen circumstances may lead to valuable losses. Clearly in this case the physical variable, namely the polarity of the active site, was not taken into account in spite of the fact that the model delivered a high initial correlation merely because of the addition of a large number of parameters or descriptors, none of which was related in a significant way to the polarity of the binding pocket. The difference between correlation and causation is especially relevant in this respect. This hypothetical example illustrates one of the limitations of models iterated above; that they may not bear relationship to actual physical phenomena and may yet fit the data well enough because of various reasons to elicit confidence in their predictive ability.

In summary, models of the kind that are used in computational chemistry have to be carefully evaluated, especially in the context of practical applications like drug discovery where time and financial resources are valuable. Training the model on high-quality datasets, reiterating the difference between correlation and causation and better application of statistics and bootstrapping can help to avert model failure.

In the end however, it is experiment that is of paramount importance for building the model. Inaccurate experimental data with uncertain error margins will undoubtedly hinder the success of every subsequent step in model building. To this end, generating, presenting and evaluating accurate experimental data is a responsibility that needs to be fulfilled by both computational chemists and experimentalists, and it is only a fruitful and synergistic alliance between the two groups that can help overcome the complex challenges in drug discovery.


(1) Berson, J. A. Chemical creativity : ideas from the work of Woodward, Hückel, Meerwein and others; 1st ed.; Wiley-VCH: Weinheim ; Chichester, 1999.
(2) Paterson, I.; Anderson, E. A. Science 2005, 310, 451-3.
(3) Hager, T. The demon under the microscope : from battlefield hospitals to Nazi labs, one doctor's heroic search for the world's first miracle drug; 1st ed.; Harmony Books: New York, 2006.
(4) Macfarlane, G. Alexander Fleming, the man and the myth; Oxford University Press: Oxford [Oxfordshire] ; New York, 1985.
(5) Judson, H. F. The eighth day of creation : makers of the revolution in biology; Expanded ed.; CSHL Press: Plainview, N.Y., 1996.
(6) Ferry, G. Max Perutz and the secret of life; Cold Spring Harbor Laboratory Press: New York, 2008.
(7) Hager, T. Linus Pauling and the chemistry of life; Oxford University Press: New York, 1998.
(8) Black, J. Annu Rev Pharmacol Toxicol 1996, 36, 1-33.
(9) Jhoti, H.; Leach, A. R. Structure-based drug discovery; Springer: Dordrecht, 2007.
(10) Davis, A. M.; Teague, S. J.; Kleywegt, G. J. Angew. Chem. Int. Ed. Engl. 2003, 42, 2718-36.
(11) Ball, P. Chem. Rev. 2008, 108, 74-108.
(12) Smith, C. G.; Vane, J. R. FASEB J. 2003, 17, 788-9.
(13) Kubinyi, H. J. Recept. Signal Transduct. Res. 1999, 19, 15-39.
(14) Wood, J. M.; Maibaum, J.; Rahuel, J.; Grutter, M. G.; Cohen, N. C.; Rasetti, V.; Ruger, H.; Goschke, R.; Stutz, S.; Fuhrer, W.; Schilling, W.; Rigollier, P.; Yamaguchi, Y.; Cumin, F.; Baum, H. P.; Schnell, C. R.; Herold, P.; Mah, R.; Jensen, C.; O'Brien, E.; Stanton, A.; Bedigian, M. P. Biochem Biophys Res Commun 2003, 308, 698-705.
(15) Hansch, C.; Leo, A.; Hoekman, D. H. Exploring QSAR; American Chemical Society: Washington, DC, 1995.
(16) Doweyko, A. M. J. Comput. Aided Mol. Des. 2008, 22, 81-9.
(17) Naylor, E.; Arredouani, A.; Vasudevan, S. R.; Lewis, A. M.; Parkesh, R.; Mizote, A.; Rosen, D.; Thomas, J. M.; Izumi, M.; Ganesan, A.; Galione, A.; Churchill, G. C. Nat. Chem. Biol. 2009, 5, 220-6.
(18) Snyder, J. P. Med. Res. Rev. 1991, 11, 641-62.
(19) Jorgensen, W. L. Science 2004, 303, 1813-8.
(20) McGaughey, G. B.; Sheridan, R. P.; Bayly, C. I.; Culberson, J. C.; Kreatsoulas, C.; Lindsley, S.; Maiorov, V.; Truchon, J. F.; Cornell, W. D. J. Chem. Inf. Model. 2007, 47, 1504-19.
(21) Huang, N.; Shoichet, B. K.; Irwin, J. J. J. Med. Chem. 2006, 49, 6789-801.
(22) Irwin, J. J.; Shoichet, B. K. J. Chem. Inf. Model. 2005, 45, 177-82.
(23) Jain, A. N.; Nicholls, A. J. Comput. Aided Mol. Des. 2008, 22, 133-9.
(24) Hawkins, P. C.; Warren, G. L.; Skillman, A. G.; Nicholls, A. J. Comput. Aided Mol. Des. 2008, 22, 179-90.
(25) Triballeau, N.; Acher, F.; Brabet, I.; Pin, J. P.; Bertrand, H. O. J. Med. Chem. 2005, 48, 2534-47.
(26) Babaoglu, K.; Simeonov, A.; Irwin, J. J.; Nelson, M. E.; Feng, B.; Thomas, C. J.; Cancian, L.; Costi, M. P.; Maltby, D. A.; Jadhav, A.; Inglese, J.; Austin, C. P.; Shoichet, B. K. J. Med. Chem. 2008, 51, 2502-11.
(27) Peach, M. L.; Tan, N.; Choyke, S. J.; Giubellino, A.; Athauda, G.; Burke, T. R.; Nicklaus, M. C.; Bottaro, D. P. J. Med. Chem. 2009.
(28) Jain, A. N. J. Comput. Aided Mol. Des. 2008, 22, 201-12.
(29) Rao, S.; Sanschagrin, P. C.; Greenwood, J. R.; Repasky, M. P.; Sherman, W.; Farid, R. J. Comput. Aided Mol. Des. 2008, 22, 621-7.
(30) Sherman, W.; Day, T.; Jacobson, M. P.; Friesner, R. A.; Farid, R. J. Med. Chem. 2006, 49, 534-53.
(31) Shan, Y.; Seeliger, M. A.; Eastwood, M. P.; Frank, F.; Xu, H.; Jensen, M. O.; Dror, R. O.; Kuriyan, J.; Shaw, D. E. PNAS 2009, 106, 139-44.
(32) Jensen, M. O.; Dror, R. O.; Xu, H.; Borhani, D. W.; Arkin, I. T.; Eastwood, M. P.; Shaw, D. E. PNAS 2008, 105, 14430-5.
(33) Goldberg, D. R.; Hao, M. H.; Qian, K. C.; Swinamer, A. D.; Gao, D. A.; Xiong, Z.; Sarko, C.; Berry, A.; Lord, J.; Magolda, R. L.; Fadra, T.; Kroe, R. R.; Kukulka, A.; Madwed, J. B.; Martin, L.; Pargellis, C.; Skow, D.; Song, J. J.; Tan, Z.; Torcellini, C. A.; Zimmitti, C. S.; Yee, N. K.; Moss, N. J. Med. Chem. 2007, 50, 4016-26.
(34) Jorgensen, W. L.; Thomas, L. L. J. Chem. Theor. Comp. 2008, 4, 869-876.
(35) Zeevaart, J. G.; Wang, L. G.; Thakur, V. V.; Leung, C. S.; Tirado-Rives, J.; Bailey, C. M.; Domaoal, R. A.; Anderson, K. S.; Jorgensen, W. L. J. Am. Chem. Soc. 2008, 130, 9492-9499.
(36) Martin, Y. C. J. Med. Chem. 2005, 48, 3164-70.
(37) Lipinski, C. A.; Lombardo, F.; Dominy, B. W.; Feeney, P. J. Adv. Drug. Del. Rev. 1997, 23, 3-25.
(38) Ioakimidis, L.; Thoukydidis, L.; Mirza, A.; Naeem, S.; Reynisson, J. Qsar & Comb. Sci. 2008, 27, 445-456.
(39) Hawkins, D. M. J. Chem. Inf. Comput. Sci. 2004, 44, 1-12.
(40) Taleb, N. The black swan : the impact of the highly improbable; 1st ed.; Random House: New York, 2007.


  1. Nice article. One point -- the sulfur-carbonyl cis preference is actually fairly well known from small-molecule xray structures. See e.g. for a review

  2. A great review of the state of computer simulations…No offense to anyone, but an important distinction must be made between in-silico science users and coding authors. When I say 'coding authors' I don't mean computer scientists who know enough physical science to be dangerous, but rather, physical scientists who author algorithms based upon first principle laws of physics. They may not write the fastest code (leave that up to the computer scientists to optimize) but they can track their code to its theoretical underpinnings making adjustments as both theory and experiment evolve. As a molecular dynamics engine coding author (and there are only a handful around) I can tell you that there are many users and programmers that consider themselves coding authors that do not have the theoretical horsepower to be taking lead roles in computer simulation coding forums. You are correct about the models - they are the problem. Computer speed is not going to solve the fundamental problems we have with inaccurate in-silico simulations - realism is the key. When models are not realistic, fast computers simply get to the wrong answer sooner. In molecular dynamics, bridging the real world to the model world is by nature an attempt to solve the many-body problem. This is where most coders give up because force potentials can only be deductively evaluated. It becomes the same dilemma as quantum mechanical simulations - the larger the system, the more intractable the problem. And so, is there an inductive approach to this? Yes there is - but one needs to know the theoretical underpinnings of force fields in order to make the algorithmic adjustments. But this is almost a social problem. Who would appreciate its solution? Most of the popular pseudo-coding authors are running the show (forum moderators) and wouldn't know the real deal if it fell on them. So who suffers? We all do because until there is realism in computer simulations, we are looking at glorified video games - nothing more. Assume for a moment that a supreme being handed you the "real" force functions for your computer simulation, wouldn't you expect that the simulation might become realistic? Force functions are the bricks in the wall of molecular structure and behavior in simulations. If the forces are correct, the simulation is real because the mode mechanics are real and consequently, self-assembly will be real. Ask yourself, what are the implications if computer simulations of the weather were 100% accurate? It’s scary in some respects. How much does realism speed up the R&D process? Think about it.

  3. Anon 1: Thanks for the wonders why the original authors did not realize this.

    Anon 2: Thanks for your articulate thoughts. You are quite right that ultimately the theoretical underpinnings decide the model. Think of the several schemes for denoting charge and electrostatics that go into modeling. Which one of these is the best? I have not yet discovered any one force field which can model energies accurately. Ultimately I think of the computer as an "eliminator" which, even if it knocks down straw men, will be valuable. I have also realized that many coders don't have a handle on empirical and physical data which they can use to improve their models and explore the presence of outliers. There still has to be very good synergy and cross talk between coders, applications scientists and actual experimentalists. In the end it's the accurate experimental number that is paramount.

  4. This is anon 2. There is no single force field that will model everything. Please take this constructively...there is, however, a way to find the force field 'on-the-fly' as you need it. I'm walking around this for my owns reasons, but take the lame analogy that force fields are like happiness...not a single event, but the results of a process. The process of finding theoretically transparent or theoretically opaque force functions does exist. I know this is quite a claim, but think about the premise that one must know some superfunction first and then deductively guess (through molecular dynamics experiment) the real outcome. We do this because it is the way things have been done since day one. It's been the only game in town and people don’t like radical change – it makes them feel bad – and yes, that is their problem, but there are so many of them. Anyway, I like to refer to the existing methodology as 'playing God'. You (the general 'you' not “you” you) create the fundamental building block of the simulation (the force function(s)), and expect the simulation to deductively reflect the behavior of the real world. That is an ego trip, really. All of us in-silico scientists need humility about this. Only the real world knows itself and is capable of correcting a model to be like itself. I'm talking about a feedback mechanism from the real world to the model world. It does exist. Yes, there is a way to bridge real experiments to model experiments inductively then correct the models on-the-fly (until there is optimal realism convergence). And yes, it smells like I just said that I knew how to solve the n-body problem analytically – I never said analytically – however, you do get a theoretically transparent or theoretically opaque force field out of this process without having to play God. Let nature play God through a feedback mechanism. And finally, “yes” details are available.

  5. I think you'll find ligand based design has been around rather longer than you give it credit for. This field didn't start with shape matching.

    Also I don't think it is correct to state that the binding enthalpy and entropy are evaluated at each step in TI or FEP. A free energy (Helmholtz or Gibbs depending on ensemble) associated with mutations is what is normally calculated. You need to break the mutation into smaller steps otherwise the simulations don't converge.

  6. Great post ! The overview of the field I was looking for without the optimism of the protagonists of each method, and with all the warts exposed.

    Can one actually calculate the potential energy surface mentioned in the paragraph after pose prediction? The caveats about models are particularly good.

    Nonetheless, this stuff is really important, even though it doesn't work as well as we'd like.

    Consider the situation with the Mexican flu virus. The molecular biologists have already cloned and sequenced its genome. They likely know exactly where the mutations are that make this virus different (or the new assortment of genes from different species making up the new virus).

    Also consider how poorly we understand protein chemistry (and small molecule chemistry). If we really understood protein chemistry, theoreticians would have predicted all the new altered conformations of the new proteins by now, and computational chemists would have designed a small bioavailable molecule to inhibit those conformations, and synthetic chemists would have made tons of them. None of the last 3 desiderata is anywhere close to reality, which is why chemistry is more than just intellectually fascinating and a thing of beauty.


  7. Sorry for misusing the comment function - I'd like to get in contact to you. How can I reach you?

  8. @Anon2: Nice analogy with happiness. A little sadness makes the picture complete. IMHO coders need to know more physical science than physical scientists need to know coding. Simply making algorithms more efficient and adding more parameters is surely not the ultimate way to nirvana.

    @Retread: Thanks, and you make a cogent point about H1N1 and our primitive knowledge of protein function. The hemagglutinin molecule does some amazing acrobatics at reduced pH whereby it enables fusion. I would have thought someone would have designed a potent and effective hemagglutinin inhibitor. To my knowledge the current drugs only target the neuraminidase and M2 proteins. Perhaps the lack of a hemagglutinin inhibitor is not surprising; protein-protein interactions, anyone? A colleague works on inhibiting the measles fusion protein and I can see his frustration in spite of many attempts at inhibitor discovery.

    As for the PES, it's not done rigorously. It's essentially a quick and dirty Monte-Carlo conformational search with extensive knowledge-based evaluation involved. QM/MM calculations could possibly do it though. QM calculations will take the course of our natural lifetimes.

    @GMC: LBD certainly went beyond shape-based. I just think shape-based is one of the best current methods for LBD. There's of course nothing as delicious as having a well-resolved structure. As for FEP, I think I should have accurately stated that PMFs are often calculated.

    @Beatrice: No problem. Get in touch at


Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="">FoS</a> = FoS