Field of Science

Molecular modeling and physics: A tale of two disciplines

The LHC is a product of both time and multiple disciplines

In my professional field of molecular modeling and drug discovery I often feel like an explorer who has arrived on the shores of a new continent with a very sketchy map in his pocket. There are untold wonders to be seen on the continent and the map certainly points to a productive direction in which to proceed, but the explorer can't really stake a claim to the bounty which he knows exists at the bottom of the cave. He knows it is there and he can even see occasional glimpses of it but he cannot hold all of it in his hand, smell it, have his patron duke lock it up in his heavily guarded coffers. That is roughly what I feel when I am trying to simulate the behavior of drug molecules and proteins.

It is not uncommon to hear experimentalists from other disciplines and even modelers themselves grumbling about the unsatisfactory state of the discipline, and with good reason. Neither are the reasons entirely new: The techniques are based on an incomplete understanding of the behavior of complex biological systems at the molecular level. The techniques are parametrized based on a limited training set and are therefore not generally applicable. The techniques do a much better job of explaining than predicting (a valid point, although it's easy to forget that explanation is as important in science as prediction).

To most of these critiques I and my fellow brethren plead guilty; and nothing advances a field like informed criticism. But I also have a few responses to the critiques, foremost among which is one that is often under-appreciated: On the scale of scientific revolutions, computational chemistry and molecular modeling are nascent fields, only just emerging from the cocoon of understanding. Or, to be pithier, give it some more time. This may seem like a trivial point but it's an important one and worth contemplating. Turning a scientific discipline from an unpolished, rough gem-in-the-making to the Kohinoor diamond takes time. To drive this point home I want to compare the state of molecular modeling - a fledgling science - with physics - perhaps the most mature science. Today physics has staked its claim as the most accurate and advanced science that we know. It has mapped everything from the most majestic reaches of the universe at its largest scale to the production of virtual particles inside the atom at the smallest scale. The accuracy of both calculations and experiments in physics can beggar belief; on one hand we can calculate the magnetic moment of the electron to sixteen decimal places using quantum electrodynamics (QED) and on the other hand we can measure the same parameter to the same degree of accuracy using ultra sensitive equipment.

But consider how long it took us to get there. Modern physics as a formal discipline could be assumed to have started with Isaac Newton in the mid 17th century. Newton was born in 1642. QED came of age in about 1952 or roughly 300 years later. So it took about 300 years for physics to go from the development of its basic mathematical machinery to divining the magnetic moment of the electron from first principles to a staggering level of accuracy. That's a long time to mature. Contrast this with computational chemistry, a discipline that spun off from the tree of quantum mechanics after World War 2. The application of the discipline to complex molecular entities like drugs and materials is even more recent, taking off in the 1980s. That's thirty years ago. 30 years vs 300 years, and no wonder physics is so highly developed while molecular modeling is still learning how to walk. It would be like criticizing physics in 1700 for not being able to launch a rocket to the moon. A more direct comparison of modeling is with the discipline of synthetic chemistry - a mainstay of drug discovery - that is now capable of making almost any molecule on demand. Synthetic chemistry roughly began in about 1828 when German chemist Friedrich Wöhler first synthesized urea from simple inorganic compounds. That's a period of almost two hundred years for synthetic chemistry to mature.

But it's not just the time required for a discipline to mature; it's also the development of all the auxiliary sciences that play a crucial role in the evolution of a discipline that makes its culmination possible. Consider again the mature state of physics in, say, the 1950s. Before it could get to that stage, physics needed critical input from other disciplines, including engineering, electronics and chemistry. Where would physics have been without cloud chambers and Geiger counters, without cyclotrons and lasers, without high-quality ceramics and polymers? The point is that no science is an island, and the maturation of one particular field requires the maturation of a host of others. The same goes for the significant developments in mathematics - multivariate calculus, the theory of Lie groups, topology - that made progress in modern physics possible. Similarly synthetic chemistry would not have been possible had NMR spectroscopy and x-ray diffraction not provided the means to determine the structure of molecules.

Molecular modeling is also constrained by similar input from other science. Simulation really took off in the 80s and 90s with the rapid advances in computer software and hardware; before this period chemists and physicists had to come up with clever theoretical algorithms to calculate the properties of molecules simply because they did not have access to the proper firepower. Now consider what other disciplines modeling is dependent on - most notably chemistry. Without chemists being able to rapidly make molecules and provide both robust databases as well as predictive experiments, it would be impossible for modelers to validate their models. Modeling has also received a tremendous boost from the explosion of crystal structures of proteins engendered by genomics, molecular biology, synchrotron sources and computer software for data processing. The evolution of databases, data mining methods and the whole infrastructure of informatics has also really fed into the growth of modeling. One can even say without exaggeration that molecular modeling is ultimately a product of our ability to manipulate elemental silicon and produce it in an ultrapure form.

Thus, just like physics was dependent on mathematics, chemistry and engineering, modeling has been crucially dependent on biology, chemistry and computer science and technology. And in turn, compared to physics, these disciplines are relatively new too. Biology especially is still just taking off, and even now it cannot easily supply the kind of data which would be useful for building a robust model. Computer technology is very efficient, but still not efficient enough to really do quantum mechanical calculations on complex molecules in a high-throughput manner (I am still waiting for that quantum computer). And of course, we still don't quite understand all the forces and factors that govern the binding of molecules to each other, and we don't quite understand how to capture these factors in sanitized and user-friendly computer algorithms and graphical interfaces. It's a bit like physics having to progress without having access to high-voltage sources, lasers, group theory and a proper understanding of the structure of the atomic nucleus.

Thus, thirty years is simply not enough for a field to claim a very significant degree of success. In fact, considering how new the field is and how many unknowns it is still dealing with, I would say that the field of molecular modeling is actually doing quite well. The fact that computer-aided molecular design was hyped during its inception does not make it any less useful, and it's silly to think so. In the past twenty years we have at least had a good handle on the major challenges that we face and we have a reasonably good idea of how to proceed. In major and minor ways modeling continues to make useful contributions to the very complicated and unpredictable science and art of drug design and discovery. For a field that's thirty years old I would say we aren't doing so bad. And considering the history of science and technology as well as the success of human ingenuity in so many forms, I would say that the future is undoubtedly bright for molecular simulation and modeling. It's a conviction that is as realistic as any other in science, and it's one of the things that helps me get out of bed every morning. In science fortune always favors the patient, and modeling and simulation will be no different.

Crystallography and chemistry: The culture issue

Image: Charles Reynolds and ACS Med Chem Letters
As the old saying goes, beware of crystallographers bearing ligands. Charles Reynolds who is a well-known structure-based drug design expert has an editorial in ACS Medicinal Chemistry Letters touching on an issue that lies at the confluence of crystallography, medicinal chemistry and modeling: flaws in protein ligand co-crystal structures. It's a problem with major ramifications for drug design, especially since it sits at the apex of the process and has the power to influence all subsequent steps. It's also an issue that has come up many times before, but like many deep-seated issues this is one that has not quite disappeared from the palette of the structure-based design scientist.

In 2003 Davis, Teague and Gerard Kleywegt (who is incidentally also one of the wittiest conference speakers I have come across) wrote an article pointing out one simple observation: in several PDB structures of proteins co-crystallized with small molecule druglike ligands, the protein seems to be well-resolved and assigned, but the small molecule is often strained, with unrealistic bond lengths, planar aromatic ring atoms, non-planar amide bonds, rings in boat or pseudo chair conformations and clashes between protein and ligand atoms. Now the protein can also be misassigned, and so can water molecules, but it turns out that the problem looms much larger for ligands.

Reynolds's editorial takes another, 2014 look at this 2003 problem. And it seems that while some people have actually become more cognizant of issues in crystal structures, things aren't exactly rosy at this point in time. He points out a 2009 study that located 75% of the structures in the data set whose geometries could be improved by using better restraints.

The first and foremost pitfall that non-specialists fall into when taking a crystal structure at face value is is to assume that whatever they see on that fancy computer screen is...real. The fact though is that, barring any structure solved to better than 1 Å (when was the last time you saw that?) every crystal structure is a model (and while we are on the topic, Morpheus's definition of "real" may also be somewhat relevant here). The raw data is those dots that you see in the x-ray diffraction; everything after that, including the pretty picture that you visualize in Pymol, comes from a series of steps undertaken by the crystallographer that involve intuition, parameter fitting, expert judgement and the divining of complete information from incomplete data. That's potentially a lot of guesswork and approximation, and so it shouldn't be surprising that it often leads to flaws in the results.

So is this problem primarily a technology issue? Not really. Reynolds points out several programs that can now fit ligands to the electron density better and get rid of strain and artifacts; Schrodinger's PrimeX and OpenEye's AFITT are only two prominent examples. Nor is it complicated to find out in the first place whether a ligand might be strained; any scientist who has access to a good molecular mechanics energy minimization program can take the ligand structure out of the protein, minimize it to the nearest local minimum, look at the energy difference (usually > 5kcal/mol for a strained ligand), visualize steric clashes between atoms and reach a reasonable conclusion regarding the feasibility of that particular ligand conformation.

The abundance of methods for both figuring out strained ligand conformations and refining them seems to point to something other than technology as the operative factor in the misinterpretation of crystal structures. I believe the problem, in significant part, is culture. Reynolds alludes to this when he says that "Crystallographers are not chemists". When you are a crystallographer and are in hot pursuit of a protein structure, you are rightly going to experience a moment of ecstasy when that huge hulking hunk of sheets and strands finally appears on your screen. But most crystallographers don't care about that little blimp in the binding site - a small molecule that's often crystallized with the purpose of stabilizing the protein as much as for aiding drug discovery - as they do about their beloved protein. In addition, many crystallographers don't have the knee-jerk, intuitive reaction to, say, rings in boat conformations that a good medicinal chemist or a medicinal chemistry-aware modeler would have.

The unfortunate consequence of all this is that the ligand often just comes along for the ride and the protein's gory structural details are exquisitely teased apart at the expense of the ligand's. Protein love often inevitably translates into ligand hate. For an organic chemist a cyclohexane boat may be a textbook violation of conformational preferences, but for a crystallographer it's a big, hydrophobic group filling up a big, fuzzy halo of electron density. Crystallographers are not chemists.

However, an honest assessment of the problem would not unfairly pin the blame for bad ligand structures on crystallographers alone. The fact is that structure-based drug design is an intimate covenant between crystallographers, medicinal chemists and modelers and true appreciation and progress can only come from each side speaking or at least understanding the other's language. To this end, chemists and modelers need to be aware of crystallographic parameters and need to ask the right questions to the crystallographer, beginning with a simple question about the resolution (even this question is rarer than you may think). A medicinal chemist or modeler who simply plucks the provided structure out of the PDB file and starts using it to design drugs is as guilty as a chemistry-challenged crystallographer.

A typical set of questions a modeler or medicinal chemist might ask the crystallographer is: 

- What's the resolution?
- What are the R-factors and the B-factors
- Do you have equal confidence in all parts of the structure? Which parts are more uncertain?
- Are the amides non-planar? 
- Where are the water molecules located? How much confidence do you have in their placement?
- Are atoms supposed to be planar non-planar? 
- Are there any gauche or eclipsed interactions? 
- Are there boats in rings? 
- Have you looked at the strain energy of the ligand?
- How did you refine the ligand?

These questions are not meant to be posed to the crystallographer by men in dark suits in a dimly lit room with bars on the windows, but rather are supposed to provide a reality check on the fidelity of the structure and its potential utility in drug design for all three arms of the SBDD process. The questions are part of a process that allows all three departments to confer and reach an agreement; anyone can and should ask them. They are meant to bring hands together, not to point fingers.

One of the cultural problems in drug discovery is still the reluctance of one group of scientists to adopt at least parts of the cultural behavior of other groups. Organic chemists are quick to look at stereochemistry or unstable functional groups, modelers are not. Modelers are much more prone to look at conformation, organic chemists are not. Crystallographers are far more likely to bear multiple conformations of loops and flexible protein side chains in their minds, the other two parties are not.

The best way to fill these gaps is for each group to speak the language of the other, but until then the optimal solution is to have all of them look at the evidence and emphasize what they think is the most important part. But for that to happen each party has to make as many details of its own domain accessible to the others, and that is partly what is being said here.

Update: As usual, the Yoda of chemistry blogging got there first.