Field of Science

Rookie mistakes in molecular modeling: Part 1

Molecular modeling as a general approach is no longer utilized only by experts but has reached the masses. Improved hardware and software capabilities combined with easy-to-use graphical user interfaces have enabled experimental chemists of all kinds to build models of molecules and perform relatively sophisticated calculations on them. Calculations which once required supercomputers can now be routinely done on desktops by organic, inorganic and biological chemists who can use the results to explain, support and predict chemical phenomena. In the coming years we can be confident that we will witness an increasing use of modeling techniques by experimentalists.

An unfortunate (but probably not unexpected) consequence of this ease of use is that it has also become easier to make mistakes while building molecular structures. The main source of errors arises during the translation of 2D chemical structures to their 3D counterparts using some energy minimization protocol. The apparently simple process of drawing a 3D-worthy 2D structure is trickier than it sounds and is therefore quite prone to error. Conformation which was not as important when drawing in 2D is suddenly of overriding importance and it's relatively easy to get it wrong.

As a modeler who has interacted closely with experiment, I have come across a number of rookie mistakes which I have seen myself and others make over the years. Sometimes these mistakes don't matter too much for the final result but sometimes they can completely change it. So I thought I would make a short list of easy-to-avoid errors which may provide checks on modeling structures. In part 1 I will describe mistakes commonly seen during the simple building of structures. Part 2 will deal with interactions with experimentalists.

1. Getting the ionization state wrong: I put this rookie mistake at the top because it's remarkable how many times I have seen even experienced modelers make it. Always remember; amines are protonated at physiological pH while carboxylic acids are deprotonated. The reason why getting this right is important is because it can completely change results from protocols like docking. Just think of the difference a protonated vs unprotonated carboxylate makes for binding to a protein. Also, many modeling algorithms use force fields which are dominated by electrostatic interactions; the wrong protonation state can therefore make a world of difference. A corollary of the ionization state problem results when replacing atoms. For instance you may have a protonated amine which you then want to turn into an alcohol by replacing the N with a O. Unfortunately the atom does change but not the ionization state, and you end up with a weird positively charged doubly bonded oxygen. On a related note, it goes without saying that you shouldn't charge up inappropriate atoms such as those which are conjugated to aromatic systems. The best way to overcome these issues is to simply display charges for all heteroatoms in your final structure.

2. Getting the stereochemistry wrong: The CIP rules were taught to us because they really matter. Here's a typical stereochemical mistake: You construct a structure in 2D and come across a stereocenter. You may even build that stereocenter with the right absolute (R or S) stereochemistry. And then you attach something else to that center and forget to recheck the stereochemistry which may have changed because of the change in CIP priority. The simplest way to make sure about stereochemistry is to always have the program display all absolute stereochemistry for the final structure.

2. Forgetting basic conformational rules: This mistake is most commonly made when converting a 2D structure into a 3D structure. The problem is that when you build a 2D structure, your placement of bonds and angles is somewhat ad hoc based on the rather random way in which you are conveniently rotating and viewing the structure. When you then suddenly convert 2D to 3D, you may end up with axial substituents on six-membered rings, syn-pentane or eclipsing interactions between substituents, funky substructures like non-planar aromatic rings resulting from strain or in the worst cases, even boats for cyclohexanes. Here's another common pitfall: You may try to close a ring by building an unrealistic long bond between two initially separated atoms, thinking that when you then minimize this structure the program will take care of the bond by shortening it to its standard length. This usually happens, but in the process some other parts of your molecule gets messed up. Again, judicious inspection can avoid most of these issues.

3. Cis and trans: The process of building unrealistic bonds between distant atoms and then simply minimizing a structure that I just mentioned can sometimes result in amide bonds becoming cis and this is important enough to be listed as a separate point. This is also a common consequence of importing 2D files in SDF format (which lack hydrogens) and asking a program to add hydrogens. The same thing can happen with double bonds.

4. Forgetting basic chemistry: This mistake has more to do with forgetting basic rules of bonding and chemistry than with modeling. Occasionally you may do things like exceeding the allowed valency of an atom, putting a double bond at a bridgehead carbon (violating Bredt's rule), generating antiaromatic rings, forgetting Baldwin's rules for ring closure, building a vinyl amine or a geminal amino alcohol...and generally creating all sorts of unstable and "impossible" molecules. The problem is that your program won't always raise red flags notifying you about these errors so you need to remember your chemistry and make sure you don't recommend some wacky molecules to make to the synthetic chemist (one of the constant sources of friction between experimentalists and modelers arises from the latter forgetting what's synthetically feasible and stable).

Ultimately, the path to a well-constructed molecule simply depends on being vigilant and judiciously checking your final structure. Remember the well-worn adage; computers don't know any chemistry whatever and they are only as good as the code that goes into them. Nothing can trump a sound knowledge of basic chemical principles.

1 comment:

  1. When attempting to hit proteases, physiological pH can be sufficiently low for carboxylates to protonate. Also the pKa values of some carboxylic acids (e.g. quinolone anti-bacterials) can approach normal physiological pH. I would not expect the nitrogens of a piperazine to be protonated simultaneously at a pH of 7.4. In certain cases (e.g. where one nitrogen is alkylated and the other is secondary) one should expect that protonation of one of the piperazine nitrogens will be preferred. Ionisation states can be set using SMARTS/SMIRKS and people might want to take a look at my most recent blog entry if they’re unfamiliar with this. There’s also an acid-base equilibria group on LinkedIn where pKa-related issues get discussed.

    Sometimes when modelling you don’t actually want to ionise molecules even when they are likely to be ionised under assay conditions. If, for example, you’re doing molecular mechanics energy minimisation with a force field that lacks a screening model it may be better to do this with a carboxylic acid in the neutral form and ionise it later before attempting to overlay the molecule with another one. It comes down to what you’re trying to achieve and what resources you have at your disposal.

    I don’t think that you can invoke Baldwin’s rules in this context unless you’re actually trying to model a reaction. If this is what you’re doing then Baldwin’s rules should follow from the analysis.


Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="">FoS</a> = FoS