The other day I wrote about the late physicist Leo Kadanoff who captured one of the key caveats of models with a seriously useful piece of advice - "Do not model bulldozers with quarks". Kadanoff was talking about the problems that arise when we fail to use the right resolution and tools to model a specific system. While reading Kadanoff's warnings I also remembered one of John von Neumann's equally witty portents for flawed modeling - "With four parameters I can fit an elephant to a curve. With five I can make him wiggle his trunk".
It strikes me that between them Kadanoff and von Neumann capture almost all the cardinal sins of modeling. The other day I was having a conversation about modeling with a leading industrial molecular modeler, and he made the very cogent point that it is imperative to keep the resolution of a particular system and the data it presents in mind when modeling it. My colleague could well have been channeling Kadanoff. This point is actually simple enough to understand (although hard enough to always keep in mind when obeying institutional mandates in a shortsighted environment which thrives on unrealistic short-term goals).
If you are doing structure based drug design for instance, it's dangerous to try to read too much atomic detail into a 3 angstrom protein-ligand structure. Divining fine details of halogen substitutions, amide flips and water molecules from such a structure can always get you in trouble. If a 3 angstrom structure is the best you have, your optimum strategy would be to try rough designs of molecules - a hydrophobic extension here, a basic amine there - without getting too fine-grained about it. What you should aim for is maximum diversity accessible with minimal synthetic effort - libraries of small peptides might be suitable candidates in such cases. After that let the chemical matter guide you. Once you have a hit, that's when you want to get more detailed, although even then the low resolution of the structure may be at odds with the high resolution of your thinking.
An equally good or even better strategy to adopt in such cases might be a purely ligand-based assault on the structure. There might be similar ligands hitting similar proteins which you might be aware of, or even in case of de novo ligand design you might want to push for purely ligand-based diversity. But this is where you now have to start listening to von Neumann. You may try to fit potential activities of ligands to a few parameters, or build a QSAR model. What you might really be doing however is building not a QSAR model but a house of cards supporting a castle in the air - in other words an overfit model with scant connection to chemically intuitive reality. In that case rest assured - von Neumann's elephant would be quite willing to crash his way in and tear apart your castle.
Kadanoff's admonition to not model bulldozers with quarks is a good admonition for structure-based design. Von Neumann's elephants are good portents to keep in mind for ligand-based drug design. Together the two can hopefully keep you from falling into the abyss and getting crushed under the elephant and the bulldozer.
Kurt Gödel's Open World
1 day ago in The Curious Wavefunction