The Curious Wavefunction: More model perils; parametrize this

Now here's a very interesting review article that puts some of the pitfalls of models that I have mentioned on these pages in perspective. The article is by Jack Dunitz and his long-time colleague Angelo Gavezzotti. Dunitz is in my opinion one of the finest chemists and technical writers of the last half century and I have learnt a lot from his articles. Two that are on my "top 10" list are his article showing the entropic gain accrued by displacing water molecules in crystals and proteins (a maximum of 2 kcal/mol for strongly bound water) and his paper demonstrating that organic fluorine rarely, if ever, forms hydrogen bonds.

In any case, in this article he talks about an area in which he is the world's acknowledged expert; organic crystal structures. Understanding and predicting (the horror!) crystal structures essentially boils down to understanding the forces that makes molecules stick to each other. Dunitz and Gavezzotti describe theoretical and historical attempts to model forces between molecules, and many of their statements about the inherent limitations of modeling these forces rang as loudly in my mind as the bell in Sainte-Mère-Église during the Battle of Normandy.

Dunitz has a lot to say about atom-atom potentials that are the most popular framework for modeling inter and intramolecular interactions. Basically such potentials assume simple functional forms that model the attractive and repulsive interactions between nuclei which are treated as rigid balls. This is also of course the fundamental approximation in molecular mechanics and force fields. The interactions are basically Coulombic interactions (relatively simple to model) and more complicated dispersion interactions which are essentially quantum mechanical in nature. The real and continuing challenge is to model these weak dispersive interactions.

But the problem is fuzzy. As Dunitz says, atom-atom potentials are popular mainly because they are simple in form and easy to calculate. However, they have scant, if any, connection to "reality". This point cannot be stressed enough again. As this blog has noted several times before, we use models because they work, not because they are real. The coefficients in the functional forms of the atom-atom potentials are essentially varied to minimize the potential energy of the system and there are several ways to skin this cat. For instance, atomic point charges are rather arbitrary (and definitely not "real") and can be calculated and assigned by a variety of theoretical approaches. In the end, nobody knows if the final values or even the functional forms have much to do with the real forces inside crystals. It's all a question of parameterization which gives you the answer, and while parameterization may seem like a magic wand which may give you anything that you want, that's precisely the problem with it...that it may give you anything that you want without reproducing the underlying reality. Overfitting is also a constant headache and one of the biggest problems with any modeling in my opinion; whether in chemistry, quantitative finance or atmospheric science. More on that later.

An accurate treatment of intermolecular forces will have to take electron delocalization into consideration. The part which is the hardest to deal with is the part close to the bottom of the famous Van der Waals energy curve, where there is an extremely delicate balance between repulsion and attraction. Naturally one thinks of quantum mechanics to handle such fine details. A host of sophisticated methods have been developed to calculate molecular energies and forces. But those who think QM will take them to heaven may be mistaken; it may in fact take them to hell.

Let's start with the basics. In any QM calculation one uses a certain theoretical framework and a certain basis set to represent atomic and molecular orbitals. One then adds terms to the basis set to improve accuracy. Consider Hartree-Fock theory. As Dunitz says, it is essentially useless for dealing with electron delocalization because it does not take electron correlation into account, no matter how large a basis set you use. More sophisticated methods have names like "Moller-Plesset perturbation theory with second order corrections" (MP2) but these may greatly overestimate the interaction energy, and more importantly the calculations become hideously computer intensive for anything more than the simplest molecules.

True, there are "model systems" like the benzene dimer (which has been productively beaten to death) for which extremely high levels of theory have been developed that approach experimental accuracy within a hairsbreadth. But firstly, model systems are just that, model systems; the benzene dimer is not exactly a molecular arrangement which real life chemists deal with all the time. Secondly, a practical chemist would rather have an accuracy of 1 kcal/mol for a large system than an accuracy of 0.1 kcal/mole for a simple system like the benzene dimer. Thus, while MP2 and other methods may give you unprecedented accuracy for some model systems, they are usually very expensive for most systems of biological interest and not very useful.

DFT still seems to be one of the best techniques around to deal with intermolecular forces. But "classical" DFT suffers from a well-known inability to treat dispersion. "Parameterized DFT" in which an inverse sixth power term is added to the basic equations can work well and promises to be a very useful addition to the theoretical chemist's arsenal. More parameterization though.

And yet, as Dunitz points out, problems remain. Even if one can accurately calculate the interaction energy of the benzene dimer, it is not really possible to know how much of it comes from dispersion and how much of it comes from higher order terms. Atom-atom potentials are happiest calculating interaction energies at large distances, where the Coulomb term is pretty much the only one which survives, but at small interatomic distances which are the distances most of interest for the chemist and the crystallographer, a complex dance between attraction and repulsion, monopoles, dipoles and multipoles and overlapping electron clouds manifests itself. The devil himself would have a hard time calculating interactions in these regions.

The theoretical physicist turned Wall Street quant Emanuel Derman (author of the excellent book ("My Life as a Quant: Reflections on Physics and Finance") says that one of the problems with the financial modelers on Wall Street is that they suffer from "physics envy". Just like in physics, they want to discover three laws that govern 99% of the financial world. More predictably as Derman says, they end up discovering 99 laws that seem to govern 3% of the financial world with varying error margins. I would go a step further and say that even physics is accurate only in the limit of ideal cases and this deviation from absolute accuracy distinctly shows in theoretical chemistry. Just consider that the Schrodinger equation can be solved exactly only for the hydrogen atom, which is where chemistry only begins. Anything more complicated that, and even the most austere physicist cannot help but approximate, parametrize, and constantly struggle with errors and noise. As much as the theoretical physicist would like to tout the platonic purity of his theories, their practical applications would without exception involve much approximation. There is a reason why that pinnacle of twentieth century physics is called the Standard Model.

I would say that computational modelers in virtually every field from finance to climate change to biology and chemistry suffer from what Freeman Dyson has called "technical arrogance". We have made enormous progress in understanding complex systems in the last fifty years and yet when it comes to modeling the stock market, the climate or protein folding, we seem to think that we know it all. But we don't. Far from it. Until we do all we can do is parametrize, and try to avoid the fallacy of equating our models with reality.

That's right Dorothy. Everything is a model. Let's start with the benzene dimer.

Dunitz, J., & Gavezzotti, A. (2009). How molecules stick together in organic crystals: weak intermolecular interactions Chemical Society Reviews, 38 (9) DOI: 10.1039/b822963p

4 comments:

Anonymous6:16 AM, December 01, 2009
neat and thoughtful post - thanks
Unknown3:55 PM, December 03, 2009
Very nice post, I'll read the Dunitz review tomorrow.

Meanwhile, I do want to point out that making approximations and parametrizing do not necessarily go hand-in-hand. In traditional ab initio (MO-based) quantum chemistry there is no parameterization, only approximations. Furthermore, the are systematic ways to improve these approximations to systematically approach the exact (non-relativistic) QM picture. Of course, as you note, such methods are really only applicable to small model systems.

Also (to plug my own work), even though the benzene dimer itself has been "beaten to death" (as you say), our understanding of substituted benzene dimers continues to change.
Wavefunction5:32 AM, December 07, 2009
Swheele2, excellent point. Approximation and parameterization are indeed not necessarily the same. And as you noted, surprises still await us in the study of the benzene dimer.
Anonymous11:32 AM, December 15, 2009
Makes me want to do some more research in comparing crystal structures!

Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS