A recent issue of Nature had an interesting article on what seems to be a wholly paradoxical feature of models used in climate science; as the models are becoming increasingly realistic, they are also becoming less accurate and predictive because of growing uncertainties. I can only imagine this to be an excruciatingly painful fact for climate modelers who seem to be facing the equivalent of the Heisenberg uncertainty principle for their field. It's an especially worrisome time to deal with such issues since the modelers need to include their predictions in the next IPCC report on climate change which is due to be published next year.
A closer look at the models reveals that this behavior is not as paradoxical as it sounds, although it's still not clear how you would get around it. The article especially struck a chord with me I see similar problems bedeviling models used in chemical and biological research. In case of climate change, the fact is that earlier models were crude and did not account for many fine-grained factors that are now being included (such as the rate at which ice falls through clouds). In principle and even in practice there's a bewildering number of such factors (partly exemplified by the picture on top). Fortuitously, the crudeness of the models also prevented the uncertainties associated with these factors from being included in the modeling. The uncertainty remained hidden. Now that more real-world factors are being included, the uncertainties endemic in these factors reveal themselves and get tacked on to the models. You thus face an ironic tradeoff; as your models strive to mirror the real world better, they also become more uncertain. It's like swimming in quicksand; the harder you try to get out of it, the deeper you get sucked in.
This dilemma is not unheard of in the world of computational chemistry and biology. A lot of the models we currently use for predicting protein-drug interactions for instance are remarkably simple and yet accurate enough to be useful. Several reasons account for this unexpected accuracy; among them cancellation of errors (the Fermi principle), similarities of training sets to test sets and sometimes just plain luck. Error analysis is unfortunately not a priority in most of these studies, since the whole point is to publish correct results. Unless this culture changes our road to accurate prediction will be painfully slow.
But here's an example of how "more can be worse". For the last few weeks I have been using a very simple model to try to predict the diffusion of druglike molecules through cell membranes. This is an important problem in drug development since even your most stellar test-tube candidate will be worthless until it makes its way into cells. Cell membranes are hydrophobic while the water surrounding them is hydrophilic. The ease with which a potential drug transfers from the surrounding water into the membrane depends among other factors on its solvation energy, on how readily the drug can shed water molecules; the smaller the solvation energy, the easier it is for drugs to get across. This simple model which calculates the solvation energy seems to do unusually well in predicting the diffusion of drugs across real cell membranes, a process that's much more complex than just solvation-desolvation.
One of the fundamental assumptions in the model is that the molecule exists in just one conformation in both water and the membrane. This assumption is fundamentally false since in reality, molecules are highly flexible creatures that interconvert between several conformations both in water and inside the membrane. To overcome this assumption, a recent paper explicitly calculated the conformations of the molecule in water and included this factor in the diffusion predictions. This was certainly more realistic. To their surprise, the authors found that making the calculation more realistic made the predictions worse. While the exact mix of factors responsible for this failure can be complicated to tease apart, what's likely happening is that the more realistic factors also bring more noise and uncertainty with them. This uncertainty piles up, errors which were likely canceling before no longer cancel, and the whole prediction becomes fuzzier and less useful.
I believe that this is what is partly happening in climate models. Including more real-life factors in the models does not mean that all those factors are well-understood. You are inevitably introducing some known unknowns. Ill-understood factors will introduce more uncertainty. Well-understood factors will introduce less uncertainty. Ultimately the accuracy of the models will depend on the interplay between these two kinds of factors, and currently it seems that the rate of inclusion of new factors is higher than the rate at which those factors can be accurately calculated.
The article goes on to note that in spite of this growing uncertainty the basic predictions of climate models are broadly consistent. However it also acknowledges the difficulty in explaining the growing uncertainty to a public which has become more skeptical of climate change since 2007 (when the last IPCC report was published). As a chemical modeler I can sympathize with the climate modelers.
But the lesson to take away from this dilemma is that crude models sometimes work better than more realistic ones. Perhaps the climate modelers should remember George Box's quote that "all models are wrong, but some are useful". It is a worthy endeavor to try to make models more realistic, but it is even more important to make them useful.