Physics, biology and models: A view from 1993 and now

Twenty three years ago the computer scientist Danny Hillis wrote an essay titled "Why physicists like models, and why biologists should." The essay gently took biologists to task for not borrowing model-building tricks from the physicists' trade. Hillis's contention was that simplified models have been hugely successful in physics, from Newton to Einstein, and that biologists seem to have largely dismissed model building by invoking the knee-jerk reason that biological systems are far too complex.

There is a grain of truth in what Hillis says about biologists not adopting modeling and simulation to understand reality. While some of it probably is still a matter of training - most experimental biologists do not receive training in statistics and computer science as a formal part of their education - the real reasons probably have more to do with culture and philosophy. Historically too biology has always been a much more experimental science compared to physics; Carl Linnaeus was still classifying animals and plants while Isaac Newton was mathematizing all of classical mechanics. 

Hillis documents three reasons why biologists aren't quick to use models.
For various reasons, biologists who are willing to accept a living model as a source of insight are unwilling to apply the same criterion to a computational model. Some of the reasons for this are legitimate, others are not. For example, some fields of biology are so swamped with information that new sources of ideas are unwelcome. A Nobel Prize winning molecular biologist said to me recently, "There may be some good ideas there, but we don't really need any more good ideas right now." He might be right. 
A second reason for biologists' general lack of interest in computational models is that they are often expressed in mathematical terms. Because most mathematics is not very useful in biology, biologists have little reason to learn much beyond statistics and calculus. The result is that the time investment required for many biologists to understand what is going on in computational models is not worth the payoff. 
A third reason why biologists prefer living models is that all known life is related by common ancestry. Two living organisms may have many things in common that are beyond our ability to observe. Computational models are only similar by construction; life is similar by descent.
Many of these reasons still apply but have evolved for the better since 1993. The information glut has, if anything, increased in important fields of biology like genomics and neuroscience. Hillis did not live in the age of 'Big Data' but his observation precedes it. However data by itself should not preclude the infusion of modeling; if anything it should encourage it even more. Also, the idea that "most mathematics is not very useful in biology" pertains to the difficulty (or even impossibility) or writing down, say, a mathematical description of a cat. But you don't have to always go that far in order to use mathematics effectively. For instance ecologists have used simple differential equations to model the rise and fall of predator and prey populations, and systems biologists are now using similar equations to model the flux of nutrients, metabolites and chemical reactions in a cell. Mathematics is certainly more useful in biology now than what it was in 1993, and much of this resurgence has been enabled by the rise of high speed computing and better algorithms. 

The emergence of better computing also speaks to the difficulty in understanding computational models that Hillis talks about; to a large extent it has now mitigated this difficulty. When Hillis was writing, it took a supercomputer to perform the kind of calculation than you can now do on your laptop in a few hours. The advent of Moore's Law-enabled software and hardware has enormously enabled number-crunching in biology. The third objection - that living things are similar by descent rather than construction - is a trivial one in my opinion. If anything it makes the use of computational models even more important, since by comparing similarities across various species one can actually get insights into potential causal relationships between them. Another reason Hillis gives for biologists not embracing computation is because they are emotionally invested in living things rather than non-living material things. This is probably much less of a problem now, especially since computers are used commonly even by biologists to perform routine calculations like graphing.

While the problems responsible for biologists' lukewarm response to computational models are legitimate, Hillis then talks about why biologists should still borrow from the physicists' modeling toolkit. The basic reason is that by constructing a simple system with analogous behavior, models can capture the essential features of a more complex system: This is in fact the sine qua non of model building. The tricky part of course is in figuring out whether the simple features truly reproduce the behavior of the real world system. The funny thing about models however is that they simply need to be useful, so they need not correspond to any of our beliefs about real world systems. In fact, trying to incorporate too much reality into models can make them worse and less accurate.

A good example I know is this paper from Merck that correlated simple minimized energies of a set of HIV protease inhibitor drugs to their inhibition values (IC50s) against the enzyme. Now, nobody believes that the very simple force field underlying this calculation actually reproduces the complex interplay of protein, small molecules and solvent that takes place in the enzyme. But the point is that they don't care: as far as the model is predictive it's all good. Hillis though is making the point that models in physics have been more than just predictive, they have been explanatory. He extols biologists to move away from simple prediction, as useful as it might be, and towards explanation. 

I agree with this sentiment, especially since prediction alone can lead you down a beatific path in which you may get more and more divorced from reality. Something similar can happen especially in fields like machine learning, where combinations of abstract descriptors that defy real world interpretation are used merely because they are predictive. These kinds of models are very risky in the sense that you can't really figure out what went wrong if their break down; at that point you have a giant morass of descriptors and relationships to sort through, very few of which make any physical sense. This dilemma is true of biological models as a whole, so one needs to tread a fine line between keeping a model simple and descriptive and incorporating enough real world variables to make it scientifically sensible.

Later the article talks about models separating out the important variables from the trivial ones, and that's certainly true. He also talks about models being able to synthesize experimental variables into a seamless whole, and I think machine learning and multiparameter optimization in drug discovery for instance can achieve this kind of data fusion. There is a twist here though, since the kinds of emergent variables that are rampant in biology are not usually seen in physics. Thus, modeling in biology needs to account for the mechanisms underlying the generation of emergent phenomena. We are still not at the point where we can do this successfully, but we seem to be getting there. Finally, I also like to emphasize one very important utility of models: as negative filters. They can tell you what experiment not to do or what variable to ignore or what molecule not to make. That at the very least leads to a saving of time and resources.

The bottom line is that there is certainly more cause for biologists to embrace computational models than there was in 1993. And it should have nothing to do with physics envy.

No comments:

Post a Comment

Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS