The Curious Wavefunction: The Black Swans of Chemistry Models

The Black Swans of Chemistry Models

By Wavefunction on Sunday, January 11, 2009

And Nassim Nicholas Taleb

Nassim Nicholas Taleb's two books, Fooled by Randomness and The Black Swan, are undoubtedly two of the most provocative and interesting books I have come across in my life. While I am still ploughing through them, especially the second book, The Black Swan, has made waves and is apparently now cited as required reading on Wall Street. The books are highly eclectic and traverse a remarkably diverse landscape that includes psychology, finance, evolution, mathematics, philosophy, history, sociology, economics and many other disciplines. It would be impossible to review them in a limited space. But if I wanted to capture their essence, it would be by saying that Taleb alerts readers to the patterns that human beings see in the randomness inherent in the world, and the models, both mental and practical, that they build to account for this randomness.

Since his book came out, Taleb has become a mini celebrity and has been interviewed on Charlie Rose and Stephen Colbert. His books sell in large numbers in the far corners of the world. The reason why Taleb has suddenly become such a big deal is in part because he at least philosophically seems to have predicted the financial crisis of 2008 which occured two years after The Black Swan came out. One of the firms he advised turned out a profit of more than a 100 million dollars in 2008 when others were close to losing the clothes on their back. Taleb now has emerged as one of the most profound soothsayers and philosophers of our times, a "scholar of randomness", although his message seems to be more modest; models are not designed to account for rare or Black Swan events which may have monumental impact. The analogy deals with the assured belief that people had in the past about all swans being white. When the continent of Australia was discovered and black swans were observed in flocks (a fact which my dad who is currently in Australia corroborated), there was a paradigm shift. Similarly a model, any model, that is built on the basis of White Swan events will fail to foresee Black Swans.

Unfortunately as Taleb explains, it's the Black Swans that dictate the direction that the world proceeds in. It's the rare event that is the watershed, the event that changes everything. And it's exactly the rare event that models don't encapsulate. And this fact spells their doom.

To augment his theory, Taleb cites many Black Swan events from history and politics. For example if you lived in 1913, you would hardly foresee the gargantuan event of 1914 which would forever change the world. If you lived in 1988, you would scarcely comprehend the epoch-making events of 1989. One of my favourite parts of the book concerns The Turkey Analogy. Imagine you are a turkey who is being constantly fed by the butcher 364 days a year. You are happy, you know the butcher loves you, your economics and accounts departments are happy, they start to think this is the way of the world for you. Now comes the 365th day. You die. Just when your expectations levels reach their most optimistic, your destiny reaches its lowest point. But right before day 365 on day 364, you were 100% certain that you had a lifetime of bountiful happiness ahead of you. Day 365 was exactly contrary to the expectations of your finance department. Day 365 was the Black Swan which you did not anticipate. And yet it was that single Black Swan day that proved fateful for you, and not the earlier 364 days of well-fed bliss. According to Taleb, this is what most of us and especially the derivatives wizards on Wall Street are- happy, deluded turkeys.

In any case, one of the most important discussions in Taleb's books concerns the fallacy of model building. He claims that the models that Wall Streets used, the models that raked in billions and Nobel Prizes, were fundamentally flawed, in part because they were not built to accommodate Black Swans. That made me think about models in chemistry and how they relate to other models. In Taleb's mind, the derivatives and other frameworks that the genius quants used were like a plane whose workings they did not understand. When you build a plane, you should always keep the possibility of a storm, a rare event, in mind. Aerospace engineers do this. The quants apparently did not.

But let's face one big truth about models; most of them in fact are not designed to represent "reality". In fact models don't care as much about causation as they do about accurate correlation and prediction. While this may sound like shooting ourselves in the foot, it often saves us a lot of time, not to mention philosophizing. We use models not because they are "real" but because they work. I would not care if a model I had for predicting enzyme rates involved little gnomes rotating amino acid torsions in the active sites and passing on water molecules to each other. If the model could predict the catalysis rate for orotidine decarboxylase, that's all I care about. A model for representing water molecules may put unrealistic charges on the O and H atoms, but all I care about is whether it reproduces the dipole moment, density and heat capacity of bulk water. Thus, models are not necessarily a window into reality. They are very likely a window in spite of reality.

Also, models can always fit data if arbitrary changes are made to their parameters and enough number of parameters are used. As Retread quoted Von Neumann in a past comments thread, "Give me five parameters and I can fit an elephant to a curve. Give me six and I can make him dance". This is overfitting. In overfitting, models can do a stellar job of accounting for known data, but miserably fail to predict new data which is after all what they should be able to do. Models to predict SAR relationships for biologically active molecules are a notorious example. In QSAR, one can almost always get a good correlation if enough parameters are used. Overfitting can be addictive and rampant. The fallacy of associating correlation with causation has been frequently asserted (how about the one in which the number of breeding storks correlate with the number of new births?), and there are few places where it manifests itself in all its glory more than in QSAR. While we know these pitfalls, in practice though we are much less careful since we want to get practical, applicable results and could care less if the model represented conditions on Mars.

Lastly and most obviously, the model is only as good as the data that goes in. Standard deviations that one might get in results obtained by the model cannot be smaller than the errors in the experimental data. Now this is not something that has been lost on model builders. Models for molecular mechanics for example frequently include charges acquired from high-level first principles quantum chemistry calculations. And yet even quantum chemistry calculations are hamstrung by computational efficiency and are based on assumptions. One of the important lessons my advisor taught me was to always ask what assumptions went into a particular study. If the assumptions are suspect, then no matter how elegant and meticulous the study, its results are bound to be hazy.

The result of all this ad hoc, utility-oriented model building is obvious; we often simply fail to include relevant physical phenomena in building a model. Now consider what will happen if we encounter an outlier, a case where that physical phenomenon dominates all others in influencing, say, the activity of a molecule or the stereoselectivity of a reaction. That's when we would get a Black Swan, an outlier, a rare event which the model predictably cannot predict because the factors responsible for that event have not been included in building it. Retrospectively a Black Swan should not be surprising. Prospectively we don't care about it much because we are certainly not going to discard the model for the sake of one outlier.

But choosing to retain the model in spite of it not being able to predict the rare event is not always an easy decision. What if the outlier is going to cost your employer millions? This is usually not very important in academic chemistry, but it almost always is in financial markets. In chemistry we have the luxury of simply retreating to our labs and computers and initiating more research that would investigate the basic factors that went into the models. One can argue about "short, strong hydrogen bonds" until the cows come home, and he or she (probably) won't get booted out. But in finance the rare outlier can, and does, mean retreating into a trailer without any savings.

The bottom line is, all of us are playing a game when we use models, in chemistry, finance or any other discipline. As in other games, we are fine as long as we win. One of Taleb's messages is that we should at least be able to assess the impact of losing, something which he asserts the quants have significantly underestimated. If the impact is a complete game changer, then we should know when to get out of the game. We tend to forget that the models that we have don't represent reality. We use them because they work, and it's the reality of utility that produces the illusion of reality. Slightly modifying a quote by the great Pablo, models then are the lies that help us to conceal the truth.

Note: The short Charlie Rose interview with Taleb is worth watching:

7 comments:

Robin St. John11:34 AM, January 12, 2009
We use models not because they are "real" but because they work. I would not care if a model I had for predicting enzyme rates involved little gnomes rotating amino acid torsions in the active sites and passing on water molecules to each other.

When this dawned on me was the point at which I broke with physics and decided to be a chemist. I realized how much time was going to be wasted fighting over things that were not likely to be accessible by science for a long, long time. I vowed to never waste another breath or another snap of a synapse worrying about whose work was most 'fundamental'.

Taleb is intriguing. I have not sat down with his books, but have see a lot of his interviews and read some writings, especially his scathing critiques of VAR. They are at the top of my 'to read' list.

Incidentally, if you haven't seen it, Gilovich's book "How we know what isn't so" is quite good. It talks a lot about how we perceive trends that are not there, and how often we miss regression to the mean. A 25 cent synopsis is here:
http://www.analytictech.com/mb021/notes_on_gilovich.htm
ReplyDelete
Replies
Wavefunction3:14 PM, January 12, 2009
Quite correct. I feel like constantly asking string theorists what they are up to. Sometimes elegant mathematics is reason alone for trusting a model. A friend of mine who is an experimental STM physicist used to get mad at theorists who casually used to throw around statements about events happening at the Planck length scale.

There are some physicists who say that chemistry is "physics, but without rigor". To them I say physics is "mathematics, but without certainty"

Thanks for the recommendation about Gilovich's book. WIll check it out.
ReplyDelete
Replies
MJ8:22 PM, January 12, 2009
I would not care if a model I had for predicting enzyme rates involved little gnomes rotating amino acid torsions in the active sites and passing on water molecules to each other. If the model could predict the catalysis rate for orotidine decarboxylase, that's all I care about.

This is a bad example, as anyone with half a brain knows that it's the dancing of tiny pixies that is responsible. The whirling of their arms spins the amino acid sidechains just right, and their acrobatic dance steps push the water molecules around in the proper fashion.

I shall now concede that the previous paragraph was used in place of a proper discussion of methodological naturalism.

Slightly modifying a quote by the great Pablo, models then are the lies that help us to conceal the truth.

I have an NMR analogy, which is completely the fault of my education, which runs a bit counter to this sentiment as I see it. Models are akin to echo phenomenona - they're not the real or original entity, they're this slightly diffuse and weakened version that we can actually measure and work with. It has many of the same characteristics, but, to steal a more familiar proverb - while it looks like a duck, moves like a duck, and sounds like a duck, it's a geese.

His books do sound quite interesting, and are going on the to-read list. Of course, given its length, maybe I'll get around to them next year.....
ReplyDelete
Replies
The Chemist12:02 PM, January 13, 2009
"Sometimes elegant mathematics is reason alone for trusting a model."

I know, and it's not really good enough is it? Isn't this what Feynman railed against? It's tempting though:

A friend and I were having coffee at Waffle House and discussing the use and utility of information (Which is a theme of Taleb's Black Swan. He directly refutes the idea that the more information is available to decision-makers, the better). He has an MBA, and I was lamenting that Google has made it easier to find solutions to problems, but only the most popular solutions, not necessarily the most effective or constructive. He disagreed, and I postulated that it was because scientists and businessmen come at problems from different angles, I said, "Businessmen want a quick solution, but scientists are looking for elegance." There I stopped myself.

It was out-and-out wrong, but somehow, it slipped out.
ReplyDelete
Replies
Chetan2:46 PM, January 13, 2009
Is it just me or did everyone else felt like this was not one of the better interviews conducted by Charlie Rose. Maybe it had something to do with an overenthusiastic interviewee. But then Rose's interview with Karparow went well. I just felt like Rose had a different idea of a narrative that might emerge from Taleb's answers while Taleb had an entirely different agenda he wanted to get across.
ReplyDelete
Replies
Robin St. John2:58 PM, January 13, 2009
Businessmen want a quick solution, but scientists are looking for elegance.
I work as a scientist at a company led by a handful of engineers, people who grew the company from the ground up to a billion dollar operation. I feel pretty lucky that the 'suits' where I work are technically competent, but I struggle with what you describe.

Within limits, elegance is valued by scientists because it can be an indicator that you are on the right track, and it I agree that it isn't enough. But it is beguiling and can be counterproductive if you have the obligation to produce a working solution to a problem subject to time and dollar constraints. The tension between getting things finished and understanding them completely abounds in my world.
ReplyDelete
Replies
Wavefunction3:45 PM, January 13, 2009
Chetan, yes, I too felt that Rose was less than himself during this interview. But he is a fine interviewer and even some of this bad ones are not that bad.

Someone (was it Dirac? Wigner?) once said that elegant mathematics did not drive his results. But after obtaining them, if they did not look mathematically elegant then he knew they were wrong. On the other hand what does it mean? String theorists claim that their results are very elegant, except that they indicate the existence of 10*500 universes as solutions...

What Dave and The Chemist are talking about is a complaint many in pharma have about pencil-pushing managers. That's where you hear most of the profanity from. There are relatively few managers who understand the nitty-gritty of the science and I think you are lucky if your manager is one of them. Unfortunately in many companies having an MBA is considered a sufficient qualification to be named VP of research.
ReplyDelete
Replies

Add comment

Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS

Field of Science

The Curious Wavefunction

The Black Swans of Chemistry Models

7 comments:

Previous Posts

Popular Posts

Follow

Blogroll

Journals and Magazines

Archives