And Nassim Nicholas Taleb
Nassim Nicholas Taleb's two books, Fooled by Randomness and The Black Swan, are undoubtedly two of the most provocative and interesting books I have come across in my life. While I am still ploughing through them, especially the second book, The Black Swan, has made waves and is apparently now cited as required reading on Wall Street. The books are highly eclectic and traverse a remarkably diverse landscape that includes psychology, finance, evolution, mathematics, philosophy, history, sociology, economics and many other disciplines. It would be impossible to review them in a limited space. But if I wanted to capture their essence, it would be by saying that Taleb alerts readers to the patterns that human beings see in the randomness inherent in the world, and the models, both mental and practical, that they build to account for this randomness.
Since his book came out, Taleb has become a mini celebrity and has been interviewed on Charlie Rose and Stephen Colbert. His books sell in large numbers in the far corners of the world. The reason why Taleb has suddenly become such a big deal is in part because he at least philosophically seems to have predicted the financial crisis of 2008 which occured two years after The Black Swan came out. One of the firms he advised turned out a profit of more than a 100 million dollars in 2008 when others were close to losing the clothes on their back. Taleb now has emerged as one of the most profound soothsayers and philosophers of our times, a "scholar of randomness", although his message seems to be more modest; models are not designed to account for rare or Black Swan events which may have monumental impact. The analogy deals with the assured belief that people had in the past about all swans being white. When the continent of Australia was discovered and black swans were observed in flocks (a fact which my dad who is currently in Australia corroborated), there was a paradigm shift. Similarly a model, any model, that is built on the basis of White Swan events will fail to foresee Black Swans.
Unfortunately as Taleb explains, it's the Black Swans that dictate the direction that the world proceeds in. It's the rare event that is the watershed, the event that changes everything. And it's exactly the rare event that models don't encapsulate. And this fact spells their doom.
To augment his theory, Taleb cites many Black Swan events from history and politics. For example if you lived in 1913, you would hardly foresee the gargantuan event of 1914 which would forever change the world. If you lived in 1988, you would scarcely comprehend the epoch-making events of 1989. One of my favourite parts of the book concerns The Turkey Analogy. Imagine you are a turkey who is being constantly fed by the butcher 364 days a year. You are happy, you know the butcher loves you, your economics and accounts departments are happy, they start to think this is the way of the world for you. Now comes the 365th day. You die. Just when your expectations levels reach their most optimistic, your destiny reaches its lowest point. But right before day 365 on day 364, you were 100% certain that you had a lifetime of bountiful happiness ahead of you. Day 365 was exactly contrary to the expectations of your finance department. Day 365 was the Black Swan which you did not anticipate. And yet it was that single Black Swan day that proved fateful for you, and not the earlier 364 days of well-fed bliss. According to Taleb, this is what most of us and especially the derivatives wizards on Wall Street are- happy, deluded turkeys.
In any case, one of the most important discussions in Taleb's books concerns the fallacy of model building. He claims that the models that Wall Streets used, the models that raked in billions and Nobel Prizes, were fundamentally flawed, in part because they were not built to accommodate Black Swans. That made me think about models in chemistry and how they relate to other models. In Taleb's mind, the derivatives and other frameworks that the genius quants used were like a plane whose workings they did not understand. When you build a plane, you should always keep the possibility of a storm, a rare event, in mind. Aerospace engineers do this. The quants apparently did not.
But let's face one big truth about models; most of them in fact are not designed to represent "reality". In fact models don't care as much about causation as they do about accurate correlation and prediction. While this may sound like shooting ourselves in the foot, it often saves us a lot of time, not to mention philosophizing. We use models not because they are "real" but because they work. I would not care if a model I had for predicting enzyme rates involved little gnomes rotating amino acid torsions in the active sites and passing on water molecules to each other. If the model could predict the catalysis rate for orotidine decarboxylase, that's all I care about. A model for representing water molecules may put unrealistic charges on the O and H atoms, but all I care about is whether it reproduces the dipole moment, density and heat capacity of bulk water. Thus, models are not necessarily a window into reality. They are very likely a window in spite of reality.
Also, models can always fit data if arbitrary changes are made to their parameters and enough number of parameters are used. As Retread quoted Von Neumann in a past comments thread, "Give me five parameters and I can fit an elephant to a curve. Give me six and I can make him dance". This is overfitting. In overfitting, models can do a stellar job of accounting for known data, but miserably fail to predict new data which is after all what they should be able to do. Models to predict SAR relationships for biologically active molecules are a notorious example. In QSAR, one can almost always get a good correlation if enough parameters are used. Overfitting can be addictive and rampant. The fallacy of associating correlation with causation has been frequently asserted (how about the one in which the number of breeding storks correlate with the number of new births?), and there are few places where it manifests itself in all its glory more than in QSAR. While we know these pitfalls, in practice though we are much less careful since we want to get practical, applicable results and could care less if the model represented conditions on Mars.
Lastly and most obviously, the model is only as good as the data that goes in. Standard deviations that one might get in results obtained by the model cannot be smaller than the errors in the experimental data. Now this is not something that has been lost on model builders. Models for molecular mechanics for example frequently include charges acquired from high-level first principles quantum chemistry calculations. And yet even quantum chemistry calculations are hamstrung by computational efficiency and are based on assumptions. One of the important lessons my advisor taught me was to always ask what assumptions went into a particular study. If the assumptions are suspect, then no matter how elegant and meticulous the study, its results are bound to be hazy.
The result of all this ad hoc, utility-oriented model building is obvious; we often simply fail to include relevant physical phenomena in building a model. Now consider what will happen if we encounter an outlier, a case where that physical phenomenon dominates all others in influencing, say, the activity of a molecule or the stereoselectivity of a reaction. That's when we would get a Black Swan, an outlier, a rare event which the model predictably cannot predict because the factors responsible for that event have not been included in building it. Retrospectively a Black Swan should not be surprising. Prospectively we don't care about it much because we are certainly not going to discard the model for the sake of one outlier.
But choosing to retain the model in spite of it not being able to predict the rare event is not always an easy decision. What if the outlier is going to cost your employer millions? This is usually not very important in academic chemistry, but it almost always is in financial markets. In chemistry we have the luxury of simply retreating to our labs and computers and initiating more research that would investigate the basic factors that went into the models. One can argue about "short, strong hydrogen bonds" until the cows come home, and he or she (probably) won't get booted out. But in finance the rare outlier can, and does, mean retreating into a trailer without any savings.
The bottom line is, all of us are playing a game when we use models, in chemistry, finance or any other discipline. As in other games, we are fine as long as we win. One of Taleb's messages is that we should at least be able to assess the impact of losing, something which he asserts the quants have significantly underestimated. If the impact is a complete game changer, then we should know when to get out of the game. We tend to forget that the models that we have don't represent reality. We use them because they work, and it's the reality of utility that produces the illusion of reality. Slightly modifying a quote by the great Pablo, models then are the lies that help us to conceal the truth.
Note: The short Charlie Rose interview with Taleb is worth watching: