Field of Science

Book review: “Lithium: A Doctor, a Drug, and a Breakthrough” by Walter Brown

A fascinating book about a revolutionary medical discovery that has saved and treated millions of lives, was adopted with a lot of resistance and made by a most unlikely, simple man who was a master observer. Lithium is still the gold standard for bipolar disorder that affects millions of people, and it’s the unlikeliest of drugs - a simple ion that is abundant in the earth’s crust and is used in applications as diverse as iPhone batteries and hydrogen bombs. Even before the breakthrough antipsychotic drug chlorpromazine, lithium signaled the dawn of modern psychopharmacology in which chemical substances replaced Freudian psychoanalysis and primitive methods like electro-convulsive therapy as the first line of treatment for mental disorders.
The book describes how an unassuming Australian psychiatrist and Japanese POW named John Cade found out lithium’s profound effects on manic-depressive patients using a hunch and serendipity (which is better called “non-linear thinking”), some scattered historical evidence, primitive equipment (he kept urine samples in his family fridge) and a few guinea pigs. And then it describes how Danish psychiatrists like Mogens Schou had to fight uphill battles to convince the medical community that not only was lithium a completely revolutionary drug but also a prophylactic one.
The debates on lithium’s efficacy got personal at times but also shed light on how some of our most successful drugs did not always emerge from the most rigorous clinical trials, and how ethics can sometimes trump the design of these trials (for instance, many doctors find it unethical to continue to give patients a placebo if a therapy is found to be as immediately and powerfully impactful as lithium was). It is also a sobering lesson to realize in this era of multimillion dollar biotech companies and academic labs, how some of the most transformative therapies we know were discovered by lone individuals working with simple equipment and an unfettered mind.
Thanks to the work of these pioneers, lithium is still the gold standard, and it has saved countless lives from unbearable agony and debilitation, significantly because of its preventive effects. Patients who had been debilitated by manic-depression for decades showed an almost magical and permanent remission. Perhaps the most humane effect of lithium therapy was in drastically reducing the rate of suicides in bipolar patients in whom the rate is 10 to 20 times higher compared to the general population. 
The book ends with some illuminating commentary about why lithium is still not used often in the US, largely because as a common natural substance it is unpatentable and therefore does not lend itself to Big Pharma’s aggressive marketing campaigns. The common medication for treating bipolar disorder in the US is valproate combined with other drugs, but these don't come without side effects.
Stunningly, even after decades of use we still don’t know exactly how it works, partly because we also don’t know the exact causes of bipolar disorder. Unlike most psychiatric drugs, lithium clearly has general, systemic effects, and this makes its mechanism of action difficult to figure out. Somewhat contrary to this fact, it strangely also seems to be unique efficacious in treating manic-depression and not other psychiatric problems. What could account for this paradoxical mix of general systemic effects and efficacy in a very specific disorder? There are no doubt many hidden surprises hidden in future lithium research, but it all started with an Australian doctor acting on a simple hunch, derived from treating patients in a POW camp in World War 2, that a deficiency of something must be causing manic-depressive illness.
I highly recommended this book, both as scientific history and as a unique example of a groundbreaking medical discovery.

Spooky factions at a distance

For me, a highlight of an otherwise ill-spent youth was reading mathematician John Casti’s fantastic book “Paradigms Lost“. The book came out in the late 1980s and was gifted to my father who was a professor of economics by an adoring student. Its sheer range and humor had me gripped from the first page. Its format is very unique – Casti presents six “big questions” of science in the form of a courtroom trial, advocating arguments for the prosecution and the defense. He then steps in as jury to come down on one side or another. The big questions Casti examines are multidisciplinary and range from the origin of life to the nature/nurture controversy to extraterrestrial intelligence to, finally, the meaning of reality as seen through the lens of the foundations of quantum theory. Surprisingly, Casti himself comes down on the side of the so-called many worlds interpretation (MWI) of quantum theory, and ever since I read “Paradigms Lost” I have been fascinated by this analysis.
So it was with pleasure and interest that I came across Sean Carroll’s book that also comes down on the side of the many worlds interpretation. The MWI goes back to the very invention of quantum theory by pioneering physicists like Niels Bohr, Werner Heisenberg and Erwin Schrödinger. As exemplified by Heisenberg’s famous uncertainty principle, quantum theory signaled a striking break with reality by demonstrating that one can only talk about the world only probabilistically. Contrary to common belief, this does not mean that there is no precision in the predictions of quantum mechanics – it’s in fact the most accurate scientific framework known to science, with theory and experiment agreeing to several decimal places – but rather that there is a natural limit and fuzziness in how accurately we can describe reality. As Bohr put it, “physics does not describe reality; it describes reality as subjected to our measuring instruments and observations.” This is actually a reasonable view – what we see through a microscope and telescope obviously depends on the features of that particular microscope or telescope – but quantum theory went further, showing that the uncertainty in the behavior of the subatomic world is an inherent feature of the natural world, one that doesn’t simply come about because of uncertainty in experimental observations or instrument error.
At the heart of the probabilistic framework of quantum theory is the wave function. The wave function is a mathematical function that describes the state of the system, and its square gives a measure of the probability of what state the system is in. The controversy starts right away with this most fundamental entity. Some people think that the wave function is “epistemic”, in the sense that it’s not a real object and is simply related to our knowledge – or our ignorance – of the system. Others including Carroll think it’s “ontological”, in the sense of being a real entity that describes features of the system. The fly in the ointment concerns the act of actually measuring this wave function and therefore the state of a quantum system, and this so-called “measurement problem” is as old as the theory itself and kept even the pioneers of quantum theory awake.
The problem is that once a quantum system interacts with an “observer”, say a scintillation screen or a particle accelerator, its wave function “collapses” because the system is no longer described probabilistically and we know for certain what it’s like. But this raises two problems: Firstly, how do you exactly describe the interaction of a microscopic system with a macroscopic object like a particle accelerator? When exactly does the wave function “collapse”, by what mechanism and in what time interval? And who can collapse the wave function? Does it need to be human observers for instance, or can an ant or a computer do it? What can we in fact say about the consciousness of the entity that brings about its collapse?
The second problem is that contrary to popular belief, quantum theory is not just a theory of the microscopic world – it’s a theory of everything except gravity (for now). This led Erwin Schrödinger to postulate his famous cat paradox which demonstrated the problems inherent in the interpretation of the theory. Before measurement, Schrödinger said, a system is deemed to exist in a superposition of states while after measurement it exists only in one; does this mean that macroscopic objects like cats also exist in a superposition of entangled states, in case of his experiment in a mixture of half dead-half alive states? The possibility bothered Schrödinger and his friend Einstein to no end. Einstein in particular refused to believe that quantum theory was the final word, and there must be “hidden variables” that would allow us to get rid of the probabilities if only we knew what they were; he called the seemingly instantaneous entanglement of quantum states “spooky action at a distance”. Physicist John Bell put that particular objection to rest in the 1960s, proving that at least local quantum theories could not be based on hidden variables.
Niels Bohr and his group of followers from Copenhagen were more successful in their publicity campaign. They simply declared the question of what is “real” before measurement irrelevant and essentially pushed the details of the measurement problem under the rug by saying that the act of observation makes something real. The cracks were evident even then – the physicist Robert Serber once pointedly pointed out problems with putting the observer on a pedestal by asking if we might regard the Big Bang unreal because there were no observers back then. But Bohr and his colleagues were widespread and rather zealous, and most attempts by physicists like Einstein and David Bohm met with either derision or indifference.
Enter Hugh Everett who was a student of John Wheeler at Princeton. Everett essentially applied Occam’s Razor to the problem of collapse and asked a provocative question: What are the implications if we simply assume that the wave function does not collapse? While this avoids asking about the aforementioned complications with measurement, it creates problems of its own since we know for a fact that we can observe only one reality (dead vs alive cat, an electron track here rather than there) while the wave function previously described a mixture of realities. This is where Everett made a bold and revolutionary proposal, one that was as courageous as Einstein’s proposal of the constancy of the speed of light: he surmised that when there is a measurement, the other realities encoded in the wavefunction split off from our own. They simply don’t collapse and are every bit as real as our own. Just like Einstein showed in his theory of relativity that there are no privileged observers, Everett conjectured that there are no privileged observer-created realities. This is the so-called many-worlds interpretation of quantum mechanics.
Everett proposed this audacious claim in his PhD thesis in 1957 and showed it to Wheeler. Wheeler was an enormously influential physicist, and while he was famous for outlandish ideas that influenced generations of physicists like Richard Feynman and Kip Thorne, he was also a devotee of Bohr’s Copenhagen school – he and Bohr had published a seminal paper explaining nuclear fission way back in 1939, and Wheeler regarded Bohr’s Delphic pronouncements akin to those of Confucius – that posited observer-generated reality. He was sympathetic to Everett but could not support him in the face of Bohr’s objections. Everett soon left theoretical physics and spent the rest of his career doing nuclear weapons research, a chain-smoking, secretive, absentee father who dropped dead of an unhealthy lifestyle in 1982. After a brief resurrection by Everett himself at a conference organized by Wheeler, many-worlds didn’t see much popular dissemination until writers like Casti and the physicist David Deutsch wrote about it.
As Carroll indicates, the MWI has a lot of things going for it. It avoids the prickly, convoluted details of what exactly constitutes a measurement and the exact mechanism behind it; it does away with especially thorny details of what kind of consciousness can collapse a wavefunction. It’s elegant and satisfies Occam’s Razor because it simply postulates two entities – a wave function and a Schrödinger equation through which the wave function evolves through time, and nothing else. One can calculate the likelihood of each of the “many worlds” by postulating a simple rule proposed by Max Born that assigns a weight to every probability. And it also avoids an inconvenient split between the quantum and the classical world, treating both systems quantum mechanically. According to the MWI, when an observer interacts with an electron, for instance, the observer’s wave function becomes entangled with the electron’s and continues to evolve. The reason why we still see only one Schrödinger’s cat (dead or alive) is because each one is triggered by distinct random events like the passage of photons, leading to separate outcomes. Carroll thus sees many-worlds as basically a logical extension of the standard machinery of quantum theory. In fact he doesn’t even see the many worlds as “emerging” (although he does see them as emergent); he sees them as always present and intrinsically encoded in the wave function’s evolution through the Schrödinger equation.
A scientific theory is of course only as good as its experimental predictions and verification – as a quote ascribed to Ludwig Boltzmann puts it, matters of elegance should be left to the tailor and the cobbler. Does MWI postulate elements of reality that are different from those postulated by other interpretations? The framework is on shakier ground here since there are no clear observable predictions except those predicted by standard quantum theory that would truly privilege it over others. Currently it seems that the best we can say is that many worlds is consistent with many standard features of quantum mechanics. But so are many other interpretations. To be accepted as a preferred interpretation, a theory should not just be consistent with experiment, but uniquely so. For instance, consider one of the very foundations of quantum theory – wave-particle duality. Wave-particle duality is as counterintuitive and otherworldly as any other concept, but it’s only by postulating this idea that we can ever make sense of disparate experiments verifying quantum mechanics, experiments like the double-slit experiment and the photoelectric effect. If we get rid of wave-particle duality from our lexicon of quantum concepts, there is no way we can ever interpret the results of thousands of experiments from the subatomic world such as particle collisions in accelerators. There is thus a necessary, one-to-one correspondence between wave-particle duality and reality. If we get rid of many-worlds, however, it does not make any difference to any of the results of quantum theory, only to what we believe about them. Thus, at least as of now, many-worlds remains a philosophically pleasing framework than a preferred scientific one.
Many-worlds also raises some thorny questions about the multiple worlds that it postulates. Is it really reasonable to believe that there are literally an infinite copies of everything – not just an electron but the measuring instrument that observes it and the human being who records the result – splitting off every moment? Are there copies of me both writing this post and not writing it splitting off as I type these words? Is the universe really full of these multiple worlds, or does it make more sense to think of infinite universes? One reasonable answer to this question is to say that quantum theory is a textbook example of how language clashes with mathematics. This was well-recognized by the early pioneers like Bohr: Bohr was fond of an example where a child goes into a store and asks for some mixed sweets. The shopkeeper gives him two sweets and asks him to mix them himself. We might say that an electron is in “two places at the same time”, but any attempt to actually visualize this dooms us, because the only notion of objects existing in two places is one that is familiar to us from the classical world, and the analogy breaks down when we try to replace chairs or people with electrons. Visualizing an electron spinning on its axis the way the earth spins on its is also flawed.
Similarly, visualizing multiple copies of yourself actually splitting off every nanosecond sounds outlandish, but it’s only because that’s the only way for us to make sense of wave functions entangling and then splitting. Ultimately there’s only the math, and any attempts to cast it in the form of everyday language is a fundamentally misguided venture. Perhaps when it comes to talking about these things, we will have to resort to Wittgenstein’s famous quote – whereof we cannot speak, thereof we must be silent (or thereof we must simply speak in the form of pictures, as Wittgenstein did in his famous ‘Tractatus’). The other thing one can say about many-worlds is that while it does apply Occam’s Razor to elegantly postulating only the wave function and the Schrödinger equation, it raises questions about the splitting off process and the details of the multiple worlds that are similar to those about the details of measurement raised by the measurement problem. In that sense it only kicks the can of complex worms down the road, and in that case believing what particular can to open is a matter of taste. As an old saying goes, nature does not always shave with Occam’s Razor.
In the last part of the book, Carroll talks about some fascinating developments in quantum gravity, mainly the notion that gravity can emerge through microscopic degrees of freedom that are locally entangled with each other. One reason why this discussion is fascinating is because it connects many disparate ideas from physics into a potentially unifying picture – quantum entanglement, gravity, black holes and their thermodynamics. These developments don’t have much to do with many-worlds per se, but Carroll thinks they may limit the number of “worlds” that many worlds can postulate. But it’s frankly difficult to see how one can find definitive experimental evidence for any interpretation of quantum theory anytime soon, and in that sense Richard Feynman’s famous words, “I think it is safe to say that nobody understands quantum mechanics” may perpetually ring true.
Very reasonably, many-worlds is Carroll’s preferred take on quantum theory, but he’s not a zealot about it. He fully recognizes its limitations and discuss competing interpretations. But while Carroll deftly dissects many-worlds, I think that the real value of this book is to exhort physicists to take what are called the foundations of quantum mechanics more seriously. It is an attempt to make peace between different quantum factions and bring philosophers into the fold. There’s a huge number of “interpretations” of quantum theory, some more valid than others, being separated by each other as much by philosophical differences as by physical ones. There was a time when the spectacular results of quantum theory combined with the thorny philosophical problems it raised led to a tendency among physicists to “shut up and calculate” and not worry about philosophical matters. But philosophy and physics have been entwined since the ancient Greeks, and in one sense, one ends where the other begins. Carroll’s book is a hearty reminder for physicists and philosophers to eat at the same table, otherwise they may well remain spooky factions at a distance when it comes to interpreting quantum theory.

A new paper on kinase inhibitor discovery: not one on "drugs", and not one on an "AI breakthrough"

There is a new multicenter study on the discovery of some new kinase inhibitor compounds for the kinase DDR1 that has been making the rounds. Using a particular flavor of generative models, the authors derive a few potent and selective inhibitors for DDR1, a kinase target that has been implicated in fibrosis.

The paper is an interesting application of generative deep learning models to kinase inhibitor discovery. The authors start with six training datasets including ZINC and several patents along with a negative dataset of non-kinase inhibitors. After using their generative reinforcement learning model and filtering out reactives and clustering, they select 40 random molecules that have a less than 0.5 Tanimoto similarity to vendor stocks and the patent literature, and pick 6 out of these for testing. Four of the six compounds are indicated as showing an improvement in the potency against DDR1, although it seems that for two of these, the potency is little improved relative to the parent compound (10 and 21 nM vs 15 nM, which is well within the two or threefold margin of error in most biological assays). The selectivity of two of the compounds for the undesirable isoform DDR2 is also essentially the same (649 nM vs 1000 nM and 278 nM vs 162 nM; again within the twofold error margin of the assay). So from a potency standpoint, the algorithm seems to find equipotent inhibitors at best; given that these four molecules were culled from a starting set of 30,000, that indicates a hit rate of 0.01%. Good selectivity against a small kinase panel is demonstrated, but selectivity against a larger panel of off-targets is not indicated. There also don't seem to be tests for aggregation or non-specific behavior; computational techniques in drug discovery are well known to produce a surfeit of false positives. It would also be really helpful to get some SAR for these compounds to know if they are on-off non-specific binders or actual lead compounds.

Now, even equipotent inhibitors can be useful if they show good ADME properties or evidence scaffold hops. The group tested the inhibitors in liver microsomal assays, and they seem to have similar stability as a group of non-kinase inhibitor controls, although it would be good to see some accompanying data for DDR inhibitors next to this data. They also tested one of the compounds in a rodent model, and it seems to show satisfactory half lives; it's again not clear how these compare to other DDR inhibitors. Finally, they build a pharmacophore-based binding model of the inhibitor and compare it to a similar quantum mechanical model, but there is no experimental data (from NMR or mutagenesis for instance) which would allow a good experimental validation of this binding pose. Pharmacophore models are again notorious for producing false positives, and it's important to demonstrate that the pharmacophore in fact does not also fit the negative data.

The paper claims to have discovered the inhibitors "in 21 days" and tested them in 46. The main issue here - and this is by no means a critique of just this paper - is not that the discovered inhibitors show very modest improvement at best over the reference; it's that there is no baseline comparison, no null models, that can tell us what the true value of the technique is. This has been a longstanding complaint in the computational community. For instance, could regular docking followed by manual picking have found the same compounds in the same time? What about simple comparisons with property-based metrics or 2D metrics? And could a team of expert medicinal chemists brainstorming over beer have looked at the same data and come up with the same conclusions much sooner? I am glad that the predictions were actually tested - even this simple follow-up is often missing from computational papers - but 21 days is not as short as it sounds if you start with a vast amount of already-existing and curated data from databases and patents, and if simpler techniques can find the same results sooner. And the reliance on vast amounts of data is of course a well-known Achilles heel for deep learning techniques, so these techniques will almost certainly not work well on new targets with a paucity of data.

Inhibitor discovery is hardly a new problem for computational techniques, and any new method is up against a whole phalanx of structure and ligand-based methods that have been developed over the last 30+ years. There's a pretty steep curve to surmount if you actually want to proclaim your latest and greatest AI technique as a novel application. As it stands, the issue is not that the generative methods didn't discover anything, it's that it's impossible to actually judge their value because of an absence of baseline comparisons.

The AI hype machine is out in absolute full force on this one (see herehere and especially here for instance). I simply don't understand this great desire to proclaim every advance in a field as a breakthrough without simply calling it a useful incremental step or constructively criticizing it. And when respected sources like WIRED and Forbes proclaim that there's been a breakthrough in new drug discovery, the non-scientific public which is unfamiliar with IC50 curves or selectivity profiles or the fact that there's a huge difference between a drug and a lead will likely think that a new age of drug discovery is upon us. There's enough misleading hype about AI to go around, and adding more to the noise does both the scientific and the non-scientific community a disservice.

Longtime cheminformatics expert Andreas Bender has some similar thoughts here, and of course, Derek at In the Pipeline has an excellent, detailed take here.

Mathematics, And The Excellence Of The Life It Brings

Shing-Tung Yau and Eugenio Calabi
Mathematics and music have a pristine, otherworldly beauty that is very unlike that found in other human endeavors. Both of them seem to exhibit an internal structure, a unique concatenation of qualities that lives in a world of their own, independent of their creators. But mathematics might be so completely unique in this regard that its practitioners have seriously questioned whether mathematical facts, axioms and theorems may not simply exist on their own, simply waiting to be discovered rather than invented. Arthur Rubinstein and Andre Previn’s performance of Chopin’s second piano concerto sends unadulterated jolts of pleasure through my mind every time I listen to it, but I don’t for a moment doubt that those notes would not exist were it not for the existence of Chopin, Rubinstein and Previn. I am not sure I could say the same about Euler’s beautiful identity connecting three of the most fundamental constants in math and nature – e, pi and i. That succinct arrangement of symbols seems to simply be, waiting for Euler to chance upon it, the way a constellation of stars has waited for billions of years for an astronomer to find it.
The beauty of music and mathematics is that anyone can catch a glimpse of this timelessness of ideas, and even someone untrained in these fields can appreciate the basics. The most shattering intellectual moment of my life was when, in my last year of high school, I read in George Gamow’s “One, Two, Three, Infinity” about the fact that different infinities can actually be compared. Until then the whole concept of infinity had been a single concept to me, like the color red. The question of whether one infinity could be “larger” than another sounded as preposterous to me as whether one kind of red was better than another. But here was the story of an entire superstructure of infinities which could be compared, studied and taken apart, and whose very existence raised one of the most famous, and still unsolved, problems in math – the Continuum Hypothesis. The day I read about this fact in Gamow’s book, something changed in my mind; I got the feeling that some small combination of neuronal gears permanently shifted, altering forever a part of my perspective on the world.
Anyone who has seriously studied mathematics for any extended period of time also knows the complete immersion that can come with this study. In my second year of college I saw a copy of George F. Simmons’s book “Introduction to Topology and Modern Analysis” at the house of a mathematically gifted friend and asked to borrow it out of sheer curiosity. Until then mathematics had mainly been a matter of utilitarian value to me and most of my formal studies had been grounded in the kind of practical, calculus-based math that are required for solving problems in chemistry and physics. But Gamow’s exposition of countable and uncountable infinities had whetted my mind for more abstract stuff. The greatest strength of Simmons’s book is that it is entirely self-contained, starting with the bare basics of set theory and building up gradually. It’s also marvelously succinct, almost austere in the brevity of its proofs.
The book swept me off my feet, and the first time I started on it I worked through the theorems and problems right through the night; I can still see myself sitting at the table, the halo of a glaringly bright table lamp enclosing me in this special world of mathematical ideas, my grandmother sleeping outside this world in the small room that the two of us shared. The next night was not much different. After that I was seized by an intense desire to understand the fundamentals of topology – compactness, connectedness, metric and topological spaces, the Heine-Borel theorem, the whole works. Topology seemed to me like a cathedral – in fact the very word “spaces” as in “vector spaces” or “topological spaces” conjured up (and still do) an intricate, self-reinforcing cathedral of axioms, corollaries, lemmas and theorems resting on certain rules, each elegantly supporting the rest of it, being gradually built – or perhaps discovered – through the ages by its great practitioners, practitioners like Cantor, Riemann, Hilbert and Banach. It appeared like a great machine with perfectly enmeshed gears flawlessly fitting into each other and enabling great feats of mechanical efficiency and beauty. I was fortunate to find an enthusiastic professor who trained students for the mathematical olympiad, and he started spending several hours with me every week explaining proofs and helping me get over roadblocks. This was followed by many evenings of study and discussion, partly with a like-minded friend who had been inspired to get his own copy of Simmons’s book. I kept up the routine for several months and got as far as the Stone-Weierstrass theorem before other engagements intruded on my time – I wasn’t majoring in mathematics after all. But the intellectual experience had been memorable, unforgettable.
If even a lowly non-mathematician like myself could be so taken by the intricacies of higher mathematics, I can only dimly imagine the reveries experience by some of math’s greatest practitioners, one of whom is Shing-Tung Yau. Yau is a professor at Harvard and one of the world’s greatest mathematicians. His speciality is geometry and topology. Yau’s claim to fame is in bridging geometry and topology with differential equations, essentially founding the discipline of geometric analysis, although perhaps his greatest legacy would be forming novel, startling connections between physics and mathematics and opening up a dialogue that has had a long and often contentious history. For these efforts he won the Fields Medal in 1982, becoming the first mathematician of Chinese descent to do so.
The connection between algebra and geometry is an ancient one. In creating analytical or Cartesian geometry for instance, Rene Descartes had found a way to represent the elements of Euclidean geometry, entities like points and lines, as algebraic coordinates. This was a revolutionary discovery, allowing basic geometric entities like circles and ellipses to be described by algebraic equations. Analytical geometry lies at the very foundation of mathematics, enabling many other fields like multivariate calculus and linear algebra. The culmination of analytical geometry was in the field of differential geometry which uses techniques from algebra and calculus to describe geometric objects, especially curved ones.
The difference between geometry and topology is essentially that the former is about local entities while the latter is about global entities, about the big picture. Thus, in the context of the analogy often given to illustrate what topology is about, while a coffee cup and a donut are different geometric objects, they are identical topological objects because one can be converted into the other simply by stretching, expanding and contracting, without having to tear or cut any part. Perhaps Yau’s most interesting contributions to differential topology would be something called a Calabi-Yau manifold. Loosely speaking, a manifold is a topological space whose every point is essentially “flat” or “locally Euclidean”. A good analogy is with ancient views of the Earth as flat contrasted with modern views of a round earth; the discrepancy arises from the fact that even the round earth is locally flat or Euclidean. Manifolds are not just interesting mathematically but are of great importance in physics and especially in general relativity. For instance, Einstein used the theory of Riemannian manifolds to deal with the curvature of spacetime. Calabi-Yau manifolds are special manifolds that gained importance when they were found to represent “hidden” dimensions in string theory. But this is merely one of Yau’s many seminal contributions to math over a long and fascinating life as described in his memoirs, “The Shape of A Life“.
That life started in China in the 1950s, during the cultural revolution. Yau was one of nine children. His parents were both very intelligent and highly committed to the education of their children. His father in particular was a scholarly role model for Yau. He was a professor who taught many disciplines, including languages and history. He was well versed in poetry and philosophy and always had a ready store of Taoisms and Confucian parables for his children. Shing-Tung’s parents lost most of their property during the revolution, and like many others migrated to Hong Kong where a better life was found. This better life was still very hard. Yau’s parents moved several times, and most houses he lived in were either overcrowded or in the wilderness, without running electricity and water, sometimes infested by snakes and other animals. School was several miles away and had to be reached through a combination of walks and public transportation. And yet it seems to have been a generally happy childhood, sustained by stories and playmates in the form of several brothers and sisters, of whom Yau was especially close to a particular older sister. The poverty and hardscrabble life also engendered a tremendous capacity for persistence and hard work in Yau. This capacity was particularly enhanced after Yau’s father tragically passed away from cancer when he was fourteen. Yau was devastated by his amazing father’s passing, and he resolved to apply the lessons this role model had imparted as diligently as possible. His mother was a tremendous influence, and she worked at odd jobs to support her large family. Later she moved to the United States with her son and had the pleasure of watching him become successful beyond her dreams.
While not particularly prodigious in mathematics in his early years, Yau started shining in high school. By a quirk of fate in which he did less than ideally in a national examination by spending time with a street gang, Yau gained admission to a school named Pui Ching that remarkably enough produced no less than one future Nobel laureate, three future U.S. National Medal of Science winners and eight future members of the U.S. National Academy of Sciences. This is an astonishing record for a fairly provincial school in Hong Kong, similar to records of future eminent scientists from the Bronx High School of Science of New York City. One factor that played into the school’s success as well as that of the Chinese University of Hong Kong which Yau attended for college was the presence of visiting American professors or native-born professors who had studied at American universities. One such professor named Stephen Salaff recommended Yau for graduate school at the University of California in Berkeley, and Yau’s career was launched. A decisive factor in Yau’s admission was a strong recommendation by S. S. Chern, Berkeley’s eminent geometer and perhaps the leading Chinese mathematician outside the United States then. Chern’s relationship with Yau looms large in the book, perhaps too large, and throughout his life Chern was both father figure and mentor to Yau as well as nemesis and adversary. Here Yau also met his wife Yu-Yun, an accomplished physicist; curiously enough, in deference to Chinese traditional culture, while he saw her during his first week in the library, he waited several years to ask her out before someone made a formal introduction. The two also lived apart for several years while Yau, uncommonly for a mathematician, bounced between many universities like Stanford, the Institute for Advanced Study in Princeton and UCSD before finally settling down at Harvard. Their two sons are successful in their own regard, one being a biochemist and the other a doctor.
After graduating Yau made a variety of significant contributions to differential geometry and geometric analysis. This included proving the Calabi conjecture which entails proving the existence of Riemannian metrics with certain properties on complex manifolds. This was a years-lone struggle emblematic of great mathematical achievements, and like great mathematical achievements it involved some blind detours, including Yau’s mistaken early results that seemed to indicate counterexamples to the conjecture. A particularly key contribution by Yau of great relevance to physics was to come up with a purely mathematical proof of the so-called positive mass conjecture. This conjecture, taken at face value as obvious by physicists for a long time, said that the mass of any physical isolated system from both matter and gravitation is positive. This includes our universe. To prove this, Yau and his collaborator Richard Schoen constructed an ingenious argument: they first proved that if the average curvature of the spacetime corresponding to such a system is positive, then the mass is also positive. They then constructed a spacetime with positive curvature that had the same mass as our universe. Put together, the two results which showcased a classic argument by analogy showed that the mass of our universe must also be positive.
Yau and Schoen’s results inaugurated a new era of interaction between physics and mathematics. This relationship although long and profound had often been fraught; David Hilbert once famously said when asked if the relationship was bad that it wasn’t bad because for it to be so the two groups have to talk to each other. Most of the breakthroughs in twentieth century physics were made using what we might call 19th century mathematics – calculus, differential equations and matrix theory. Yau and others’ work showed that there were still novel approaches from pure math based on topology and geometry that could contribute to advances in physics. Roger Penrose who was trained in the classical tradition of British mathematics imbibed these fields, and he was able to use insights from them to make groundbreaking contributions to general relativity.
This line of discovery especially took off when physicists working on string theory in the 1980s discovered that the hidden dimensions postulated by string theory could essentially be modeled as Calabi-Yau manifolds. This was indeed one of those happy circumstances where a purely mathematical discovery made out of intellectual curiosity could have deep ramifications for physics. There are also examples from string theory that have spurred developments in pure mathematics. There was again precedent for such unexpected relationships – for instance the theory of Lie groups turned out to have completely unexpected connections with particle physics – but Yau and others’ work showed the great value of pure curiosity-driven research in mathematics that could spark a robust back and forth with physics. One aspect of string theory that is missing from Yau’s account is the increasing criticism of the field as being unmoored from experiment or even from experimental prediction. But notwithstanding this valid criticism, it is clear that string theory provides a great example of how, just like mathematics has traditionally contributed to physics, discoveries in physics can play back into pure mathematics.
Along with straddling the worlds of math and physics, Yau has also straddled two others worlds – those of China and the United States. Although he grew up in Hong Kong, his parents’ strong Chinese roots made him feel very strong connections to his ancestral homeland. He visited China several times a year, handpicked Chinese students to study in the US and collaborated extensively with Chinese researchers; in fact almost two-thirds of his students and collaborators are Chinese, and in what was a harbinger of current times, the CIA asked him about his students multiple times before realizing that their work was too obscure and pure to impact national security – one of the unexpected ancillary perks of working in pure mathematics. He also criticizes the Chinese system as being too enamored of prizes, money and fame rather than the pure intellectual satisfaction that comes from pursuing science for its own goals. After he won the Fields Medal, Yau’s Chinese sojourns became high profile and they got him into many controversies involving funding, favoritism and committee work, many involving his former mentor Chern. At least a dozen personal controversies dot the narrative in the book, and while they make for fascinating reading because they demonstrate that even the most abstract of mathematics is not free from the very human qualities of personal jealousies, feuds, nepotism and claims of credit, methinks that Yau sometimes doth protest too much, especially since we only hear one side of the story and seldom the other side.
Perhaps the most significant controversy came about when Sylvia Nasar who wrote the book “A Beautiful Mind” wrote in a widely read article in the New Yorker that Yau had tried, through his students, to steal credit away from the famously reclusive Russian mathematician Grigori Perelman and his stunning, completely unexpected proof of the century-old Poincare Conjecture. It turned out that Perelman had not worked out all the details of his proof and had built on the very important work done by the American mathematician Richard Hamilton. Yau recognized that Perelman’s results would have been impossible without Hamilton’s work, and went out of his way to praise Hamilton. He also recruited two Chinese students to work out a mammoth, 300-page exposition of the proof that filled in some gaps. There is no doubt that the proof was Perelman’s, but Yau’s extensive maneuverings made it sound like he was undermining Perelman’s efforts. In this case, because of Perelman’s self-imposed isolation from the community, it is easy to think that Yau deserves the criticisms, but he makes his side of the story clear and one gets the feeling that Nasar exaggerated the feud. And in spite of all these controversies, Yau has sustained warm friendships with many leading mathematicians.
Shing-Tung Yau’s life has been wholly dedicated to mathematics and its advancement. He sees mathematics much like Newton saw all of natural science:
“After much tumult in my early years, I was able to find my way to the field of mathematics, which still has the power to sweep me off my feet like a surging river. I’ve had the opportunity to travel upon this river – at times even clearing an obstruction or two from a small tributary so that water can flow to new places that have never been accessed before. I plan to continue my explorations a bit more and then, perhaps, do some observing – or cheerleading – from the riverbanks, a few steps removed.”
A little boy on the shore, playing with shiny pebbles, while the great ocean of truth lies undiscovered before him, ready to be explored.
First posted on 3 Quarks Daily.

The three horsemen of the machine learning apocalypse

My colleague Patrick Riley from Google has a good piece in Nature in which he describes three very common errors in applying machine learning to real world problems. The errors are general enough to apply to all uses of machine learning irrespective of field, so they certainly apply to a lot of machine learning work that has been going on in drug discovery and chemistry.

The first kind of error is an incomplete split between training and test sets. People who do ML in drug discovery have encountered this problem often; the test set can be very similar to the training set, or - as Patrick mentions here - the training and test sets aren't really picked at random. There should be a clear separation between the two sets, and the impressive algorithms are the ones which extrapolate non-trivially from the former to the latter. Only careful examination of the training and test sets can ensure that the differences are real.

Another more serious problem with training data is of course the many human biases that have been exposed over the last few years, biases arising in fields ranging from hiring to facial recognition. The problem is that it's almost impossible to find training data that doesn't have some sort of human bias (in that context, general image data usually works pretty well because of the sheer number of random images human beings capture), and it's very likely that this hidden bias is what your model will then capture. Even chemistry is not immune from such biases; for instance, if your training data contains compounds synthesized using metal-catalyzed coupling reactions and is therefore enriched in biaryls, you will be training an algorithm that is excellent at identifying biaryls, drug scaffolds that are known to have issues with stability and clearance in the body.

The second problem is that of hidden variables, and this is especially the case with unsupervised learning where you let loose your algorithm on a bunch of data and expect it to learn relevant features. The problem is that there are a very large number of features in the data that your algorithm could potentially learn and find correlations with, and a good number of these might be noise or random features that would give you a good correlation while being physically irrelevant. A couple of years ago there was a good example of an algorithm used to classify tumors learning nothing about the tumors per se but instead learning features of rulers; it turns out that oncologists often keep rulers next to malignant tumors to measure their dimensions, and these were visible in the pictures. 

Closer to the world of chemistry, there was a critique last year of an algorithm that was supposed to pick an optimal combination of reaction conditions for a synthetic Buchwald-Hartwig reaction. This is a rather direct application of machine learning in chemistry, and one of the most promising ones in my view, partly because reaction optimization is still very much a trial-and-error art and it is far more deterministic than, say, finding a new drug target based on sparse genomic correlations. After the paper was published there was a critique pointing out that you could get the same results if you randomized the data or fit the model on noise. That doesn't mean the original model was wrong, it means that it wasn't unique and wasn't likely causative. Basically asking what exactly your model is fitting to is always a good idea.

As Patrick's article points out, there are other examples like an algorithm latching on to edge effects of plates in a biological assay or in image analysis in phenotypic screening; two other applications very relevant to drug discovery. The remedy here again is to run many different models while asking many different questions, a process that needs patience and foresight. Another strategy which I increasingly like would be to not do unsupervised learning but instead do constrained learning, with the constraints coming from the laws of science.

The last problem is a bit more subtle and involves using the wrong objective or "loss" function. A lot of this boils down to asking the right question. Patrick cites the example of using ML to diagnose diabetic retinopathy using images of the back of the eye. It turns out that if the question they asked was focused more on diagnosing a single disease rather than whether the patient needs to see a doctor, the models were thrown into disarray.

So what's the solution? What it always has been. As the article says,


"First, machine-learning experts need to hold themselves and their colleagues to higher standards. When a new piece of lab equipment arrives, we expect our lab mates to understand its functioning, how to calibrate it, how to detect errors and to know the limits of its capabilities. So, too, with machine learning. There is no magic involved, and the tools must be understood by those using them."
Would you expect the developer of a new NMR technique or a new quantum chemistry calculation algorithm to not know what lies under the hood? Would you expect these developers to not run tests using many different parameters and under different controls and conditions? For that matter, would you expect a solider to go into battle without understanding the traps the enemy has laid? Then why expect developers of machine learning to operate otherwise? 
Some of it is indeed education, but much of it involves the same standards and values that have been part of the scientific and engineering disciplines since antiquity. Unfortunately, too often machine learning, especially because of its black-box nature, is regarded as magic. But there is no magic (Arthur Clarke quotes notwithstanding). It's all careful, meticulous investigation, it's about going into the field knowing that there almost certainly will be a few mines scattered here and there. Be careful if you don't want you/r model to get blown up.

Infinite horizons; or why I am optimistic about the future

The Doomsday Scenario, also known as the Copernican Principle, refers to a framework for thinking about the death of humanity. One can read all about it in a recent book by science writer William Poundstone. The principle was popularized mainly by the philosopher John Leslie and the physicist J. Richard Gott in the 1990s; since then variants of it have have been cropping up with increasing frequency, a frequency which seems to be roughly proportional to how much people worry about the world and its future.
The Copernican Principle simply states that the probability of us existing at a unique time in history is small because we are nothing special. We therefore must exist roughly close to half the period of our existence. Using Bayesian statistics and the known growth of population, Gott and others then calculated lower bounds for humanity’s future existence. Referring to the lower bound, their conclusion is that there is a 95% chance that humanity will go extinct in 9120 years.
The Doomsday Argument has sparked a lively debate on the fate of humanity and on different mechanisms by which the end will finally come. As far as I can tell, the argument is little more than inspired numerology and has little to do with any rigorous mathematics. But the psychological aspects of the argument are far more interesting than the mathematical ones; the arguments are interesting because they tell us that many people are thinking about the end of mankind, and that they are doing this because they are fundamentally pessimistic. This should be clear by how many people are now talking about how some combination of nuclear war, climate change and AI will doom us in the near future. I reject such grim prognostications because they are mostly compelled by psychological impressions rather than by any semblance of certainty.
A major reason why there is so much pessimism these days is because of what the great historian Barbara Tuchman once called ‘Tuchman’s Law’; Tuchman’s Law states that the impression that an event leaves in the minds of observers is proportional to its coverage in the newspapers. Tuchman said this in 1979, and it has become a truism today because of the Internet. The media is much more interested in reporting bad things that happened rather than good things that did not happen, so it’s easy to think that the world is getting worse every day. The explosion of social media and multiple news sources have amplified this sensationalism and selection bias by gargantuan proportions. As Tuchman said, even if you may be relentlessly reading about a troubling phenomenon like child kidnapping or mass shootings, it is exceedingly rare that you will come home on any given day having faced such calamities.
In this trivial sense I agree with Bill Gates, Hans Rosling, Steven Pinker and others who have written books describing how by almost every important parameter – for instance child mortality, women and minority rights, health status, poverty, political awareness, environmental improvement – the world of today is not just vastly better than that of yesterday but has been on a steep and steady curve of improvement since medieval times. One simply needs to pick up any well-regarded book on medieval history (Tuchman’s marvelous book “The Distant Mirror” describing the calamitous 14th century will do the job) to realize how present human populations almost seem to live on a different planet as far as quality of life is concerned. This does not refute the often uneven distribution of progress, nor thus it tell us that every improvement that we have seen is guaranteed, nor this it say we should rest on our laurels, but it does give us more than enough rational cause for optimism.
Sometimes the difference between optimism and pessimism is simply related to looking at the same data point in two different ways. For instance, take as a reference date the year that the US Supreme Court legalized same-sex marriage – 2015. Now go back a hundred years, to 1915. Even in the United States the world of individual rights was stunningly different from now. Women could not vote, immigration from non-European countries was strongly discouraged and restricted, racism against non-white people (and even some white people such as Catholics) was part of the fabric of American society, black people were actively getting lynched in the south and their civil rights were almost non-existent, abortion was illegal, gay people would not dream of coming out of the closet and anti-Semitism was not only rampant but institutionalized in places like Ivy League universities.
It is downright incredible that, only a hundred years later, every single one of these barriers had fallen. Not one or two or three, but every single one. I cannot see how this extraordinary reversal of discrimination and inequality cannot lead to soaring optimism about the future. Now, two people might look at this fact in two different ways. One might say, “It took 228 years since the writing of the US Constitution for these developments to transpire”, while another person might say, “It took only a hundred years from 1915 for these developments to transpire”. Which perspective do you choose since both are equally valid? I choose the latter, not only because it points to optimism for the future but to informed optimism. There has been a tremendous raising of moral consciousness about equal treatment of all kinds of groups in the last one hundred years, and if anything, the strong, unstoppable waves of progressivism on the Internet promise that this moral elevation will continue unabated. There are effectively zero chances that women or minorities will lose the vote for instance. The price of liberty is eternal vigilance, not eternal pessimism.
What about those four horsemen of the apocalypse, now compressed into the three horsemen comprising nuclear war, AI and climate change, that seem to loom large when it comes to a dim view of the future of humanity? I believe that as real as some of the fears from climate change, nuclear war and AI are, they are exaggerated and not likely to impact us the way we think.
First, climate change. There are many deleterious impacts of human beings on the environment, of which global warming is an important one and likely the most complicated to predict in its details. It is harder to predict phenomena like the absorption of carbon dioxide by the biosphere and the melting of glaciers based on computer models than it is to understand and act on phenomena like ocean acidification, deforestation, air pollution and strip mining. Sadly, discussions of these topics are often lost in the political din surrounding global warming. There is also insufficient enthusiasm for solutions such as nuclear energy and solar power that can make a real impact on energy usage and fossil fuel emissions. On the bright side, support for fighting climate change and environmental degradation is more vociferous than ever, and social media thankfully has played an important role in generating it. This support is similar to the support that early 20th century environmentalists lent to preventing creatures like the American buffalo and whales from going extinct. There are good reasons to think that whatever the real or perceived effects of climate change, it will not cease to be a publicly important issue in the future. But my optimism regarding climate change does not just come from the level of public engagement I see but from the ability of humans to cope; I am not saying that climate change will pose no problem, but that one way or another humans will find solutions to contain or even eliminate those problems. Humans survived the last ice age at dangerously low levels of population and technological capability compared to today, so there is little reason to think that we won’t be able to cope. Some people worry whether it is worth bequeathing the uncertain world of tomorrow to our children and grandchildren. My belief is that, considering the travails that humanity successfully faced in the last thousand years or so, our children and grandchildren will be more than competent to handle whatever problem they are handed by their predecessors and the planet.
Second, nuclear war. The world’s nuclear arsenals have posed a clear and present danger for years. However, deterrence – as fragile and fraught with near misses as it is – has ensured that no nuclear weapon has been exploded in anger for almost 75 years. This is an almost miraculous track record. Moreover, while the acquisition of dirty bombs or nuclear material by non state actors is a real concern, the global nuclear stockpile has been generally quite secure, and there are enough concerned experts who continue to monitor this situation. Since the end of the Cold War, both the United States and Russia have significantly reduced their stockpiles, although both countries should go to still lower numbers. The detonation of even a low yield nuclear weapon in a major city will be a great tragedy, but it will not have the same effects as the global thermonuclear war whose threat the world labored under for more than fifty years. In 1960, Herman Kahn wrote “On Thermonuclear War”, a controversial book that argued that even a major thermonuclear war would not mean the end of humanity as most people feared. Part of Kahn’s analysis included calculations on the number of deaths and part included historical evidence of human renewal and hope after major wars. While the book was morbid in many details, it did make the point that humanity is far more resilient than we think. Fortunately the scenarios that Kahn described never came to pass, and the risk of them happening even on a small scale are now far lower than they ever were.
Finally, AI seems to be perhaps the prime reason for the extinction of humanity that many world and business leaders and laymen fear. Early fears centered on the kind of killer robots that dotted the landscape of science fiction movies, but recent concerns have centered on machines gradually developing intelligence and humans gradually ceding authority to them. But most AI doomsday scenarios are speculative at best and contain a core of deep uncertainty. For instance, a famous argument made by Nick Bostrom described a scenario called the AI paperclip maximizer. The idea is that humanity creates an AI whose purpose is to create paperclips. The AI will gradually single-mindedly start making paperclips out of everything, consuming all natural resources and rendering the human race extinct. This kind of doomsday scenario has some important assumptions built into it, among which is the assumption that such an AI can actually exist and wouldn’t have a failsafe built into it. But the bigger question is regarding the AI’s intelligence: any kind of truly intelligent AI won’t spend its entire time making paperclips, while any kind of insufficiently intelligent AI will be easily controlled by human beings or at least live with them in some kind of harmony. I worry much less about a paperclip AI than I do about humans gradually ceding thinking to fleeting sources of entertainment like social media.
But the real problem with any kind of doomsday scenario involving AGI (artificial general intelligence) is that it simply underestimates what it would take for a machine to acquire true human-like cognitive capabilities. One of the best guides to thinking about exactly what it would take for AGI to somehow take over the world is the technologist Kevin Kelly. He gives three principal reasons for the unlikelihood of this happening: one, that intelligence is along many axes, and even very intelligent human beings are usually intelligent along a few; second, that intelligence is not just gained through thinking alone but through experimentation, and that experimentation slows down any impact that a super-intelligence might have; and three, that any kind of AGI scenario assumes that the relationship between humans and their creations would be intrinsically hostile and fixed. Almost all such assumptions about AGI are subject to doubt, and at least a few of the conditions that seem to be necessary for AGI to truly dominate humanity seem to be both rate-limiting and unlikely.
Ultimately, most doomsday scenarios are based on predicting the future, and prediction, as Niels Bohr famously said, is very difficult, especially concerning the future. The most important prediction about the future of humanity will probably be the one that we are not capable of making. But in the absence of accurate prediction about the future, we have the past. And while the past is never a certain guide to the future, the human past in particular shows a young species that is almost infinitely capable of adaptation, empathy, creativity and optimism. I see no reason to believe this will not continue to be the case.
First published on 3 Quarks Daily.

Book review: The British Are Coming: The War for America, Lexington to Princeton, 1775-1777

The British Are Coming: The War for America, Lexington to Princeton, 1775-1777The British Are Coming: The War for America, Lexington to Princeton, 1775-1777 by Rick Atkinson

When the British army of regulars captured American troops during the Battle of New York, they contemptuously noted how they were surprised to see so many ordinary people among them – tanners, brewers, farmers, metal workers, carpenters and the like. That observation in one sense summed up the difference between the British and American causes: a ragtag group of ordinary citizens with little battle experience pitted against a professional, experienced and disciplined army belonging to a nation that then possessed the biggest empire since the Roman Empire. The latter were fighting for imperial power, the former for conducting an experiment in individual rights and freedom. The former improbably won.

Rick Atkinson shows us how in this densely-packed, rousing military history of the first two years of the Revolutionary War. The Americans kept on foiling the British through a combination of brilliant tactical retreats, dogged determination, improvisation and faith in providence. His is primarily a military history that covers the opening salvo in Lexington and Concord to the engagements in Princeton and Trenton and Washington's legendary crossing of the frozen Delaware. However, there is enough observational detail on the social and political aspects of the conflict and the sometimes larger than life personalities involved to make it a broader history. The account could be supplemented with other political histories such as ones by Gordon Wood, Bernard Bailyn and Joseph Ellis to provide a fuller view of the politics and the personalities.

Atkinson’s greatest strength is to bring an incredible wealth of detail to the narrative and pepper it with primary quotes from not just generals and soldiers but from ordinary men and women. His other big strength is logistical information. No detail seems to escape his eye; the number and tonnage of food and clothing provisions and shipping, sundry details of types of weapons, ships, beasts of burden and ammunition, the kinds of diseases riddling the camps and the medieval medicine used to treat them (some of them positively so - "oil of whelps" was a grotesque substance concocted from white wine, earthworms and the flesh of dogs boiled alive), ditties and plays that were being performed by the soldiers ("Clinton, Burgoyne, Howe, Bow, wow, wow"), the constantly-changing weather, the political machinations in Whitehall and the Continental Congress…the list goes on and on. Sometimes the overwhelming detail can be distracting – for instance do we need to know the exact number of blankets and weight of salt pork supplied during the eve of a particular battle? – but overall the dense statistics and detail have the effect of immersing the reader in the narrative.

The major battles – Lexington and Bunker Hill, Long Island and Manhattan, Quebec and Ticonderoga, Charleston and Norfolk, Princeton and Trenton – are dissected with fine detail and rousing descriptions of men, material, the thrust and parry at the front and the desperation, disappointments, retreats and triumphs that often marked the field of battle. The writing can occasionally be almost hallucinatory: "Revere swung into the saddle and took off at a canter across Charlestown Neck, hooves striking sparks, rider and steed merged into a single elegant creature, bound for glory". The accounts of the almost unbelievably desperate and excruciating winter fighting and retreat in Canada are probably the highlights of the military narratives. Lesser-known conflicts in Virginia and South Carolina in which the British were squarely routed also get ample space. Particularly interesting is the improbable and self-serving slave uprising drummed up by Lord Dunmore, Virginia's governor, and the far-reaching fears that it inspired in the Southern Colonies. Epic quotes that have become part of American history are seen in a more circumspect light; for instance, it’s not clear who said “Don’t fire until you see the whites of their eyes” during Bunker Hill, and instead of the famous “The British are coming” cry that is attributed to Paul Revere, it’s more likely that he said “The regulars are coming.” Also, the British army might have been experienced, but they too were constantly impacted by shortage of food and material, and this shortage was a major factor in many of their decisions, including the retreat from Boston. Brittania might have ruled the waves, but she wasn’t always properly nourished.

The one lesson that is constantly driven home is how events that seem providential and epic now were so uncertain and riddled with improvisation and desperation when they happened; in that sense hindsight is always convenient. Atkinson makes us aware of the sheer miserable conditions the soldiers and generals lived in; the threadbare clothing which provided scant protection against the cold, the horrific smallpox, dysentery and other diseases which swept entire battle companies off the face of the planet without warning and the problems constantly posed by loyalists and deserters to American patriots. There were many opportunities for men to turn on one another, and yet we also see both friends and enemies being surprisingly humane toward each other. In many ways, it is Atkinson’s ability to provide insights across a wide cross-section of society, to make the reader feel the pain and uncertainty faced by ordinary men and women, that contribute to the uniqueness of his writing.

Atkinson paints a sympathetic and sometimes heroic portrait of both British politicians and military leaders, but he also makes it clear how clueless, bumbling and misguided they were when it came to understanding the fundamental DNA of the colonies, their frontier spirit, their Enlightenment thinking and their very different perception of their relationship with Britain. A excellent complement to Atkinson’s book for understanding British political miscalculations leading up to the war would be Nick Bunker’s “An Empire on the Edge”. While primarily not a study of personality, Atkinson’s portraits of American commanders George Washington, Benedict Arnold, Henry Knox, Charles Lee, Israel Putnam and British commanders William and Richard Howe, George Clinton, Guy Carleton and others are crisp and vivid. Many of these commanders led their men and accomplished remarkable feats through cold and disease, in the wilderness and on the high seas; others like American John Sullivan in Canada and Briton George Clinton in Charleston could be remarkably naive and clueless in judging enemy strength and resolve. Atkinson also dispels some common beliefs; for instance, while the rank and file were indeed generally inexperienced, there were plenty of more senior officers including Washington who had gained good fighting experience in the ten-year-old French and Indian War. As a general, Washington’s genius was to know when to retreat, to make the enemy fight a battle of attrition, to inspire and scold when necessary, and somehow to keep this ragtag group fighting men and their logistical support together, emerging as a great leader in the process. He was also adept at carefully maneuvering the levers of Congress and to keep driving home the great need for ammunition, weapons and ordinary provision through a mixture of cajoling and appeals to men’s better angels.

For anyone wanting a detailed and definitive military history of the Revolutionary War, Atkinson’s book is highly recommended. It gives an excellent account of the military details of the “glorious cause” and it paints a convincing account of the sheer improbability and capriciousness of its success.

View all my reviews