The Curious Wavefunction: July 2017

A Manhattan Project for AI?

Neuroscientist and AI researcher Gary Marcus has an op-ed in the NYT in which he bemoans the lack of international collaboration in AI, a limitation that Marcus thinks is significant hampering progress in the field. He says that AI researchers should consider a global effort akin to CERN; a massively funded, wide-ranging project to solve specific problems in AI that would benefit from the expertise of hundreds of independent researchers. This hivemind effort could potentially clear the AI pipeline of several clogs which have held back progress.

On the face of it this is not a bad idea. Marcus's opinion is that both private and public research has some significant limitations which a meld of the two could potentially overcome.

"Academic labs are too small. Take the development of automated machine reading, which is a key to building any truly intelligent system. Too many separate components are needed for any one lab to tackle the problem. A full solution will incorporate advances in natural language processing (e.g., parsing sentences into words and phrases), knowledge representation (e.g., integrating the content of sentences with other sources of knowledge) and inference (reconstructing what is implied but not written). Each of those problems represents a lifetime of work for any single university lab.

Corporate labs like those of Google and Facebook have the resources to tackle big questions, but in a world of quarterly reports and bottom lines, they tend to concentrate on narrow problems like optimizing advertisement placement or automatically screening videos for offensive content. There is nothing wrong with such research, but it is unlikely to lead to major breakthroughs. Even Google Translate, which pulls off the neat trick of approximating translations by statistically associating sentences across languages, doesn’t understand a word of what it is translating.

I look with envy at my peers in high-energy physics, and in particular at CERN, the European Organization for Nuclear Research, a huge, international collaboration, with thousands of scientists and billions of dollars of funding. They pursue ambitious, tightly defined projects (like using the Large Hadron Collider to discover the Higgs boson) and share their results with the world, rather than restricting them to a single country or corporation. Even the largest “open” efforts at A.I., like OpenAI, which has about 50 staff members and is sponsored in part by Elon Musk, is tiny by comparison.

An international A.I. mission focused on teaching machines to read could genuinely change the world for the better — the more so if it made A.I. a public good, rather than the property of a privileged few."

This is a good point. For all its commitment to blue sky research, Google is not exactly the Bell Labs of 2017, and except for highly targeted research like that done at Verily and Calico, it's still committed to work that has more or less immediate applications to its flagship products. And as Marcus says, academic labs suffer from limits to capacity that keep them from working on the big picture.

A CERN for AI wouldn't be a bad idea, but it would be different from the real CERN in some key aspects. Most notably, unlike discovering the Higgs Boston, AI has immense potential social, economic and political ramifications. Thus, keeping the research at a CERN-like facility open and free for all would be a steep challenge, with governments and individuals constantly vying for a piece of the pie. In addition, there would be important IP issues if corporations were funding this endeavor. And even CERN had to contend with paranoid fears of mini black holes, so one can only imagine how much the more realistic (albeit more modest) fears of AI would be blown out of proportion.

As interesting as a CERN-like AI facility is, I think another metaphor for a global AI project would be the Manhattan Project. Now let me be the first to say that I consider most comparisons of Big Science projects to the Manhattan Project to be glib and ill-considered; comparing almost any peacetime project with necessarily limited resources to a wartime project that benefited from a virtually unlimited supply of resources brought to bear on it with great urgency will be a fraught exercise. And yet I think the Manhattan Project supplies at least one particular ingredient for successful AI research that Marcus does not really talk about. It's the essential interdisciplinary nature of tackling big problems like nuclear weapons or artificial intelligence.

What seems to be missing from a lot of the AI research taking place today is that it does not involve scientists from all disciplines working closely together in an open, free-for-all environment. That is not to say that individual scientists have not collaborated in the field, and it's also not to say that fields like neuroscience and biology have not given computer scientists a lot to think about. But a practical arrangement in which generally smart people from a variety of fields work intensely on a few well-defined AI problems seems to still be missing.

The main reason why this kind of interdisciplinary work may be key to cracking AI is very simple: in a very general sense, there are no experts in the field. It's too new for anyone to really claim expertise. The situation was very similar to the Manhattan Project. While physicists are most associated with the atomic bomb, without specialists in chemistry, metallurgy, ordnance, engineering and electronics the bomb would have been impossible to create. More importantly, none of these people were experts in the field and they had to make key innovations on the fly. Let's take the key idea of implosion, perhaps the most important and most novel scientific contribution to emerge from the project: Seth Neddermeyer who worked on cosmic rays before the war came up with the initial idea of implosion that made the Nagasaki bomb possible. But Neddermeyer's idea would not have taken practical shape had it not been for the under-appreciated British physicist James Tuck who came up with the ingenious design of having explosives of different densities around the plutonium core that would focus the shockwave inward toward the core, similar to how a lens focuses light. And Tuck's design would not have seen the light of day had they not brought in an expert in the chemistry of explosives - George Kistiakowsky.

These people were experts in their well-defined fields of science, but none of them were expert in nuclear weapons design, and they were making it up as they went along. But they were generally smart and capable people, capable of thinking widely outside their immediate sphere of expertise, capable of producing at least parts of ideas which they could then hand over in a sort of relay to others with different parts.

Similarly, nobody in the field of AI is an expert, and just like nuclear weapons the field is still new enough and wide enough for all kinds of generally smart people to make contributions to it. So along with a global effort, we should perhaps have a kind of Manhattan Project of AI that brings together computer scientists, neuroscientists, physicists, chemists, mathematicians and biologists at the minimum to dwell on the field's outstanding problems. These people don't need to be experts or know much about AI at all, they don't even need to know how to implement every idea they have, but they do need to be idea generators, they need to be able to bounce ideas off of each other, and they need to be able to pursue odd leads and ends and try to see the big picture. The Manhattan Project worked not because of experts pursuing deep ideas but because of a tight deadline and a concentrated effort by smart scientists who were encouraged to think outside the box as much as possible. Except for the constraints of wartime urgency, it should not be hard to replicate that effort, at least in its essentials.

Want to know if you are depressed? Don't ask Siri just yet.

"Tell me more about your baseline calibration, Siri"

There's no dearth of articles claiming that the "wearables revolution" is around the corner and that we aren't far from the day when every aspect of our health is recorded every second, analyzed and sent to the doctor for rapid diagnosis and treatment. That's why it was especially interesting for me to read this new analysis from computer scientists at Berkeley and Penn that should temper the soaring enthusiasm that riddles pretty much all things "AI" these days.

The authors are asking a very simple question in the context of machine learning (ML) algorithms that claim to predict your mood - and by proxy mental health issues like depression - based on GPS and other data. What's this simple question? It's one about baselines. When any computer algorithm makes a prediction, one of the key questions is how much better this prediction is compared to some baseline. Another name for baselines is "null models". Yet another is "controls", although controls themselves can be artificially inflated.

In this case the baseline can be of two kinds: personal baselines (self-reported individual moods) or population baselines (the mood of a population). What the study finds is not too pretty. They analyze a variety of literature on mood-reporting ML algorithms and find that in about 77% of cases the studies use meaningless baselines that overestimate the performance of the ML models with respect to predicting mood swings. The reason is because the baselines that are used in most studies are population baselines rather than the more relevant personal baselines. The population baseline assumes a constant average state for all individuals, while the individual baseline assumes an average state for every individuals but different states between individuals.

Clearly doing better than the population baseline is not very useful for tracking individual mood changes, and this is especially true since the authors find greater errors for population baselines compared to individual ones; these larger errors can simply obscure model performance. The paper also consider two datasets and try to figure out how to improve the performance of models on these datasets using a metric which they call "user lift" that determines how much better the model is compared to the baseline.

I will let the abstract speak for itself:

"A new trend in medicine is the use of algorithms to analyze big datasets, e.g. using everything your phone measures about you for diagnostics or monitoring. However, these algorithms are commonly compared against weak baselines, which may contribute to excessive optimism. To assess how well an algorithm works, scientists typically ask how well its output correlates with medically assigned scores. Here we perform a meta-analysis to quantify how the literature evaluates their algorithms for monitoring mental wellbeing. We find that the bulk of the literature (∼77%) uses meaningless comparisons that ignore patient baseline state. For example, having an algorithm that uses phone data to diagnose mood disorders would be useful. However, it is possible to over 80% of the variance of some mood measures in the population by simply guessing that each patient has their own average mood - the patient-specific baseline. Thus, an algorithm that just predicts that our mood is like it usually is can explain the majority of variance, but is, obviously, entirely useless. Comparing to the wrong (population) baseline has a massive effect on the perceived quality of algorithms and produces baseless optimism in the field. To solve this problem we propose “user lift” that reduces these systematic errors in the evaluation of personalized medical monitoring."

That statement about being able to explain 80% of the variance in the model simply by guessing an average mood for every individual should stand out. It means that simple informed guesswork based on an average "feeling" is both as good as the model and is also eminently useless since it predicts no variability and is therefore of little practical utility.

I find this paper important because it should put a dent in what is often inflated enthusiasm about wearables these days. It also illustrates the dangers of what is called "technological solutionism": simply because you can strap on a watch or device on your body to measure various parameters and simply because you have enough computing power to analyze the resulting stream of data does not mean the results will be significant. You record because you can, you analyze because you can, you conclude because you can. What the authors find about tracking moods can apply to tracking other kinds of important variables like blood pressure and sleep duration. Every time the question must be; am I using the right baseline for comparison? And am I doing better than the baseline? Hopefully the authors can use larger and more diverse datasets and find out similar facts about other such metrics.

I also found this study interesting because it reminds me of a whole lot of valid criticism in the field of molecular modeling that we have seen over the last few years. One of the most important questions there is about null models. Whenever your latest and greatest FEP/MD/informatics/docking study is claimed to have done exceptionally well on a dataset, the first question should be; is it better than the null model? And have you defined the null model correctly to begin with? Is your model doing better than a simpler method? And if it's not, why use it, and why assign a causal connection between your technique and the relevant result?

In science there are seldom absolutes. Studies like this show us that every new method needs to be compared with what came before it. When old facts have already paved the way, new facts are compelled to do better. Otherwise they can create the illusion of doing well.

Cognitive biases in drug discovery, part 2: Anchoring, availability and representativeness

In the last post, I talked about how cognitive biases would be especially prevalent in drug discovery and development because of the complex, information-poor, tightly time-bound and financially-incentivized nature of the field. I talked about confirmation bias which riddles almost all human activity and which can manifest itself in drug discovery in the form of highlighting positive data for one’s favorite belief, metric or technique and rejecting negative data that does not agree with this belief.

In this post, I will mention a few more important cognitive biases. All of them are classic examples of getting carried away by limited patches of data and ignoring important information; often information on much larger samples. It’s worth noting that not all of them are equally important; a bias that’s more applicable in other parts of life may be less applicable in drug discovery, and vice versa. It’s also interesting to see that a given case may present more than one bias; because the human mind operates in multiple modes, biases often overlap. In the next post we will look at a few more biases related to statistics and comparisons.

Anchoring: Anchoring is the tendency to rely too much on one piece of information or trait, especially if it appears first. In some sense it’s a ubiquitous phenomenon, and it can also be subtle; it can be influenced by random things we observe and hear. A classic anchoring experiment was done by Kahneman and Tversky who showed participants a spinning wheel that would randomly settle on a number. After the spinning wheel stopped, the participants were asked what percentage of African countries are part of the U.N. It turned out that the percentage quoted by the participants was correlated to the random, unrelated number they saw on the wheel; if they saw a larger number they quoted a larger percentage, and vice versa. One important feature of the anchoring effect that this experiment demonstrated was that it involves random numbers or phenomena that can be completely irrelevant to the issue at hand.

It’s hard to point to specific anchoring biases in drug discovery, but one thing we know is that scientists can be skewed by numbers all the time, especially if the numbers are promising and seem very accurate. For instance, being biased by sparse in vitro affinity data for some early hits, leads or series can blind you to optimization of downstream properties. People sometimes come around, but I have seen even experienced medicinal chemists get obsessed with early leads with very good affinities but poor properties. In general, random promising numbers relating to affinity, properties, clinical data etc. for particular sets of compounds can lead one to believing that other similar compounds will have similar properties, or that those numbers are very relevant to begin with.

As has been well-documented, “similarity” itself can be a bias since every chemist for instance will look at different features of compounds to decide whether they are similar or not. Objective computational similarity comparisons can diminish this bias a bit, but since there’s no right way of deciding what the “perfect” computational similarity measure is either (and there’s plenty of misleading similarity metrics), this solution carries its own baggage.

You can also be carried away by measurements (often done using fancy instrumentation) that can sound very accurate; in reality, they are more likely to simply be precise. This problem is a bigger subset of problems related to what is called “technological solutionism”. It is the habit of believing in data when it’s generated by the latest and greatest new experimental or computational technique. This data can anchor our beliefs about drug behavior and lead us to extrapolate when we shouldn’t. The key questions to ask in this regard are: Are the numbers being measured accurate? Do the numbers actually measure the effect we think they do and is the effect real and statistically significant? Is the effect actually relevant to my hypothesis or conclusion? That last question is probably the most important and not asking it can lead you to squander a lot of time and resources.

Availability heuristic: A bias related to anchoring is availability. This is the tendency to evaluate new information based on information - especially recent information - that can be easily recalled. In case of drug discovery, easily recalled information can include early stage data, data that’s simply easier to gather, data that’s “popular” or data that’s simply repeated enough number of times, in the literature or by word of mouth. There are countless reasons and why certain information is easily recalled while other information is not. They can also be related to non-scientific variables like emotional impact. Were you feeling particularly happy or sad when you measured a particular effect? Was the effect validated by groupthink and did it therefore make you feel vindicated? Was the piece of data described by an “important” person who you admire? All these factors can contribute to fixing a certain fact or belief in our minds. Availability of specific information can cement that information as the best possible or most representative information.

Everyone is biased by successful projects they have worked on. They may recall a particular functional group or synthetic reaction or computational technique that worked for them and believe that it will work for other cases. This is also an example of confirmation bias, but the reason it’s an availability heuristic hinges on the fact that other information - and most notably information that can counter one’s beliefs - is not easily available. Most of the times we report positive results and not negative ones; this is a general problem of the scientific literature and research policy. Sometimes gathering enough data that would tweak the availability of the result is simply too expensive to do. That’s understandable, but it also means that we should be more wary about what we choose to believe.

Finally, the availability heuristic is particularly strong when a recent decision leads to an important consequence; perhaps installing a fluorine in your molecule suddenly led to improved pharmacokinetics, or using a certain formulation led to better half lives in patients. It is then tempting to believe that the data that was available is the data that’s generalizable, especially when it has had a positive emotional impact on your state of mind.

Representativeness: The availability bias is also closely related to the representativeness fallacy. In one sense the representativeness fallacy reflects a very common failing of statistical thinking: the tendency to generalize to a large sample based on a representative sample. For instance, a set of “rules” for druglike behavior may have been drawn from a limited set of studies. It would then be tempting to think that those rules applied to everything that was not tested in those studies, simply on the basis of similarity to the cases that were tested. Representativeness can manifest itself in the myriad definitions of “druglike” used by medicinal chemists as we all as metrics like ligand efficiency.

A great example of representativeness comes from Tversky and Kahneman’s test involving personality traits. Consider the following description of an individual:

“Linda is a 55-year-old woman with a family. She likes reading and quiet reflection. Ever since she was a child, Linda has been non-confrontational, and in a tense situation prefers tactical retreats to open arguments.”

Given this information, what’s Linda’s likely profession?
a. Librarian
b. Doctor

Most people would pick a. since Linda’s introverted qualities seem to align with one’s mental image of a librarian. But the answer is really likely to be b. since there are far more doctors than librarians, so even a tiny percentage of doctors with the aforementioned traits would constitute a bigger number than librarians.

Now let us apply the same kind of reasoning to a description of a not-so-fictional molecule:

“Molecule X is a small organic molecule with a logP value of 3.2, 8 hydrogen bond acceptors, 4 hydrogen bond donors and a molecular weight of 247. It has shown activity against cancer cells and was discovered at Novartis using a robotics-enabled phenotypic screening technique with high throughput.”

Given this information, what is more likely?

a. Molecule X is “druglike”.

b. Molecule X is non-druglike.

What I have just described is the famous Lipinski’s Rule of 5 that lays down certain rules related to basic physicochemical properties for successful drugs. If you were dealing with a compound having these properties, you would be more likely to think it’s a drug. But among the unimaginably vast chemical space of compounds, the number of druglike compounds is vanishingly small. So there are far more non-druglike compounds than druglike compounds. Given this fact, Molecule X is very likely to not be a drug, yet one is likely to use its description to believe it’s a drug and pursue it.

I can also bet that the anchoring effect is at work here: the numbers “3.2” for logP and “247” for molecular weight which sound very accurate as well as the fact that a fancy technique at a Big Pharma company found this molecule are more likely to contribute to your belief that you have a great potential drug molecule at hand. But most of this information is marginally relevant at best to the real properties of Molecule X. We have again been misled by focusing on a tiny sample with several irrelevant properties and thinking it to be representative of a much larger group of data points.

Base rate fallacy: Representativeness leads us to another statistical fallacy: the base rate fallacy. As we saw above, the mistake in both the librarian and the druglike examples is that we fail to take into account the base rate of non-librarians and non-druglike compounds.

The base rate fallacy is generally defined as the tendency to ignore base rate or general information and focus only on specific cases. There are at least two examples in which I can see the base rate fallacy manifesting itself:

1. In overestimating HTS/VS hit rates against certain targets or for certain chemotypes without taking base hit rates into account. In turn, the bias can lead chemists to make fewer compounds than what might be necessary to get a hit.

2. The base rate fallacy is more generally related to ignoring how often you might obtain a certain result by chance; for instance, a correlation between expression levels of two proteins or a drug and a protein, or one involving non-specific effects of a druglike compound. The chance result can then feed into the other biases described above like representativeness or availability.

Anchoring, availability, representativeness and the base rate fallacy are classic examples of both extrapolating from a limited amount of information and ignoring lots of unknown information. They speak to the shortcuts that our thinking takes when trying to quickly conclude trends, rules and future directions of inquiry based on incomplete data. A lot of the solutions to these particular biases involve generating more data or finding it in the literature. Unfortunately this is not always an achievable goal in the fast-faced and cash-strapped environment of drug discovery. In that case, one should at least identify the most important pieces of data one would need to gather in order to update or reject a hypothesis. For example, one way to overcome the base rate fallacy is to calculate what kind of sampling might be necessary to improve the confidence in the data by a certain percentage. If all else fails, one must then regard the data or belief that he or she has as highly tentative and constantly keep on looking for evidence that might shore up other beliefs.

Cognitive biases are a very human construct, and they are so relevant to drug discovery and science in general because these are very human enterprises. In the ideal world of our imagination, science is an objective process of finding the truth (and of discovering drugs). In the real world, science is a struggle between human fallibility and objective reality. Whether in drug discovery or otherwise, at every step a scientist is struggling to square the data with the biases in his or her mind. Acknowledging these biases and constantly interrogating them is a small first step in at least minimizing their impact.

Image source

Black holes and the curse of beauty: When revolutionary physicists turn conservative

This is my latest monthly column for 3 Quarks Daily.

On September 1, 1939, the leading journal of physics in the United States, Physical Review, carried two remarkable papers. One was by a young professor of physics at Princeton University named John Wheeler and his mentor Niels Bohr. The other was by a young postdoctoral fellow at the University of California, Berkeley, Hartland Snyder, and his mentor, a slightly older professor of physics named J. Robert Oppenheimer.

The first paper described the mechanism of nuclear fission. Fission had been discovered nine months earlier by a team of physicists and chemists working in Berlin and Stockholm who found that bombarding uranium with neutrons could lead to a chain reaction with a startling release of energy. The basic reasons for the large release of energy in the process came from Einstein's famous equation, E = mc², and were understood well. But a lot of questions remained: What was the general theory behind the process? Why did uranium split into two and not more fragments? Under what conditions would a uranium atom split? Would other elements also undergo fission?

Bohr and Wheeler answered many of these questions in their paper. Bohr had already come up with an enduring analogy for understanding the nucleus: that of a liquid drop that wobbles in all directions and is held together by surface tension until an external force that is violent enough tears it apart. But this is a classical view of the uranium nucleus. Niels Bohr had been a pioneer of quantum mechanics. From a quantum mechanical standpoint the uranium nucleus is both a particle and a wave represented as a wavefunction, a mathematical object whose manipulation allows us to calculate properties of the element. In their paper Wheeler and Bohr found that the uranium nucleus is almost perfectly poised on the cusp of classical and quantum mechanics, being described partly as a liquid drop and partly by a wavefunction. At twenty five pages the paper is a tour de force, and it paved the way for understanding many other features of fission that were critical to both peaceful and military uses of atomic energy.

The second paper, by Oppenheimer and Snyder, was not as long; only four pages. But these four pages were monumental in their importance because they described, for the first time in history, what we call black holes. The road to black holes had begun about ten years earlier when a young Indian physicist pondered the fate of white dwarfs on a long voyage by sea to England. At the ripe old age of nineteen, Subrahmanyan Chandrasekhar worked out that white dwarfs wouldn't be able to support themselves against gravity if their mass increased beyond a certain limit. A few years later in 1935, Chandrasekhar had a showdown with Arthur Eddington, one of the most famous astronomers in the world, who could not believe that nature could be so pathological as to permit gravitational collapse. Eddington was a previous revolutionary who had famously tested Einstein's theory of relativity and its prediction of starlight bending in 1919. By 1935 he had turned conservative.

Four years after the Chandrasekhar-Eddington confrontation, Oppenheimer became an instant revolutionary when he worked out the details of gravitational collapse all the way to their logical conclusion. In their short paper he and Snyder demonstrated that a star that has exhausted all its thermonuclear fuel cannot hold itself against its own gravity. When it undergoes gravitational collapse, it would present to the outside world a surface beyond which any falling object will appear to be in perpetual free fall. This surface is what we now call the event horizon; beyond the event horizon even light cannot escape, and time essentially stops flowing for an outside observer.

Curiously enough, the black hole paper by Oppenheimer and Snyder sank like a stone while the Wheeler-Bohr paper on fission gained wide publicity. In retrospect the reason seems clear. On the same day that both papers came out, Germany attacked Poland and started World War 2. The potential importance of fission as a source of violent and destructive energy had not gone unnoticed, and so the Wheeler-Bohr paper was of critical and ominous portent. In addition, the paper was in the field of nuclear physics which had been for a long time the most exciting field of physics. Oppenheimer's paper on the other hand was in general relativity. Einstein had invented general relativity more than twenty years earlier, but it was considered more mathematics than physics in the 1930s. Quantum mechanics and nuclear physics were considered the most promising fields for young physicists to make their mark in; relativity was a backwater.

What is more interesting than the fate of the papers themselves though is the fate of the three principal characters associated with them. In their fate as well as that of others, we can see the differences between revolutionaries and conservatives in physics.

Niels Bohr had pioneered quantum mechanics with his paper on atomic structure in 1913 and since then had been a founding father of the field. He had run an intellectual salon at his institute at Copenhagen which had attracted some of the most original physicists of the century; men like Werner Heisenberg, Wolfgang Pauli and George Gamow. By any definition Bohr had been a true revolutionary. But in his later life he turned conservative, at least in two respects. Firstly, he stubbornly clung to a philosophical interpretation of quantum mechanics called the Copenhagen Interpretation which placed the observer front and center. Bohr and his disciples rejected other approaches to quantum interpretation, including one named the Many Worlds Interpretation pioneered by John Wheeler's student Hugh Everett. Secondly, Bohr could not grasp the revolutionary take on quantum mechanics invented by Richard Feynman called the sum-over-histories approach. In this approach, instead of considering a single trajectory for a quantum particle, you consider all possible trajectories. In 1948, during a talk in front of other famous physicists in which Feynman tried to explain his theory, Bohr essentially hijacked the stage and scolded Feynman for ignoring basic physics principles while Feynman had to humiliatingly stand next to him. In both these cases Bohr was wrong, although the verdict is still out on the philosophical interpretation of quantum mechanics. It seems however that Bohr forgot one of his own maxims: "The opposite of a big truth is also a big truth". For some reason Bohr was unable to accept the opposites of his own big truths. The quantum revolutionary had become an old-fashioned conservative.

John Wheeler, meanwhile, went on to make not just one but two revolutionary contributions to physics. After pioneering nuclear fission theory with Bohr, Wheeler immersed himself in the backwater of general relativity and brought it into the limelight, becoming one of the world's foremost relativists. In the public consciousness, he will probably be most famous for coining the term "black hole". But Wheeler's contributions as an educator were even more important. Just like his own mentor Bohr, he established a school of physics at Princeton that produced some of the foremost physicists in the world; among them Richard Feynman, Kip Thorne and Jakob Bekenstein. Today Wheeler's scientific children and grandchildren occupy many of the major centers of relativity research around the world, and until the end of his long life that remained his proudest accomplishment. Wheeler was a perfect example of a scientist who stayed a revolutionary all his life, coming up with wild ideas and challenging the conventional wisdom.

What about the man who may not have coined the term "black holes" but who actually invented them in that troubled year of 1939? In many ways Oppenheimer's case is the most interesting one, because after publishing that paper he became completely disinterested in relativity and black holes, a conservative who did not think the field had anything new to offer. What is ironic about Oppenheimer is that his paper on black holes is his only contribution to relativity – he was always known for his work in nuclear physics and quantum mechanics after all – and yet today this very minor part of his career is considered to be his most important contribution to science. There are good reasons to believe that had he lived long enough to see the existence of black holes experimentally validated, he would have won a Nobel Prize.

And yet he was utterly oblivious to his creations. Several reasons may have accounted for Oppenheimer's lack of interest. Perhaps the most obvious reason is his leadership of the Manhattan Project and his fame as the father of the atomic bomb and a critical government advisor after the war. He also became the director of the rarefied Institute for Advanced Study and got saddled with administrative duties. It's worth noting that after the war, Oppenheimer co-authored only a single paper on physics, so his lack of research in relativity really reflects his lack of research in general. It's also true that particle physics became the most fashionable field of physics research after the war, and stayed that way for at least two decades. Oppenheimer himself served as a kind of spiritual guide to that field, leading three key postwar conferences that brought together the foremost physicists in the field and inaugurated a new era of research. But it's not that Oppenheimer simply didn't have the time to explore relativity; it's that he was utterly indifferent to developments in the field, including ones that Wheeler was pioneering at the time. The physicist Freeman Dyson recalls how he tried to draw out Oppenheimer and discuss black holes many times after the war, but Oppenheimer always changed the subject. He just did not think black holes or anything to do with them mattered.

In fact the real reason for Oppenheimer's abandonment of black holes is more profound. In his later years, he was afflicted by a disease which I call "fundamentalitis". As described by Dyson, fundamentalitis leads to a belief that only the most basic, fundamental research in physics matters. Only fundamental research should occupy the attention of the best scientists; other work is reserved for second-rate physicists and their graduate students. For Oppenheimer, quantum electrodynamics was fundamental, beta decay was fundamental, mesons were fundamental; black holes were applied physics, worthy of second-rate minds.

Oppenheimer was not the only physicist to be stricken by fundamentalitis. The malady was contagious and in fact had already infected the occupant of the office of the floor below Oppenheimer's – Albert Einstein. Einstein had become disillusioned with quantum mechanics ever since his famous debates with Bohr in the 1920s and his belief that God did not play dice. He continued to be a holdout against quantum mechanics; a sad, isolated, often mocked figure ignoring the field and working on his own misguided unification of relativity and electromagnetism. Oppenheimer himself said with no little degree of scorn that Einstein had turned into a lighthouse, not a beacon. But what is less appreciated is Einstein's complete lack of interest in black holes, which in some sense is even more puzzling considering that black holes are the culmination of his own theory. Einstein thought that black holes were a pathological example of his relativity, rather than a general phenomenon which might showcase deep mysteries of the universe. He also wrongly thought that the angular momentum of the particles in a purported black hole would stabilize its structure at some point; this thinking was very similar to Eddington's rejection of gravitational collapse, essentially based on faith that some law of physics would prevent it from happening.

Unfortunately Einstein was obsessed with the same fundamentalitis that Oppenheimer was, thinking that black holes were too applied while unified field theory was the only thing worth pursuing. Between them, Einstein and Oppenheimer managed to ignore the two most exciting developments in physics – black holes and quantum mechanics – of their lives until the end. Perhaps the biggest irony is that the same black holes that both of them scorned are now yielding some of the most exciting, and yes – fundamental – findings in cosmology, thermodynamics, information theory and computer science. The children are coming back to haunt the ghosts of their parents.

Einstein and Oppenheimer's fundamentalitis points to an even deeper quality of physics that has guided the work of physicists since time immemorial. That quality is beauty, especially mathematical beauty. Perhaps the foremost proponent of mathematical beauty in twentieth century physics was the austere Englishman Paul Dirac. Dirac said that an equation could not be true until it was beautiful, and he had a point. Some of the most important and universal equations in physics are beautiful by way of their concision and universal applicability. Think about E= mc², or Ludwig Boltzmann's equation relating entropy to disorder, S=klnW. Einstein's field equations of general relativity and Dirac's equation of the electron that marries special relativity with quantum mechanics are both prime examples of elegance and deep beauty. Keats famously said that "Beauty is truth and truth is beauty", and Dirac and Einstein seem to have taken his adage to heart.

And yet stories of Dirac and Einstein's quest for beauty are misleading. To begin with, both of them and particularly their disciples seem to have exaggerated the physicists' reliance on beauty as a measure of reality. Einstein may have become enamored of beauty in his later life, but when he developed relativity, he was heavily guided by experiment and stayed very close to the data. He was after all the pioneer of the thought experiment. As a patent clerk in the Swiss patent office at Bern, Einstein gained a deep appreciation for mechanical instrumentation and its power to reveal the secrets of nature. He worked with his friend Leo Szilard on that most practical of gadgets – a refrigerator. His later debates with Bohr on quantum mechanics often featured ingenious thought experiments with devices that he had mentally constructed. In fact Einstein's most profoundly emotional experience came not with a mathematical breakthrough but when he realized that his theory could explain deviations in the perihelion of Mercury, an unsolved problem for a century; this realization left him feeling that "something had snapped" inside him. Einstein's success thus did not arise as much from beauty as from good old-fashioned compliance with experiment. Beauty was a sort of secondary effect, serving as a post-facto rationalization for the correctness of the theory.

Unfortunately Einstein adopted a very different attitude in later years, trying to find a unified field theory that was beautiful rather than true. He started ignoring the experimental data that was being collected by particle physicists around him. We now know that Einstein's goal was fundamentally flawed since it did not include the theory of the strong nuclear force, a theory which took another thirty years to evolve and which could not have progressed without copious experimental data. You cannot come up with a complete theory, beautiful or otherwise, if you simply lack one of the key pieces. Einstein seems to have forgotten a central maxim of doing science, laid down by the sixteenth century natural philosopher Francis Bacon, one of the fathers of the scientific method: "All depends on keeping the eye steadily fixed upon the facts of nature and so receiving their images simply as they are. For God forbid that we should give out a dream of our own imagination for a pattern of the world". In his zeal to make physics beautiful, Einstein ignored the facts of nature and pursued the dreams of his once-awesome imagination.

Perhaps the biggest irony in the story of Einstein and black holes comes from the words of the man who started it all. In 1983, Subrahmanyan Chandrasekhar published a dense and authoritative tome called "The Mathematical Theory of Black Holes" which laid out the complete theory of this fascinating object in all its mathematical glory. In it Chandra (as he was called by his friends) had the following to say:

"In my entire scientific life, extending over forty-five years, the most shattering experience has been the realization that an exact solution of Einstein's equations of general relativity, discovered by the New Zealand mathematician, Roy Kerr, provides the absolutely exact representation of untold numbers of massive black holes that populate the universe. This shuddering before the beautiful, this incredible fact that a discovery motivated by a search after the beautiful in mathematics should find its exact replica in Nature, persuades me to say that beauty is that to which the human mind responds at its deepest and most profound."

Black holes and beauty had come full circle. Far from being a pathological outlier as believed by Einstein and Oppenheimer, they emerged as the epitome of austere mathematical and physical beauty in the cosmos.

Dirac seems to have been guided by beauty to an even greater extent than Einstein, but even there the historical record is ambiguous. When he developed the Dirac equation, he was very closely aware of the experimental results. His biographer Graham Farmelo notes, "Dirac tried one equation after another, discarding each one as soon as it failed to conform to his theoretical principles or to the experimental facts". Beauty may have been a criterion in Dirac's choices, but it was more a way of serving as an additional check rather than a driving force. Unfortunately Dirac did not see it that way. When Richard Feynman and others developed the theory of quantum electrodynamics – a framework that accounts for almost all of physics and chemistry except general relativity - Dirac was completely unenthusiastic about it. This was in spite of quantum electrodynamics agreeing with experiment to a degree unprecedented in the history of physics. When asked why he still had a problem with it, Dirac said it was because the equations were too ugly; he was presumably referring to a procedure called renormalization that got rid of infinities that had plagued the theory for years.

He continued to believe until the end that those ugly equations would somehow metamorphose into beautiful ones; the fact that they worked spectacularly was of secondary importance to him. In that sense beauty and utility were opposed in Dirac's mind. Dirac continued to look for beauty in his equations throughout his life, and this likely kept him from making any contribution that was remotely as important as the Dirac equation. That's a high bar, of course, but it does speak to the failure of beauty as a primary criterion for scientific discovery. Later in his life, Dirac developed a theory of magnetic monopoles and dabbled in finding formulas relating the fundamental constants of nature to each other; to some this was little more than aesthetic numerology. Neither of these ideas has become part of the mainstream of physics.

It was the quest for beauty and the conviction that fundamental ideas were the only ones worth pursuing that turned Einstein and Dirac from young revolutionaries to old conservatives. It also led them to ignore most of the solid progress in physics that was being made around them. The same two people who had let experimental facts serve as the core of their decision making during their youth now behaved as if both experiment and the accompanying theory did not matter.

Yet there is something to be said for making beauty your muse, and ironically this realization comes from the history of the Dirac equation itself. Perhaps the crowning achievement of that equation was to predict the existence of positively charged electrons or positrons. This discovery seemed so alien and unsettled Dirac so much at the beginning that he thought positrons had to be protons; it wasn't until Oppenheimer showed this could not be the case that Dirac started taking the novel prediction seriously. Positrons were finally found by Carl Anderson in 1932, a full three years after Dirac's prediction. This is one of the very few times in history that theory has genuinely predicted a completely novel fact of nature with no experimental basis in the past. Dirac would claim that it was the tightly knit elegance of his equation that logically ordained the existence of positrons, and one would be hard pressed to argue with him. Even today, when experimental evidence is lacking or absent, one has to admit that mathematical beauty is as good a guide to the truth as any other.

Modern theoretical physics has come a long way from the Dirac equation, and experimental evidence and beauty still guide practitioners of the field. Unfortunately physics at the frontiers seems to be unmoored from both these criteria today. The prime example of this is string theory. According to physicist Peter Woit and others, string theory has made no unique, experimentally testable prediction since its inception thirty years ago, and it also seems that its mathematics is unwieldy; while the equations seem to avoid the infinities that Dirac disliked, they also presents no unique, elegant, tightly knit mathematical structure along the lines of the Dirac equation. One wonders what Dirac would have thought of it.

What can today's revolutionaries do to make sure they don't turn conservative in their later years? The answer might come not from a physicist but from a biologist. Charles Darwin, when explaining evolution by natural selection, pointed out a profoundly important fact: "It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is most adaptable to change". The principle applies to frogs and butterflies and pandas, and there is no reason why it should not apply to theoretical physicists.

What would it take for the next Dirac or Einstein to make a contribution to physics that equals those of Einstein and Dirac themselves? We do not know the answer, but one lesson that the lives of both these physicists has taught us – through their successes as well as their failures – is to have a flexible mind, to always stay close to the experimental results and most importantly, to be mindful of mathematical beauty while not making it the sole or even dominant criterion to guide your thought processes, especially when an "uglier" theory seems to agree well with experiment. Keep your eye fixed on the facts of nature, not just on the dream of your imagination.

Friday Book Review: "The One Device: The Secret History of the iPhone"

Isaac Newton's quote about standing on the shoulders of giants applies to science as well as technology. No technology arises in a vacuum, and every technology is in some sense a cannibalized hybrid of versions of it that came before. Unlike science, however, technology suffers from a special problem: that of mass appeal and massive publicity, usually made possible by one charismatic individual. Because of the myth-weaving associated with it, technology even more than science can thus make make us forget its illustrious forebears.

Brian Merchant's book on the origin story of the iPhone is a good example of both these aspects of technological innovation. It was the culmination of dozens of technical innovations going back decades, most of which are now forgotten. And it was also sold to the public as essentially the brainchild of one person - Steve Jobs. This book should handily demolish that latter myth.

Merchant's book takes us into both the inside of the iPhone as well as the inside of the technically accomplished team at Apple that developed the device. He shows us how the idea of the iPhone came about through fits and starts, even as concepts from many different projects were finally merged into one. The initial goal was not a phone; Jobs finally made it one. But for most of the process Job was not involved, and one of the biggest contributions that the book makes is to highlight the names of many unsung engineers who both conceived the project and stuck with it through thick and thin.

Merchant illuminates the pressure-cooker atmosphere at Apple that Jobs cultivated as well as his quest for perfection. Jobs comes across as an autocratic and curmudgeonly task master in the account; most of the innovations were not his, and people were constantly scrambling to avoid incurring his wrath, although that did not prevent him from being first on all the key patents. In some sense he seems to have hampered the development of the iPhone because of his mercurial and unpredictable personality. Nonetheless, he had a vision for the big picture and commanded an authority that none of the others did, and that vision was finally what made the device a reality. Merchant's doggedness in hunting down the true innovators behind the phone and getting them to talk to him - a constantly uphill battle in the face of Apple's ultra-secret culture - is to be commended. This is probably as much of an outsider's inside account as we are likely to get.

The second part of the book is more interesting in many ways, because in this part Merchant dons the hat of investigative field reporter and crisscrosses the world in search of the raw materials that the phone is made up of. As a chemist I particularly appreciated his efforts. He surreptitiously sends a phone to a metallurgist who pulverizes it completely and analyzes its elemental composition; Merchant lovingly spends three pages listing the percentages of every element in there. His travels take him deep into a Bolivian mine called Cerro Rico which mines almost all the lithium that goes into the lithium-cobalt battery that powers the device. This mine, along with mines in other parts of South America and Africa which produce most of the metals found in the phone, often have atrocious safety records; many of the miners at Cerro Rico have average life expectancies of 40 years, and it's only the terrible standard of living that compels desperate job-seekers to try to make a quick buck here. Merchant also hunts down the father of the lithium-ion battery, John Goodenough (perpetual contender for a Nobel Prize), who gives him a tutorial not just on that revolutionary invention but on another, even more powerful sodium-powered batter that the 94-year-old chemist is working on.

Merchant also explores the origin of the Gorilla Glass that forms the cover of the phone; that glass was the result of a late-stage, frenzied negotiation between Jobs and Corning. He leads us through the history of the gyroscopes, image stabilizing camera and accelerometers in the device, none of which were invented at Apple and all of which are seamlessly integrated into the system. And there is a great account of the transgender, maverick woman who massively contributed to the all-important ARM chip that is at the heart of the phone's operating system. Equally important is the encryption system which illustrates one of the great paradoxes of consumer technology: we want our data to be as secure as possible, and at the same time we also want to use technology in myriad ways in which we willingly give up our privacy. Finally, there is an important discussion of how the real innovation in the iPhone was not the iPhone at all - it was the App Store: only when third party developers got permission to write their own apps did sales soar (think Uber). That's a marketing lesson for the business school textbooks I believe.

One of the most important - if not the most important - innovations in the iPhone is the multitouch display, and no other part of the phone illustrates how technology and ideas piggyback on each other. Contrary to popular wisdom, neither Steve Jobs nor Apple invented multitouch. It was in fact invented multiple times before over three decades; at particle physics lab CERN, at the University of Toronto, by a pioneering educator who wanted to make primitive iPad-like computers available to students, and finally, by a small company trying to make it easier for people with hand disabilities to operate computers. One of Apple's employees whose hand was sprained was seen wearing that device; it caught the eye of one of the engineers on the team, and the rest is history. Multitouch is the perfect example of how curiosity-based research gradually flows into useful technology, which then accidentally gets picked up by a giant corporation which markets it so well that we all misattribute the idea to the giant corporation.

Another example of this technological usurpation is the basic idea of a smartphone, which again did not come from Apple at all. In fact this discussion takes Merchant into a charming sojourn into the nineteenth century when fanciful ideas about wireless telegraphy dotted the landscape of popular culture and science fiction; in one illustration from 1907, Punch Magazine anticipated the social isolation engendered by technology by showing a lady and her lover sitting next to each other but choosing to communicate through a fictional wireless telegraph. Like many other inventions, ideas about wireless communication had been "in the air" since Bell developed the telephone, and so the iPhone in a sense is only the logical culmination of this marketplace of ideas. The smart phone itself came from an engineer at IBM named Frank Canova. For a variety of reasons - most notably cost - Canova's device never took off, although if you look at it it appears to be an almost identical albeit primitive version of the iPhone.

In the last part of the book, Merchant takes us on a trip to Foxconn, the world's largest electronics factory. Foxconn which is based in China is a city unto itself, and it's fascinating to have Merchant lead us through its labyrinthine and dimly-lit corridors, housing literally hundreds of thousands of workers whose toil reminds us of scenes from the underground city of Zion in the "Matrix" franchise. At one point Merchant makes an unauthorized excursion into forbidden parts of the factory and is amazed to see a landscape of manufacturing whose sheer scale seems to stretch on forever. The scenes are fascinating even if morbidly so; the working environment is brutal, the workers are constantly overworked and live in cramped quarters, and the suicides are so frequent that the authorities had to install nets in front of buildings to catch those who jumped from the top.

In one sense everything - the Bolivian lithium salt mines with workers breathing noxious fumes and being paid in pennies, the iPhone scrap heaps in Africa over which eager counterfeiters drool, the dozen other odd sourcing companies for metals and plastics, the dizzying cornucopia of iPhone parts with their diverse history and the sweat and toil of countless unknown laborers in far-flung parts of the world struggling to produce this device, often under conditions that would be downright illegal in the United States - come together on that dimly lit factory floor in Foxconn to bring you the piece of technology on which you may be reading these words.

You should never look at your phone the same way again.

Lab automation using machine learning? Hold on to your pipettes for now.

There is an interesting article on using machine learning and AI for lab automation in Science that generally puts a positive spin on the use of smart computer algorithms for automating routine experiments in biology. The idea is that at some point in the near future, a scientist could design, execute and analyze the results of experiments on her MacBook Air from a Starbucks.

There's definitely a lot of potential for automating routine lab protocols like pipetting and plate transfers, but this has already been done by robots for decades. What the current crop of computational improvements plans to do is potentially much more consequential though; it is to conduct entire suites of biological experiments with a few mouse clicks. The CEO of Zymergen, a company profiled in the piece, says that the ultimate objective is to "get rid of human intuition"; his words, not mine.

I must say I am deeply skeptical of that statement. There is no doubt that parts of experiment planning and execution will indeed become more efficient because of machine learning, but I don't see human biologists being replaced or even significantly augmented anytime soon. The reason is simple: most of research, and biological research in particular, is not about generating and rapidly testing answers (something which a computer excels at), but about asking questions (something which humans typically excel at). A combination of machine learning and robotics may well be very efficient at laying out a whole list of possible solutions and testing them, but it will all come to naught if the question that's being asked is the wrong one.

Machine learning will certainly have an impact, but only in a narrowly circumscribed set of experimental space. Thus, I don't think it's just a coincidence that the article focuses on Zymergen, a company which is trying to produce industrial chemicals by tweaking bacterial genomes. This process involves mutating thousands of genes in bacteria and then picking combinations that will increase the fitness of the resulting organism. It is exactly the kind of procedure that is well-adapted to machine learning (to try to optimize and rank mutations for instance) and robotics (to then perform the highly repetitive experiments). But that's a niche application, working well in areas like directed evolution; as the article itself says, "Maybe Zymergen has stumbled on the rare part of biology that is well-suited to computer-controlled experimentation."

In most of biological research, we start with figuring out what question to ask and what hypotheses to generate. This process is usually the result of combining intuition with experience and background knowledge. As far as we know, only human beings excel in this kind of coarse-grained, messy data gathering and thinking. Take drug discovery for instance; most drug discovery projects start with identifying a promising target or phenotype. This identification is usually quite complicated and comes from a combination of deep expertise, knowledge of the literature and careful decisions on what are the right experiments to do. Picking the right variables to test and knowing what the causal relationships between them are is paramount. In fact, most drug discovery fails because the biological hypothesis that you begin with is the wrong one, not because it was too expensive or slow to test the hypothesis. Good luck teaching a computer to tell you whether the hypothesis is the right one.

It's very hard for me to see how to teach a machine this kind of multi-layered, interdisciplinary analysis. One we have the right question or hypothesis of course we can potentially design an automated protocol to carry out the relevant experiments, but reaching that point is going to take a lot more than just rapid trial and error and culling of less promising possibilities.

This latest wave of machine learning optimism therefore looks very similar to the old waves. It will have some impact, but the impact will be modest and likely limited to particular kinds of projects and goals. The whole business reminds me of the story - sometimes attributed to Lord Kelvin - about the engineer who was recruited by a company to help them with building a bridge. After thinking for about an hour, he made a mark with a piece of chalk on the ground, told the company's engineers to start building the bridge at that location, and then billed them for ten thousand dollars. When they asked what on earth he expected so much money for, he replied, "A dollar for making that mark. Nine thousand nine hundred and ninety nine for knowing where to make it."

I am still waiting for that algorithm which tells me where to make the mark.