Field of Science

James P. Snyder (1939-2016)

My PhD advisor Jim Snyder passed away recently from a stroke. I was a graduate student with him at Emory University from 2003-2009. Along with my other advisor, Dennis Liotta, Jim had a very significant impact on both my scientific as well as personal development. When I entered Emory wishing to do research at the intersection of chemistry and medicine, I quickly realized that I was much more suited to computational work rather than experimental work, both by skill and by temperament. Jim quickly recognized my interest in building molecular models and without hesitation invited me to do research in his lab.

Jim was a world-class molecular modeler and scientist. Although my own experience in the field is short-lived, I can say that he had the best intuitive knowledge of chemical systems and the strengths and limitations of modeling that I have ever seen. He had an almost tactile feel for forces and interactions between molecules, and while he was technically accomplished his real function was not technical; rather it was in knowing what method would work and what wouldn't, what experiments one should do next, and how one would interpret the results unearthed by modeling and simulation without reading too much into them. For him modeling was not about coding or about favorite techniques, it was one more valuable lens through which you could address chemical and biological problems. Chemical intuition was his gift, and simply by looking at a picture of two molecules interacting with each other he knew what the sizes of the atoms were, what part you could safely modify without introducing a clash, what part was essential and what part was dispensable. He exemplified semi-quantitative science at its best and knew a good model when he saw one; he especially knew that the best models are not always the prettiest ones. I have never met a molecular modeler quite like him. He was also one of the very few scientists I know who achieved high scientific success in both industry and academia, and he could transition seamlessly between the two domains: he held successive positions at the University of Copenhagen, Merck, Yeshiva University, G D Searle and Emory. He also had a very international outlook, spending time as a professor in Copenhagen, New York and Italy and collaborating with friends around the world.

The results of his expertise were communicated in more than 300 papers in top journals. They spanned a remarkable diversity of fields, from the purest chemistry to the most applied drug design, from the structure of organometallic copper compounds to the properties of anticancer drugs in the body. In each one of these papers Jim used computational techniques sparingly and combined them as much as possible with experimental data (as in the papers we wrote together combining NMR data with conformational searching). By any standards Jim’s output was prolific, and I have seen very few scientists from any field who worked on such a wide variety of topics. What’s even more remarkable is that several of these papers in top journals like the Journal of the American Chemical Society are single-author publications; they attest to Jim’s dedication to scholarship and careful attention to detail. These single-author papers are a memorable testament in an age when collaborative work is almost de riguer. I remember my own papers with him and how he stressed paying attention to every single word and sentence, even in the cover letter.

In spite of some significant personal and professional difficulties, Jim lived a life of great scientific and personal productivity. He was not only a great scientist but also an accomplished hiker and mountain climber, having made it as far as Mt. Everest base camp in the Himalayas. He enjoyed good books (Michael Shermer whose books I often gifted to him was a particular favorite), fine wine and spending time with his family, and had friends and collaborators in all corners of the world. He had some amusing favorite figures of speech which I suspect fell out of favor in the 60s or so; someone who liked controversy was no shrinking violet, you never wanted to open a can of worms, a kilocalorie or two of energy was small potatoes, and that particular approach...well, that was just a different kettle of fish. He was one of the most laid back and yet focused people I have met. He wore bright clothes to work, drove a yellow scooter or red jeep, and exuded informality and ease. And while I saw him in a tie exactly once in ten years, I have never seen him make do with a sloppy piece of scientific work.

All though graduate school and after graduation, he and I enjoyed a great working relationship as well as a very warm friendship. Whenever I visited Atlanta I used to have breakfast or lunch with him ("Always happy to 'jawbone' with you, Ash"), trading stories and looking to him for wise advice on science and life. The last time we met for breakfast was in July 2015, and the last time I saw him was after his stroke, in December 2015. Even when the stroke had affected his speech, Jim was upbeat and clearly enjoyed talking to me; I was very glad I could see him and spend an hour chatting.

Jim left me with two singular lessons which I will never forget.

One was a scientific lesson: always question the assumptions behind a scientific study or paper, no matter how elegant the methodology might be, how pretty the molecular models might look, or how famous the authors in question might be. Even the most impressive science can crumble if built on a foundation of flawed assumptions.

The second lesson was human: always take everyone's questions or statements in good faith, and always treat everyone with respect and kindness, whether Nobel Laureate or undergraduate student. Jim taught me this lesson by example every single day. I have seen him accord the same honest respect and careful consideration to beginning undergraduates as well as famous professors from Ivy League universities. For him respect did not scale with rank but remained a constant property.

I am sure Jim would have liked us to celebrate his life, so I will end with a funny and happy memory of him. We were together on a scientific trip to Imperial College in London in 2007, and were staying at a hotel in Kensington. When we went to our rooms which were on separate floors, Jim asked me to meet him in his room (say room no. 36) at a specific time so that we could head out for dinner. At the specified time I went to room no. 36 and could not find it. I looked around everywhere. Finally, in desperation, I yelled out Jim's name and heard a squeaky voice coming from - at least it seemed like - the floor. I located a very tiny hole in the floor and walked down the very narrow staircase. That's where I saw room no. 36 all by itself; to call it a room would be an exaggeration since it was no bigger than a jail cell.

Jim was sitting on the bed looking more miserable than I had ever seen him. Ash, he quipped, this is the worst hotel room I have ever stayed in. I could hardly stifle my laughter and tried to look as sympathetic as I could.

Jim Snyder touched the lives of many people in multiple ways, and his scientific and human qualities live on in many of us.

Kuhn, Galison and Dyson: Tool-driven and idea-driven revolutions in chemistry and biology

Peter Galison
Are scientific revolutions made from new ideas or new instruments? For most of the history of science we have exalted ideas over 'mere' tools: relativity and quantum mechanics command a kind of rarefied respect which magnetic resonance and gene sequencing don't. The image of an Einstein single-handedly inventing relativity through armchair speculation is far more romantic than that of an Ernest Rutherford scattering alpha particles off a gold foil using string and sealing wax.

And yet this blinkered view of scientific history would miss at least half of all scientific breakthroughs. A few years ago Freeman Dyson wrote a perspective in Science magazine in which he provided a summary of a theme he has explored in his book "The Sun, the Genome and the Internet". Dyson's central thesis was that scientific revolutions are driven as much or even more by tools than by ideas. This view runs somewhat contrary to the generally accepted belief regarding the dominance of Kuhnian revolutions - described famously by Thomas Kuhn in his seminal book "The Structure of Scientific Revolutions" - which are engineered by shifting paradigms. In contrast, in reference to Harvard university historian of science Peter Galison and his book "Image and Logic", Dyson emphasizes the importance of Galisonian revolutions which are driven mainly by experimental tools.

As a chemist I find myself in almost complete agreement with the idea of tool-driven Galisonian revolutions. Chemistry as a discipline rose from the ashes of alchemy, a thoroughly experimental activity. Since then there have been four revolutions in chemistry that can be called Kuhnian. One was the attempt by Lavoisier, Priestley and others at the turn of the 17th century to systematize elements, compounds and mixtures to separate chemistry from the shackles of alchemical mystique. The second was the synthesis of urea by Friedrich Wohler in 1828; this was a paradigm shift in the true sense of the term since it placed substances from living organisms into the same realm as those from non-living organisms. The third revolution was the conception of the periodic table by Mendeleev, although this was more of a classification akin to the classification of elementary particles by Murray Gell-Mann and others during the 1960s. 

A minor revolution accompanying Mendeleev's invention that was paramount for organic chemistry was the development of the structural theory by von Leibig, Kekule and others which led the way to structure determination of molecules. The fourth revolution was the application of quantum mechanics to chemistry and the elucidation of the chemical bond by Pauling, Slater, Mulliken and others. All these advances blazed new trails, but none were as instrumental or overarching as the corresponding revolutions in physics by Newton (mechanics), Carnot, Clausius and others (thermodynamics), Maxwell and Faraday (electromagnetism), Einstein (relativity) and Einstein, Planck and others (quantum mechanics).

Why does chemistry seem more Galisonian and physics seem more Kuhnian? One point that Dyson does not allude to but which I think is cogent concerns the complexity of the science. Physics can be very hard, but chemistry is more complex in that it deals with multilayered, emergent systems that cannot yield themselves to reductionist, first principles approaches. This kind of complexity is also apparent in the branches of physics typically subsumed under the title of "many-body interactions". Many-body interactions range from the behavior of particles in a superconductor to the behavior of stars condensing into galaxies under the influence of their mutual gravitational interaction. There are of course highly developed theoretical frameworks to describe both kinds of interactions, but they involve several approximations and simplifications, resulting in models rather than theories. My contention is that the explanation of more complex systems, being less amenable to theorizing, is driven by Galisonian revolutions rather than Kuhnian.

Chemistry is a good case in point. Linus Pauling's chemical theory arose from the quantum mechanical treatment of molecules, and more specifically the theory of the simplest molecule, the hydrogen molecular ion which consists of one electron interacting with two nuclei. The parent atom, hydrogen, is the starting point for the discipline of quantum chemistry. Open any quantum chemistry textbook and what follows from this simple system is a series of approximations that allow one to apply quantum mechanics to complex molecules. Today quantum chemistry and more generally theoretical chemistry are highly refined techniques that allow one to explain and often predict the behavior of molecules with hundreds of atoms.

And yet if you look at the insights gained into molecular structure and bonding over the past century, they have come from a handful of key experimental approaches. Foremost among these are x-ray diffraction, which Dyson also mentions, and Nuclear Magnetic Resonance (NMR) spectroscopy, also the basis of MRI. It is hard to overstate the impact that these techniques have had on the determination of the structure of literally millions of molecules ranging across an astonishing range of diversity, from table salt to the ribosome. X-ray diffraction and NMR have provided us not only with the locations of the atoms in a molecule, but also with invaluable insights into the bonding and energetic features of the arrangements. Along with other key spectroscopic methods like infrared spectroscopy, neutron diffraction and fluorescence spectroscopy, x-rays and magnetic resonance have not just revolutionized the practice of chemical science but have also led to the most complete understanding we have yet of chemical bonding. 

Contrast this wealth of data with similar attempts using purely theoretical techniques which can also be used in principle to predict the structures, properties and functions of molecules. Progress in this area has been remarkable and promising, but it's still orders of magnitude harder to predict, say, the most stable configuration of a simple molecule in a crystal than to actually crystallize the chemical even by trial and error. From materials for solar cells to those for organ transplants, experimental structure determination in chemistry has fast outpaced theoretical prediction.

What about biology? The Galisonian approach in the form of x-ray diffraction and NMR has been spectacularly successful in the application of chemistry to biological systems that culminated in the advent of molecular biology in the twentieth century. Starting with Watson and Crick's solution of the structure of DNA, x-ray diffraction basically helped formulate the theory of nucleic acid and protein structure. Particularly noteworthy is the Sanger method of gene sequencing - an essentially chemical technique - which has had a profound and truly revolutionary impact on genetics and medicine that we are only beginning to appreciate. Yet we are still far from a theory of protein structure in the form of protein folding; that Kuhnian revolution is yet to come. 

The dominance of Galisonian approaches to biochemistry raise the question about the validity of Kuhnian thinking in the biological sciences. This is an especially relevant question because the last Kuhnian revolution in biology - a synthesis of known facts leading to a general explanatory theory that could encapsulate all of biology - was engineered by Charles Darwin more than 150 years ago. Since then nothing comparable has happened in biological science; as indicated earlier, the theoretical understanding of the genetic code and the central dogma came from experiment rather than the very general synthesis in terms of replicators, variation and fitness that Darwin put together for living organisms. Interestingly, in his later years (and only a year before the discovery of the structure of DNA) the great mathematician John von Neumann put forward a Darwin-like, general theoretical framework that explained how replication and metabolism could be coupled to each other, but this was largely neglected and certainly did not come to the attention of practicing chemists and biologists.

Neuroscience is another field in which tool-driven revolutions are likely to uncover deep truths about the human brain; witness the coming of techniques like single-neuron recording, CLARITY and optogenetics which are almost certainly poised to revolutionize our understanding of that fundamental biological entity. For too long neuroscience was caught up in shaky concepts and half-baked facts. There is little doubt in my mind that new tools are going to shake up things in fundamental ways in the 21st century. 

Dyson's essay and the history of science does not necessarily assert that the view of science in terms of Kuhnian revolutions is misguided and that in terms of Galisonian revolutions is justified. It's rather that complex systems are often more prone to Galisonian advances because the theoretical explanations are simply too complicated. Another viewpoint driven home by Dyson is that Kuhnian and Galisonian approaches alternate and build on each other. It is very likely that after a few Galisonian spells a field becomes ripe for a Kuhnian consolidation.

Biology is going to be especially interesting in this regard since there are parts of it that need both tools and new theories. The most exciting areas in current biology are considered to be neuroscience, systems biology and genomics. These fields have been built up from an enormous number of experimentally determined facts, but they are in search of general theories. However, it is very likely that a general theoretical understanding of the cell or the brain will come from very different approaches from the reductionist approaches that were so astonishingly successful in the last two hundred years. 

A Kuhnian revolution to understand biology could likely borrow from its most illustrious practitioner - Charles Darwin. One of the signature features of Darwin's theory is that it seeks to provide a unified understanding that transcends multiple levels of biological organization, from individual to society. Our twenty-first view of biology adds two pieces, genes and culture, to opposite ends of the ladder. It is time to integrate these pieces - obtained by hard, creative Galisonian science - into the Kuhnian edifice of biology.

This is an updated version of a past post.

On AlphaGo, chemical synthesis and the rise of the intuition machines

There is a very interesting article by quantum computing and collaborative science pioneer Michael Nielsen in Quanta Magazine on the recent victory of Google’s program AlphaGo over the world’s reigning Go champion, Lee Sedol. In the article Nielsen tries to explain what makes this victory so special. Some people seem to think that Go is just a more complex version of chess, with more branching possibilities and solutions. And since Deep Blue beat Kasparov in 1997 and we have acquired far more computing power since then, this would simply be one more summit on that journey.

In the article Nielsen explains why this belief is incorrect. Go is very different from chess for many reasons. Not only are the number of solutions and branch points exponentially greater, but the winning features of Go are more nebulous are far more prone to intuition. The difference between a win for black over white in chess is clear – it’s when white’s king is checkmated. But the difference between a win for black over white in Go can be very subtle; it’s when black’s pieces surround white’s pieces better, and the definition of “surround” can be marginal. Unlike chess where you display all your pieces, in Go many of your pieces are held in reserve, so your opponent has to consider those pieces too when he or she makes a move. Unlike chess where even an amateur can recognize a winning board, a winning board in Go may be only slightly different from a losing board. In his book “On China”, Henry Kissinger says that China’s strategy is like Go’s whereas the West’s has been like chess’s. That’s something to think about.

However, the important thing here is that it’s precisely these complex features of Go that make it much more prone to subtle intuition: for a human Go player, recognizing a winning board over a losing board is as much a matter of intuition and feeling as it is of mechanistic rational analysis. It’s these features that make it far harder for a machine to defeat a Go champion than a chess champion. And yet two weeks back it did.

What made this victory possible? According to Nielsen, it was the fact that AlphaGo’s algorithm was actually trained to recognize human intuition. Its creators did this by training the program’s neural nets on thousands of past winning boards. These winning boards were human products; they were products of the intuition employed by human players. But the program did not need to understand intuition; it simply needed to learn it by looking at thousands of cases where that intuition worked successfully. AlphaGo’s creators further made the neural net play against itself and tweak its parameters until it achieved a successful result. Ultimately when the program was deployed, not only could it calculate a mind boggling number of winning boards, but it could also use elements of human intuition to say which ones looked good.

Nielsen thinks that it’s this ability to capture important elements of human intuition that marks AlphaGo’s victory as a new era in artificial intelligence. Human beings think that intuition is perhaps the most important thing that distinguishes them from machines. And yet if we think about intuition as pattern recognition and extrapolation gathered from evaluating thousands of past examples, there is no reason why a machine cannot learn how to intuit its way through a tough problem. In our case those countless examples have been ingrained by millions of years of neural and biological evolution; in case of machines we will provide it with those examples ourselves.

I thought of AlphaGo’s victory as I read Derek’s post on automating synthesis by having a smart algorithm work through all the intermediates, starting materials and potential branch points of a complex molecule’s synthesis. Derek thinks that it’s not too far in the future when such automated synthesis starts contributing significantly to the production of complex molecules like drugs and and other materials, and I tend to agree with him. People have been working on applying AI to synthesis since E J Corey’s LHASA program and Carl Djerassi’s work on AI in chemistry from the late 70s, and it does seem that we are getting close to what would be at least a moderately interesting tipping point in the field.

One of the commenters on Derek’s blog brought up the subject of AlphaGo’s latest win, and when I pointed out Nielsen’s article, had the following to say:
"These are my thoughts exactly. When we don’t understand chemical intuition people tend to think “Well, if we can’t understand it ourselves, how can we code it into a computer?” But what if the question is turned back onto you, “If YOU don’t understand chemical intuition, how did YOU learn it?” You learned it by looking at lots of reactions and drawing imperceptible parallels between disparate data points. This is neural net computing and, as you say, this is what allows AlphaGo to have “intuition” without that feature being encoded into its logic. The way these computers are now learning is no different than how humans learn, I think we just need to provide them with informational infrastructure that allows them to efficiently and precisely navigate through the data we’ve generated so far. With Go that’s simple, since the moves are easily translated to 1’s and 0’s. With chemistry it’s tougher, but certainly nowhere near impossible."
“Tougher, but not impossible” is exactly what I think about applying AI to automated chemical synthesis and planning. The fact is that we have accumulated enough synthetic data and wisdom over fifty years of brilliant synthetic feats to serve as a very comprehensive database for any algorithm wanting to use the kind of strategy that AlphaGo did. What I think should be further done is for developers of these algorithms to also train their program’s neural nets on past successful syntheses by chemists who were not just renowned for their knowledge but were known for the aesthetic sense which they brought to their syntheses. Foremost among these of course was the ‘Pope’, R B Woodward, who when he won the Nobel Prize was anointed by the Nobel committee as being a “good second” to Nature when it came to implementing notions of art, beauty and elegance in organic synthesis. Woodward’s syntheses were widely considered to be beautiful and spare, and his use of stereochemistry especially was unprecedented.

Fortunately, we also have a good guidebook to the use of elegance and aesthetics in organic synthesis: K C Nicolaou’s “Classics in Total Synthesis" series. My proposal would be for the developers to train their algorithms on such classics. For every branch point in a synthesis campaign there are several – sometimes thousands - of directions that are possible. Clearly people like Woodward picked certain directions over others, sometimes using perfectly rational principles but at other times using their sense of aesthetics. Together these rational approaches and aesthetic sense comprise what we can call intuition in synthesis. It would not be that hard to train a synthesis AI’s neural nets on capturing this intuition by constantly asking it to learn what option among the many possible were actually chosen in any good synthesis. That in turn would allow us to tweak the weights of the ‘neurons’ in the program’s neural nets, just like the creators of AlphaGo did. If repeated enough number of times, we would get to a stage when the program’s decision to follow one route over another are dictated not just by brute force computation of number of steps, availability of reagents, stereochemical complexity etc. but also simply by what expert human beings did in the past.

At this stage the algorithm would start capturing what we call intuitive elements in the synthesis of complex molecules. Any such program that cuts down synthetic planning and execution time by even 20% would have a very distinct advantage over the (non-existent?) competition. There is little doubt that not only would it be quickly adopted by industry and academia, but that its key functions would also be rapidly outsourced, just like software, car manufacturing and chemical synthesis are currently. This in turn would lead to a huge impact on jobs, STEM careers as well as the economy. The political fallout could be transformational.

All this could potentially come about by the application of training algorithms similar to AlphaGo’s to the construction of synthesis AI software. It would be wise for chemists to anticipate these developments instead of denying their inevitable onward march. We are still a long way from when a computer comes up with a synthetic route rivaling those of a Bob Woodward, an E J Corey or a Phil Baran. But as far as actual impact is concerned, the computer does not need to win that contest; it simply needs to be good enough. And that future seems closer than we think.

What makes me human? On personhood and levels of emergence

Ce n'est pas une machine
There is a wonderful episode of Star Trek (The Next Generation) titled “The Measure of a Man” which tackles an issue that all of us take for granted. The episode asks if Data – the highly intelligent and indispensable android on the USS Enterprise – has self-determination. In fact, Data faces an even greater challenge: he has to prove that he is equivalent to a person. If he cannot do this he faces a grim fate.

We might think it’s easy enough to decide if an android is a person or not, partly because we think we know how we define ourselves as persons. But is it really that simple? Can we actually ascribe unique, well-defined qualities to ourselves that lend themselves to a singular definition of “personhood”? Can we be sure that these qualities distinguish us from cats and jellyfish? Or mountains and computers for that matter?

Let’s throw down the gauntlet right away in the form of a question: A tree and I are both composed of the same kinds of atoms – carbon, hydrogen, oxygen and a handful of others. So what makes me a person and the tree a mere tree, a non-person?

To a simple first approximation, the difference lies in the arrangement of those atoms. Clearly the arrangement is different in a tree, in a lion and in me. But it’s not just the arrangement of the parts, it’s the connections between them. If you delete enough connections between the neurons in a human brain, at some point the person possessing that brain would clearly cease to be a sentient human being. The same goes for the connections between the cells in a tree.

Thus, simply from a physical standpoint, a person is an object that presents an arrangement of atoms in a particular configuration. But that definition comes no closer to telling us exactly what makes that particular arrangement special. To get some insights into these reasons, it’s worth thinking about the concept of emergence.

Emergence is a very common phenomenon, and in its basic incarnation it simply means that the whole is different from the sum of the parts; or, as the physicist Philip Anderson put it in a seminal article in 1972, “More is Different”. Going back to our example, the human brain may be composed of the same atoms as a tree, but because of the unique arrangement of these atoms and the connections between them, the sum total of these atoms possesses properties that are very different from those of the individual atoms. Just as an example, a carbon atom in our brain can be uniquely defined by what are called quantum numbers – atomic parameters related to properties like the spin of the atom’s electrons, the energy levels on which the electrons lie and their angular momentum. And yet it’s downright absurd to talk about these properties in the context of the brain which these atoms make up. Thus it’s the emergent properties of carbon and other atoms that contribute to the structure and function of the human brain.

Emergent properties of individual atoms don’t uniquely make a human person, however, since even the brains of cats and dogs exhibit these properties. We don’t yet have a perfect understanding of all the qualities that distinguish a cat’s brain from a human’s, but we do know for certain that at least some of those qualities pertain to the size and shape of parts of the brains in the two species and the exact nature of the connections between their neurons. For instance, the cerebral cortex in a human brain is bigger and far more convoluted than in a cat. In addition the human brain has a much greater density of neurons. The layering of neurons is also different.

Taking our idea of emergence further, each one of these qualities is an emergent property that distinguishes cat brains from human brains. There are thus different levels of emergence. Let’s call the emergence of properties arising when individual atoms coalesce into neurons Level I emergence. This level is very similar for humans and cats. However, the Level II emergence which arises when these neurons connect differently in humans and cats is very different in the two species. The Level III emergence that arises when these connections give rise to modules of a particular size and shape is even more different. The interactions of these modules with themselves and with the environment presumably give rise to the unique phenomenon of human consciousness: Level IV emergence. And finally, the connections that different human brains form with each other, giving rise to networks of families, friends, communities and societies, constitute an overarching Level V emergence that truly distinguishes persons not just from cats but also from every other creature that we can imagine.

This idea of thinking in terms of different levels of emergence is useful because it captures both similarities and differences between persons and non-persons, emphasizing the common as well as the distinct evolutionary roots of the two kinds of entities. Cats and human beings are similar when defined in terms of certain levels of emergence, but very different when defined in terms of ‘higher order’ levels.

The foregoing discussion makes it sounds as if a simple way to distinguish persons from non-persons would be to map different levels of emergence on each other and declare something to be a non-person if we are successfully able to map the lower levels of emergence but not the higher ones. I think this is generally true if we are comparing animals with human beings. But the analogy is actually turned on its head when we start to compare humans with a very important and presumably non-person object: a sophisticated computer. In that case it’s actually the higher order emergent functions that can be mapped on to each other, but not the lower order ones. 

A computer is built up of silicon rather than carbon, and silicon and carbon are different emergent entities. But as we proceed further up the hierarchy, we start to find that we can actually simulate primitive forms of human thinking by connecting silicon atoms to each other in specific ways. For instance, we can teach the silicon-based circuitry in a computer to play chess. Chess is presumably a very high level (Level VIIXXI?) emergent human property, and yet we can simulate this common higher order property from very different lower order emergent properties. Today computers can translate languages, solve complex mathematical puzzles and defeat Go champions. All of these accomplishments constitute higher levels of emergent behavior similar to human behavior, arising from lower levels that are very different from those in human.

In fact it is precisely this kind of comparison that allowed Captain Jean-Luc Picard to secure personhood for Data. Data is a human-machine hybrid that is built up from a combination of carbon and non-carbon atoms. His underlying molecular structure is thus very different from those of persons. But the higher order emergent functions he exhibits – especially free will allows Picard to make a convincing case for Data to be treated as a person. This crucial recognition of emergent functions in fact saves Data’s life. It's what compels the man who is trying to dismantle him to address him as "he" instead of "it".

Whether it’s Data or a dolphin, a computer or a catfish, while it’s probably not possible to give a wholly objective and airtight definition of personhood, framing the discussion in terms of comparing different levels of emergent functions and behaviors provides a useful guide.

More is indeed different.
This piece was published yesterday in an issue of '3-Hours', the online magazine of Neuwrite Boston.

Nimbus Therapeutics and computational drug discovery: A promising start with much data to plumb

Nimbus Therapeutics, which as far as I know is the only drug company based on a purely computational model of drug discovery (with all experiments outsourced), just handed over one of their key programs to Gilead for a good sum of money. This program was aimed at discovering inhibitors of the protein acetyl-CoA carboxylase (ACC) which is implicated in Non-Alcoholic Steatohepatitis (NASH) and went from start to finish in about 18 months.

To discover this inhibitor Nimbus brought all of Schrodinger's resources and computing capability (they essentially have unlimited access to Schrodinger's licenses and hardware) to bear on the problem. They (presumably) used techniques like virtual screening of millions of compounds, molecular dynamics, calculating unstable water molecules and the newest Schrodinger tool - free-energy perturbation (FEP) which in its ideal incarnation can allow you to rank order compounds by their free energy of binding to a protein target.

I think this is a very promising development for the applications of computation to drug discovery, but as a scientist in the field myself I am even more excited about the volume of useful data this effort must have generated. This is simply because, based on their model, it seems that every molecule that Nimbus prioritizes or discards necessarily goes through their computational pipeline: this would be rather unique. The corpus of data this process generates is presumably locked inside Nimbus's virtual vaults, but I think they and Schrodinger should release at least a high level abstraction of it to let everyone figure out what worked and what did not. At the very least it would transform Nimbus's success from an impressive black box to a comprehensible landscape of computational workflows.

There are several questions whose answers are always very valuable when it comes to assessing the success of any computational drug discovery protocol. One of the most important insights is to get an idea of the domain of applicability of particular techniques, and try to tease apart the general features of problems where they successfully seem to work. Also since any computational protocol is a model, one also wants to know how well it compares to other models, and preferably simpler ones. If a particular technique turns out to be one that's general, accurate, robust and consistently better than others, then we're in business.

Here are a few questions which apply not only to Nimbus but which I think can be asked of any project in which computational methods are thought to play a predominant role.

1. Did they have a crystal structure or did they build a homology model? If they built a model, what was the sequence identity with the template protein?
2. Was the protein flexible or was it fairly rigid? Were there missing loops or other parts and did they computationally build these loops? If loops were actually reconstructed, what was their length?
3. What was the success rate of their virtual screening campaign? How many top ranked molecules were actually made by the chemists?
4. How much did Schrodinger's WaterMap protocol help with improving binding affinity? Were the key displaceable or replaceable water molecules buried deep in hydrophobic cavities or were there also a few non-intuitive water molecules at the protein-solvent interface?
5. Did they test any of the false negatives to know if the accuracy was what they thought it was?
6. How well did the new FEP algorithm work in terms of rank ordering binding affinity? Did they compare this method with other simpler methods like MMGBSA?
7. In what way, if at all, did molecular dynamics help in the discovery of these inhibitors? How did MD compare to simpler techniques?
8. How well did methods to predict ADME work relative to methods to predict binding affinity? (The latter are usually far more accurate).

I think it's great that a purely computation-driven company like Nimbus can discover a lead for a tough target and an important disease in a reasonable amount of time. But it would be even better to distill general lessons from their success that we could apply to the discovery of drugs for other important diseases.

NSA, data uncertainty, and the problem of separating bad ideas from good ones

Citizenfour” is a movie about NSA whistleblower Edward Snowden made by journalist Laura Poitras, one of the two journalists Snowden contacted. It’s a gripping, professional, thought-provoking movie that everyone should consider watching. But this is especially so because at a deeper level, I think it goes to the heart not just of government surveillance but also of the whole problem of picking useful nuggets of data in the face of an onslaught of potential dross. In fact even the very process of classifying data as “nuggets” or “dross” is fraught with problems.

I was reminded of this problem as my mind went back to a piece on by noted historian of technology George Dyson in which he takes government surveillance to task, not just on legal or moral grounds but on basic technical ones. Dyson’s concern is simple; when you are trying to identify that nugget of a dangerous idea from the morass of ideas out there, you are as likely to snare creative, good ideas in your net as bad ones. This may lead to a situation rife with false positives where you routinely flag – and, if everything goes right with your program, try to suppress – the good ideas. The problem arises partly because you don’t need to, and in fact cannot, flag every idea as “dangerous” or “safe” with one hundred percent accuracy; all you need to do is to get a rough idea.

“The ultimate goal of signals intelligence and analysis is to learn not only what is being said, and what is being done, but what is being thought. With the proliferation of search engines that directly track the links between individual human minds and the words, images, and ideas that both characterize and increasingly constitute their thoughts, this goal appears within reach at last. “But, how can the machine know what I think?” you ask. It does not need to know what you think—no more than one person ever really knows what another person thinks. A reasonable guess at what you are thinking is good enough.”

And when you are trying to get a rough idea, especially pertaining to someone’s complex thought processes, there’s obviously a much higher chance of making a mistake and failing the discrimination test.
The problem of separating the wheat from the chaff is encountered by every data analyst: for example, drug hunters who are trying to identify ‘promiscuous’ molecules – molecules which will indiscriminately bind to multiple proteins in the body and potentially cause toxic side effects – have to sift through lists of millions of molecules to find the right ones. They do this using heuristic rules of thumb which tell them what kinds of molecular structures have been promiscuous in the past. But the past cannot foretell the future, partly because the very process of defining these molecules is sloppy and inevitably captures a lot of ‘good’, non-promiscuous, perfectly druglike compounds. This problem with false positives applies to any kind of high-throughput process founded on empirical rules of thumb; there are always bound to be several exceptions. The same problem applies when you are trying to sift through millions of snippets of DNA and assigning causation or even correlations between specific genes and diseases.
What’s really intriguing about Dyson’s objection though is that it appeals to a very fundamental limitation in accomplishing this discrimination, one that cannot be overcome even by engaging the services of every supercomputer in the world.
Alan Turing jump-started the field of modern computer science when he proved that even an infinitely powerful algorithm cannot determine whether an arbitrary string of code represents a provable statement (the so-called ‘Decision Problem’ articulated by David Hilbert). Turing provided to the world the data counterpart of Kurt Gödel’s Incompleteness Theorem and Heisenberg’s Uncertainty Principle; there is code whose truth or lack thereof can only be judged by actually running it and not by any preexisting test. Similarly Dyson contends that the only way to truly distinguish good ideas from bad is to let them play out in reality. Now nobody is actually advocating that every potentially bad idea should be allowed to play out, but the argument does underscore the fundamental problem with trying to pre-filter good ideas from bad ones. As he puts it:
“The Decision Problem, articulated by Göttingen’s David Hilbert, concerned the abstract mathematical question of whether there could ever be any systematic mechanical procedure to determine, in a finite number of steps, whether any given string of symbols represented a provable statement or not.

The answer was no. In modern computational terms (which just happened to be how, in an unexpected stroke of genius, Turing framed his argument) no matter how much digital horsepower you have at your disposal, there is no systematic way to determine, in advance, what every given string of code is going to do except to let the codes run, and find out. For any system complicated enough to include even simple arithmetic, no firewall that admits anything new can ever keep everything dangerous out…

There is one problem—and it is the Decision Problem once again. It will never be entirely possible to systematically distinguish truly dangerous ideas from good ones that appear suspicious, without trying them out. Any formal system that is granted (or assumes) the absolute power to protect itself against dangerous ideas will of necessity also be defensive against original and creative thoughts. And, for both human beings individually and for human society collectively, that will be our loss. This is the fatal flaw in the ideal of a security state.”

In one sense this problem is not new since governments and private corporations alike have been trying to separate and suppress what they deem to be dangerous ideas for centuries; it’s a tradition that goes back to book burning in medieval times. But unlike a book which you can at least read and evaluate, the evaluation of ideas based on snippets, indirect connections, Google links and metadata is tenuous at best and wildly unlikely to accurately succeed. That is the fundamental barrier that agencies who are trying to determine thoughts and actions based on Google searches and Facebook profiles are facing, and it is likely that no amount of sophisticated computing power and data will enable them to solve the general problem.
Ultimately whether it’s government agencies, drug hunters, genomics experts or corporations, the temptation to fall prey to what writer Evgeny Morozov calls “technological solutionism” – the belief that key human problems will succumb to the latest technological advances – can be overpowering. But when you are dealing with people’s lives you need to be a little more wary of technological solutionism than when you are dealing with the latest household garbage disposal appliance or a new app to help you find nearby ice cream places. There is not just a legal and ethical imperative but a purely scientific one to treat data with respect and to disabuse yourself of the notion that you can completely understand it if only you threw more manpower, computing power and resources at it. A similar problem awaits us in the application of computation to problems in biotechnology and medicine.

At the end of his piece Dyson recounts a conversation he had with Herbert York, a powerful defense establishment figure who designed nuclear weapons, advised presidents and oversaw billions of dollars in defense and scientific funding. York cautions us to be wary of not just Eisenhower’s famed military-industrial complex but of the scientific-technological complex that has aligned itself with the defense establishment for the last fifty years. With the advent of massive amounts of data this alignment is honing itself into an entity that can have more power on our lives than ever before. At the same time we have never been in greater need of the scientific and technological tools that will allow us to make sense of the sea of data that engulfs. And that, as York says, is precisely the reason why we need to beware of it.
“York understood the workings of what Eisenhower termed the military-industrial complex better than anyone I ever met. “The Eisenhower farewell address is quite famous,” he explained to me over lunch. “Everyone remembers half of it, the half that says beware of the military-industrial complex. But they only remember a quarter of it. What he actually said was that we need a military-industrial complex, but precisely because we need it, beware of it. Now I’ve given you half of it. The other half: we need a scientific-technological elite. But precisely because we need a scientific-technological elite, beware of it. That’s the whole thing, all four parts: military-industrial complex; scientific-technological elite; we need it, but beware; we need it but beware. It’s a matrix of four.”

It’s a lesson that should particularly be taken to heart in industries like biotechnology and pharmaceuticals, where standard, black-box computational protocols are becoming everyday utilities of the trade. Whether it’s the tendency to push a button to launch a nuclear war or to sort drugs from non-drugs in a list of millions of candidates, temptation and disaster both await us at the other end.