Field of Science

Showing posts with label complexity. Show all posts
Showing posts with label complexity. Show all posts

Modular complexity, and reverse engineering the brain

The Forbes columnist Matthew Herper has a profile of Microsoft co-founder Paul Allen who has placed his bets on a brain institute whose goal is to to map the brain...or at least the visual cortex. His institute is engaged in charting the sum total of neurons and other working parts of the visual cortex and then mapping their connections. Allen is not alone in doing this; there's projects like the Connectome at MIT which are trying to do the same thing (and the project's leader Sebastian Seung has written an excellent book about it) .

Well, we have heard echoes of reverse engineered brains from more eccentric sources before, but fortunately Allen is one of those who does not believe that the singularity is near. He also seems to have entrusted his vision to sane minds. His institute's chief science officer is Christof Koch, former professor at Caltech and longtime collaborator of the late Francis Crick who started at the institute this year. Just last month Koch penned a perspective in Science which points out the staggering challenge of understanding the connections between all the components of the brain; the "neural interactome" if you will. The article is worth reading if you want to get an idea of how simple numerical arguments illuminate the sheer magnitude of mapping out the neurons, cells and proteins that make up the wonder that's the human brain.

Koch starts by pointing out that calculating the interactions between all the components in the brain is not the same as computing the interactions between all atoms of an ideal gas since the interactions are between different kinds of entities and are therefore not identical. Instead, he proposes, we have to use something called Bell's number B(n) which reminds me of the partitions that I learnt when I was sleepwalking through set theory in college. Briefly for n objects, B(n) refers to the number of combinations (doubles, triples, quadruples etc.) that can be formed. Thus, when n=3 B(n) is 5. Not surprisingly, Bn scales exponentially with n and Koch points out that B(10) is already 115,975. If we think of a typical presynaptic terminal with its 1000 proteins or so, B(n) already starts giving us heartburn. For something like the visual cortex where n= 2 million B(n) would be prohibitive. And as the graph demonstrates, for more than 10^5 components or so the amount of time spirals out of hand at warp speed. Koch then uses a simple calculation based on Moore's law in trying to estimate the time needed for "sequencing" these interactions. For n = 2 million the time would be of the order of 10 million years.

And this considers only the 2 million neurons in the visual cortex; it doesn't even consider the proteins and cells which might interact with the neurons on an individual basis. Looks like we can rapidly see the outlines of what Allen himself has called the "complexity brake". And this one seems poised to make an asteroid-sized impact.

So are we doomed in trying to understand the brain, consciousness and the whole works? Not necessarily, argues Koch. He gives the example of electronic circuits where individual components are grouped separately into modules. If you bunch a number of interacting entities together and form a separate module, then the complexity of the problem reduces since you now have to only calculate interactions between modules. The key question then is, is the brain modular? Commonsense would have us think it is, but it is far from clear how we can exactly define the modules. We would also need a sense of the minimal number of modules to calculate interactions between them. This work is going to need a long time (hopefully not as long as that for B(2 million) and I don't think we are going to have an exhaustive list of the minimal number of modules in the brain any time soon, especially since these are going to be composed of different kinds of components and not just one kind.

Any attempt to define these modules are going to run into problems of emergent complexity that I have occasionally written about. Two neurons plus one protein might be different from two neurons plus two proteins in unanticipated ways. Nevertheless this goal seems far more attainable in principle than calculating every individual interaction, and that's probably the reason Koch left Caltech to join the Allen Institute in spite of the pessimistic calculation above. If we can ever get a sense of the modular structure of the brain, we may have at least a fighting chance to map out the whole neural interactome. I am not holding my breath too hard, but my ears will be wide open.

Image source: Science magazine

If you want to improve AI, let it evolve toward emergence

One of my favorite quotes about artificial intelligence is often attributed to pioneering computer scientists Hans Moravec and Marvin Minsky. To paraphrase: “The most important thing we have learned from three decades of AI research is that the hard things are easy and the easy things are hard”. In other words, we have been hoodwinked for a long time. We thought that vision and locomotion and housework would be easy and language recognition and chess and driving would be hard. And yet it has turned out that we have made significant strides in tackling the latter while hardly making a dent in the former.
Why is this? Clearly one trivial reason is that we failed to define “easy” and “hard” properly, so in one sense it’s a question of semantics. But the question still persists: what makes the easy problems hard? We got fooled by the easy problems because we took them for granted. Things like facial recognition and locomotion come so easily to human beings, even human beings that are a few months old, that we thought they would be easy for computers too. But the biggest obstacle for an AI today is not the chess playing ability of a Gary Kasparov but the simple image recognition abilities of an average one year old.
What we forgot was that these things seem easy only because they are the sleek final façade of a four billion year process that progressed with countless fits and starts, wrong alleys and dead ends and random experimentation. We see the bare shining mountaintop but we don’t see the tortuous road leading to it. If you looked under the hood, both spatial and temporal, of a seemingly simple act like bipedal navigation over a slightly rocky surface, you would find a veritable mess of failed and successful experiments in the history of life. If the brain were an electrical box which presented an exterior of wondrous simplicity and efficiency, inside the box would be fused wires, wires leading nowhere, wires with the middles cut off, wires sprouting other wires, stillbirthed wires; a mélange of wired chaos with a thread of accident and opportunity poking through it. We see only that ascendant thread but not the field littered with dead cousins and ancestors it resides in.
Over the ages, much of AI tried to grasp the essence of this evolutionary circus by trying to reproduce the essential structure of the human brain. The culmination of these efforts was the neural network, a layered abstract model of virtual electronic neurons trying to capture different aspects of reality with adjustable weights on every layer and a feedback loop that optimized the difference between the model and reality. So far so good, but neural networks are only modeling the end product and not the process. For the longest time they were not allowed to deliberately make mistakes and mirror the contingent, error-ridden processes of evolution that are grounded in mutation and genetic recombination. They made the evolution of thinking seem far more deterministic than what it was, and if there’s anything we know about evolution by now, it’s that one cannot understand or reproduce it unless one understands the general process of clumsy, aimless progress intrinsic to its workings.
But apart from the propensity of evolution to make mistakes, there is another, much broader aspect of evolution that I believe neural nets or other models of AI must capture in order to be useful or credible or both. That aspect is emergence, a feature of the human brain that is directly the product of its messy evolution. Not only could emergence help AI approach the actual process of thinking better and realize its scientific and financial potential, but it could also lead to reconciliation between two fields that are often and unnecessarily at war with each other – science and philosophy.
The basic idea of emergence has been recognized for a long time, first by philosophers and then by scientists. Whether it’s a block of gold having color properties that cannot be ascribed to individual gold atoms, individual termites forming a giant anthill or thousands of starlings forming stunning, sweeping, transient geometric patterns that carpet the sky for miles, we have known that the whole is often very different from both the individual parts and the sum of the parts. Or as one of the philosophical fathers of emergence, the physicist Philip Anderson, wrote in a now-famous article, “More is different”. Anderson noted that the properties of a physical system cannot be directly derived from its individual constituents, and more components are not just quantitatively but qualitatively different from fewer ones. Part of the reason for this is that both physics and biology are, in the words of Anderson’s fellow physicist Murray Gell-Mann, the result of “a small number of laws and a great number of accidents”. In case of biological evolution the laws are the principles of natural selection and neutral drift; in case of physical evolution the laws are the principles of general relativity, quantum mechanics and thermodynamics.
Emergence is partly a function of the great number of accidents that these small numbers of laws have been subjected to. In case of biology the accidents come from random mutations leading to variation and selection; in case of physics they come from forces and fields causing matter to stick together in certain ways and not others to form stars, galaxies and planets. Evolution critically occurred while immersed in this sea of stochastic emergence, and that led to complex feedback loops between fundamental and emergent laws. The human brain in particular is the end product of the basic laws of chemistry and physics being subjected to a variety of other emergent laws imposed by things like group and sexual selection, tribalism, altruism, predation avoidance and prey seeking. Agriculture, cities, animal domestication, gossip, religion, empires, democracy, despotism; all of humanity’s special creations are emergent phenomena. Mind is the ultimate emergent product of the stochastic evolution of the brain. So is consciousness. It’s because of the universal feature of accidental emergence that even a supercomputer (or an omniscient God, if you will) that had all the laws of physics built into it and that could map every one of the countless trajectories that life would take into the future would be unable to predict the shape and function of the human brain in the year 2018.
The mind which itself is an emergent product of brain evolution is very good at modeling emergence. As just one example, our minds are quite competent at understanding both individual needs as well as societal ones. We are good at comprehending the behavior of matter on both a microscopic scale – although it did take some very determined and brilliant efforts to achieve this feat – and the macro scale. In fact, we have so completely assimilated the laws of emergent physics in our brains that implementing them – throwing a javelin or anticipating the speed of a charging elephant for instance – is instinctive and a matter of practice rather than active calculation. Our minds, which build constantly updated models of the world, can now take emergent behavior into account and can apply the right level of emergent detail in these models to address the right problem. Evolution has had a leisurely four billion years to experiment with its creations while buffeted by the winds of stochastic emergence, so it’s perhaps not surprising that it has now endowed one of its most successful species with the ability to intuitively grasp emergent reality.
And yet we are largely failing to take into account this emergent reality when imagining and building new AIs. Even now, most of our efforts at AI are highly reductionist. We are good at writing algorithms to model individual neurons as well as individual layers of them, but we ignore the higher-level emergent behavior that is expected to result from a real neural network in a real human brain. Through a process called backpropagation, the neural networks are getting better at optimizing the gap between reality and the models they represent by setting up feedback loops and optimizing the weights of individual neurons, but whether their models are trying to capture the right level of emergent detail is a question they don’t address. If your model is capturing the wrong emergent details, then you are optimizing the wrong model.
Even if your model does solve the right problem, it will be such a specialized solution that it won’t apply to other related problems, which means you will be unable to build an artificial general intelligence (AGI). Consider the example of image recognition, a problem that neural nets and their machine learning algorithms are supposed to especially excel at. It’s often observed that if you introduce a bit of noise into an image or make it slightly different from an existing similar image, the neural net starts making mistakes. And yet children do this kind of recognition of “different but similar” images effortlessly and all the time. When shown an elephant for instance, a child will be able to identify elephants in a variety of contexts; whether it’s a real elephant, a stuffed elephant toy, a silhouette of an elephant or a rock formation that traces out the outline of an elephant. Each one of these entities is radically different in its details, but they all say “elephant” to the mind of the child but not to the neural network.
Why is this? I believe that emergence is one of the key secret sauces accounting for the difference. The child recognizes both a real elephant and a rock formation as an elephant because its brain, instead of relying on low-level “elephant features” like the detailed texture of the skin and the black or gray colors, is instead relying on high-level “emergent elephant features” like the general shape and more abstract topological qualities. The right level of emergent abstraction makes the child succeed where the computer is failing. And yet the child can – with some practice – also switch between different levels of emergence and realize for instance that the rock formation is not going to charge her. Through practice and exploration, the child perfects this application of emergent recognition. Perhaps that’s why it’s important to heed Alan Turing’s prescription for building intelligent machines in which he told us to endow a machine with the curiosity of a child and let intelligence evolve.
Another emergent feature of living organisms is what we call “emotion” or “instinct”. For the longest time we used to believe that human beings make rational decisions when evaluating their complex social and physical environments. But pioneering work by psychologists and neuroscientists ranging from Daniel Kahneman to Antonio Damasio has now shown that emotion and logical thinking both play a role when deciding how to react to an environmental stimulus. Take again the example of the child recognizing an elephant; one reason why it is so good at recognizing elephant-like features is because the features trigger a certain kind of emotional reaction in her. Not only are the logical feature-selecting parts of her brain activated, but so are her hormonal systems, perhaps imperceptibly; not only does she start thinking, but even before this, her palms may turn sweaty and her heartbeat may increase. Research has now consistently shown that our instinctive systems make decisions before our logical systems even kick in. This behavior was honed in humans by millions of years of living and passing on their genes in the African savannah, where split second decisions had to made to ensure that you weren’t weeded out of the gene pool. This kind of emotional reaction is thus also a kind of emergent behavior. It comes about because of the interaction of lower-level entities (DNA sequences and hormone receptors) with environmental and cultural cues and learnings. If an AI does not take emotional responses into account, it will likely never be able to recognize the kinds of abstract features that scream out “elephant” in a child’s mind.
As the biologist Theodosius Dobzhansky famously quipped, “Nothing in biology makes sense except in the light of evolution”, and I would extend that principle to the construction of artificial intelligences. Human intelligence is indeed a result of a few universal laws combined with an enormous number of accidents. These accidents have evolved evolution to select for those brains which can take stochastic emergent reality into account and build generalized models that can switch between different levels of emergent abstraction. It seems to me that mimicking this central feature of evolution would not just lead to better AIs but would be an essential feature of any truly general AI. Perhaps then the easy problems would truly become easy to solve.

This is my latest column for 3 Quarks Daily.

Lab automation using machine learning? Hold on to your pipettes for now.

There is an interesting article on using machine learning and AI for lab automation in Science that generally puts a positive spin on the use of smart computer algorithms for automating routine experiments in biology. The idea is that at some point in the near future, a scientist could design, execute and analyze the results of experiments on her MacBook Air from a Starbucks.

There's definitely a lot of potential for automating routine lab protocols like pipetting and plate transfers, but this has already been done by robots for decades. What the current crop of computational improvements plans to do is potentially much more consequential though; it is to conduct entire suites of biological experiments with a few mouse clicks. The CEO of Zymergen, a company profiled in the piece, says that the ultimate objective is to "get rid of human intuition"; his words, not mine.

I must say I am deeply skeptical of that statement. There is no doubt that parts of experiment planning and execution will indeed become more efficient because of machine learning, but I don't see human biologists being replaced or even significantly augmented anytime soon. The reason is simple: most of research, and biological research in particular, is not about generating and rapidly testing answers (something which a computer excels at), but about asking questions (something which humans typically excel at). A combination of machine learning and robotics may well be very efficient at laying out a whole list of possible solutions and testing them, but it will all come to naught if the question that's being asked is the wrong one.

Machine learning will certainly have an impact, but only in a narrowly circumscribed set of experimental space. Thus, I don't think it's just a coincidence that the article focuses on Zymergen, a company which is trying to produce industrial chemicals by tweaking bacterial genomes. This process involves mutating thousands of genes in bacteria and then picking combinations that will increase the fitness of the resulting organism. It is exactly the kind of procedure that is well-adapted to machine learning (to try to optimize and rank mutations for instance) and robotics (to then perform the highly repetitive experiments). But that's a niche application, working well in areas like directed evolution; as the article itself says, "Maybe Zymergen has stumbled on the rare part of biology that is well-suited to computer-controlled experimentation."

In most of biological research, we start with figuring out what question to ask and what hypotheses to generate. This process is usually the result of combining intuition with experience and background knowledge. As far as we know, only human beings excel in this kind of coarse-grained, messy data gathering and thinking. Take drug discovery for instance; most drug discovery projects start with identifying a promising target or phenotype. This identification is usually quite complicated and comes from a combination of deep expertise, knowledge of the literature and careful decisions on what are the right experiments to do. Picking the right variables to test and knowing what the causal relationships between them are is paramount. In fact, most drug discovery fails because the biological hypothesis that you begin with is the wrong one, not because it was too expensive or slow to test the hypothesis. Good luck teaching a computer to tell you whether the hypothesis is the right one.

It's very hard for me to see how to teach a machine this kind of multi-layered, interdisciplinary analysis. One we have the right question or hypothesis of course we can potentially design an automated protocol to carry out the relevant experiments, but reaching that point is going to take a lot more than just rapid trial and error and culling of less promising possibilities.

This latest wave of machine learning optimism therefore looks very similar to the old waves. It will have some impact, but the impact will be modest and likely limited to particular kinds of projects and goals. The whole business reminds me of the story - sometimes attributed to Lord Kelvin - about the engineer who was recruited by a company to help them with building a bridge. After thinking for about an hour, he made a mark with a piece of chalk on the ground, told the company's engineers to start building the bridge at that location, and then billed them for ten thousand dollars. When they asked what on earth he expected so much money for, he replied, "A dollar for making that mark. Nine thousand nine hundred and ninety nine for knowing where to make it." 

I am still waiting for that algorithm which tells me where to make the mark.

Physicists in biology, inverse problems and other quirks of the genomic age

Nobel Laureate Sydney Brenner has 
criticized systems biology as a grandiose 
attempt to solve inverse problems in biology
Leo Szilard – brilliant, peripatetic Hungarian physicist, habitué of hotel lobbies, soothsayer without peer – first grasped the implications of a nuclear chain reaction in 1933 while stepping off the curb at a traffic light in London. Szilard has many distinctions to his name; not only did he file a patent for the first nuclear reactor with Enrico Fermi, but he was the one who urged his old friend Albert Einstein to write a famous letter to Franklin Roosevelt, and also the one who tried to get another kind of letter signed as the war was ending in 1945; a letter urging the United States to demonstrate a nuclear weapon in front of the Japanese before irrevocably stepping across the line. Szilard was successful in getting the first letter signed but failed in his second goal.
After the war ended, partly disgusted by the cruel use to which his beloved physics had been put, Szilard left professional physics to explore new pastures – in his case, biology. But apart from the moral abhorrence which led him to switch fields, there was a more pragmatic reason. As Szilard put it, this was an age when you took a year to discover something new in physics but only took a day to discover something new in biology.
This sentiment drove many physicists into biology, and the exodus benefited biological science spectacularly. Compared to physics whose basic theoretical foundations had matured by the end of the war, biology was uncharted territory. The situation in biology was similar to the situation during the heyday of physics right after the invention of quantum theory when, as Paul Dirac quipped, “even second-rate physicists could make first-rate discoveries”. And physicists took full advantage of this situation. Since Szilard, biology in general and molecular biology in particularly have been greatly enriched by the presence of physicists. Today, any physics student who wants to mull doing biology stands on the shoulders of illustrious forebears including Szilard, Erwin Schrodinger, Francis Crick, Walter Gilbert and most recently, Venki Ramakrishnan.
What is it that draws physicists to biology and why have they been unusually successful in making contributions to it? The allure of understanding life which attracts other kinds of scientists is certainly one motivating factor. Erwin Schrodinger whose little book “What is Life?” propelled many including Jim Watson and Francis Crick into genetics is one example. Then there is the opportunity to simplify an enormously complex system into its constituent parts, an art which physicists have excelled at since the time of the Greeks. Biology and especially the brain is the ultimate complex system, and physicists are tempted to apply their reductionist approaches to deconvolute this complexity. Thirdly there is the practical advantage that physicists have; a capacity to apply experimental tools like x-ray diffraction and quantitative reasoning including mathematical and statistical tools to make sense of biological data.
The rise of the data scientists
It it this third reason that has led to a significant influx of not just physicists but other quantitative scientists, including statisticians and computer scientists, into biology. The rapid development of the fields of bioinformatics and computational biology has led to a great demand for scientists with the quantitative skills to analyze large amounts of data. A mathematical background brings valuable skills to this endeavor and quantitative, data-driven scientists thrive in genomics. Eric Lander for instance got his PhD in mathematics at Oxford before – driven by the tantalizing goal of understanding the brain – he switched to biology. Cancer geneticist Bert Vogelstein also has a background in mathematics. All of us are familiar with names like Craig Venter, Francis Collins and James Watson when it comes to appreciating the cracking of the human genome, but we need to pay equal attention to the computer scientists without whom crunching and combining the immense amounts of data arising from sequencing would have been impossible. There is no doubt that, after the essentially chemically driven revolution in genetics of the 70s, the second revolution in the field has been engineered by data crunching.
So what does the future hold? The rise of the “data scientists” has led to the burgeoning field of systems biology, a buzzword which seems to proliferate more than its actual understanding. Systems biology seeks to integrate different kinds of biological data into a broad picture using tools like graph theory and network analysis. It promises to potentially provide us with a big-picture view of biology like no other. Perhaps, physicists think, we will have a theoretical framework for biology that does what quantum theory did for, say, chemistry.
Emergence and systems biology: A delicate pairing
And yet even as we savor the fruits of these higher-level approaches to biology, we must be keenly aware of their pitfalls. One of the fundamental truths about the physicists’ view of biology is that it is steeped in reductionism. Reductionism is the great legacy of modern science which saw its culmination in the two twentieth-century scientific revolutions of quantum mechanics and molecular biology. It is hard to overstate the practical ramifications of reductionism. And yet as we tackle the salient problems in twenty-first century biology, we are become aware of the limits of reductionism. The great antidote to reductionism is emergence, a property that renders complex systems irreducible to the sum of their parts. In 1972 the Nobel Prize winning physicist Philip Anderson penned a remarkably far-reaching article named “More is Different” which explored the inability of “lower-level” phenomena to predict their “higher-level” manifestations.
The brain is an outstanding example of emergent phenomena. Many scientists think that neuroscience is going to be to the twenty-first century what molecular biology was to the twentieth. For the first time in history, partly through recombinant DNA technology and partly due to state-of-the-art imaging techniques like functional MRI, we are poised on the brink of making major discoveries about the brain; no wonder that Francis Crick moved into neuroscience during his later years. But the brain presents a very different kind of challenge than that posed by, say, a superconductor or a crystal of DNA. The brain is a highly hierarchical and modular structure, with multiple dependent and yet distinct layers of organization. From the basic level of the neuron we move onto collections of neurons and glial cells which behave very differently, onward to specialized centers for speech, memory and other tasks on to the whole brain. As we move up this ladder of complexity, emergent features arise at every level whose behavior cannot be gleaned merely from the behavior of individual neurons.

The tyranny of inverse problems
The problem thwarts systems biology in general. In recent years, some of the most insightful criticism of systems biology has come from Sydney Brenner, a founding father of molecular biology whose 2010 piece in Philosophical Transactions of the Royal Society titled “Sequences and Consequences” should be required reading for those who think that systems biology’s triumph is just around the corner. In his essay, Brenner strikes at what he sees as the heart of the goal of systems biology. After reminding us that the systems approach seeks to generate viable models of living systems, Brenner goes on to say that:
“Even though the proponents seem to be unconscious of it, the claim of systems biology is that it can solve the inverse problem of physiology by deriving models of how systems work from observations of their behavior. It is known that inverse problems can only be solved under very specific conditions. A good example of an inverse problem is the derivation of the structure of a molecule from the X-ray diffraction pattern of a crystal…The universe of potential models for any complex system like the function of a cell has very large dimensions and, in the absence of any theory of the system, there is no guide to constrain the choice of model.”
What Brenner is saying that every systems biology project essentially results in a model, a model that tries to solve the problem of divining reality from experimental data. However, a model is not reality; it is an imperfect picture of reality constructed from bits and pieces of data. It is therefore – and this has to be emphasized – only one representation of reality. Other models might satisfy the same experimental constraints and for systems with thousands of moving parts like cells and brains, the number of models is astronomically large. In addition, data in biological measurements is often noisy with large error bars, further complicating its use. This puts systems biology into the classic conundrum of the inverse problem that Brenner points out, and like other inverse problems, the solution you find is likely to be one among an expanding universe of solutions, many of which might be better than the one you have. This means that while models derived from systems biology might be useful – and often this is a sufficient requirement for using them – they might likely leave out some important feature of the system.
There has been some very interesting recent work in addressing such conundrums. One of the major challenges in the inverse problem universe is to find a minimal set of parameters that can describe a system. Ideally the parameters should be sensitive to variation so that one constrains the parameter space describing the given system and avoids the "anything goes" trap. A particularly promising example is the use of 'sloppy models' developed by Cornell physicist James Sethna and others in which parameter combinations rather than individual parameters are varied and those combinations which are most tightly constrained are then picked as the 'right' ones.

But quite apart from these theoretical fixes, Brenner’s remedy for avoiding the fallout from imperfect systems modeling is to simply use the techniques garnered from classical biochemistry and genetics over the last century or so. In one sense systems biology is nothing new; as Brenner tartly puts it, “there is a watered-down version of systems biology which does nothing more than give a new name to physiology, the study of function and the practice of which, in a modern experimental form, has been going on at least since the beginning of the Royal Society in the seventeenth century”. Careful examination of mutant strains of organisms, measurement of the interactions of proteins with small molecules like hormones, neurotransmitters and drugs, and observation of phenotypic changes caused by known genotypic perturbations remain tried-and-tested ways of drawing conclusions about the behavior of living systems on a molecular scale.
Genomics and drug discovery: Tread softly
This viewpoint is also echoed by those who take a critical view of what they say is an overly genomics-based approach to the treatment of diseases. A particularly clear-headed view comes from Gerry Higgs who in 2004 presciently wrote a piece titled “Molecular Genetics: The Emperor’s Clothes of Drug Discovery”. Higgs criticizes the whole gamut of genomic tools used to discover new therapies, from the “high-volume, low-quality sequence data” to the genetically engineered cell lines which can give a misleading impression of molecular interactions under normal physiological conditions. Higgs points to many successful drugs discovered in the last fifty years which have been found using the tools of classical pharmacology and biochemistry; these would include the best-selling, Nobel Prize winning drugs developed by Gertrude Elion and James Black based on simple physiological assays. Higgs’s point is that the genomics approach to drugs runs the risk of becoming too reductionist and narrow-minded, often relying on isolated systems and artificial constructs that are uncoupled from whole systems. His prescription is not to discard these tools which can undoubtedly provide important insights, but supplement them with older and proven physiological experiments.
Does all this mean that systems biology and genomics would be useless in leading us to new drugs? Not at all. There is no doubt that genomic approaches can be remarkably useful in enabling controlled experiments. The systems biologist Leroy Hood for instance has pointed out how selective gene silencing can allow us to tease apart side-effects of drugs from beneficial ones. But what Higgs, Brenner and others are impressing upon us is that we shouldn’t allow genomics to become the end-all and be-all of drug discovery. Genomics should only be employed as part of a judiciously chosen cocktail of techniques including classical ones for interrogating the function of living systems. And this applies more generally to physics-based and systems biology approaches. 

Perhaps the real problem from which we need to wean ourselves is “physics envy”; as the physicist-turned-financial modeler Emanuel Derman reminds us, “Just like  physicists, we would like to discover three laws that govern ninety-nine percent of our system’s intricacies. But we are more likely to discover ninety-nine laws that explain three percent of our system”. And that’s as good a starting point as any.

Adapted from a previous post on Scientific American Blogs.

Modular complexity and the problem of reverse engineering the brain

Bell's number calculates the number of connections between
various components of a system and scales exponentially
with those components (Image: Science Magazine).
I have been reading an excellent collection of essays on the brain titled "The Future of the Brain" which contains ruminations on current and future brain research from leading neuroscientists and other researchers like Gary Marcus, George Church and the Moser husband and wife pair who won last year's Nobel prize. Quite a few of the authors are from the Allen Institute for Brain Science in Seattle. In starting this institute, Microsoft co-founder Paul Allen has placed his bets on mapping the brain…or at least the mouse visual cortex for starters. His institute is engaged in charting the sum total of neurons and other working parts of the visual cortex and then mapping their connections. Allen is not alone in doing this; there’s projects like the Connectome at MIT which are trying to do the same thing (and the project’s leader Sebastian Seung has written a readable book about it).

Now we have heard prognostications about mapping and reverse engineering brains from more eccentric sources before, but fortunately Allen is one of those who does not believe that the singularity is around the corner. He also seems to have entrusted his vision to sane minds. His institute’s chief science officer is Christof Koch, former professor at Caltech, longtime collaborator of the late Francis Crick and self-proclaimed “romantic reductionist” who started at the institute earlier this year. Koch has written one of the articles in the essay collection. His article and the book in general reminded me of a very interesting perspective that he penned in Science last year which points out the staggering challenge of understanding the connections between all the components of the brain; the “neural interactome” if you will. The article is worth reading if you want to get an idea of how even simple numerical arguments illuminate the sheer magnitude of mapping out the neurons, cells, proteins and connections that make up the wonder that’s the human brain.

Koch starts by pointing out that calculating the interactions between all the components in the brain is not the same as computing the interactions between, say, all atoms of an ideal gas since unlike a gas, the interactions are between different kinds of entities and are therefore not identical. Instead, he proposes, we have to use something called Bell’s number Bwhich reminds me of the partitions that I learnt about when I was sleepwalking through set theory in college. Briefly for n objects, Bn refers to the number of combinations (doubles, triples, quadruples etc.) that can be formed. Thus, when n=3 Bn is 5. Not surprisingly, Bn scales exponentially with n and Koch points out that B10 is already 115,975. If we think of a typical presynaptic terminal with its 1000 proteins or so, Bstarts giving us serious heartburn. For something like the visual cortex where n= 2 million Bn would be inconceivable, and it's futile to even start thinking about what the number would be for the entire brain. Koch then uses a simple calculation based on Moore’s Law in trying to estimate the time needed for “sequencing” these interactions. For n = 2 million the time needed would be of the order of 10 million years. And as the graph on top demonstrates, for more than 10components or so the amount of time spirals out of hand at warp speed.

This considers only the 2 million neurons in the visual cortex; it doesn’t even consider the proteins and cells which might interact with the neurons on an individual basis. In addition, at this point we are not even really aware of how neuronal types there are in the brain: neurons are not all identical like indistinguishable electrons. What makes the picture even more complicated that these types may be malleable so that sometimes a single neuron can be of one type while at other types it can team up with other neurons to form a unit that is of a different type. This multilayered, fluid hierarchy rapidly reveals the outlines of what Paul Allen has called the “complexity brake”: he described this in the same article that was cogently critical of Ray Kurzweil's singularity. And the neural complexity brake that Koch is talking about seems poised to make an asteroid-sized impact on our dreams.

So are we doomed in trying to understand the brain, consciousness and the whole works? Not necessarily, argues Koch. He gives the example of electronic circuits where individual components are grouped separately into modules. If you bunch a number of interacting entities together and form a separate module, then the complexity of the problem reduces since you now have to only calculate interactions between modules. The key question then is, is the brain modular, and how many modules does it present? Commonsense would have us think it is modular, but it is far from clear how we can exactly define the modules. We would also need a sense of the minimal number of modules to calculate interactions between them. This work is going to need a long time (hopefully not as long as that for B2 million) and I don’t think we are going to have an exhaustive list any time soon, especially since these are going to be composed of different kinds of components and not just one kind. But it's quite clear that whataver the nature of these modules, delineating their particulars would go a long way in making the problem more manageable.

Any attempt to define these modules are going to run into problems of emergent complexity that I have occasionally written about. Two neurons plus one protein might be different from two neurons plus two proteins in unanticipated ways. Also if we are thinking about forward and reverse neural pathways, I would hazard a guess that one neuron plus one neuron in one direction may even be different from the same interaction in the reverse direction. Then there’s the more obvious problem of dynamics. The brain is not a static entity and its interactions would reasonably be expected to change over time. This might interpose a formidable new barrier in brain mapping, since it may mean that whatever modules are defined may not even be the same during every time slice. A fluid landscape of complex modules whose very identity changes every single moment could well be a neuroscientist’s nightmare. In addition, the amount of data that captures such neural dynamics would be staggering since even a millimeter sized volume of rat visual tissue requires a few terabytes of data to store all its intricacies. However, the data storage problem pales in comparison to the data interpretation problem.

Nevertheless this goal of mapping modules seems far more attainable in principle than calculating every individual interaction, and that’s probably the reason Koch left Caltech to join the Allen Institute in spite of the pessimistic calculation above. The value of modular approaches goes beyond neuroscience though; similar thinking may provide insights into other areas of biology, such as the interaction of genes with proteins and of proteins with drugs. As an amusing analogy, this kind of analysis reminds me of trying to understand the interactions between different components in a stew; we have to appreciate how the salt interacts with the pepper and how the pepper interacts with the broth and how the three of them combined interact with the chicken. Could the salt and broth be considered a single module?

If we can ever get a sense of the modular structure of the brain, we may have at least a fighting chance to map out the whole neural interactome. I am not holding my breath too hard, but my ears will be wide open since this is definitely going to be one of the most exciting areas of science around.

Adapted from a previous post on Scientific American Blogs.

Occam, me and a conformational medley

Originally posted on the Scientific American Blog Network.


William of Occam, whose principle of parsimony has been used and misused (Image: WikiCommons)
The philosopher and writer Jim Holt who has written the sparkling new book “Why Does The World Exist?” recently wrote an op-ed column in the New York Times, gently reprimanding physicists to stop being ‘churlish’ and appreciate the centuries-old interplay between physics and philosophy. Holt’s point was that science and philosophy have always co-existed, even if their relationship has been more of an uneasy truce rather than an enthusiastic embrace. Some of the greatest physicists including Bohr and Einstein were also great philosophers.

Fortunately – or unfortunately – chemistry has had little to say about philosophy compared to physics. Chemistry is essentially an experimental science and for the longest time, theoretical chemistry had much less to contribute to chemistry than theoretical physics had to physics. This is now changing; people like Michael WeisbergEric Scerri and Roald Hoffmann proclaim themselves to to be bonafide philosophers of chemistry and bring valuable ideas to the discussion.

But the interplay between chemistry and philosophy is a topic for another post. In this post I want to explore one of the very few philosophical principles that chemists have embraced so wholeheartedly that they speak of it with the same familiar nonchalance with which they would toss around facts about acids and bases. This principle is Occam’s Razor, a sort of guiding vehicle that allows chemists to pick between competing explanations for a phenomenon or observation. Occam’s Razor owes its provenance to William of Occam, a 14th century Franciscan friar who dabbled in many branches of science and philosophy. Fully stated, the proposition tells us that “entities should not be multiplied unnecessarily” or that the fewer the assumptions and hypotheses underlying a particular analysis, the more preferred that analysis relative to those of equal explanatory power. More simply put, simple explanations are always better than complex explanations.

Sadly, the multiple derivate restatements of Occam’s Razor combined with our tendency to look for simple explanations can sometimes lead to erroneous results. Part of the blame lies not with Occam’s razor but with his interpreters; the main problem is that it’s not clear what “simple” and “complex” mean when applied to a natural law or phenomena. In addition, nature does not really care about what we perceive as simple or complex, and what may seem complex to us may appear perfectly simple to nature because it’s…real. This was driven home to me early on in my career.

Most of my research in graduate school was concerned with finding out the many conformations that complex organic molecules adopt in solution. Throw an organic molecule like ibuprofen in water and you don’t get a static picture of the molecule standing still; instead, there is free rotation about single bonds joining various atoms leading to multiple, rapidly interconverting shapes, or conformations, that are buffeted around by water like ships on the high seas. The exact percentage of each conformation in this dance is dictated by its energy; low-energy conformations are more prevalent than high-energy ones.
Different shapes of conformations of cyclohexane - a ring of six carbon atoms - ranked by energy (Image: Mcat review)

Since the existence of multiple conformations enabled by rotation around single bonds is a logical consequence of the basic principles of molecular structure, it would seem that this picture would be uncontroversial. Surprisingly though, it’s not always appreciated. The reason has to do with the fact that measurements of conformations by experimental techniques like nuclear magnetic resonance (NMR) spectroscopy always result in averages. This is because the time-scales for most of these techniques are longer than the time-scales needed for interconversion between conformations and therefore they cannot make out individual differences. The best analogy is that of a ceiling fan; when the fan is rotating fast, all we see is a contiguous disk because of the low time resolution of our eye. But we know that in reality, there are separate individual blades (see figure at end of post). NMR is like the eye that sees the disk and mistakes it for the fan.

Such is the problem with using experimental techniques to determine individual conformations of molecules. Their long time scales lead to average data to which a single, average structure is assigned. Clearly this is a flawed interpretation, but partly because of entrenched beliefs and partly because of lack of methods to tease apart individual conformations, scientists through the years have routinely published single structures as representing a more complex distribution of conformers. Such structures are sometimes called “virtual structures”, a moniker that reflects their illusory – essentially non-existent – nature. A lot of my work in graduate school was to use a method called NAMFIS (NMR Analysis of Molecular Flexibility In Solution) that combined average NMR data with theoretically calculated conformations to tease apart the data into individual conformations. There are others. Here's an article on NAMFIS that I wrote for college students.

When time came to give a talk on this research, a very distinguished scientist in the audience told me that he found it hard to digest this complicated picture of multiple conformations vying for a spot on the energy ladder. Wouldn’t the assumption of a single, clean, average structure be more pleasing? Wouldn’t Occam’s Razor favor this interpretation of the data? That was when I realized the limitations of Occam’s principle. The “complicated” picture of the multiple conformations was the real one in this case, and the simple picture of  a single average conformation was unreal. In this case, it was the complicated and not the simple explanation that turned out to be the right one. This interpretation was validated when I also managed to find, among the panoply of conformations, one which bound to a crucial protein in the body and turned the molecule into a promising anticancer drug. The experience again drove home the point that nature doesn’t often care about what we scientists find simple or complex.

Recently Occam made another appearance, again in the context of molecular conformations. This time I was studying the diffusion of organic molecules through cell membranes, a process that’s of great significance in drug discovery since even your best test-tube drug is useless if it cannot get into a cell. A chemist from San Francisco has come up with a method to calculate different conformations of molecules. By looking at the lowest-energy conformation, he then predicts whether that conformation will be stable inside the lipid-rich cell membrane. Based on this he predicts whether the molecule will make it across. Now for me this posed a conundrum and I found myself in the shoes of my old inquisitor; we know that molecules have several conformations, so how can only the single, lowest-energy conformation matter in predicting membrane permeability?

I still don’t know the answer, but a couple of months ago another researcher did a more realistic calculation in which she did take all these other conformations into consideration. Her conclusion? More often than not the accuracy of the prediction becomes worse because by including more conformations, we are also including more noise. Someday perhaps we can take all those conformations into account without the accompanying noise. Would we then be both more predictive and more realistic? I don’t know.

These episodes from my own research underscores the rather complex and subtle nature of Occam’s Razor and its incarnation in scientific models. In the first case, the assumption of multiple conformations is both realistic and predictive. In the second, the assumption of multiple conformations is realistic but not predictive because the multiple-conformation model is not good enough for calculation. In the first case, a simple application of Occam’s razor is flawed while in the second, the flawed simple assumption actually leads to better predictions. Thus, sometimes simple assumptions can work not because the more complex ones are wrong, but because we simply lack the capacity to implement the more complex ones.

I am glad that my work with molecular conformations invariably led me to explore the quirky manifestations of Occam’s razor. And I am thankful to a well-known biochemist who put it best: “Nature doesn’t always shave with Occam’s Razor”. In science as in life, simple can be quite complicated, and complicated can turn out to be refreshingly simple.

A rotating ceiling fan - Occam's razor might lead us to think that the fan is a contiguous disk, but we know better.

Again, drug design and airplane design are not the same

A while back I had a post about an article that compared airplane design to drug design. I discussed the challenges in drug design compared to airplane design and why the former is much less predictable than the latter, the short answer being "biological complexity".

Now the analogy surfaces again in a different context. C & EN has an interview with Kiran Mazumdar-Shaw, CEO of India's largest biopharmaceutical company Biocon. Shaw is an accomplished woman who does not hold back when she laments the current depressing state of drug development. I think many of us would commiserate with her disappointment at the increasing regulatory hurdles that new drugs have to face. But at one point she says something that I don't quite agree with:

Mazumdar-Shaw dismisses the argument that drugs create a public safety imperative mandating stricter oversight than many other regulated products. “So you think passenger safety is any less important than patient safety?” she asked. Yet aircraft makers don’t face a 12-year, all-or-nothing proposition when designing, developing, and commercializing an airplane. Nor, she added, does Boeing have to prove that it is making something fundamentally different than what Airbus already has on the market.

To which my counter-question would be, "What do you think is the probability of unforeseen problems showing up in aircraft design compared to drug design"? I see a rather clear flaw in the analogy; aircraft design is not as tightly regulated because most aircraft work as designed and the attrition rate in their development is quite low. The number of failures in aircraft development pale in comparison with the number of Phase II failures in drug development. In fact as the article quoted in my previous post described, these days you can almost completely simulate an aircraft on a computer. Regulatory agencies can thus be much more confident and insouciant about approving a new airplane.

This is far from the case for drugs. First of all there is no clear path to a drug's development and in the initial stages most people don't have a clue as to what the final product is going to look like. But more importantly, designing drugs is just so much riskier than designing aircraft that regulatory agencies have to be more circumspect. How many times do drugs show all kinds of side-effects which would never have been predicted at the beginning? How many times do drugs show side-effects that would not even have been imagined at the beginning? It's this almost complete lack of prediction driven by the sheer complexities of biology that distinguishes drugs from airplanes.

In another part of the interview Mazumdar-Shaw voices her impatience with regulators' recalcitrance to adopt new technologies. The example she gives is of Hawk-Eye, a computer that tracks a tennis ball's movement and makes it easier to call out the result of a disputed bounce. Just like sports authorities are reluctant to use these technologies to override the flaws in human judgement, Mazumdar-Shaw thinks regulators are reluctant to use new technologies to overcome the limitations of human judgement. The point is not irrelevant but the truth is that decision making in drug development is far more complex than decision making in tennis tournaments. For Hawk-Eye to track tennis balls is a simple matter of physics, and it can do this with high accuracy. Contrast this with drug development where the "event" to be analyzed is not the bounce of a ball but the efficacy of a drug in a large clinical trial as assessed by a variety of complex statistical measures. In addition, approving a drug is inherently more subjective, being based on efficacies of existing therapies, the exact numerical superiority of the extra benefits, cost and patient populations. Good luck writing a computer problem that could possibly assess this morass of sometimes conflicting information and reach an informed judgement.

I think many of us are frustrated with the increasing regulatory hurdles that new drugs face and we all wish that the process was smoother. Personally I don't think that the FDA's systems for assessing risks is as finely attuned to potential benefits as it should be. But I don't find myself following Mazumdar-Shaw in advocating for drug approvals that are as easy as aircraft approvals. The former is science and engineering. The latter is science with a healthy dose of intuition and art. And some black magic.