Field of Science

Neuroscience and other theory-poor fields: Tools first, simulation later


I have written about the ‘Big Brain Project’ a few times before, and I wrote a post about it for the Canadian TV channel TVO last year. The project basically seeks to make sense of that magnificent 3-pound bag of jelly inside our skull at multiple levels, from molecules to neurons to interactions at the whole brain level. The aims of the project are typical of ‘moon shot’ endeavors; ambitious, multidisciplinary, multi-institutional and of course, expensive. Yet right after the project was announced in both the US (partly by President Obama) and in Europe there were whispers of criticism that turned first into a trickle and then into a cascade. The criticism was at multiple levels – administrative, financial and scientific. But even discounting the administrative and financial problems, many scientists saw issues with the project even at the basic scientific level.

The gist of those issues can be boiled down to one phrase: “trying to chew on more than we can bite off”. Basically we are trying to engineer a complex, emergent system whose workings we still don’t understand, even at basic levels of organization. Our data is impoverished and our approaches are too reductionist. One major part of the project especially suffers from this drawback – in-silico simulation of the brain at multiple levels, from neurons to entire mouse and human brains. Now here’s a report from a committee which has examined the pros and cons of the project and reached the conclusion that much of the criticism was indeed valid, and that we are trying to achieve something for which we still don’t have the tools. The report is here. The conclusion of the committee is simple: first work on the tools; then incorporate the findings from those tools into a bigger picture. The report makes this clear in a paragraph that also showcases problems with the public’s skewed perception of the project.

The goal of reconstructing the mouse and human brain in silico and the associated comprehensive bottom-up approach is viewed by one part of the scientific community as being impossible in principle or at least infeasible within the next ten years, while another part sees value not only in making such simulation tools available but also in their development, in organizing data, tools and experts (see, e.g., http://www.bbc.com/future/story/ 20130207-will-we-ever-simulate-the-brain). A similar level of disagreement exists with respect to the assertion that simulating the brain will allow new cures to be found for brain diseases with much less effort than in experimental investigations alone.

The public relations and communication strategy of the HBP and the continuing and intense public debate also led to the misperception by many neuroscientists that the HBP aims to cover the field of neuroscience comprehensively and that it constitutes the major neuroscience research effort in the European Research Area (ERA).

This whole discussion reminds of the idea of tool-driven scientific revolutions publicized by Peter Galison, Freeman Dyson and others of which chemistry is an exemplary instance. The Galisonian picture of scientific revolutions does not discount the role of ideas in causing seismic shifts in science, but it places tools on an equal footing. Discussions of grand ideas and goals (like simulating a brain) often give short shrift to the mundane but critical everyday tools that need to be developed in order to enable those ideas in the first place. They are great for sound bytes for the public but brittle in their foundations. Although scientific ideas are often considered the progenitors of a lot of everyday scientific activity by the public, in reality the progression can equally often be the opposite: first come the tools, then the ideas. Sometimes tools can follow ideas, as was the case with a lot of predictions of the general theory of relativity. At other times ideas follow the tools and the experiments, as was the case with the Lamb Shift and quantum electrodynamics. 

Generally speaking it’s more common for ideas to follow tools when a field is theory-poor, like quantum field theory was in the 1930s, while it’s more common for tools to follow ideas when a field is theory-rich. From this viewpoint neuroscience is currently theory-poor, so it seems much more likely to me that ideas will follow the tools in the field. To be sure the importance of tools has long been recognized in neurology; where would we be without MRI and patch-clamp techniques for instance? And yet these tools have only started to scratch the surface of what we are trying to understand. We need much better tools before we get our hands on a theory of the brain, let alone one of the mind.

I believe the same progression also applies to my own field of molecular modeling in some sense. Part of the problem with modeling proteins and molecules is that we still don’t have a good idea of the myriad factors that drive molecular recognition. We have of course had an inkling of these factors (such as water and protein dynamics) for a while now but we haven’t really had a good theoretical framework to understand the interactions. We can wave this objection away by saying that sure we have a theoretical framework, that of quantum mechanics and statistical mechanics, but that would be little more than a homage to strong reductionism. The problem is we still don’t have a handle on the quantitative contribution of various factors to protein-small molecule binding. Until we have this conceptual understanding the simulation of such interactions is bound to suffer. And most importantly, until we have such understanding what we really need is not simulation but improved instrumental and analytical techniques that enable us to measure even simple things like molecular concentrations and the kinetics of binding. Once we get an idea of these parameters using good tools, we can start incorporating the parameters in modeling frameworks.

Now the brain project is indeed working on tools too, but reports like the current one ask whether we need to predominantly focus on those tools and perhaps divert some of the money and attention from the simulation aspects of the project to the tool-driven aspects. The message from the current status report is ultimately simple: we need to first stand before we can run.

Image link

Pauling vs Woodward: It's on

Some day they will hopefully make a video about this along the lines of Einstein vs Hawking or Hayek vs Keynes. For now an imperfect beginning will have to suffice (with apologies to Gilbert and Sullivan, of course. Oh, and Tom Lehrer too).

Pauling:

I am the very model of the greatest chemist ever...
I've information vegetable, animal, and mineral,
I know the bonds of aluminum, and I quote the distances dihedral,
From quantum to quark, in order categorical;
I'm very well acquainted too with matters mathematical,
I understand equations, both the simple and quadratical,
About the binomial theorem I'm teeming with a lot o' news,
And with many cheerful facts about the square of the hypotenuse.
 
Woodward:
I am the very model of the greatest chemist ever...
You may have won two Nobel Prizes
But look at me, I am full of architectural surprises
I make chlorophyll out of tar and sand,
And assemble ferrocene with my left hand.
On nitric acid I can lecture for three hours hard,
And inspire a new unit of time, the venerable milli-Woodward.
You may think you know calculus and all things quantum,
But you don't know the Woodward-Hoffmann rules (trust me, you'll want 'em). 

Molecules, software and function: Why synthesis is no longer chemistry's outstanding problem.

R B Woodward and his followers have solved the general
problem of organic synthesis. We need to solve the general
problem of design and function.
Yesterday's post by Derek titled 'The End of Synthesis'  (follow up here) ruffled quite a few feathers. But I suspect one of the reasons it did so is because it actually hits too close to home for those who are steeped in the art and science of organic synthesis. The paper which the post talks about, by chemist Martin Burke in Science may or may not be the end of synthesis but it may well be the beginning of the end, at least symbolically. As much as it might be unpalatable to some, the truth is that synthesis is no longer chemistry’s outstanding general problem.

Synthesis was the great philosophical question of the twentieth century, not the twenty-first. As Derek pointed out, in the 50s it was actually not clear at all that a molecule like strychnine with its complex shape and stereochemistry could even be made. But now there is little doubt that given enough manpower (read graduate students and postdocs) and money, basically anything that you can sketch on paper can be made. Now I am certainly not claiming that synthesizing a complex natural product with fifty rotatable bonds and twenty chiral centers is even today a trivial task. There is a still a lot of challenge and beauty in the details. I am also not saying that synthesis will cease to be a fruitful source of solutions for humanity’s most pressing problems, such as disease or energy; as a tool the importance of synthesis will remain undiminished. What I am saying is that the general problem of synthesis has now been solved in an intellectual sense (as an aside, this would be consistent with the generally pessimistic outlook regarding total synthesis seen on many blogs, including Derek's.). And since the general problem has been solved it's not too audacious to think that it might lend itself to automation. Once you have a set of formulas for making a molecule, it is not too hard to imagine that machines might be able to bring about specific instantiations of those formulas, even if you will undoubtedly have to work out the glitches along the way.

I always think of software when I think about the current state of synthesis. When software was being developed in the 40s and 50s it took the ingenuity of a John von Neumann or a Grace Hopper to figure out how to encode a set of instructions in the form of a protocol that a machine could understand and implement. But what it took von Neumann to do in the 50s, it takes a moderately intelligent coder from India or China to do in 2014. Just as the massive outsourcing of software was a good example of software's ability to be commoditized and automated, so is the outsourcing of large swathes of organic synthesis a telling sign of its future in automation. Unlike software synthesis is not quite there yet, but given the kind of complexity that it can create on demand, it will soon be.

To see how we have gotten here it's worth taking a look at the history, and this history contains more than a nugget of comparison between synthesis and computer science. In the 1930s the general problem was unsolved. It was also unsolved in the 50s. Then Robert Burns Woodward came along. Woodward was a wizard who made molecules whose construction had previously defied belief. He had predecessors, of course, but it was Woodward who solved the general problem by proving that one could apply well-known principles of physical organic chemistry, conformational analysis and stereochemical control to essentially synthesize any molecule. He provided the definitive proof of principle. All that was needed after that was enough time, effort and manpower. Woodward was followed by others like Corey, Evans, Stork and now Phil Baran, and all of them demonstrated the facile nature of the general problem.

If chemistry were computer science, then Woodward could be said to have created a version of the Turing Machine, a general formula that could allow you to synthesize the structure of any complex molecule, as long as you had enough NIH funding and cheap postdocs to fill in the specific gaps. Every synthetic chemist who came after Woodward has really developed his or her own special versions of Woodward’s recipe. They might have built new models of cars, but their Ferraris, Porches and Bentleys – as elegant and impressive as they are – are a logical extension of Woodward and his predecessor’s invention of the internal combustion engine and the assembly line. And it is this Turing Machine-like nature of synthetic schemes that lend them first to commoditization, and then to automation. The human component is still important and will always be but the proportion of that creative human contribution is definitely changing.

A measure of how the general problem of synthesis has been solved is readily apparent to me in my own small biotech company which specializes in cyclic peptides, macrocycles and other complex bioactive molecules. The company has a vibrant internship program for undergraduates in the area. To me the most remarkable thing is to see how quickly the interns can bring themselves up to speed on the synthetic protocols. Within a month or so of starting at the bench they start churning out these compounds with the same expertise and efficiency as chemists with PhDs. The point is, synthesizing a 16-membered ring with five stereocenters has not only become a routine, high-throughput task but it’s something that can be picked up by a beginner in a month. This kind of synthesis might have easily fazed a graduate student in Woodward's group twenty years ago and taken up a good part of his or her PhD project. The bottom line is that we chemists have to now face an uncomfortable fact: there are still a lot of unexpected gems to be found in synthesis, but the general problem is now solved and the incarnation of chemical synthesis as a tool for other disciplines is now essentially complete.

Functional design and energetics are now chemistry’s outstanding general problems

So if synthesis is no longer the general problem, what is? George Whitesides has held forth on this question in quite a few insightful articles, and my own field of medicinal chemistry and molecular modeling provides a good example. It may be easy to synthesize a highly complex drug molecule using routine techniques, but it is impossible, even now, to calculate the free energy of binding of an arbitrary simple small molecule with an arbitrary protein. There is simply no general formula, no Turing Machine, that can do this. There are of course specific cases where the problem can be solved, but the general solution seems light years away. And not only is the problem unsolved in practice but it is also unsolved in principle. It's not a question of manpower and resources, it's a question of basic understanding. Sure, we modelers have been saying for over twenty years that we have not been able to calculate entropy or not been able to account for tightly bound water molecules. But these are mostly convenient questions which when enunciated make us feel more emotionally satisfied. There have certainly been some impressive strides in addressing each of these and other problems, but the fact is that when it comes to calculating the free energy of binding, we are still today where we were in 1983. So yes, the calculation of free energies – for any system – is certainly a general problem that chemists should focus on.

But here’s the even bigger challenge that I really want to talk about: We chemists have been phenomenal in being able to design structure, but we have done a pretty poor job in designing function. We have of course determined the function of thousands of industrial and biological compounds, but we are still groping in the dark when it comes to designing function. An example from software would be designing an emergency system for a hospital: there the real problem is not writing the code but interfacing the code with the several human, economic and social factors that make the system successful. 

Here are a few examples from chemistry: Through combinatorial techniques we can now synthesize antibodies that we want to bind to a specific virus or molecule, but the very fact that we have to adopt a combinatorial, brute force approach means that we still can’t start from scratch and design a single antibody with the required function (incidentally this problem subsumes the problem of calculating the free energy of antigen-antibody binding). Or consider solar cells. Solid-state and inorganic chemists have developed an impressive array of methods to synthesize and characterize various materials that could serve as more efficient solar materials. But it’s still very hard to lay out the design principles – in general terms – for a solar material with specified properties. In fact I would say that the ability to rapidly make molecules has even hampered the ability to think through general design principles. Who wants to go to the trouble of designing a specific case when you can simply try out all combinations by brute force?

I am not taking anything away from the ingenuity of chemists – nor am I refuting the belief that you do whatever it takes to solve the problem – but I do think that in their zeal to perfect the art of synthesis chemists have neglected the art of de novo design. Yet another example is self-assembly, a phenomenon which operates in everything from detergent action to the origin of life. Today we can study the self-assembly of diverse organic and inorganic materials under a variety of conditions, but we still haven’t figured out the rules – either computational or experimental – that would allow us to specific the forces between multiple interacting partners so that these partners assembly in the desired geometry when brought together in a test tube. Ideally what we want is the ability to come up with a list of parts and the precise relationships between them that would allow us to predict the end product in terms of function. This would be akin to what an architect does when he puts together a list of parts that allows him to not only predict the structure of a building but also the interplay of air and sunlight in it.

I don’t know what we can do to solve this general problem of design but there are certainly a few promising avenues. A better understanding of theory is certainly one of them. The fact is that when it comes to estimating intermolecular interactions, the theories of statistical thermodynamics and quantum mechanics do provide – in principle – a complete framework. Unfortunately these theories are usually too computationally expensive to apply to the vast majority of situations, but we can still make progress if we understand what approximations work for what kind of systems. Psychologically I do think that there has to be a general push away from synthesis and toward understanding function in a broad sense. Synthesis still rules chemical science and for good reason; it's what makes chemistry unique among the sciences. But that also often makes synthetic chemists immune to the (well deserved) charms of conformation, supramolecular interactions and biology. It’s only when synthetic chemists seamlessly integrate themselves into the end stages of their day job that they will learn better to appreciate synthesis as an opportunity to distill general design principles. Part of this solution will also be cultural: organic synthesis has long enjoyed a cultish status which still endures. 

However the anguish should not obscure the opportunity here. The solution of the general problem of synthesis and its possible automation should not leave chemists chagrined since it's really a tribute to the amazing success that organic synthesis has engendered in the last thirty years. Instead of reinventing ourselves as synthetic chemists let's retool. Let the synthetic chemist interact with the physical biochemist, the structural engineer, the photonics expert; let him or her see synthesis through the requirement of function rather than structure. The synthesis is there, the other features are not. Whitesides was right when he said that chemists need to broaden out, but another way to interpret his statement would be to ask other scientists to channel their thoughts into synthesis in a feedback process. As chemists we have nailed structure, but nailing design will bring us untold dividends and will enormously enhance the contribution of chemistry to our world.
 

Adapted from a previous post on Scientific American Blogs.

Memo to chemists: Move away from the molecule

Derek's latest post on the 'end of synthesis' reminded me of this perspicacious essay by George Whitesides which I had blogged about on Scientific American last year. I am reposting the piece here.
Harvard chemist George Whitesides probably does not consider himself a philosopher of chemistry, but he is rapidly turning into one with his thought-provoking pronouncements on the future of the field and its practitioners. In 2013 he wrote a rumination in the Annual Reviews of Analytical Chemistry provocatively titled “Is the Focus on Molecules Obsolete?” where he uses analytical chemistry as an excuse to really pontificate on the state and progress of chemical science. Along the way he also has some valuable words of advice for aspiring chemists.
Whitesides’s main message to young chemists is to stop focusing on molecules. Given the nature of chemistry this advice may seem strange, even blasphemous. After all it’s the molecule that has always been the heart and soul of chemical science. And for chemists, the focus on molecules has manifested itself through two important activities – structure determination and synthesis. The history of chemistry is essentially the history of finding out the structure of molecules and of developing new and efficient methods of making them. Putting these molecules to new uses is what underpins our modern world, but it was really a secondary goal for most of chemistry’s history. Whitesides tells us that the focus of the world’s foremost scientific problems is moving away from composition to use, from molecules to properties. Thus the new breed of chemists should really focus on creating properties rather on creating molecules. The vehicle for Whitesides’s message is the science and art of analytical chemistry which has traditionally dealt with developing new instrumentation and methods for analyzing the structure and properties of molecules.
Of course, since properties depend on structures, Whitesides is not telling us to abandon our search for better, cleaner and more efficient techniques of synthesis. Rather, I see what he is saying as a kind of “platform independence”. Let’s take a minute to talk about platform independence. As the physicist Leo Kadanoff has demonstrated, you can build a computer by moving around 1s and 0s or by moving around buckets of water, with full buckets essentially representing 1s and empty ones representing 0s. Both models can give rise to computing. Just like 1s and 0s simply turn out to be convenient abstract moving parts for building computers, similarly a certain kind of molecule should be seen as no more than a convenient vehicle for creating a particular property. That property can be anything from “better stability in whole blood” to “efficient capture of solar energy” to “tensile strength”. The synthesis of whatever molecular material gives rise to particular properties is important, but it should be secondary; a convenient means to an end that can be easily replaced with another means. As an example from his own childhood, Whitesides describes a project carried out in his father’s company in which his job was to determine the viscosities of different coal-tar blacks. The exact kind of coal-tar black was important, but what really counted was the property – viscosity – and not the molecular composition.
A focus on properties is accompanied by one on molecular systems, since often it’s a collection of different, diverse molecules rather than of a single type that gives rise to a desired property. What kind of problems will benefit from a molecular systems approach? Whitesides identifies four critical ones; health care, environmental management, national security and megacity management. We have already been living with the first three challenges, and the fourth one looms large on the horizon.
Firstly, health care. Right now most of the expenditure on health care, especially in the United States, is on end-of-life care. Preventative medicine and diagnostics are still relegated to the sidelines. One of the most important measures to drive down the cost of healthcare will be to focus on prevention, thus avoiding the expensive, all-out war that is often waged – and lost – on diseases like cancer during their end stages. Prevention and diagnostics are areas where chemistry can play key roles. We still lack methods that can quickly and comprehensively analyze disease markers in whole blood, and this is an area where analytical and other kinds of chemists can have a huge impact. And no method of diagnostics is going to be useful if it’s not cheap, so it’s obvious that chemistry will also have to struggle to minimize material cost, another goal which it has traditionally been good at addressing, especially in industry.
Secondly, the environment. We live in an age when the potentially devastating effects of climate change and biodiversity loss demand quick and comprehensive action. Included in this response will be the ability to monitor the environment, and to relate local monitoring parameters to global ones. Just like we still lack methods to analyze the composition of complex whole blood, we also lack methods to quickly analyze and compare the composition of the atmosphere, soil and seawater in different areas of the world. Analyzing heterogeneous systems with different phases like the atmosphere is a tricky and quintessentially chemical problem, and chemists have their work cut out in front of them to make such routine analysis a reality.
Thirdly, national security. Here chemists will face even greater challenges, since the solutions are as much political and social as they are scientific. Nonetheless, science will play an important role in the resolution of scores of challenges that have to be met to make the world more secure; these include quickly analyzing the composition of a suspicious liquid, solid or gas, unintrusively finding out whether a particular individual has spent time in certain volatile parts of the world or has been handling certain materials, and using techniques to track the movements of suspicious individuals in diverse locations. Chemistry will undoubtedly have to interface with other disciplines in addressing these problems and questions of privacy will be paramount, but there is little doubt that chemists have traditionally not participated much in such endeavors and need to step up to the plate in order to address what are obviously important security issues.
Fourthly, megacities. As we pick up speed and move into the second decade of the twenty-first century, one of the greatest social challenges confronting us is how to have very large, heterogeneous populations ranging across diverse levels of income and standards of living co-existing in peace over vast stretches of land. This is the vision of the megacity whose first stirrings we are already witnessing around the world. Among the problems that megacities will encounter will be monitoring air, water and food quality (vida supra). A task like analyzing the multiple complex components of waste effluent, preferably with a readout that quantifies each component and assesses basic qualities like carcinogenicity would be invaluable. There is no doubt that chemists could play an indispensable role in meeting such challenges.
The above discussion of major challenges makes Whitesides’s words about moving away from the molecule clear. The problems encompassing health care, national security and environmental and megacity management involve molecules, but what they really are are collages resulting from the interaction of molecules with other scientific entities, and with the interaction of chemists with many other kinds of professional scientists and policy makers. In one sense Whitesides is simply asking chemists to leave the familiar environment of their provincial roots and diversify. What chemists really need to think of is molecules embedded in a broad context involving other disciplines and human problems.
Part of the challenge of addressing the above issues will be the proper training of chemists. The intersection of chemistry with social issues and public policy demands interdisciplinary and general skills, and Whitesides urges chemists to be trained in general areas rather than specialized subfields. Courses in applied mathematics and statistics, public policy, urban planning, healthcare management and environmental engineering are traditionally missing from chemistry curricula, and chemists should branch out and take as many of these as is possible within a demanding academic environment. It is no longer sufficient for chemists to limit themselves to analysis and synthesis if they want to address society’s most pressing problems. And at the end of it they need not feel that a movement away from the molecule is tantamount to abandoning the molecule; rather it is an opportunity to press the molecule into interacting with the human world on a canvas bigger than ever before.

Three reasons why nobody should be surprised by junk DNA

Science writer Carl Zimmer just wrote an article in the New York Times about the junk DNA wars and it's worth a read. Zimmer presents conflicting expert opinions about the definition of 'junk' and the exact role that junk DNA plays in various genetic processes. As he makes clear, the conversation is far from over. But personally I have always thought that the whole debate has been a bit of an unnecessary dustup (although the breathless media coverage of ENCODE didn't help). What surprises me especially is that people are surprised by junk DNA.
Even to someone like me who is not an expert, the existence of junk DNA appears perfectly normal. Let me explain. I think that junk DNA shouldn’t shock us at all if we accept the standard evolutionary picture. The standard evolutionary picture tells us that evolution is messy, incomplete and inefficient. DNA consists of many kinds of sequences. Some sequences have a bonafide biological function in that they are transcribed and then translated into proteins that have a clear physiological role. Then there are sequences which are only transcribed into RNA which doesn’t do anything. There are also sequences which are only bound by DNA-binding proteins (which was one of the definitions of “functional” the ENCODE scientists subscribed to). Finally, there are sequences which don’t do anything at all. Many of these sequences consist of pseudogenes and transposons and are defective and dysfunctional genes from viruses and other genetic flotsam, inserted into our genome through our long, imperfect and promiscuous genetic history. If we can appreciate that evolution is a flawed, piecemeal, inefficient and patchwork process, we should not be surprised to find this diversity of sequences with varying degrees of function or with no function in our genome.
The reason why most of these useless pieces have not been weeded out is simply because there was no need to. We should remember that evolution does not work toward a best possible outcome, it can only do the best with what it already has. It’s too much of a risk and too much work to get rid of all these defective and non-functional sequences if they aren’t a burden; the work of simply duplicating these sequences is much lesser than that of getting rid of them. Thus the sequences hung around in our long evolutionary history and got passed on. The fact that they may not serve any function at all would be perfectively consistent with a haphazard natural mechanism depending on chance and the tacking on of non-functionality to useful functions simply as extra baggage.
There are two other facts in my view which should make it very easy for us to accept the existence of junk DNA. Consider that the salamander genome is ten times the size of the human genome. Now this implies two possibilities; either salamanders have ten times functional DNA than we do, or that the main difference between us and salamanders is that they have much more junk DNA. Wouldn’t the complexity of salamander anatomy of physiology be vastly different if they really had so much more functional DNA? On the contrary, wouldn’t the relative simplicity of salamanders compared to humans be much more consistent with just varying degrees of junk DNA? Which explanation sounds more plausible?
The third reason for accepting the reality of junk DNA is to simply think about mutational load. Our genomes, as of other organisms, have undergone lots of mutations during evolution. What would be the consequences if 90% of our genome were really functional and had undergone mutations? How would we have survived and flourished with such a high mutation rate? On the other hand, it’s much simpler to understand our survival if we assume that most mutations that happen in our genome happen in junk DNA.
As a summary then, we should be surprised to find someone who says they are surprised by junk DNA. Even someone like me who is not an expert can think of at least three simple reasons to like junk DNA:
1. The understanding that evolution is an inherently messy and inefficient process that often produces junk. This junk may be retained if it’s not causing trouble.
2. The realization that the vast differences in genome sizes are much better explained by junk DNA than by assuming that most DNA is truly functional.
3. The understanding that mutational loads would be prohibitive had most of our DNA not been junk.
Finally as a chemist, let me say that I don’t find the binding of DNA-binding proteins to random, non-functional stretches of DNA surprising at all. That hardly makes these stretches physiologically important. If evolution is messy, chemistry is equally messy. Molecules stick to many other molecules, and not every one of these interactions has to lead to a physiological event. DNA-binding proteins that are designed to bind to specific DNA sequences would be expected to have some affinity for non-specific sequences just by chance; a negatively charged group could interact with a positively charged one, an aromatic ring could insert between DNA base pairs and a greasy side chain might nestle into a pocket by displacing water molecules. It was a pity the authors of ENCODE decided to define biological functionality partly in terms of chemical interactions which may or may not be biologically relevant.
The dustup from the ENCODE findings suggests that scientists continue to find order and purpose in an orderless and purposeless universe which can nonetheless produce structures of great beauty. They would like to find a purpose for everything in nature and are constantly looking for the signal hidden in the noise. Such a quest is consistent with our ingrained sense of pattern recognition and has often led to great discoveries. But the stochastic, contingent, haphazard meanderings of nature mean that sometimes noise is just that, noise. It’s a truth we must accept if we want to understand nature as she really is.

Adapted from a previous post on Scientific American Blogs.

The data junkie and the lamp post: Cancer, genomics and technological solutionism

The Cancer Genome Atlas
The other day I was having a nice discussion with a very knowledgable colleague about advances in genomic sequencing and how they are poised to transform the way we acquire and process genetic information in biology, biochemistry and medicine. The specific topic under consideration was the bevy of sequencing companies who are showcasing their wares at the Advances in Genome Biology and Technology 2015 conference. Many companies like Solexa and Oxford Nanopore are in an intense race to prove whose sequencing technology can become the next Illumina, and it's clear that much fame and fortune lies ahead for whoever gets there first. It's undoubtedly true that technological developments in this field are going to have enormous and uncertain ramifications for all kinds of disciplines as well as potentially for our way of life.
And yet as I mull over these issues I am reminded of an article by MIT biologist Michael Yaffe from the journal Science Signaling which warns against the quick wielding of what the philosopher of technology Yevgeny Morozov has called “technological solutionism”. Technological solutionism is the tendency to define problems primarily or purely based on whether or not a certain technology can address them. This is a concerning trend since it foreshadows a future where problems are no longer prioritized by their social or political importance but instead by how easily they would succumb under the blade of well-defined and easily available technological solutions. Morozov’s solutionism is a more sophisticated version of the adage about everything looking like a nail when you have a hammer. But it’s all too real in this age of accelerated technological development, when technology advances much faster than we can catch up with its implications. It’s a problem that only threatens to grow.
In his piece Yaffe alerts us to the pitfalls of somewhat mindlessly applying genomic sequencing to discovering the basis and cure for cancer and succumbing to such solutionism in the process. One of the great medical breakthroughs of the twentieth century was the finding that cancer is in its heart and soul a genetic disease. This finding was greatly bolstered by the discovery of specific genes (oncogenes and tumor suppressor genes) which when mutated greatly increase the probability and progress of the disease. The availability of cheap sequencing techniques in the latter half of the century gave scientists and doctors what seemed to be a revolutionary tool for getting to the root of the genetic basis of cancer. Starting with the great success of the human genome project, it became increasingly easier to sequence entire genomes of cancer patients to discover the mutations that cause the disease. Scientists have been hopeful since then that sequencing cancer cells from hundreds of patients would enable them to discover new mutations which in turn would point to new potential therapies.
But as Yaffe points out, this approach has often ended up relegating true insights into cancer to the application of one specific technology – that of genomics – to probe the complexities of the diseases. And as he says, this is exactly like the drunk looking under the lamppost, not because that’s where his keys really are but that’s where the light is. In this case the real basis for cancer therapy constitutes the keys, sequencing is the light. During the last few years there have been several significant studies on major cancers like breast, colorectal and ovarian cancer which have sought to sequence cancer cells from hundreds of patients. This information has been incorporated into The Cancer Genome Atlas, an ambitious effort to chart and catalog all the significant mutations that every important cancer can possibly accrue.
But these efforts have largely ended up finding more of the same. The Cancer Genome Atlas is a very significant repository, but it may end up accumulating data that’s irrelevant for actually understanding or curing cancer. Yaffe acknowledges this fact and expresses thoughtful concerns about the further expenditure of funds and effort on massive cancer genome sequencing at the expense of other potentially valuable projects.
So far, the results have been pretty disappointing. Various studies on common human tumors, many under the auspices of The Cancer Genome Atlas (TCGA), have demonstrated that essentially all, or nearly all, of the mutated genes and key pathways that are altered in cancer were already known…Despite the U.S. National Institutes of Health (NIH) spending over a quarter of a billion dollars (and all of the R01 grants that are consequently not funded to pay for this) and the massive data collection efforts, so far we have learned little regarding cancer treatment that we did not already know. Now, NIH plans to spend millions of dollars to massively sequence huge numbers of mouse tumors!
It’s pretty clear that while there has been valuable data gathered from sequencing these patients, almost none of it has led to novel insights. Why, then, do the NIH and researchers continue to focus on raw, naked sequencing? Enter the data junkie and the lamppost:
I believe the answer is quite simple: We biomedical scientists are addicted to data, like alcoholics are addicted to cheap booze. As in the old joke about the drunk looking under the lamppost for his lost wallet, biomedical scientists tend to look under the sequencing lamppost where the “light is brightest”—that is, where the most data can be obtained as quickly as possible. Like data junkies, we continue to look to genome sequencing when the really clinically useful information may lie someplace else.
The term “data junkie” conjures up images of the quintessential chronically starved, slightly bug-eyed nerd hungry for data who does not quite realize the implications or the wisdom of simply churning information out from his fancy sequencing machines and computer algorithms. The analogy would have more than a shred of truth to it since it speaks to something all of us are in danger of becoming; data enthusiasts who generate information simply because they can. This would be technological solutionism writ large; turn every cancer research and therapeutics problem into a sequencing problem because that’s what we can do cheaply and easily.
Clearly this is not a feasible approach if we want to generate real insights into cancer behavior. Sequencing will undoubtedly continue to be an indispensable tool but as Yaffe points out, the real action takes place at the level of proteins, in the intricacies of the signaling pathways involving hundreds of protein hubs whose perturbation is key to a cancer cell’s survival. When drugs kill cancer cells they don’t target genes, they directly target proteins. Yaffe mentions several recent therapeutic discoveries which were found not by sequencing but by looking at the chemical reactions taking place in cancer cells and targeting their sources and products; essentially by adopting a protein-centric approach instead of a gene-centric one. Perhaps we should re-route some of those resources which we are using for sequencing into studying these signaling proteins and their interdependencies:
These therapeutic successes may have come even faster, and the drugs may be more effectively used in the future, if cancer research focuses on network-wide signaling analysis in human tumors (20), particularly when coupled with insights that the TCGA sequencing data now provide Currently, signaling measurements are hard, not particularly suited for high-throughput methods, and not yet optimized for use in clinical samples. Why not invest in developing and using technologies for these signaling directed studies?
In other words, why not ask the drunk to buy a lamp and install it in another part of town where his keys are more likely to located? It’s a cogent recommendation. But it’s important not to lose sight of the larger implications of Yaffe’s appeal to explore alternative paradigms for finding effective cures for cancers. In one sense he is directly speaking to the love affair with data and new technology that seems to be increasingly infecting the minds and hearts of the new generation. Whether it’s cancer researchers hoping that sequencing will lead to breakthroughs or political commentators hoping that Twitter and Facebook will help bring democracy in the Arab world, we are all in danger of being sucked into the torrent of technological solutionism. Of this we must be eternally vigilant.

Adapted from a previous post on Scientific American Blogs.

Physicists in biology, inverse problems and other quirks of the genomic age

Nobel Laureate Sydney Brenner has 
criticized systems biology as a grandiose 
attempt to solve inverse problems in biology
Leo Szilard – brilliant, peripatetic Hungarian physicist, habituĂ© of hotel lobbies, soothsayer without peer – first grasped the implications of a nuclear chain reaction in 1933 while stepping off the curb at a traffic light in London. Szilard has many distinctions to his name; not only did he file a patent for the first nuclear reactor with Enrico Fermi, but he was the one who urged his old friend Albert Einstein to write a famous letter to Franklin Roosevelt, and also the one who tried to get another kind of letter signed as the war was ending in 1945; a letter urging the United States to demonstrate a nuclear weapon in front of the Japanese before irrevocably stepping across the line. Szilard was successful in getting the first letter signed but failed in his second goal.
After the war ended, partly disgusted by the cruel use to which his beloved physics had been put, Szilard left professional physics to explore new pastures – in his case, biology. But apart from the moral abhorrence which led him to switch fields, there was a more pragmatic reason. As Szilard put it, this was an age when you took a year to discover something new in physics but only took a day to discover something new in biology.
This sentiment drove many physicists into biology, and the exodus benefited biological science spectacularly. Compared to physics whose basic theoretical foundations had matured by the end of the war, biology was uncharted territory. The situation in biology was similar to the situation during the heyday of physics right after the invention of quantum theory when, as Paul Dirac quipped, “even second-rate physicists could make first-rate discoveries”. And physicists took full advantage of this situation. Since Szilard, biology in general and molecular biology in particularly have been greatly enriched by the presence of physicists. Today, any physics student who wants to mull doing biology stands on the shoulders of illustrious forebears including Szilard, Erwin Schrodinger, Francis Crick, Walter Gilbert and most recently, Venki Ramakrishnan.
What is it that draws physicists to biology and why have they been unusually successful in making contributions to it? The allure of understanding life which attracts other kinds of scientists is certainly one motivating factor. Erwin Schrodinger whose little book “What is Life?” propelled many including Jim Watson and Francis Crick into genetics is one example. Then there is the opportunity to simplify an enormously complex system into its constituent parts, an art which physicists have excelled at since the time of the Greeks. Biology and especially the brain is the ultimate complex system, and physicists are tempted to apply their reductionist approaches to deconvolute this complexity. Thirdly there is the practical advantage that physicists have; a capacity to apply experimental tools like x-ray diffraction and quantitative reasoning including mathematical and statistical tools to make sense of biological data.
The rise of the data scientists
It it this third reason that has led to a significant influx of not just physicists but other quantitative scientists, including statisticians and computer scientists, into biology. The rapid development of the fields of bioinformatics and computational biology has led to a great demand for scientists with the quantitative skills to analyze large amounts of data. A mathematical background brings valuable skills to this endeavor and quantitative, data-driven scientists thrive in genomics. Eric Lander for instance got his PhD in mathematics at Oxford before – driven by the tantalizing goal of understanding the brain – he switched to biology. Cancer geneticist Bert Vogelstein also has a background in mathematics. All of us are familiar with names like Craig Venter, Francis Collins and James Watson when it comes to appreciating the cracking of the human genome, but we need to pay equal attention to the computer scientists without whom crunching and combining the immense amounts of data arising from sequencing would have been impossible. There is no doubt that, after the essentially chemically driven revolution in genetics of the 70s, the second revolution in the field has been engineered by data crunching.
So what does the future hold? The rise of the “data scientists” has led to the burgeoning field of systems biology, a buzzword which seems to proliferate more than its actual understanding. Systems biology seeks to integrate different kinds of biological data into a broad picture using tools like graph theory and network analysis. It promises to potentially provide us with a big-picture view of biology like no other. Perhaps, physicists think, we will have a theoretical framework for biology that does what quantum theory did for, say, chemistry.
Emergence and systems biology: A delicate pairing
And yet even as we savor the fruits of these higher-level approaches to biology, we must be keenly aware of their pitfalls. One of the fundamental truths about the physicists’ view of biology is that it is steeped in reductionism. Reductionism is the great legacy of modern science which saw its culmination in the two twentieth-century scientific revolutions of quantum mechanics and molecular biology. It is hard to overstate the practical ramifications of reductionism. And yet as we tackle the salient problems in twenty-first century biology, we are become aware of the limits of reductionism. The great antidote to reductionism is emergence, a property that renders complex systems irreducible to the sum of their parts. In 1972 the Nobel Prize winning physicist Philip Anderson penned a remarkably far-reaching article named “More is Different” which explored the inability of “lower-level” phenomena to predict their “higher-level” manifestations.
The brain is an outstanding example of emergent phenomena. Many scientists think that neuroscience is going to be to the twenty-first century what molecular biology was to the twentieth. For the first time in history, partly through recombinant DNA technology and partly due to state-of-the-art imaging techniques like functional MRI, we are poised on the brink of making major discoveries about the brain; no wonder that Francis Crick moved into neuroscience during his later years. But the brain presents a very different kind of challenge than that posed by, say, a superconductor or a crystal of DNA. The brain is a highly hierarchical and modular structure, with multiple dependent and yet distinct layers of organization. From the basic level of the neuron we move onto collections of neurons and glial cells which behave very differently, onward to specialized centers for speech, memory and other tasks on to the whole brain. As we move up this ladder of complexity, emergent features arise at every level whose behavior cannot be gleaned merely from the behavior of individual neurons.

The tyranny of inverse problems
The problem thwarts systems biology in general. In recent years, some of the most insightful criticism of systems biology has come from Sydney Brenner, a founding father of molecular biology whose 2010 piece in Philosophical Transactions of the Royal Society titled “Sequences and Consequences” should be required reading for those who think that systems biology’s triumph is just around the corner. In his essay, Brenner strikes at what he sees as the heart of the goal of systems biology. After reminding us that the systems approach seeks to generate viable models of living systems, Brenner goes on to say that:
“Even though the proponents seem to be unconscious of it, the claim of systems biology is that it can solve the inverse problem of physiology by deriving models of how systems work from observations of their behavior. It is known that inverse problems can only be solved under very specific conditions. A good example of an inverse problem is the derivation of the structure of a molecule from the X-ray diffraction pattern of a crystal…The universe of potential models for any complex system like the function of a cell has very large dimensions and, in the absence of any theory of the system, there is no guide to constrain the choice of model.”
What Brenner is saying that every systems biology project essentially results in a model, a model that tries to solve the problem of divining reality from experimental data. However, a model is not reality; it is an imperfect picture of reality constructed from bits and pieces of data. It is therefore – and this has to be emphasized – only one representation of reality. Other models might satisfy the same experimental constraints and for systems with thousands of moving parts like cells and brains, the number of models is astronomically large. In addition, data in biological measurements is often noisy with large error bars, further complicating its use. This puts systems biology into the classic conundrum of the inverse problem that Brenner points out, and like other inverse problems, the solution you find is likely to be one among an expanding universe of solutions, many of which might be better than the one you have. This means that while models derived from systems biology might be useful – and often this is a sufficient requirement for using them – they might likely leave out some important feature of the system.
There has been some very interesting recent work in addressing such conundrums. One of the major challenges in the inverse problem universe is to find a minimal set of parameters that can describe a system. Ideally the parameters should be sensitive to variation so that one constrains the parameter space describing the given system and avoids the "anything goes" trap. A particularly promising example is the use of 'sloppy models' developed by Cornell physicist James Sethna and others in which parameter combinations rather than individual parameters are varied and those combinations which are most tightly constrained are then picked as the 'right' ones.

But quite apart from these theoretical fixes, Brenner’s remedy for avoiding the fallout from imperfect systems modeling is to simply use the techniques garnered from classical biochemistry and genetics over the last century or so. In one sense systems biology is nothing new; as Brenner tartly puts it, “there is a watered-down version of systems biology which does nothing more than give a new name to physiology, the study of function and the practice of which, in a modern experimental form, has been going on at least since the beginning of the Royal Society in the seventeenth century”. Careful examination of mutant strains of organisms, measurement of the interactions of proteins with small molecules like hormones, neurotransmitters and drugs, and observation of phenotypic changes caused by known genotypic perturbations remain tried-and-tested ways of drawing conclusions about the behavior of living systems on a molecular scale.
Genomics and drug discovery: Tread softly
This viewpoint is also echoed by those who take a critical view of what they say is an overly genomics-based approach to the treatment of diseases. A particularly clear-headed view comes from Gerry Higgs who in 2004 presciently wrote a piece titled “Molecular Genetics: The Emperor’s Clothes of Drug Discovery”. Higgs criticizes the whole gamut of genomic tools used to discover new therapies, from the “high-volume, low-quality sequence data” to the genetically engineered cell lines which can give a misleading impression of molecular interactions under normal physiological conditions. Higgs points to many successful drugs discovered in the last fifty years which have been found using the tools of classical pharmacology and biochemistry; these would include the best-selling, Nobel Prize winning drugs developed by Gertrude Elion and James Black based on simple physiological assays. Higgs’s point is that the genomics approach to drugs runs the risk of becoming too reductionist and narrow-minded, often relying on isolated systems and artificial constructs that are uncoupled from whole systems. His prescription is not to discard these tools which can undoubtedly provide important insights, but supplement them with older and proven physiological experiments.
Does all this mean that systems biology and genomics would be useless in leading us to new drugs? Not at all. There is no doubt that genomic approaches can be remarkably useful in enabling controlled experiments. The systems biologist Leroy Hood for instance has pointed out how selective gene silencing can allow us to tease apart side-effects of drugs from beneficial ones. But what Higgs, Brenner and others are impressing upon us is that we shouldn’t allow genomics to become the end-all and be-all of drug discovery. Genomics should only be employed as part of a judiciously chosen cocktail of techniques including classical ones for interrogating the function of living systems. And this applies more generally to physics-based and systems biology approaches. 

Perhaps the real problem from which we need to wean ourselves is “physics envy”; as the physicist-turned-financial modeler Emanuel Derman reminds us, “Just like  physicists, we would like to discover three laws that govern ninety-nine percent of our system’s intricacies. But we are more likely to discover ninety-nine laws that explain three percent of our system”. And that’s as good a starting point as any.

Adapted from a previous post on Scientific American Blogs.