Field of Science

The virtualization of drug discovery is not the actualization of drug discovery

Automation in biology and drug development can design, execute, explore and analyze, but can it ask the right questions?
From Forbes comes this piece about VC firm Andreessen Horowitz and former Stanford biology/computer science professor Vijay Pande (who has recently joined the firm as a partner) who are trying to get the most out of the "hacking drug discovery" paradigm. 

The piece talks about the "virtualization" of several key aspects of biology and drug discovery which would involve using software and smart automation to perform experiments. The areas that Horowitz and Pande are focusing on in particular involve cloud-based experiments, machine learning and lab automation. This is not the first time that Bay Area software entrepreneurs and scientists are taking notice of drug discovery. Peter Thiel who has had some interesting thoughts on drug development has already funded Emerald Cloud Labs (ECL), a venture that uses 15,000 sq ft of lab space packed with robots and automation to perform experiments that you can design and initiate with the click of a button on your laptop.

I am all for new approaches to drug discovery and especially ones that promise to make it more efficient, but as the piece notes, the optimism about applying software to drug development is rightly tempered by a recognition of the inherent messiness of biology and the vast gaps of ignorance that riddle our knowledge of the interaction between small molecules and living organisms. The article quotes several industry experts on the challenges that any kind of software-based drug development platform would face. Here's Mark Murcko for instance:

Can you solve the biology problem with the latest technology out on the cloud? I have not seen that.Every day, every company I work with is struggling with target validation, biomarkers and patient selection. Questions come up such as “‘I have a hit from a screen and I do not know what it does’ and  ‘Which of these two targets (out of ten in total) do I pick for my next drug discovery project?’” Murcko said. All of this, Murcko said, gets into biology that is “half-right and half-wrong. For example, ‘I have to extrapolate from mouse data.’ Or ‘It is human genetic data but it’s from the germ line [e.g. from sperm or egg cells or their immediate progeny].’”  So the data do not necessarily teach you what will happen if you shut down 80% of the activity of the same target in a 50-year-old patient."

And he's right. The kinds of questions that most people in drug discovery tackle are very messy and often quantitatively ill-defined. They deal with emergent biological organization and non-linear dose-response. It's one thing to be able to speed up the acquisition of data in such experiments; another to interpret that data or even to ask the right questions in the first place. Although I am all for automation and cloud-based analysis, these experiments by themselves are not going to speed up the fundamental challenges involved in getting to a new drug, nor are they going to account for unexpected events. One of the other experts quoted in the article, Nagesh Mahanthappa, puts this issue into perspective:

“You can automate an assay and be in love with the output. But if you bother to look you can find out that you have been grossly misled…These days, so much equipment is automated or semi-automated. They give you results in thirty minutes. But the results often sound like this: ‘The molecule inhibited signaling.’ You have to remember to ask in that case, are you sure that the molecule did not just kill all the cells? Or that the cells were not washed off the plates during a washing step?”

Consider Emerald Cloud Labs for instance; their website lists dozens of experiments ranging from flow cytometry to fluorescence microscopy which you can remotely ask a robot to perform. Key biostructural techniques like crystallography and NMR spectroscopy are coming online by the end of the year. It's great that you can cheaply outsource such techniques from the comfort of your living room. But the problem as anyone who has worked in the field knows is that both the course and the output of these experiments are far from standard. No assay development project is the same as another, even for well known targets, and assays and biophysical characterization of every target and class of small molecule demands its own tweaking, idiosyncrasies and unexpected glitches. There is no doubt that a facility like Emerald Cloud Labs will speed up plain vanilla type experiments, but there is also little doubt in my mind that the automation that such a facility promises will be severely hampered by the project specific human-intervention that will be constantly demanded by the vagaries of drug discovery.

Some of the thinking in that article exemplifies what Derek Lowe has called the "Andy Grove fallacy", the belief that bringing computational thinking to biology will help us rapidly sort the wheat from the chaff and get to the right answer fast. It's the kind of thinking that a lot of Silicon Valley entrepreneurs who are steeped in the high success rate of software ventures are bringing to bear on the intricacies of biology. Unfortunately as I mentioned before the problem here is not speed or efficiency per se, it's asking the right questions in the first place. Very little of our ability to develop new drugs is constrained by speed; much of it is constrained by plain ignorance. You can hack together a car app given enough manpower, money and time because the goal is usually quite clear and the process highly deterministic. That's not the case with the emergent world of biology. There's not much point in doing something fast if you don't know whether that's the right thing to do. Blazing automation will help only if you are asking the right question and know how to interpret the answer.

All this being said, I am glad that people like Thiel, Andreessen, Horowitz and Pande are putting their own money into such investments and walking the talk. The real benefit of such ventures would be to push the boundaries of our thinking regarding the application of data science to biology, and even the ignorance that they would discover would be enlightening. At the very least, increased speed and automation would allow us to make mistakes faster and learn what doesn't work. And as anyone who has worked in the time and money-constrained world of pharma knows, that's as good an asset to have as any other.

Update: Derek's take.


  1. Computer experiments include all the things you understood well enough to code into your program. Real experiments also include all the things you don't understand yet.

  2. "the "Andy Grove fallacy", the belief that bringing computational thinking to biology will help us rapidly sort the wheat from the chaff and get to the right answer fast"

    We won't know until we try. Though I'm sure you and Derek and others will continue to post these hand wringing comments every time someone does.

    1. Not hand wringing, healthy skepticism and a fear of hype.

    2. Here's what I had to say about hype in a contribution to the JCAMD 25th anniversary issue ( ) and I also managed to drop a 'voodoo thermodynamics' later in the article.

      "The last 25 years have seen adoption of a number of technologies by pharmaceutical and biotechnology industries and much CAMD activity is a reaction to these developments. Typically, each new technology is introduced with the promise that it will revolutionise Drug Discovery and much of the hyperbole spills over into the CAMD arena. Some of the difficulty that CAMD scientists face in gaining acceptance for their approaches can be traced to extravagant claims made earlier by other CAMD scientists. Assessing the value added by a new technology is not always easy. When a pharmaceutical company spends a large amount of money to acquire new capability, it is in the interests of both vendor and customer that purchases and collaborations are seen in the most favourable light. Over-selling of technologies leads to panacea-centric thinking, which is especially dangerous in CAMD because success frequently depends on bringing together diverse computational tools to both define and solve problems. One important lesson from the last 25 years of Drug Discovery is that technology is a good servant but a poor master"

  3. Sure, we won't know until we try. But the previous rounds of over-hype (all the genomics, proteomics, systems biology and all the other stuff that was going to rapidly lead to personalized medicine cures for everybody and their dog in a few years) have made people skeptical.

    That being said, I'm all for these computational biology ventures, as long as private money goes into them rather than my tax dollars.

    1. And we do want you try since trying means spending lots of dosh on Bayesian Bullshitter which is our tenth generation, flagship product. Have faith and stop being so negative. We can't flog software to negative people.


Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="">FoS</a> = FoS