Lab automation using machine learning? Hold on to your pipettes for now.

There is an interesting article on using machine learning and AI for lab automation in Science that generally puts a positive spin on the use of smart computer algorithms for automating routine experiments in biology. The idea is that at some point in the near future, a scientist could design, execute and analyze the results of experiments on her MacBook Air from a Starbucks.

There's definitely a lot of potential for automating routine lab protocols like pipetting and plate transfers, but this has already been done by robots for decades. What the current crop of computational improvements plans to do is potentially much more consequential though; it is to conduct entire suites of biological experiments with a few mouse clicks. The CEO of Zymergen, a company profiled in the piece, says that the ultimate objective is to "get rid of human intuition"; his words, not mine.

I must say I am deeply skeptical of that statement. There is no doubt that parts of experiment planning and execution will indeed become more efficient because of machine learning, but I don't see human biologists being replaced or even significantly augmented anytime soon. The reason is simple: most of research, and biological research in particular, is not about generating and rapidly testing answers (something which a computer excels at), but about asking questions (something which humans typically excel at). A combination of machine learning and robotics may well be very efficient at laying out a whole list of possible solutions and testing them, but it will all come to naught if the question that's being asked is the wrong one.

Machine learning will certainly have an impact, but only in a narrowly circumscribed set of experimental space. Thus, I don't think it's just a coincidence that the article focuses on Zymergen, a company which is trying to produce industrial chemicals by tweaking bacterial genomes. This process involves mutating thousands of genes in bacteria and then picking combinations that will increase the fitness of the resulting organism. It is exactly the kind of procedure that is well-adapted to machine learning (to try to optimize and rank mutations for instance) and robotics (to then perform the highly repetitive experiments). But that's a niche application, working well in areas like directed evolution; as the article itself says, "Maybe Zymergen has stumbled on the rare part of biology that is well-suited to computer-controlled experimentation."

In most of biological research, we start with figuring out what question to ask and what hypotheses to generate. This process is usually the result of combining intuition with experience and background knowledge. As far as we know, only human beings excel in this kind of coarse-grained, messy data gathering and thinking. Take drug discovery for instance; most drug discovery projects start with identifying a promising target or phenotype. This identification is usually quite complicated and comes from a combination of deep expertise, knowledge of the literature and careful decisions on what are the right experiments to do. Picking the right variables to test and knowing what the causal relationships between them are is paramount. In fact, most drug discovery fails because the biological hypothesis that you begin with is the wrong one, not because it was too expensive or slow to test the hypothesis. Good luck teaching a computer to tell you whether the hypothesis is the right one.

It's very hard for me to see how to teach a machine this kind of multi-layered, interdisciplinary analysis. One we have the right question or hypothesis of course we can potentially design an automated protocol to carry out the relevant experiments, but reaching that point is going to take a lot more than just rapid trial and error and culling of less promising possibilities.

This latest wave of machine learning optimism therefore looks very similar to the old waves. It will have some impact, but the impact will be modest and likely limited to particular kinds of projects and goals. The whole business reminds me of the story - sometimes attributed to Lord Kelvin - about the engineer who was recruited by a company to help them with building a bridge. After thinking for about an hour, he made a mark with a piece of chalk on the ground, told the company's engineers to start building the bridge at that location, and then billed them for ten thousand dollars. When they asked what on earth he expected so much money for, he replied, "A dollar for making that mark. Nine thousand nine hundred and ninety nine for knowing where to make it." 

I am still waiting for that algorithm which tells me where to make the mark.


  1. Sometimes answers arise out of the errors we commit while conducting an experiment or in the design. Something goes wrong and when trying to figure that out, we get an answer. AI may not be able to that kind of thinking since every mistake is unique and one cannot program for it.

  2. As researchers, we also acquire a tremendous amount of "prior knowledge" (all those years of learning!) that helps us to propose the best hypotheses and design the best experiments. ML/AI does not have that prior knowledge, unless we encode and input it.

    Perhaps, if we do supply sufficient prior knowledge to AI/ML, it can be used to distill what we know into all best-possible hypotheses and corresponding experiments. The number of experiments will likely be staggering, even for "high throughout" robotic assays (how many atoms in the Universe, you ask?), and it's up to us to further filter and select the most promising ones.

    With that philosophy, AI/ML is another valuable tool to formalize the differences between what we know and what is unknown/possible.


Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="">FoS</a> = FoS