You don’t use a hammer to do impressionistic painting. And
although you technically could, you won’t use a spoon for drinking beer. The
domains of applicability of these tools are different, in terms of quality and
quantity.
The idea of domains of applicability (DOA) is an idea that
is somehow both blatantly simple as well as easily forgotten. As the examples
above indicate, the definition is apparent; every tool, every idea, every
protocol, has a certain reach. There are certain kinds of data for which it
works well and certain others for which it fails miserably. Then there are the
most interesting cases; pieces of data on the boundary between applicable and
non-applicable. These often serve as real testing grounds for your tool or idea.
Often the DOA of a tool becomes clear only when it’s been
used for a long time on enough number of test cases. Sometimes the DOA reveals
itself accidentally, when you are trying to use the tool on data for which it’s
not really designed. That way can lie much heartbreak. It’s better instead to
be constantly aware of the DOA for your techniques and deliberately stress-test
its range. The DOA can also inform you about the sensitivity of your model; for instance, for a certain model a small change from a methyl to a hydroxy might fall within its DOA, while for another it might exceed it.
The development and use of molecular docking, an important
part of bottom-up drug discovery, makes the idea of DOA clear. By now there’s an
extensive body of knowledge about docking, developed over at least twenty
years, which makes it clear when docking works well and when you can trust it
less. For example, docking works quite well in reproducing known crystal poses
and generating new poses when the protein is well resolved and relatively
rigid; when there are no large-scale conformational changes; when there are no
unusual interactions in the binding site; when water molecules aren’t playing
any weird or special role in the binding. On the other hand, if you are doing
docking on a homology model built on sparse homology that features a highly
flexible loop and several bridging water molecules as key binding elements, all
bets are off. You have probably stepped way outside the DOA of docking. Then
there are the intermediate and in many ways the most interesting cases;
somewhat rigid proteins, just one or two water molecules, a good knowledge base
around that protein that tells you what works. In these cases, one can be
cautiously optimistic and make some testable hypotheses.
Fortunately there are ways to pressure-test the DOA of a
favorite technique. If you suspect that the system under consideration does not
fall within the DOA, there are simple tests you can run and questions you can
ask. The first set of questions concerns the quality and quantity of data that
is available. This data falls into two categories; data that was used for
training the method and the data that you actually have in your test case. If
the test data closely matches the training data then there’s a fair chance that
your DOA is covered. If not, you ask the second important question: What’s the
quickest way I can actually test the DOA? Usually the quickest way to test any
hypothesis in early stage drug discovery is to propose a set of molecules that
your model suggests as top candidates. As always, the easier these are to make,
the faster you can test them and the better you can convince chemists to make
them in the first place. It might also be a good idea to sneak in a molecule
that your model says has no chance in hell of working. If neither of these
predictions come true within a reasonable margin, you clearly have a problem,
either with the data itself or with your DOA.
There are also ways to fix the DOA of a technique, but
because that task involves generating more training data and tweaking the code
accordingly, it’s not something that most end users can do. In case of docking
for instance, a DOA failure might result from inadequate sampling or inadequate
scoring. Both of these issues can be fixed in principle through better data and
better force fields, but that’s really something only a methods developer can
do.
When a technique is new it always struggles to establish its
DOA. Unfortunately both technical users and management don’t understand this
and can immediately start proclaiming the method as a cure for all your
problems; they think that just because it has worked well on certain cases it
will do so on most others. The lure of publicity, funding and career
advancement can further encourage this behavior. That certainly happened with
docking and other bottom-up drug design tools in the Wild West of the late 80s
and early 90s. I believe that something similar is happening with machine
learning and deep learning now.
For instance it’s well known that when it comes to problems
like image recognition and natural language processing (NLP), machine learning
can do extremely well. In that case one is clearly operating well within the
DOA. But what about modeling traffic patterns or brain activity or social
networks or SAR data for that matter? What is the DOA of machine learning in
these areas? The honest answer is that we don’t know. Now some users and
developers of machine learning acknowledge this and are actually trying to
circumscribe the right DOA by pressure-testing the algorithms. Others
unfortunately simply take it for granted that more data must translate to
better accuracy; in other words, they assume that the DOA is purely dictated by
data quantity. This is true only in a narrow sense. Yes, less data can
certainly hamper your efforts, but more data is neither always necessary and
certainly not sufficient. You can have as much data as possible, but your
technique can still be operating in the wrong DOA. For example, the presence of
a discontinuous landscape of molecular activity places limitations on using
machine learning in medicinal chemistry. Would more data ameliorate this
problem? We don’t know yet, but this kind of thinking would not be inconsistent
with the new religion of “dataism” which says that data is everything.
There are many opportunities to test the DOA of top-down approaches
like deep learning in drug discovery and beyond. But to do this, both
scientists and management must have realistic goals about the efficacy of the
techniques, and more importantly must honestly acknowledge that they don’t know
the DOA in the first place. In other words, they need to honestly acknowledge
that they don’t yet know whether the technique will work for their specific
problem. Unfortunately these kinds of decisions and proclamations are severely
subject to hype and the enticement of dollars and drama. Machine learning is
seen as a technique with such an outsize potential impact on diverse areas of
our lives, that many err on the side of wishful thinking. Companies have sunk
billions of dollars into the technology; how many of them would be willing to
admit that the investment was really based on hope rather than reality?
In this context, machine learning can draw some useful
lessons from the cautionary tale of drug design in the 80s, when companies were
throwing money from all directions at molecular modeling. Did that money result
in important lessons learnt and egos burnt? Indeed it did, but one might argue
that computational chemists are still suffering from the negative effects of
that hype, both in accurately using their techniques and in communicating the
true value of those techniques to what seem like perpetually skeptical Nervous
Nellies and Debbie Downers. Machine learning could go down the same route and
it would be a real tragedy, not only because the technique is promising but
because it could potentially impact many other aspects of science, technology,
engineering and business and not just pharmaceutical development. And it might
all happen because we were unable or unwilling to acknowledge the DOA of our
methods.
Whether it’s top-down or bottom-up approaches, we can all ultimately benefit from Feynman’s words: “For a successful technology, reality has to take precedence over public relations, for Nature cannot be fooled.” For starters, let’s try not to fool each other.
Whether it’s top-down or bottom-up approaches, we can all ultimately benefit from Feynman’s words: “For a successful technology, reality has to take precedence over public relations, for Nature cannot be fooled.” For starters, let’s try not to fool each other.