In
the last post, I talked about how cognitive biases would be especially
prevalent in drug discovery and development because of the complex, information-poor,
tightly time-bound and financially-incentivized nature of the field. I talked
about confirmation bias which riddles almost all human activity and which can
manifest itself in drug discovery in the form of highlighting positive data for
one’s favorite belief, metric or technique and rejecting negative data that
does not agree with this belief.
In this post, I will mention a few more
important cognitive biases. All of them are classic examples of getting carried
away by limited patches of data and ignoring important information; often
information on much larger samples. It’s worth noting that not all of them are
equally important; a bias that’s more applicable in other parts of life may be
less applicable in drug discovery, and vice versa. It’s also interesting to see
that a given case may present more than one bias; because the human mind
operates in multiple modes, biases often overlap. In the next post we will look
at a few more biases related to statistics and comparisons.
Anchoring: Anchoring is the tendency
to rely too much on one piece of information or trait, especially if it appears
first. In some sense it’s a ubiquitous phenomenon, and it can also be subtle;
it can be influenced by random things we observe and hear. A classic anchoring
experiment was done by Kahneman and Tversky who showed participants a spinning
wheel that would randomly settle on a number. After the spinning wheel stopped,
the participants were asked what percentage of African countries are part of
the U.N. It turned out that the percentage quoted by the participants was
correlated to the random, unrelated number they saw on the wheel; if they saw a
larger number they quoted a larger percentage, and vice versa. One important
feature of the anchoring effect that this experiment demonstrated was that it
involves random numbers or phenomena
that can be completely irrelevant to
the issue at hand.
It’s
hard to point to specific anchoring biases in drug discovery, but one thing we
know is that scientists can be skewed by numbers all the time, especially if
the numbers are promising and seem very accurate. For instance, being biased by
sparse in vitro affinity data for some early hits, leads or series can blind
you to optimization of downstream properties. People sometimes come around, but
I have seen even experienced medicinal chemists get obsessed with early leads
with very good affinities but poor properties. In general, random promising
numbers relating to affinity, properties, clinical data etc. for particular
sets of compounds can lead one to believing that other similar compounds will
have similar properties, or that those numbers are very relevant to begin with.
As
has been well-documented, “similarity” itself can be a bias since every chemist
for instance will look at different features of compounds to decide whether
they are similar or not. Objective computational similarity comparisons can
diminish this bias a bit, but since there’s no right way of deciding what the “perfect”
computational similarity measure is either (and there’s plenty of misleading
similarity metrics), this solution carries its own baggage.
You
can also be carried away by measurements (often done using fancy instrumentation)
that can sound very accurate; in reality, they are more likely to simply be
precise. This problem is a bigger subset of problems related to what is called
“technological solutionism”. It is the habit of believing in data when it’s
generated by the latest and greatest new experimental or computational
technique. This data can anchor our beliefs about drug behavior and lead us to
extrapolate when we shouldn’t. The key questions to ask in this regard are: Are
the numbers being measured accurate? Do the numbers actually measure the effect
we think they do and is the effect real and statistically significant? Is the
effect actually relevant to my hypothesis or conclusion? That last question is
probably the most important and not asking it can lead you to squander a lot of
time and resources.
Availability
heuristic:
A bias related to anchoring is availability. This is the tendency to evaluate
new information based on information - especially recent information - that can
be easily recalled. In case of drug
discovery, easily recalled information can include early stage data, data
that’s simply easier to gather, data that’s “popular” or data that’s simply
repeated enough number of times, in the literature or by word of mouth. There
are countless reasons and why certain information is easily recalled while
other information is not. They can also be related to non-scientific variables
like emotional impact. Were you feeling particularly happy or sad when you
measured a particular effect? Was the effect validated by groupthink and did it
therefore make you feel vindicated? Was the piece of data described by an
“important” person who you admire? All these factors can contribute to fixing a
certain fact or belief in our minds. Availability of specific information can
cement that information as the best possible or most representative
information.
Everyone
is biased by successful projects they have worked on. They may recall a
particular functional group or synthetic reaction or computational technique
that worked for them and believe that it will work for other cases. This is
also an example of confirmation bias, but the reason it’s an availability
heuristic hinges on the fact that other information - and most notably
information that can counter one’s beliefs - is not easily available. Most of
the times we report positive results and not negative ones; this is a general
problem of the scientific literature and research policy. Sometimes gathering
enough data that would tweak the availability of the result is simply too
expensive to do. That’s understandable, but it also means that we should be
more wary about what we choose to believe.
Finally,
the availability heuristic is particularly strong when a recent decision leads
to an important consequence; perhaps installing a fluorine in your molecule
suddenly led to improved pharmacokinetics, or using a certain formulation led
to better half lives in patients. It is then tempting to believe that the data
that was available is the data that’s generalizable, especially when it has had
a positive emotional impact on your state of mind.
Representativeness: The availability bias is
also closely related to the representativeness fallacy. In one sense the
representativeness fallacy reflects a very common failing of statistical
thinking: the tendency to generalize to a large sample based on a
representative sample. For instance, a set of “rules” for druglike behavior may
have been drawn from a limited set of studies. It would then be tempting to
think that those rules applied to everything that was not tested in
those studies, simply on the basis of similarity to the cases that were tested.
Representativeness can manifest itself in the myriad definitions of “druglike”
used by medicinal chemists as we all as metrics like ligand efficiency.
A
great example of representativeness comes from Tversky and Kahneman’s test
involving personality traits. Consider the following description of an
individual:
“Linda
is a 55-year-old woman with a family. She likes reading and quiet reflection.
Ever since she was a child, Linda has been non-confrontational, and in a tense
situation prefers tactical retreats to open arguments.”
Given
this information, what’s Linda’s likely profession?
a. Librarian
b. Doctor
a. Librarian
b. Doctor
Most
people would pick a. since Linda’s introverted qualities seem to align with
one’s mental image of a librarian. But the answer is really likely to be b.
since there are far more doctors than librarians, so even a tiny percentage of
doctors with the aforementioned traits would constitute a bigger number than
librarians.
Now
let us apply the same kind of reasoning to a description of a not-so-fictional
molecule:
“Molecule
X is a small organic molecule with a logP value of 3.2, 8 hydrogen bond
acceptors, 4 hydrogen bond donors and a molecular weight of 247. It has shown activity
against cancer cells and was discovered at Novartis using a robotics-enabled
phenotypic screening technique with high throughput.”
Given
this information, what is more likely?
a.
Molecule X is “druglike”.
b.
Molecule X is non-druglike.
What
I have just described is the famous Lipinski’s Rule of 5 that lays down certain
rules related to basic physicochemical properties for successful drugs. If you
were dealing with a compound having these properties, you would be more likely
to think it’s a drug. But among the unimaginably vast chemical space of
compounds, the number of druglike compounds is vanishingly small. So there are
far more non-druglike compounds than druglike compounds. Given this fact,
Molecule X is very likely to not be a drug, yet one is likely to use its
description to believe it’s a drug and pursue it.
I
can also bet that the anchoring effect is at work here: the numbers “3.2” for
logP and “247” for molecular weight which sound very accurate as well as the
fact that a fancy technique at a Big Pharma company found this molecule are
more likely to contribute to your belief that you have a great potential drug
molecule at hand. But most of this information is marginally relevant at best
to the real properties of Molecule X. We have again been misled by focusing on
a tiny sample with several irrelevant properties and thinking it to be
representative of a much larger group of data points.
Base rate fallacy: Representativeness leads
us to another statistical fallacy: the base rate fallacy. As we saw above, the
mistake in both the librarian and the druglike examples is that we fail to take
into account the base rate of non-librarians and non-druglike compounds.
The
base rate fallacy is generally defined as the tendency to ignore base rate or
general information and focus only on specific cases. There are at least two examples
in which I can see the base rate fallacy manifesting itself:
1. In overestimating HTS/VS hit rates against certain targets or for certain chemotypes without taking base hit rates into account. In turn, the bias can lead chemists to make fewer compounds than what might be necessary to get a hit.
2. The base rate fallacy is more generally related to ignoring how often you might obtain a certain result by chance; for instance, a correlation between expression levels of two proteins or a drug and a protein, or one involving non-specific effects of a druglike compound. The chance result can then feed into the other biases described above like representativeness or availability.
1. In overestimating HTS/VS hit rates against certain targets or for certain chemotypes without taking base hit rates into account. In turn, the bias can lead chemists to make fewer compounds than what might be necessary to get a hit.
2. The base rate fallacy is more generally related to ignoring how often you might obtain a certain result by chance; for instance, a correlation between expression levels of two proteins or a drug and a protein, or one involving non-specific effects of a druglike compound. The chance result can then feed into the other biases described above like representativeness or availability.
Anchoring,
availability, representativeness and the base rate fallacy are classic examples
of both extrapolating from a limited amount of information and ignoring lots of
unknown information. They speak to the shortcuts that our thinking takes when
trying to quickly conclude trends, rules and future directions of inquiry based
on incomplete data. A lot of the solutions to these particular biases involve
generating more data or finding it in the literature. Unfortunately this is not
always an achievable goal in the fast-faced and cash-strapped environment of
drug discovery. In that case, one should at least identify the most important
pieces of data one would need to gather in order to update or reject a
hypothesis. For example, one way to overcome the base rate fallacy is to calculate
what kind of sampling might be necessary to improve the confidence in the data
by a certain percentage. If all else fails, one must then regard the data or
belief that he or she has as highly tentative and constantly keep on looking
for evidence that might shore up other beliefs.
Cognitive
biases are a very human construct, and they are so relevant to drug discovery
and science in general because these are very human enterprises. In the ideal
world of our imagination, science is an objective process of finding the truth
(and of discovering drugs). In the real world, science is a struggle between
human fallibility and objective reality. Whether in drug discovery or
otherwise, at every step a scientist is struggling to square the data with the
biases in his or her mind. Acknowledging these biases and constantly
interrogating them is a small first step in at least minimizing their impact.
No comments:
Post a Comment
Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS