Field of Science

Bottom-up and top-down in drug discovery

There are two approaches to discovering new drugs. In one approach drugs fall in your lap from the sky. In the other you scoop them up from the ocean. Let’s call the first the top-down approach and the second the bottom-up approach.

The bottom-up approach assumes that you can discover drugs by thinking hard about them, by understanding what makes them tick at the molecular level, by deconstructing the dance of atoms orchestrating their interactions with the human body. The top-down approach assumes that you can discover drugs by looking at their effects on biological systems, by gathering enough data about them without understanding their inner lives, by generating numbers through trial and error, by listening to what those numbers are whispering in your ear.

To a large extent, the bottom-up approach assumes knowledge while the top-down approach assumes ignorance. Since human beings have been ignorant for most of their history, for most of the recorded history of drug discovery they have pursued the top-down approach. When you don't know what works, you try things out randomly. The Central Americans found out by accident that chewing the bark of the Cinchona plant relieved them of the afflictions of malaria. Through the Middle Ages and beyond, people who called themselves physicians prescribed a witches' brew of substances ranging from sulfur to mercury to arsenic to try to cure a corresponding witches' brew of maladies, from consumption to the common cold. More often than not these substances killed patients as readily as the diseases themselves.

The top-down approach may seem crude and primitive, and it was primitive, but it worked surprisingly well. For the longest time it was exemplified by the ancient medical systems of China and India – one of these systems delivered an antimalarial medicine that helped its discoverer bag a Nobel Prize for Medicine. Through fits and starts, scores of failures and a few solid successes, the ancients discovered many treatments that were often lost to the dust of ages. But the philosophy endured. It endured right up to the early 20th century when the German physician Paul Ehrlich tested 604 chemical compounds - products of the burgeoning dye industry pioneered by the Germans - and found that compound 606 worked against syphilis. Syphilis was a disease that so bedeviled people since medieval times that it was often a default diagnosis of death, and cures were desperately needed. Ehrlich's 606 was arsenic-based, unstable and had severe side effects, but the state of medicine was such back then that anything was regarded as a significant improvement over the previous mercury-based compounds.

It was with Ehrlich's discovery that drug discovery started to transition to a more bottom-up discipline, systematically trying to make and test chemical compounds and understand how they worked at the molecular level. But it still took decades before the approach bore fruition. For that we had to await a nexus of great and concomitant advances in theoretical and synthetic organic chemistry, spectroscopy and cell and molecular biology. These advances helped us figure out the structure of druglike organic molecules, they revealed the momentous fact that drugs work by binding to specific target proteins, and they also allowed us to produce these proteins in useful quantity and uncover their structures. Finally at the beginning of the 80s, we thought we had enough understanding of chemistry to design drugs by bottom-up approaches, "rationally", as if everything that had gone on before was simply the product of random flashes of unstructured thought. The advent of personal computers (Apple and Microsoft had launched in the late 70s) and their immense potential left people convinced that it was only a matter of time before drugs were "designed with computers". What the revolution probably found inconvenient to discuss much was that it was the top-down analysis which had preceded it that had produced some very good medicines, from penicillin to thorazine.

Thus began the era of structure-based drug design which tries to design drugs atom by atom from scratch by knowing the protein glove in which these delicate molecular fingers fit. The big assumption is that the hand that fits the glove can deliver the knockout punch to a disease largely on its own. An explosion of scientific knowledge, startups, venture capital funding and interest from Wall Street fueled those heady times, with the upbeat understanding that once we understood the physics of drug binding well and had access to more computing power, we would be on our way to designing drugs more efficiently. Barry Werth's book "The Billion-Dollar Molecule" captured this zeitgeist well; the book is actually quite valuable since it's a rare as-it-happens study and not a more typical retrospective one, and therefore displays the same breathless and naive enthusiasm as its subjects.

And yet, 30 years after the prophecy was enunciated in great detail and to great fanfare, where are we? First, the good news. The bottom-up approach did yield great dividends - most notably in the field of HIV protease inhibitor drugs against AIDS. I actually believe that this contribution from the pharmaceutical industry is one of the greatest public services that capitalism has performed for humanity. Important drugs for lowering blood pressure and controlling heartburn were also the beneficiaries of top-down thinking. 

The bad news is that the paradigm fell short of the wild expectations that we had from it. Significantly short in fact. And the reason is what it always has been in the annals of human technological failure: ignorance. Human beings simply don't know enough about perturbing a biological system with a small organic molecule. Biological systems are emergent and non-linear, and we simply don't understand how simple inputs result in complex outputs. Ignorance was compounded with hubris in this case. We thought that once we understood how a molecule binds to a particular protein and optimized this binding, we had a drug. But what we had was simply a molecule that bound better to that protein; we still worked on the assumption that that protein was somehow critical for a disease. Also, a molecule that binds well to a protein has to overcome enormous other hurdles of oral bioavailability and safety before it can be called a drug. So even if - and that's a big if - we understood the physics of drug-protein binding well, we still wouldn't be any closer to a drug, because designing a drug involves understanding its interactions with an entire biological system and not just with one or two proteins.

In reality, diseases like cancer manifest themselves through subtle effects on a host of physiological systems involving dozens if not hundreds of proteins. Cancer especially is a wily disease because it activates cells for uncontrolled growth through multiple pathways. Even if one or two proteins were the primary drivers of this process, simply designing a molecule to block their actions would be too simplistic and reductionist. Ideally we would need to block a targeted subset of proteins to produce optimum effect. In reality, either our molecule would not bind even one favored protein sufficiently and lack efficacy, or it would bind the wrong proteins and show toxicity. In fact the reason why no drug can escape at least a few side effects is precisely because it binds to many other proteins other than the one we intended it to.

Faced with this wall of biological complexity, what do we do? Ironically, what we had done for hundreds of years, only this time armed with far more data and smarter data analysis tools. Simply put, you don't worry about understanding how exactly your molecule interacts with a particular protein; you worry instead only about its visible effects, about how much it impacts your blood pressure or glucose levels, or how much it increases urine output or metabolic activity. These endpoints are agnostic of knowledge of the detailed mechanism of action of a drug. You can also compare these results across a panel of drugs to try to decipher similarities and differences.

This is top-down drug design and discovery, writ large in the era of Big Data and techniques from computer science like machine learning and deep learning. The field is fundamentally steeped in data analysis and takes advantage of new technology that can measure umpteen effects of drugs on biological systems, greatly improved computing power and hardware to analyze these effects, and refined statistical techniques that can separate signal from noise and find trends.

The top-down approach is today characterized mainly by phenotypic screening and machine learning. Phenotypic screening involves simply throwing a drug at a cell, organ or animal and observing its effects. In its primitive form it was used to discover many of today's important drugs; in the field of anxiety medicine for instance, new drugs were discovered by giving them to mice and simply observing how much fear the mice exhibited toward cats. Today's phenotypic screening can be more fine-grained, looking at drug effects on cell size, shape and elasticity. One study I saw looked at potential drugs for wound healing; the most important tool in that study was a high-resolution camera, and the top-down approach manifested itself through image analysis techniques that quantified subtle changes in wound shape, depth and appearance. In all these cases, the exact protein target the drug might be interacting with was a distant horizon and an unknown. The large scale, often visible, effects were what mattered. And finding patterns and subtle differences in these effects - in images, in gene expression data, in patient responses - is what the universal tool of machine learning is supposed to do best. No wonder that every company and lab from Boston to Berkeley is trying feverishly to recruit data and machine learning scientists and build burgeoning data science divisions. These companies have staked their fortunes on a future that is largely imaginary for now.

Currently there seems to be, if not a war, at least a simmering and uneasy peace between top-down and bottom-up approaches in drug discovery. And yet this seems to be mainly a fight where opponents set up false dichotomies and straw men rather than find complementary strengths and limitations. First and foremost, the ultimate proof of the pudding is in the eating, and machine learning's impact on the number of approved new drugs still has to be demonstrated; the field is simply too new. The constellation of techniques has also proven itself to be much better at solving certain problems (mainly image recognition and natural language processing) than others. A lot of early stage medicinal chemistry data contains messy assay results and unexpected structure-activity relationships (SAR) containing "activity cliffs" in which a small change in structure leads to a large change in activity. Machine learning struggles with these discontinuous stimulus-response landscapes. Secondly, there are still technical issues in machine learning such as working with sparse data and noise that have to be resolved. Thirdly, while the result of a top-down approach may be a simple image or change in cell type, the number of potential factors that can lead to that result can be hideously tangled and multifaceted. Finally, there is the perpetual paradigm of garbage-in-garbage-out (GIGO). Your machine learning algorithm is only as good as the data you feed it, and chemical and biological data are notoriously messy and ill-curated; chemical structures might be incorrect, assay conditions might differ in space and time, patient reporting and compliance might be sporadic and erroneous, human error riddles data collection, and there might be very little data to begin with. The machine learning mill can only turn data grist into gold if what it's provided with is grist in the first place.

In contrast to some of these problems with the top-down paradigm, bottom-up drug design has some distinct advantages. First of all, it has worked, and nothing speaks like success. Also operationally, since you are usually looking at the interactions between a single molecule and protein, the system is much simpler and cleaner, and the techniques to study it are less prone to ambiguous interpretation. Unlike machine learning which can be a black box, here you can understand exactly what's going on. The amount of data might be smaller, but it may also be more targeted, manageable and reproducible. You don't usually have to deal with the intricacies of data fitting and noise reduction or the curation of data from multiple sources. Ultimately at the end of the day, if like HIV protease your target does turn out to be the Achilles heel of a deadly disease like AIDS, your atom-by-atom design can be as powerful as Thor's hammer. There is little doubt that bottom-up approaches have worked in selected cases, where the relevance of the target has been validated, and there is little doubt that this will continue to be the case.

Now it's also true that just like with top-downers, bottom-uppers have had their burden of computational problems and failures, and both paradigms have been subjected to their fair share of hype. Starting from that "designing drugs using computers" headline in 1981, people have understood that there are fundamental problems in modeling intermolecular interactions: some of these problems are computational and in principle can be overcome with better hardware and software, but others like the poor understanding of water molecules and electrostatic interactions are fundamentally scientific in nature. The downplaying of these issues and the emphasizing of occasional anecdotal successes has led to massive hype in computer-aided drug design. But in case of machine learning it's even worse in some sense since hype from applications of the field in other human endeavors is spilling over in drug discovery too; it seems hard for some to avoid claiming that your favorite machine learning system is going to soon cure cancer if it's making inroads in trendy applications like self-driving cars and facial recognition. Unlike machine learning though, the bottom-up take has at least had 20 years of successes and failures to draw on, so there is a sort of lid on hype that is constantly waved by skeptics.

Ultimately, the biggest advantage of machine learning is that it allows us to bypass detailed understanding of complex molecular interactions and biological feedback and work from the data alone. It's like a system of psychology that studies human behavior purely based on stimuli and responses of human subjects, without understanding how the brain works at a neuronal level. The disadvantage is that the approach can remain a black box; it can lead to occasional predictive success but at the expense of understanding. And a good open question is to ask how long we can keep on predicting without understanding. Knowing how many unexpected events or "Black Swans" exist in drug discovery, how long can top-down approaches keep performing well?

The fact of the matter is that both top-down and bottom-up approaches to drug discovery have strengths and limitations and should therefore be part of an integrated approach to drug discovery. In fact they can hopefully work well together, like members of a relay team. I have heard of at least one successful major project in a leading drug firm in which top down phenotypic screening yielded a valuable hit which then, midstream, was handed over to a bottom-up team of medicinal chemists, crystallographers and computational chemists who deconvoluted the target and optimized the hit all the way to an NDA (New Drug Application). At the same time, it was clear that the latter would not have been made possible without the former. In my view, the old guard of the bottom-up school has been reluctant and cynical in accepting membership in the guild for the young Turks of the top-down school, while the Turks have been similarly guilty of dismissing their predecessors as antiquated and irrelevant. This is a dangerous game of all-or-none in the very complex and challenging landscape of drug discovery and development, where only multiple and diverse approaches are going to allow us to discover the proverbial needle in the haystack. Only together will the two schools thrive, and there are promising signs that they might in fact be stronger together. But we'll never know until we try.

(Image: BenevolentAI)


  1. Irrespective of top down or bottom up my experience working in big pharma suggest to me that "serendipitous" discovery happens all too often despite all the advances.

  2. What happens when quantum computers make their appearance, quite soon presumably. Won't the bottom up approach which relies on a fine grained understanding of the molecules from the tiniest levels -get a spectacular boost?

  3. Great Article! Finally had a chance to read it!


Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="">FoS</a> = FoS