There is a new multicenter study on the discovery of some new kinase inhibitor compounds for the kinase DDR1 that has been making the rounds. Using a particular flavor of generative models, the authors derive a few potent and selective inhibitors for DDR1, a kinase target that has been implicated in fibrosis.
The paper is an interesting application of generative deep learning models to kinase inhibitor discovery. The authors start with six training datasets including ZINC and several patents along with a negative dataset of non-kinase inhibitors. After using their generative reinforcement learning model and filtering out reactives and clustering, they select 40 random molecules that have a less than 0.5 Tanimoto similarity to vendor stocks and the patent literature, and pick 6 out of these for testing. Four of the six compounds are indicated as showing an improvement in the potency against DDR1, although it seems that for two of these, the potency is little improved relative to the parent compound (10 and 21 nM vs 15 nM, which is well within the two or threefold margin of error in most biological assays). The selectivity of two of the compounds for the undesirable isoform DDR2 is also essentially the same (649 nM vs 1000 nM and 278 nM vs 162 nM; again within the twofold error margin of the assay). So from a potency standpoint, the algorithm seems to find equipotent inhibitors at best; given that these four molecules were culled from a starting set of 30,000, that indicates a hit rate of 0.01%. Good selectivity against a small kinase panel is demonstrated, but selectivity against a larger panel of off-targets is not indicated. There also don't seem to be tests for aggregation or non-specific behavior; computational techniques in drug discovery are well known to produce a surfeit of false positives. It would also be really helpful to get some SAR for these compounds to know if they are on-off non-specific binders or actual lead compounds.
Now, even equipotent inhibitors can be useful if they show good ADME properties or evidence scaffold hops. The group tested the inhibitors in liver microsomal assays, and they seem to have similar stability as a group of non-kinase inhibitor controls, although it would be good to see some accompanying data for DDR inhibitors next to this data. They also tested one of the compounds in a rodent model, and it seems to show satisfactory half lives; it's again not clear how these compare to other DDR inhibitors. Finally, they build a pharmacophore-based binding model of the inhibitor and compare it to a similar quantum mechanical model, but there is no experimental data (from NMR or mutagenesis for instance) which would allow a good experimental validation of this binding pose. Pharmacophore models are again notorious for producing false positives, and it's important to demonstrate that the pharmacophore in fact does not also fit the negative data.
The paper claims to have discovered the inhibitors "in 21 days" and tested them in 46. The main issue here - and this is by no means a critique of just this paper - is not that the discovered inhibitors show very modest improvement at best over the reference; it's that there is no baseline comparison, no null models, that can tell us what the true value of the technique is. This has been a longstanding complaint in the computational community. For instance, could regular docking followed by manual picking have found the same compounds in the same time? What about simple comparisons with property-based metrics or 2D metrics? And could a team of expert medicinal chemists brainstorming over beer have looked at the same data and come up with the same conclusions much sooner? I am glad that the predictions were actually tested - even this simple follow-up is often missing from computational papers - but 21 days is not as short as it sounds if you start with a vast amount of already-existing and curated data from databases and patents, and if simpler techniques can find the same results sooner. And the reliance on vast amounts of data is of course a well-known Achilles heel for deep learning techniques, so these techniques will almost certainly not work well on new targets with a paucity of data.
Inhibitor discovery is hardly a new problem for computational techniques, and any new method is up against a whole phalanx of structure and ligand-based methods that have been developed over the last 30+ years. There's a pretty steep curve to surmount if you actually want to proclaim your latest and greatest AI technique as a novel application. As it stands, the issue is not that the generative methods didn't discover anything, it's that it's impossible to actually judge their value because of an absence of baseline comparisons.
The AI hype machine is out in absolute full force on this one (see here, here and especially here for instance). I simply don't understand this great desire to proclaim every advance in a field as a breakthrough without simply calling it a useful incremental step or constructively criticizing it. And when respected sources like WIRED and Forbes proclaim that there's been a breakthrough in new drug discovery, the non-scientific public which is unfamiliar with IC50 curves or selectivity profiles or the fact that there's a huge difference between a drug and a lead will likely think that a new age of drug discovery is upon us. There's enough misleading hype about AI to go around, and adding more to the noise does both the scientific and the non-scientific community a disservice.
Longtime cheminformatics expert Andreas Bender has some similar thoughts here, and of course, Derek at In the Pipeline has an excellent, detailed take here.