Field of Science

What do we need? A Jarvis for molecular modeling. When do we need him? Yesterday.

"Jarvis, is this model statistically validated or am I just
making stuff up?"
One of the major and somewhat underappreciated characters in the Iron Man franchise of movies is Jarvis, Tony Stark’s loyal AI system and indispensable assistant. Jarvis in fact may be the principal character in Iron Man apart from Iron Man himself, since he has saved Tony Stark's life more than once. For our purposes though, Jarvis’s key function is in reducing Iron man’s ideas to practice. In the first Iron Man movie, after Tony Stark cobbles together a primitive iron man suit from a bunch of scraps in a cave, it’s Jarvis who helps him turn his newfound ideas into something far more sophisticated. Jarvis has two key capabilities that help him help Stark. One is a superior natural language processing capability that allows him to understand exactly what his creator wants. The other is access to a vast repository of data regarding specs, blueprints, system hybrids and other paraphernalia which he can summon on demand.

What I find most interesting though is Jarvis’s highly interactive nature. He seems to anticipate much of Tony Stark’s thinking, asking questions like “Are you sure you don’t want to use carbon nanotubes instead of titanium, Sir?” or “Do you want me to pull up [system X] which is similar to [system Y] that you are trying to build?”. This interactive capability not only speeds up the progress of projects that Stark is working on, but it also opens up new avenues that he himself may not have thought about.

Why was I thinking about Jarvis? Here’s the thing: I thought about Jarvis when I realized how woefully, drastically uninteractive all of our molecular modeling software is. This criticism does not apply to a specific program or set of tools; it permeates the entire panoply of structure and ligand based drug design tools employed by computational and medicinal chemists. One can debate the pros and cons of specific algorithms for molecular dynamics or docking or QSAR, but I think the one thing we should be able to agree on is that none of these tools anticipate our needs or talk to us even in simple ways. In the sense of being interactive, our modeling software is as primitive as transportation was in the eighteenth century; it sits there, listless and passive, waiting for us to push the buttons and pull the levers.

This is an odd state of affairs. Today we expect most of our electronic devices and software to interact with us; Microsoft Office had in fact implemented their primitive version of Jarvis – the unfortunate and doomed ‘Clippy’ – in Windows back in the 90s. Clippy did not stay around forever, but in principle he was asking the right questions (“It seems you are writing a letter”). What we need is a more sophisticated form of Clippy for our modeling software.

What would a Jarvis or advanced Clippy for modeling look like? For one thing, it would be able to look at a protein or ligand of interest and immediately have at hand a list of similar systems drawn from the academic, industrial and patent literature. Its speed and efficiency assumes ready – instant in fact – access to databases like the PDB, GVK and ChemBL. This task by itself shouldn’t be a problem since it only involves an upfront investment of effort. Once this data has been acquired, our hypothetical Jarvis should then be able to identify the tasks we are embarking on and suggest enhancements, automation or modifications to those tasks. For instance, we may want to identify or probe binding sites in a protein which we want to inhibit. In that case, once we start to run a program like Schrodinger’s SiteMap which accomplishes this, Jarvis should immediately be able to chime in, identify what we are doing, and then retrieve homologous proteins with similar binding sites. It should similarly be able to parse a ligand on the screen and identify similar ligands depending on what we are doing with it. For instance if we were docking that ligand, it would draw up a list of shape-based binding pocket pharmacophores which are similar to the ones in our protein, telling us what the probability of our ligand inhibiting those other proteins might be. This would give us some idea of the protein off-target interactions which our ligand might be expected to have. All this would be displayed as attractively as the graphics in Tony Stark’s basement lab.

In the ligand-based design sphere, a Jarvis for modeling would be especially useful in building QSAR models. One of the biggest pitfalls of QSAR - and in fact of all of computational chemistry - is the existence of spurious or artifactual correlations with biological activity. There are innumerable case studies where people have correlated biological activity of molecules with any number or combination of chemically impenetrable and mathematically dense parameters without first checking whether the activity correlates equally well or better with very simple parameters such as molecular weight, hydrophobicity (logP) or polar surface area. A Jarvis for modeling would make sure that whenever you start building any kind of a QSAR or similar model, he calculates and flashes in front of you a few simple correlations that allow you to make sure that you are not missing simple relationships; only once you are sure that the simple correlations don’t hold up would it make sense to grab a non-linear combination of your favorite ten-dimensional topological index and a parameter from a relativistic quantum chemical calculation. Visualization of this data in a clean, comprehensive and attractive format would again be a key attribute of such a Jarvis.

The desirability of having a Jarvis for molecular modeling dovetails with similar general thoughts I have had about the lack of sophistication in modeling software. I always find it odd that we expect a lot of interactive sophistication in our iPhones and our Samsung watches, and yet we somehow seem to be content in dealing with software for molecular modeling that simply waits for us to push buttons. Generally speaking we don’t apply the same standards to molecular modeling software which we apply to our smartphones, tablets and other devices. We take Siri and Cortana for granted, and yet we don’t demand Siris, Cortanas and Jarvisis in our docking programs. It would be a while before an entity as intelligent as Jarvis is able to help us do better modeling, but that does not mean we don’t start trying right now. I think that demanding this kind of sophistication from scientists, developers and vendors will do a lot of good for the entire community. It’s high time we did.


  1. A prerequisite for a recommender system would probably be a molecular viewer that supports annotation from databases. Aquaria seems to be moving in this direction.

  2. We did experiment with the Amazon "People Who Bought … Also Bought …” algorithm in reagent selection awhile ago ( Would love to see a wider implementation, as you suggest Ash.

  3. Both of these are really interesting - thanks! Will check them out.


Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="">FoS</a> = FoS