"Jarvis, is this model statistically validated or am I just making stuff up?" |
One of the major and somewhat underappreciated characters in
the Iron Man franchise of movies is Jarvis, Tony Stark’s loyal AI system and indispensable assistant.
Jarvis in fact may be the principal character in Iron Man apart from Iron Man
himself, since he has saved Tony Stark's life more than once. For our purposes
though, Jarvis’s key function is in reducing Iron man’s ideas to practice. In
the first Iron Man movie, after Tony Stark cobbles together a primitive iron
man suit from a bunch of scraps in a cave, it’s Jarvis who helps him turn his
newfound ideas into something far more sophisticated. Jarvis has two key
capabilities that help him help Stark. One is a superior natural language
processing capability that allows him to understand exactly what his creator
wants. The other is access to a vast repository of data regarding specs,
blueprints, system hybrids and other paraphernalia which he can summon on
demand.
What I find most interesting though is Jarvis’s highly
interactive nature. He seems to anticipate much of Tony Stark’s thinking,
asking questions like “Are you sure you don’t want to use carbon nanotubes
instead of titanium, Sir?” or “Do you want me to pull up [system X] which is
similar to [system Y] that you are trying to build?”. This interactive
capability not only speeds up the progress of projects that Stark is working
on, but it also opens up new avenues that he himself may not have thought
about.
Why was I thinking about Jarvis? Here’s the thing: I thought
about Jarvis when I realized how woefully, drastically uninteractive all of our
molecular modeling software is. This criticism does not apply to a specific
program or set of tools; it permeates the entire panoply of structure and
ligand based drug design tools employed by computational and medicinal chemists.
One can debate the pros and cons of specific algorithms for molecular dynamics
or docking or QSAR, but I think the one thing we should be able to agree on is
that none of these tools anticipate our needs or talk to us even in simple ways.
In the sense of being interactive, our modeling software is as primitive as
transportation was in the eighteenth century; it sits there, listless and
passive, waiting for us to push the buttons and pull the levers.
This is an odd state of affairs. Today we expect most of our
electronic devices and software to interact with us; Microsoft Office had in
fact implemented their primitive version of Jarvis – the unfortunate and doomed
‘Clippy’ – in Windows back in the 90s. Clippy did not stay around forever, but
in principle he was asking the right questions (“It seems you are writing a
letter”). What we need is a more sophisticated form of Clippy for our modeling
software.
What would a Jarvis or advanced Clippy for modeling look
like? For one thing, it would be able to look at a protein or ligand of
interest and immediately have at hand a list of similar systems drawn from the
academic, industrial and patent literature. Its speed and efficiency assumes
ready – instant in fact – access to databases like the PDB, GVK and ChemBL.
This task by itself shouldn’t be a problem since it only involves an upfront
investment of effort. Once this data has been acquired, our hypothetical Jarvis
should then be able to identify the tasks we are embarking on and suggest
enhancements, automation or modifications to those tasks. For instance, we may
want to identify or probe binding sites in a protein which we want to inhibit.
In that case, once we start to run a program like Schrodinger’s SiteMap which
accomplishes this, Jarvis should immediately be able to chime in, identify what
we are doing, and then retrieve homologous proteins with similar binding sites.
It should similarly be able to parse a ligand on the screen and identify
similar ligands depending on what we are doing with it. For instance if we were
docking that ligand, it would draw up a list of shape-based binding pocket
pharmacophores which are similar to the ones in our protein, telling us what
the probability of our ligand inhibiting those other proteins might be. This
would give us some idea of the protein off-target interactions which our ligand
might be expected to have. All this would be displayed as attractively as the
graphics in Tony Stark’s basement lab.
In the ligand-based design sphere, a Jarvis for modeling
would be especially useful in building QSAR models. One of the biggest pitfalls
of QSAR - and in fact of all of computational chemistry - is the existence of
spurious or artifactual correlations with biological activity. There are
innumerable case studies where people have correlated biological activity of
molecules with any number or combination of chemically impenetrable and
mathematically dense parameters without first checking whether the activity
correlates equally well or better with very simple parameters such as molecular
weight, hydrophobicity (logP) or polar surface area. A Jarvis for modeling
would make sure that whenever you start building any kind of a QSAR or similar
model, he calculates and flashes in front of you a few simple correlations that
allow you to make sure that you are not missing simple relationships; only once
you are sure that the simple correlations don’t hold up would it make sense to
grab a non-linear combination of your favorite ten-dimensional topological
index and a parameter from a relativistic quantum chemical calculation.
Visualization of this data in a clean, comprehensive and attractive format
would again be a key attribute of such a Jarvis.
The desirability of having a Jarvis for molecular modeling dovetails with similar general thoughts I have had about the lack of sophistication in modeling software. I always find it odd that we expect a lot of interactive sophistication in our iPhones and our Samsung watches, and yet we somehow seem to be content in dealing with software for molecular modeling that simply waits for us to push buttons. Generally speaking we don’t apply the same standards to molecular modeling software which we apply to our smartphones, tablets and other devices. We take Siri and Cortana for granted, and yet we don’t demand Siris, Cortanas and Jarvisis in our docking programs. It would be a while before an entity as intelligent as Jarvis is able to help us do better modeling, but that does not mean we don’t start trying right now. I think that demanding this kind of sophistication from scientists, developers and vendors will do a lot of good for the entire community. It’s high time we did.
The desirability of having a Jarvis for molecular modeling dovetails with similar general thoughts I have had about the lack of sophistication in modeling software. I always find it odd that we expect a lot of interactive sophistication in our iPhones and our Samsung watches, and yet we somehow seem to be content in dealing with software for molecular modeling that simply waits for us to push buttons. Generally speaking we don’t apply the same standards to molecular modeling software which we apply to our smartphones, tablets and other devices. We take Siri and Cortana for granted, and yet we don’t demand Siris, Cortanas and Jarvisis in our docking programs. It would be a while before an entity as intelligent as Jarvis is able to help us do better modeling, but that does not mean we don’t start trying right now. I think that demanding this kind of sophistication from scientists, developers and vendors will do a lot of good for the entire community. It’s high time we did.
A prerequisite for a recommender system would probably be a molecular viewer that supports annotation from databases. Aquaria seems to be moving in this direction.
ReplyDeleteWe did experiment with the Amazon "People Who Bought … Also Bought …” algorithm in reagent selection awhile ago (http://www.eyesopen.com/2010_EuroCUP_presentations/EuroCUP4_Bostrom.pdf). Would love to see a wider implementation, as you suggest Ash.
ReplyDeleteBoth of these are really interesting - thanks! Will check them out.
ReplyDelete