Field of Science

The devil under the hood: To look or not to look?

Modern biology and chemistry would be unthinkable without the array of instrumental techniques at their disposal; indeed, one can make a case that it was new methods (think NMR, x-ray crystallography, PCR) rather than new concepts that were really responsible for revolutions in these disciplines. The difference between a good paper and a great paper is sometimes the foolproof confirmation of a decisive concept, often made possible only by the application of a novel technique.

Yet the onslaught of these methods have brought with them the burden of responsibility. Ironically, increasing user-friendliness of the tools has only exacerbated this burden. Today it's all too easy to press a button and communicate a result which may be utter nonsense. In a recent article in Nature titled "Research tools: Understand how it works", Daniel Piston from Vanderbilt laments the fact that many modern instruments and techniques have turned into black boxes that are being used by students and researchers without an adequate understanding of how they work. While acknowledging the undoubted benefits that automation has brought to the research enterprise, Piston points out the flip side:

Unfortunately, this scenario is becoming all too common in many fields of science: researchers, particularly those in training, use commercial or even lab-built automated tools inappropriately because they have never been taught the details about how they work. Twenty years ago, a scientist wanting to computerize a procedure had to write his or her own program, which forced them to understand every detail. If using a microscope, he or she had to know how to make every adjustment. Today, however, biological science is replete with tools that allow young scientists simply to press a button, send off samples or plug in data — and have a result pop out. There are even high-throughput plate-readers that e-mail the results to the researcher.

Indeed, and as a molecular modeler I can empathize, since modeling presents a classic example of black-box versus nuts-and-bolts approaches. On one hand you have the veteran programmers who did quantum chemistry on punch cards, and on the other hand you have application scientists like me who are much more competent at looking at molecular structures than at code (you also have those who can do both, but these are the chosen few). There's a classic time spent vs benefits accrued tradeoff here. In the old days (which in modeling lingo go back only fifteen years or so), most researchers wrote their own programs, compiled and debugged them and tested them rigorously on test systems. While this may seem like the ideal training environment, the fact is that in modern research environments and especially in an industry like the pharmaceutical industry, this kind of from-scratch methodology development is often just not possible because of time constraints. If you are a modeler in a biotech or pharma company, your overlords rightly want you to apply existing software to discover new drugs, not spend most of your time writing it. In addition, many modelers (especially in this era of user-friendly software) don't have strong programming skills. So it's considered far better to pay a hefty check to a company like Schrodinger or OpenEye who have the resources to spend all their time perfecting such programs. 

The flip side of this however is that most of the software coming from these companies is not going to be customized for your particular problem, and you can start counting the number of ways in which a small change between training and test sets can dramatically impact your results. The only way to truly make these programs work for you is to look under the hood, change the code at the source and reconfigure the software for your unique situation. Unfortunately this runs into the problem stated above, namely the lack of personnel, resources and time for doing that kind of thing.

So how do you solve this problem? The solution is not simple, but the author hints at one possible approach when he suggests providing more graduate-level opportunities to learn the foundations of the techniques. For a field like molecular modeling, there are still very few formal courses available in universities. Implementing such courses will give students a head-start in learning about the relevant background, so that they can come to industry at least reasonably well-versed with the foundations and subsequently spent their time actually applying the background instead of acquiring it. 

The same principal applies to more standard techniques like NMR and x-ray diffraction. For instance, even today most courses in NMR start with a basic overview of the technique followed by dozens of problems in structure determination. This is good training for a synthetic chemist, but what would be really useful is a judiciously chosen list of case studies from the current literature which illustrate the promises and pitfalls of magnetic resonance. These case studies would illustrate the application of NMR to messy, real-world problems rather than ideal cases. And it's only by studying these case studies that students can get a real feel for the kind of problems for which NMR is really the best technique.

Thus, gaining a background in the foundations of a particular technique is only one aspect of the problem, and one which in fact does not sound as important to me as getting to know the strengths and limitations of the technique. To me it's not as important to formally learn quantum chemistry as it is to get a feel for the kinds of systems for which it works. In addition you want to know what the results really mean, since the numbers you get from the output are often more or less informative than they look. Learning the details of perturbation theory is not as key as knowing when to apply it. If the latter is your goal, it may be far more fruitful to scour the literature and get a feel for the circumstances in which the chosen technique works rather than just take formal classes. And conveying this feel for strengths and limitations of techniques is again something we are not doing very well in graduate school, which we should be.


  1. My initial reaction was that I could only fantasize about the day when NMR becomes a black box - otherwise, why else was I testing the effects of particular sets of capacitors and transmission lines on probe resonant frequencies today?

    On a more serious note, I can understand the author's concerns and his recommendation, although I would have to concur that your recommendation to educate people as to the strengths and weaknesses of techniques, as well as their range of applicability, is probably more critical. In either case, it is scary when people don't understand the (usually straightforward) logic of how a particular molecular biology kit works, or just how much one can extract from a data set without infringing upon its experimental limitations.

    Curiosity compels me - do pharma/biotech companies bother hiring a token computational chemist or two who have that old-school quantum chemistry background with the ability to deal with wonky legacy Fortran or C code? The people I knew in grad school who did do that mix of QC/programming either stayed in academia or ended up working for a company like Schrodinger (as memory serves).

  2. These days Python is all the rage. Good scripting skills generally seem to count for much more than hard-core coding knowledge, unless of course you want to be a developer. Not surprisingly, dedicated coders are in greater demand during the early days of a company when everything has to be built from scratch. Even a company like Schrodinger these days is mostly concerned with hiring scripters and application scientists. But yes, you do find the odd veteran programmer or two and these invariably turn out to be valuable since things never turn out the way you want them to and the need for hacking together miscellaneous bits of code seems to surface annoyingly often.

  3. I guess this is the price of technology. We have black boxes and we believe the numbers they churn out.

    It is not the increased in sophistication of the tools we have that should daunt us. If we train people to twiddle knobs, then they have trouble with the next generation of, say NMR machines. Rather they must be trained in the basics of science so the results can be interpreted.

    However "how much science?" seems to be the question. Quantum mechanics is too much for some to learn, so it too becomes part of the black box. Same with statistical mechanics. However if one works in a field, with time one develops a visualization of phenomena which comes with experience. For example, you do not need to know anything about classical mechanics to catch a ball that someone tossers you. If, however, a photon is fired at a molecule, we cannot visualize what is going on without some deeper understanding. In time, the process of photons interacting with molecules, (I mean spectroscopy of course) becomes natural to the user who only has a scant and localized understanding of quantum mechanics. How s/he use those results will be the proof of the pudding.


Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="">FoS</a> = FoS