Field of Science

The 100 most highly cited papers of all time: Tools, not ideas

A rather obscure paper by biochemist
Oliver Lowry is the most highly cited
scientific paper of all time
(Image: beckerarchives)
Nature has published a comprehensive list of the top 100 most highly cited papers of all time and the list is well worth a look for what it reveals about what's really important in science, what people perceive as being important and in fact how science progresses.

I like the list because it confirms something that I have written about on the blog a few times: science is as much a tool-driven revolution as an idea-driven one. This is evidenced by the fact that almost all of the most highly cited papers in that list are about useful techniques rather than great ideas and discoveries. Thus the expansion of the universe, the structure of DNA and the theory of relativity don't rank on the list but Sanger's gene sequencing method, an algorithm for comparing protein and DNA sequences (ClustalW) and a popular statistical technique do. And the dominance of many of these came about because of computers, so the selection also speaks very highly to the rise of computer technology.

Now one can argue that the reason the structure of DNA does not rank on the list is because its importance is so obvious that it has become a textbook fact and does not need to be cited. This is true, and the list does nothing to denigrate the significance of such household ideas (and it also does not elevate the importance of tools above ideas). But what it does convey is that the really visible papers are those which gave researchers practical tools rather than profound ideas. It also tells us that what's in the textbook is not always what's used the most by scientists in their day to day work, even if it may be an important component of their background knowledge.

The selection is dominated by papers from protein biochemistry, bioinformatics and statistics. The most highly cited paper documents the use of the Folin phenol reagent which even researchers consider a "tad outdated". This big enchilada leads the pack with a whopping 305,000 citations; the next best paper with 213,000 citations is a dot on the horizon by comparison. The genomics revolution meant a huge unmet need both for methods to quantify, isolate and sequence biomolecules and to process, compare and analyze these sequences. Not surprisingly, specific lab protocols for isolating and studying biomolecules and computer algorithms for analyzing their information content rank at the top of the list. The latter phenomenon is a great example of two technological revolutions - cheap software and hardware and the applications they engender - piggybacking on each other.

Statistics and crystallography - two other disciplines with huge practical ramifications - also feature prominently on the list. Computers were again tremendously important in the practical realization of these disciplines. For instance the program SHELX made it possible to analyze complex diffraction data from x-ray diffraction. Similarly the original paper describing the Kaplan-Meier test is highly cited: today the reason the Kaplan-Meier test is so popular is partly because it has found critical use in fields like clinical trials, but more importantly because it has been incorporated in popular software tools like Matlab and R which even non-statisticians can use efficiently. As statistics becomes more user friendly it is likely that these papers will be even more highly cited, but the people who cite them might just be using their products the way a gardner uses a lawnmower without really understanding how it works.

Computational chemistry also makes an appearance and density functional theory (DFT) which has caused a revolution in the accurate and fast calculation of molecular properties is what raises the profile of the field. The two most highly cited papers in this area include one by by Lee, Yang and Parr and another by Becke. The interesting thing about the amalgamation of these two methods is that they have now become ensconced in an abbreviation (B3LYP) which is used as part of a recipe by thousands of graduate students, postdocs and professional researchers to do all kinds of quantum chemical calculations, from simple energy determinations to complex reaction studies. But the prominence of B3YLP also goes to the heart of the reason why a paper can be highly cited: it becomes so firmly enmeshed into a standard arsenal of tools that most people start blindly citing it and using it as a black box. I have used the technique myself dozens of times including in a few papers, but I don't remember the last time I took a look at the original paper.

One of the most puzzling facts about the list is the almost complete absence of papers from physics and astronomy. Maybe I am missing something here but the absence tempts me to reiterate something that I have noted before: that chemistry and biology, much more so than physics, are about tools rather than ideas. Nonetheless I find the paucity of physics papers puzzling since there is no reason a priori why physics should be devoid of practical tools nor why ideas in physics should not be highly cited. One intriguing point noted in the article is that physicists might be less prone to citing each other's papers than biologists, and if true this is really a cultural phenomenon responsible for the absence of physics papers on that list.

Nevertheless the list is quite readable. It tells us that science is ultimately as much about the mundane use of tools and techniques as it is about the genesis and distribution of profound ideas. Sadly the former view is not half as much appreciated by the general public as the latter. The list provides a great counterpoint to the idea of science as a series of paradigm shifts. It tells us that in science, as in many other fields, what matters ultimately is what we can use.


  1. Does the study account for the fact that some fields have less workers and will then write less papers and refer to other papers less? Also, some work is so hard to understand you won't be able to use it. It may be correct. It may be valuable. The fault may be in the reader who then won't reference it.

  2. That's an interesting analysis. But after a certain point the "tools" are not cited anymore as it became a standard. I'm not citing how the NMR works or the TLC or the chromatography as those are nowadays standard techniques. I really cannot see how a paper can get >300.000 citations....

  3. I have a feeling that in physics and astronomy "tools" become standard to the point that they are not cited faster than in biology. In the list of most cited papers I notice that numbers 58 and 88 are the original references for two widely used numerical method: the Metropolis-Hastings and Levenberg-Marquardt algorithms. I've used them both, but I've never cited the original papers. I've always considered them to be textbook material.

    1. Good points, both. Maybe chemists and biologists tend to cite "obvious" developments more than physicists.

    2. Looking at the list, I noticed that citation to perhaps the most widely used (numerical) tool in science is missing from the list: the Cooley-Tukey FFT paper in Mathematics of Computation in 1965. If it was cited every time FFT is used, the citation count would be enormous. I guess FFT has reached the status of standard tool in all branches of science.

    3. Yes, the Cooley-Tukey transform is a great example of "obvious" research that's so important it's left uncited.


Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="">FoS</a> = FoS