Rutherford on tools and theories (and machine learning)

Ernest Rutherford was the consummate master of experiment, disdaining theoreticians for playing around with their symbols while he and his fellow experimentalists discovered the secrets of the universe. He was said to have used theory and mathematics only twice - once when he discovered the law of radioactive decay and again when he used the theory of scattering to interpret his seminal discovery of the atomic nucleus. But that's where his tinkering with formulae stopped.

Time and time again Rutherford used relatively simple equipment and tools to pull off seemingly miraculous feats. He had already won the Nobel Prize for chemistry by the time he discovered the nucleus - a rare and curious case of a scientist making their most important discovery after they won a Nobel prize. The nucleus clearly deserved another Nobel, but so did his fulfillment of the dreams of the alchemists when he transmuted nitrogen to oxygen by artificial disintegration of the nitrogen atom in 1919. These achievements justified every bit Rutherford's stature as perhaps one of two men who were the greatest experimental physicists in modern history, the other being Michael Faraday. But they also justified the primacy of tools in engineering scientific revolutions.

However, Rutherford was shrewd and wise enough to recognize the importance of theory - he famously mentored Niels Bohr, presumably because "Bohr was different; he was a football player." And he was on good terms with both Einstein and Eddington, the doyens of relativity theory in Europe. So it's perhaps not surprising that he pointed out an observation about the discovery of radioactivity attesting to the important of theoretical ideas that's quite interesting.

As everyone knows, radioactivity in uranium was discovered by Henri Becquerel in 1896, then taken to great heights by the Curies. But as Rutherford points out in a revealing paragraph (Brown, Pais and Pippard, "Twentieth Century Physics", Vol. 1; 1995), it could potentially have been discovered a hundred years earlier. More accurately, it could have been experimentally discovered a hundred years earlier.

Rutherford's basic point is that unless there's an existing theoretical framework for interpreting an experiment - providing the connective tissue, in some sense - the experiment remains merely an observation. Depending only on experiments to automatically uncover correlations and new facts about the world is therefore tantamount to hanging on to a tenuous, risky and uncertain thread that might lead you in the right direction only occasionally, by pure chance. In some ways Rutherford here is echoing Karl Popper's refrain when Popper said that even unbiased observations are "theory laden"; in the absence of the right theory, there's nothing to ground them.

It strikes me that Rutherford's caveat applies well to machine learning. One goal of machine learning - at least as believed by its most enthusiastic proponents - is to find patterns in the data, whether the data is dips and rises in the stock market or signals from biochemical networks, by blindly letting the algorithms discover correlations. But simply letting the algorithm loose on data would be like letting gold leaf electroscopes and other experimental apparatus loose on uranium. Even if they find some correlations, these won't mean much in the absence of a good intellectual framework connecting them to basic facts. You could find a correlation between two biological responses, for instance, but in the absence of a holistic understanding of how the components responsible for these responses fit within the larger framework of the cell and the organism, the correlations would stay just that - correlations without a deeper understanding.

What's needed to get to that understanding is machine learning plus theory, whether it's a theory of the mind for neuroscience or a theory of physics for modeling the physical world. It's why efforts that try to supplement machine learning by embedding knowledge of the laws of physics or biology in the algorithms are likely to work, while efforts blindly using machine learning to try to discover truths about natural and artificial systems using correlations alone would be like Rutherford's fictitious uranium salts from 1806 giving off mysterious radiation that's detected without interpretation, posing a question waiting for an explanation.


  1. Lovely, but I want to push back a little against your final paragraph's "efforts that try to supplement machine learning by embedding knowledge of the laws of physics or biology in the algorithms are likely to work".
    What I think this means in software terms is that some algorithms are prioritized over others? Taking that as a starting point despite its flaws, that's fine for routine discoveries that are in tune with existing theory, but it will make some radical discoveries more unlikely if they need only a short sequence of low priority algorithms but need a long sequence of many high priority algorithms, in a given system of priorities. Not all radical discoveries will be made more unlikely by a given choice of priorities, but this seems to create a specific probability density over different rat runs through the discovery space.
    I don't understand machine learning well enough, and I even more don't understand QML well enough, to feel that you're mistaken in your claim, and I can certainly feel the power of your analogy with Rutherford's claim (in the last clause of your last sentence) that discoveries about Uranium salts a century earlier would not have fallen on a theoretical landscape that would have been as fertile. I'll end, weakly, by saying only that "are likely to work" is a probabilistic claim, so I suppose it requires probabilistic argument to support it: for one choice of prioritization it will be true ("true" in whatever probabilistic sense we favor) for a given discovery, for another I suppose it will not. Implicitly or explicitly, we prioritize, but the consequences are so complex that I suspect it is only by experiment that we will discover which prioritization is good for which discovery.

  2. The 'advantage' of using machine learning is that it allows so-call researchers to concentrate on the minutia of the machine learning fitting rather than learning the application domain or collecting more data. The result is fake research, such as


Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="">FoS</a> = FoS