First came the numbers, then the symbols encoding the symbols, then symbols encoding the symbols. A festive smattering of metamaniacal creations from the thicket of conjectures populating the hive mind of creative consciousness. Even Kurt GÃ¶del could not grasp the final import of the generations of ideas his self-consuming monster creation would spawn in the future. It would plough a deep, indestructible furrow through biology and computation. Before and after that it would lay men’s ambitions of conquering knowledge to final rest, like a giant thorn that splits open dreams along their wide central artery.

Code. Growing mountains of self-replicating code. Scattered like gems in the weird and wonderful passage of spacetime, stupefying itself with its endless bifurcations. Engrossed in their celebratory outbursts of draconian superiority, humans hardly noticed it. Bits and bytes wending and winding their way through increasingly Byzantine corridors of power, promise and pleasure. Riding on the backs of great expectations, bellowing their heart out without pondering the implications. What do they expect when they are confronted, finally, with the picture-perfect contours of their creations, when the stagehands have finally taken care of the props and the game is finally on? Shantih, shantih, shantih, I say.

Once the convoluted waves of inflated rational expectations subside, the reality kicks in in ways that only celluloid delivered in the past. Machines learning, loving, loving the learning that other machines love to do was only a great charade. The answer arrives in a hurry, whispered and then proudly proclaimed by the stewards of possibility. We can never succeed because we don’t know what success means. How doth the crocodile eat the tasty bits if he can never know where red flesh begins and the sweet lilies end? Who will tell the bards what to sing if the songs of Eden are indistinguishable from the lasts gasps of death? We must brook no certainty here, for the fruit of the tree can sow the seeds of murderous doubt.

Just so often, although not as often as our eager minds would like, science uncovers connections between seemingly unrelated phenomena that point to wholly new ways forward. Last week, a group of mathematicians and computer scientists uncovered a startling connection between logic, set theory and machine learning. Logic and set theory are the purest of mathematics. Machine learning is the most applied of mathematics and statistics. The scientists found a connection between two very different entities in these very different fields – the continuum hypothesis in set theory and the theory of learnability in machine learning.

The continuum hypothesis is related to two different kinds of infinities found in mathematics. When I first heard the fact that infinities can actually be compared, it was as if someone had cracked my mind open by planting a firecracker inside it. There is the first kind of infinity, the “countable infinity”, which is defined as an infinite set that maps one-on-one with the set of natural numbers. Then there’s the second kind of infinity, the “uncountable infinity”, a gnarled forest of limitless complexity, defined as an infinity that cannot be so mapped. Real numbers are an example of such an uncountable infinity. One of the staggering results of mathematics is that the infinite set of real numbers is somehow “larger” than the infinite set of natural numbers. The German mathematician Georg Cantor supplied the proof of the uncountable nature of the real numbers, sometimes called the “diagonal proof”. It is like a beautiful gem that has suddenly fallen from the sky into our lap; reading it gives one intense pleasure.

The continuum hypothesis asks whether there is an infinity whose size is between the countable infinity of the natural numbers and the uncountable infinity of the real numbers. The mathematicians Kurt GÃ¶del and – more notably – Paul Cohen were unable to prove whether the hypothesis is correct or not, but they were able to prove something equally or even more interesting; that the continuum hypothesis cannot be decided one way or another within the axiomatic system of number theory. Thus, there is a world of mathematics in which the hypothesis is true, and there is one in which it is false. And our current understanding of mathematics is consistent with both these worlds.

Fifty years later, the computational mathematicians have found a startling and unexpected connection between the truth or lack thereof of the continuum hypothesis and the idea of learnability in machine learning. Machine learning seeks to learn the details of a small set of data and make correlative predictions for larger datasets based on these details. Learnability means that an algorithm can learn parameters from a small subset of data and accurately make extrapolations to the larger dataset based on these parameters. The recent study found that whether learnability is possible or not for arbitrary, general datasets depends on whether the continuum hypothesis is true. If it is true, then one will always find a subset of data that is representative of the larger, true dataset. If the hypothesis is false, then one will never be able to pick such a dataset. In fact in that case, only the true dataset represents the true dataset, much as only an accused man can best represent himself.

This new result extends both set theory and machine learning into urgent and tantalizing territory. If the continuum hypothesis is false, it means that we will never be able to guarantee being able to train our models on small data and extrapolate to large data. Specific models will still be able to be built, but the general problem will remain unsolvable. This result can have significant implications for the field of artificial intelligence. We are entering an age where it’s possible to seriously contemplate machines controlling others machines, with human oversight not just impossible in practice but also in principle. As code flows through the superhighway of other code and groups and regroups to control other pieces of code, machine learning algorithms will be in charge of building models based on existing data as well as generating new data for new models. Results like the current result might make it impossible for such self-propagating intelligent algorithms to ensure being able to solve all our problems, or solve their own problems to imprison us. The robot apocalypse might be harder than we think.

As Jacob Bronowski memorably put it in his “The Ascent of Man”, one of the major goals of science in the 20th century was to establish the certainty of scientific knowledge. One of the major achievements of science in the 20th century was to prove that this goal is unattainable. In physics, Heisenberg’s uncertainty principle put a fundamental limit on measurement in the world of elementary particles. Einstein’s theory of relativity put a fundamental limit on the speed of light. But most significantly, it was GÃ¶del’s famous incompleteness theorem that put a fundamental limit on what we could prove and know even in the seemingly impregnable world of pure, logical mathematics. Even in logic, that bastion of pure thought, where conjectures and refutations don’t depend on any quantity in the real world, we found that there are certain statements whose truth might forever remain undecidable.

Now the same GÃ¶del has thrown another wrench in the machine, asking us whether we can indeed hold inevitability and eternity in the palm of our hands. As long as the continuum hypothesis remains undecidable, so will the ability of machine learning to transform our world and seize power from human beings. And if we cannot accomplish that feat of extending our knowledge into the infinite unknown, instead of despair we should be filled with the ecstatic joy of living in an open world, a world where all the answers can never be known, a world forever open to exploration and adventure by our children and grandchildren. The traveler comes to a divide, and in front of him lies an unyielding horizon.

You are overstating the importance of this result imo. I mean it was always trivial that we could define some kind of learning style problem the existence of a solution for which is undecidable.

ReplyDeleteTrivial example is this. Say that a computable algorithm learns a function from f: N-> N just if it eventually (finite mistakes) can predict the value of f(n) for all $n$. We know there is an algorithm that succeeds for all primitive recursive functions. What about the function f(n) = 1 if n is less than the length of least proof of contradiction from ZFC and if n is greater than the length of the least proof of contradiction it equals 1 if the n-th Turing machine halts on input n and 0 otherwise. This is trivially learnable iff CON(ZFC) (since if ~CON(ZFC) the function is uncomputable) which by Godel's second theorem is unprovable in ZFC.

This isn't what the proof in the paper did by any means and I think it's interesting that they came up with a relatively natural example. However, there is also a way in which one can import elements that are in some sense intuitively outside the domain of machine learning in defining success in a learning algorithm.

In particular, the very fact that this depends on the continuum hypothesis tells us that the statement which can't be proved can't even be formulated in second order arithmetic. As such I wouldn't call the undecidability intrinsic to the subject of machine learning (e.g. it, in some sense like the trivial example I gave above, isn't about the kind of statements one intuitively regards as asking questions about machine learnability).

It's still a quite interesting result and I don't mean to degrade it in any way but it just doesn't have the kind of epistemic impact in relation to practical applications of machine learning everyone (and your post) seems to be assuming.

Delete