It’s fairly well known that the theories of information and of thermodynamics share the concept of entropy in a meaningful way. It is also known, although less well, that the theories of information and of computation also share this concept. In 1975, Gregory Chaitin published a paper entitled “A Theory of Program Size Formally Identical to Information Theory” in the Journal of the Association for
Computing Machinery. In that paper, Chaitin constructs a measure for the relative size of a computer program such that a program’s relative size obeys all the standard identities on entropy as laid out in classical Shannon-Weaver information theory. A reasonable conclusion is that computing machines have something in common with heat engines, and that computational work has something in common with physical work. This interpretation, however, is my own; Chaitin’s paper is a brilliant piece of mathematics, but it doesn’t really concern itself with these kinds of intuitive interpretations. However, I think the intuition of computational work as physical work is a powerful and potentially very fruitful one. In what follows, I’ll attempt to give an informal but careful justification for this connection, in the hopes of convincing the reader that computation constitutes work in the strict physical sense of that word.
In order to understand the connection between physical work and computational work, it’s necessary to properly understand the concept of entropy. The usual informal definition of entropy as a measure of disorder is vague, misleading, and ultimately unhelpful; put it aside and forget about it. The following definition is much more precise and usable, and effectively ties together thermodynamics, information theory, and computation:
Given some observable phenomenon P, entropy is a measure of how much can be inferred about P from direct observation of P.
Applying this definition to thermodynamics, we have:
Given a thermodynamic system, its entropy is a measure of how many microscopic states are consistent with a macroscopic state.
Applying this definition to information theory,
Given a code, its entropy is a measure of the average number of bits that must be transmitted in order to uniquely specify a code word.
There are three essential parts to all of these definitions: something that can be directly observed, something that cannot be directly observed, and some kind of causal relationship between the two. Entropy is basically a measure of how strong an inference can be made using the causal relationship between those parts of the system that are directly observable and those that are not. The juxtaposition of two frames of observational reference is crucial to the discussion that will follow, and the definition of entropy is a powerful one because it draws a quantitatively precise relationship between these two frames.
Chaitin’s definition of computational entropy is actually very intuitive and sensible in light of these observations. Assume a fixed optimal universal Turing machine U. Informally, Chaitin’s entropy can then be defined as follows:
The entropy of a string s is the length of the shortest program for U that halts with s on the output tape.
The important feature to observe in this definition is that entropy is a measure on the final state of the machine relative to the initial state. In a well-defined sense, the entropy of a final machine-state is a measure of how much information must be given initially in order to produce the specified result. To extend the analogy with information theory, the machine’s initial state (i.e. the program it is given to execute) is a code word, while the machine’s final state (i.e. the result of the computation – which may not be defined in the case of non-termination) is the result of transmitting and decoding that word. To extend the earlier analogy with thermodynamics, the machine’s initial state is like the microscopic configuration of a thermal system (i.e. velocities of the individual particles), while the final state is the directly observable macroscopic configuration of the thermal system (i.e. pressure, volume, and temperature).
More to the point, Chaitin demonstrates in his paper that the definition of entropy given for computation obeys all the identities on entropy as defined in the classical settings, which means that these analogies are not mere intuitions, but exact correspondences. It’s worth mentioning a few of these. Denote the entropy of s as H(s), and let H(s,t) and H(s∕t) denote the respective entropies of the joint event “s followed by t” and of the conditional event “s given t”. The most important identity that holds for all concepts of entropy is:
|H(s,t) = H(s) + H(t∕s) + Δ|
which states that there is some bounded constant Δ such that jointly observing s and t conveys the same amount of cumulative information as observing s alone and then observing t in the presence of s. In terms of computation, this means that computing s and then t should entail roughly the same informational burden as computing s alone then passing it as an input to t. It’s also
worth the trouble to think out what the following identities mean in terms of computation:
|H(s,t)||=||H(t,s) + O(1)|
|H(s)||=||H(s,t) + O(1)|
|H(s∕t)||≤||H(s,t) + O(1)|
|H(s,t)||≤||H(s) + H(t∕s) + O(1)|
I’ll leave that to the curious reader.
These are all very nice correspondences, but by themselves don’t do much to further the analogy that motivated this article, namely, an intuitive analogy between physical and computational work. Real machines perform various kinds of energy conversions, and because perfectly efficient energy conversion is impossible they produce waste heat. This phenomenon is a consequece of the often quoted (and often misinterpreted) Second Law of Thermodynamics, which states that the entropy of a closed system will always increase with time. Since we have here a meaningful concept of computational entropy, it is only natural to ask whether there is a computational analogy to the Second Law of Thermodynamics.
An important consequence of the Second Law of Thermodynamics is that there are no perfectly reversible physical processes. As such, there is one more important observation that remains to be made for the purposes of this discussion, and it is that computation is an irreversible process. Understanding irreversibility is key to understanding what it means for entropy to increase in a computational system. All non-trivial transition systems, including computational ones, are irreversible in the sense that it is not generally possible to infer initial states from final states. The reason is simple: there are many possible initial states that could lead to any given final state. As a consequence, each step of a computation reduces the amount of available information about the history of that computation – unless that information is saved elsewhere. The significance of this fact becomes immediate in light of the earlier analysis of entropy: a computer’s final state is the directly observable part of the entropic system, it’s initial state is the unobservable part of the entropic system, and the rules or semantics of the computation specify the causal relationship between the two. Entropy increases as the computation progresses because the computer consumes its input in the very same sense that a physical machine consumes energy in order to operate. The terminology of theoretical computer science is pregnant with these ideas; it seems more than a coincidence that many programming language semantics use the notion of reduction to describe how a computation works.
This computational analogy to the Second Law of Thermodynamics is basically a statement about the real meanings of computations. A computation is not merely a result; in order to be useful, it must be associated to a computational procedure and an input. A result is meaningless if its observer doesn’t know what program produced it and what the initial input was. In order for the work of computation to be useful, it is necessary that contextual information about initial states and the rules of computation be kept elsewhere, or the program will simply reduce its initial state to a meaningless final state, consuming all contextual information in the process. In this sense, a user that knows a program’s inputs and rules acts as a heat sink to the computational process, and this heat sinking is as necessary to the performance of computational work as it is to the functioning of a real machine. Douglas Adams displayed a keen insight into this important relationship between initial and final states, or question and answer, in the ironic admonition of his fictional Deep Thought computer to those who objected to its answer to The Ultimate Question of Life, the Universe, and Everything: “I think the problem, to be quite honest with you is that you’ve never actually known what the question was.” Computers can only do work if they are given initial information to consume; if that information does not exist elsewhere, then it is lost. In this sense, the work of an abstract, computational machine is very much like the work of a real, physical machine: both consume their inputs, and both must export the uncertainty they create in order to continue functioning.
I personally find this insight to be very surprising and rich with the possibility of even deeper interpretations. Given the successful mathematization of science since the time of Newton, perhaps it shouldn’t be surprising that abstract constructions and physical phenomena should behave the same way. Up until this point in history, however, it is exclusively abstract notions that have been used to determine the concrete boundaries of the real world, and never the other way around. What is surprising about Chaitin’s result and its analogy to heat engines is that it seems to completely invert this trend; the limited, circumscribed behaviors seen in physical machines seem to pop up unexpectedly in the realm of the abstract, where it has often been supposed that if something can be imagined, it can be done. Arguably, this is not the first such discovery; Turing, Godel, and Church each independently discovered incomputable functions in the earlier half of the 20th century, and one could (correctly) argue that Chaitin’s result is simply a radical reformulation of their work. The difference, perhaps, is that early work in the theory of computation seems to leave computation itself in the realm of the purely abstract. (In a curious kind of defiance, there have been some amusing attempts to build real Turing machines, although to the best of my knowledge, no one has ever constructed a partial recursive function or a lambda calculus out of Legos). In the visions of partial recursive functions, lambda calculi, or even Turing machines, computational work still seems to belong to a realm of Cartesian duality, where there may be constraints and even perhaps a physical substrate, but where when it comes to the meanings and symbols themselves, the usual rules of the material somehow don’t apply. As it turns out, however, abstract machines function under constraints that are just as real and just as inescapable as physical ones.