A Talk for “Off the Beaten Track”
Posted: 2011/12/16 Filed under: news | Tags: conferences, programming languages, publications, talks Leave a comment »Below is a short synopsis of a talk, loosely inspired by this old post, that I’ll be giving at Off the Beaten Track: Underrepresented Problems for Programming Language Researchers, in Philadelphia, PA on January 28, 2012. If by some odd chance you happen to be there, or if this just sounds interesting, I’d love to hear your thoughts. Unfortunately, I only have five minutes of talking time, and so I won’t be able to unpack the full motivation or elaborate specific proposals during the talk proper, as reflected in the brevity of the synopsis. That said, I do have considerably more to say on both these points, which would make a great future post.
Symbolic computation occupies its own universe with its own laws. Intuitively, such abstract and purely symbolic universes are also extremely orderly. Given this order, it seems only natural that formal logic should provide a powerful tool for reasoning about phenomena in the symbolic universes of computation. This is no idle observation; nearly a century of intellectual labor has developed and analyzed the deep and remarkable relation between logic and computation, yielding useful tools in the form of type systems, theorem provers, model checkers, and novel programming language paradigms. Logic is an essential tool for computer science because it makes precise and clear the ways in which programs are, in their essence, simple and orderly.
In a precise and very well-defined sense, programs are also fundamentally complex and disorderly. This fact should not be completely surprising, or even wholly controversial; there are well-known theoretical limits to what logic can reveal about programs. It is in the face of exactly such complexity and disorder that logic is powerless. The implications of this observation, however, are profound; the pioneering work of Kolmogorov, Solomonoff, and Chaitin illustrated that even purely symbolic, computational universes are subject to the same tendency to complexity and disorder observed in the physical universe. Just as the study of dynamic physical systems promises a deeper understanding of phenomena that heretofore confounded the physical sciences, a deeper exploration of complexity offers the potential for new and powerful tools for analyzing and understanding computational systems.
Unfortunately, most developments in computational complexity have been confined to pure mathematics and to the theory of abstract computation. Tools for precisely analyzing the structure and degree of program complexity, however, would find application in virtually all fields of computer science, and lay the foundations of a methodology that would solidifiy the position of software development as a genuine engineering practice. Programming language research, with its awareness of the breadth and diversity of computational paradigms, its dynamic synthesis of theoretical and practical aspects of computing, and its mastery of program analysis and transformation, is uniquely positioned to realize the theoretical consequences of program complexity as practical tools. In order for these results to take shape, however, researchers must widen their view to encompass both halves of the dual nature of computation, as something essentially simple in structure but inescapably complex in its consequences.
Follow the Bouncing Ball: An Over-Literal Interpretation of ‘Mechanized Logic’
Posted: 2011/11/18 Filed under: conceptual explanations, interesting questions | Tags: abstract machines, algorithmic information theory, decidability, entropy, logic, speculations, theory of computation Leave a comment »Developing your own intuitions is an underappreciated but absolutely essential part of understanding mathematics, computing, or science. In general, more concrete intuitions are also more useful intuitions, as the human mind has a better-developed set of tools for reasoning about familiar objects. (And perhaps this observation constitutes a definition of ‘familiarity’.) While it’s important to develop new intuitions about unfamiliar and sometimes surprising objects (this being a good argument for mathematics taking a place alongside literacy as a pillar of education), straightforward analogy to familiar phenomena can illuminate even the most dense and obscure of abstractions.
I’ve been preoccupied with work elsewhere for a while, and so I’d like to get back into the rhythm of writing here by dusting off an old analogical device that I cooked up a couple of years ago for trying to understand the relationship between formal logic and other branches of mathematics. The degree to which the device succeeds at this aim is questionable, but I think the device itself is interesting, if for no other reason that it brings such a strong spatial-mechanical flavor to such a highly abstract system. This spatial-mechanical flavor may even illuminate some surprising relationships. Whether or this revelation is actually useful, or just amusing isn’t clear, but after all, when does it ever hurt to take a new perspective?
So let’s begin.
One of the most basic constructions of formal logic is the truth table. Truth tables characterize logical predicates as a straightforward tabulation of some number of boolean variables. Formally, a truth table on N variables is a tabular representation of all the values of a function P : {0, 1}^N -> {0, 1}. (Throughout this presentation, P indicates some predicate on N boolean variables.) Informally, a truth table is nothing more than a table that characterizes which of some number of possible things have some property, and which don’t.
Suppose then that we have respective values v[i] for some collection of boolean variables x[i] and we want to determine whether P is true or false under these conditions. (That is, x[i] |-> v[i].) One procedure for determining the truth or falsity of P under these conditions would be to look up the entry v[1] … v[N] in the truth table for P. There are a number of ways this look-up could be structured, however. The most obvious involves a flat ordering of the rows, so that each is scanned exactly once until the sought-after row is reached. This is kind of dull, and moreover requires N look-ups in the worst case. A better organization (read: data representation) would be to arrange the values of P as leaf-nodes in a binary tree, denoted T, where each tier of nodes corresponds to one of the x[i]. By convention, assume that the [i]th tier, counting from top to bottom, corresponds to variable x[i], and that left and right branches from a node correspond to values of 0 and 1 respectively. There is now a one-to-one correspondence between possible combinations of v[i] and paths from the root node of T to the leaf nodes of T. Checking the v[i] one at a time, taking the left branch if v[i] = 0 and the right branch if instead v[i] = 1, a determination of the value of P will take no more comparisons than the height of T, which happens to be log_2(N).
Well, none of this is terribly surprising. What I’ve described above would be completely unremarkable if implemented on a digital computer, and would not even be the most efficient approach for many kinds of problems. Suppose, however, that we wanted to implement the computation of P as a mechanical artifact. While the implementation details would require heroic feats of material craftsmanship, the metaphorical descent of the index [i] through the tree could be implemented as a ball (which I’ll also call [i]) descending very literally under the force of gravity through a series of left-or-right chutes opened and closed e.g. by levers or knobs. At the bottom of the apparatus, which I’ll also call T, would be 2^n boxes, each one labelled according to a value of P given one of the possible combinations of the x[i]. Given values v[i] for the variables x[i], the value of P could then be determined as follows: the ball [i] would be dropped in at the top of the machine and roll through a series of left-or-right chutes until it fell through to one of the boxes at the bottom; if the value v[i] was false, the chutes at level [i] would be opened to allow the ball to roll to the left, otherwise the chutes would be opened to allow the ball to roll to the right.
(As an aside, what I’m describing bears a curious resemblance to the Plinko Machine featured on the popular television gameshow “The Price Is Right”. The difference between the predicate-tree machine featured here and Plinko is that the predicate-tree machine would need to be built for scrupulously predictable behavior, whereas Plinko appears designed to maximize the random and unpredictable bouncing of its [i]-token.)
Is this bouncing-ball mechanism for computing P any more interesting than the abstract binary tree algorithm? One thing that ought to be observed is that the bouncing-ball mechanism (hereafter referred to as B) used to implement the algorithm as a literal apparatus is necessarily constrained in ways that the abstract algorithm does not appear to be. Most conspicuously, the B machine is a physical (albeit imaginary) artifact and so must occupy some non-zero amount of space. The height of the machine will be proportional to the number of variables N (since one tier is needed for each one), and the width of the machine will grow as a function of 2^N, since one box will be needed for each possible combination of values for x[i]. What this means, for instance, is that the machine may become very wide at the bottom very quickly if variables are added to P. One engineering challenge that would certainly present implementers and maintainers of B would likely be finding a space large enough to accommodate the apparatus, especially if N is quite large.
Still, even these facts are not surprising: of course the physical realization of a symbolic system with exponential growth would also grow exponentially in its spatial dimensions. There is, however, a subtlety that has been overlooked. Every feature of the abstract tree algorithm for computing P has to be physically realized in order to implement B, including features that are not usually the subject of time- or space-complexity analyses. Varying these parameters has important and highly suggestive consequences for the interpretation of P that do no necessarily follow from the abstract algorithm. One of these features is the pointer that descends through the nodes of the tree toward the leaf nodes. This pointer is realized as the ball [i], and it too must obviously have non-zero dimension in order to be realized. Denote the diameter of the ball [i] as D. The chutes in B must be at least D wide in order for the machine to function, i.e. for the ball [i] to fall through to a box at the bottom. In the event that B must be fit into a space of fixed side, e.g. a building, there will be some design pressure to make the chutes as narrow as possible, which in turn will require D to be even smaller.
By varying some of the other features of the predicate P and its encoding, along with the dimensions of [i], some interesting analogies pop out. For example, suppose that, in order to control the explosive growth in the width of the machine at its base, the chutes become progressively narrower as they descend. If the ball is too large, it may become stuck before it reaches the bottom. Supposing that the too-large ball [i] is the only one available, is there any way to work around this difficulty? There is, provided that the values of P are “not too random”. For instance, suppose that some number of contiguous boxes all have the same label, say 0, and suppose that there is a horizontal interval I in the machine such that, below a certain depth H, all chutes lead to one of the 0-boxes. Then the machine B could be fitted with an “escape chute” that takes the ball to a single 0-box once it has entered the area bounded by H and I. If all of the 0-boxes could be accommodated in this way, the value of P would be known to be 0 whenever [i] was able to roll all the way out of the machine, and 1 if it got stuck.
These assumptions about the number of 0s and 1s in the value of P and their contiguous arrangement (read: entropy) are pretty strong, and will be invalid in a lot of cases. Things become even more difficult if N is allowed to be infinite. While it might seem strange to allow this in a supposedly mechanical-spatial setting, it’s instructive to consider the consequences. For example, B could still function in a least some such arrangements. If the infinite number of boxes at the bottom of the machine could be rearranged into finitely many regions of all-0s or all-1s, the escape-chute mechanism would allow the ball to roll out of the machine after some finite period of time (read: would allow the machine to terminate). Conversely, if the infinitely-numerous boxes cannot be rearranged into finitely many all-1s or all-0s regions, i.e. if there is at least one box whose label that does not differ from its left- and right-hand-side neighbors, then the ball might never roll to the bottom — either because it became stuck, or because it simply took infinitely long to traverse an infinite distance. Interesting, which of these latter two outcomes depends upon whether the infinite machine is allowed to grow to an infinite height with a non-infinitesimal value of D, or whether the height remains finite while the minimum chute-width D becomes infinitesimally small.
In particular, they won’t work for any cases where N is infinite. Here’s the reason: in the finite case, it will only take a finite length of time for the ball to roll to the bottom. If it’s known exactly how long it take the ball to roll to the bottom, it can be inferred that the ball is stuck whenever it fails to appear at the bottom of the machine within that period of time. Even in the infinite case, there may still be contiguous regions of 0s or 1s that can be compressed into an escape chute out of which the ball might appear after a finite length of time. In the infinite case, however, the ball might require an infinite duration to roll to the bottom, and so it will be impossible in general to tell whether the ball is rolling or stuck by waiting for it to appear at the bottom. Holding the height and width of the machine fixed and finite, D is readily interpreted as the degree of precision possible in determining the value of P, with perfect precision only possible in the case of a rearrangement into finitely many continuous regions of all-0s or all-1s.
The machine B also bears an interesting resemblance to John H. Conway’s reinterpretation of the Dedekind Cut in his classic “On Games and Numbers”. The Dedekind Cut is simply an inductive definition of each real number as a set of the two closest inductively-hypothesized numbers that are respectively less or greater. More concretely, the Dedekind Cut defines a real number as lying at a unique position between a set of lesser numbers and a set of greater numbers. Conway’s construction of each real number resembles the B machine in that it also involves a (possibly infinite) progression through a series of left-or-right decisions. Infinite paths through the machine then correspond quite precisely to Conway’s construction of the transcendental and transfinite numbers.
The possibility of rearranging the boxes at the bottom of B corresponds very precisely to the entropy of the predicate P. While entropy is a widely abused and misunderstood term, it may easily be though of as the degree to which information can be compressed into a smaller form without losing its meaning. It should be obvious that it won’t always be possible to rearrange the boxes at the bottom of B without also changing the value of P, no matter how the variables x[i] are remapped among the tiers. In general, the higher the entropy of P, the further down [i] will have to travel before it can hope to escape through an auxiliary chute; in the case of maximal entropy, [i] no escape-chutes are possible at all.
It’s also interesting, and only natural, to ask whether the B machine has anything to say about whether P is decidable. Clearly, P will be decidable whenever B can be constructed in such a way that [i] will always roll out of the machine after some finite time has passed. What about if P is not decidable? P is clearly undecidable in the infinite case for entropy sufficiently high. It is interesting to ask, however, what relation there may be between the entropy P and the size of D, all of which are very conspicuously tied to the question of decidability. While this question has been examined elsewhere, most conspicuously in Algorithmic Information Theory, I do not know whether there may be any combinatorial or geometric interpretations of interest, e.g. whether D, H, and I obey concrete number-theoretic relationships under certain conditions.
These all seem like interesting questions, and although I am certain that at least some of them are at least partially answered elsewhere, I am also fairly confident that not all of them have been answered completely. While I do not know the answers myself, it would be quite interesting to hear yet another interpretation.
What the Hell is a Monad?
Posted: 2011/09/09 Filed under: conceptual explanations, tutorials | Tags: functional programming, Haskell, monads Leave a comment »
In the programming languages literature, the concept of the monad is widely known and almost as widely misunderstood. Some authors have gone so far as to call monads “the essence of functional programming”, while others seem to think that monads are little more than an especially elaborate kind of syntactic sugar. A consequence of this broad confusion is a lack of clear explanations of what exactly a monad is. Authors seem sharply divided into two camps: experts on monads and category theory, who don’t need to explain anything to other members of their clique, and experts on everything else, who just don’t care much about monads at all. Didactic explanations of monads are notoriously difficult to come by, and where explanations do exist, they are almost always dense with category theory. Category theory’s biggest strength, its abstractness, can also be a severe weakness when it is used in tutorial explanations: something concrete ultimately has to be shown in order to forge an understanding, and it’s hard to find much that’s decidedly concrete in category theory.
What follows is what I personally believe is missing from the literature: a self-contained, intuitive explanation of what monads are and why they can be a useful specification tool. No mathematical background or experience with a specific programming language is presumed, only a basic familiarity with the usual concepts of functional programming – which is what I would tend to expect if you’ve actually chosen to read this article.
What the Hell is an ’Effect?’
The conventional explanation of monads is that they’re used to specify side effects in functional programming languages. This is a correct explanation, but it’s not extremely informative if you’ve never programmed using monads and know nothing else about them. There are two shortcomings. Firstly, although there are a lot of examples and vague intuitions, there seems to be no single, authoritative definition of what precisely is meant by “side-effect” in the general sense. Secondly, even given a good definition of side-effects, monads are not essentially functional, and could be constructed in any adequately expressive language. It’s true that monads are very useful in the context of functional languages for several reasons, but it’s something of a misconception to suppose that monads only make sense in the context of functional languages, and I suspect this misconception contributes to much of the confusion that monads are simply convenient syntax for certain kinds of functional idioms.
Effects are closely tied to the notion of referential transparency. By definition, a computation is referentially transparent whenever it can be replaced by its result (as an expressible value) in the source text without affecting the meaning of the program. All terminating computations in a pure functional language are referentially transparent, as are all primitive expressions in C that do not involve pointers. In general, a computation is side-effecting or effectful whenever it is not referentially transparent, so that its meaning in the context of the program can only be determined by actually performing the computation. It should be noted that the relatively academic terms “referential transparency” and “effectful computation” are very closely related to the notion of orthogonality as familiar to experienced programmers and software engineers. While these terms don’t quite coincide (mostly due to the fact that the former have very precise technical meanings, while the latter is a much higher-level design concept), monadic style is a powerful tool for enforcing orthogonality by making syntactically and semantically explicit the separation between the different purposes and consequences of different computations.
Good design dictates that a distinct block of code should do just one thing, and that it should be clear what that one thing is. Side-effects are anything other than the result that must be considered when running that code. The virtue of monads as a programming tool is in their ability to clearly distinguish effects, and thus factor computations into actions that very clearly do one thing or another.
A First Example: Exceptional Conditions
Even in a pure lambda calculus with a strong type system, anomalous things can happen. Take as an example one of the simplest and most familiar kinds of anomalous conditions, the old divide-by-zero error. Divide-by-zero is a possibility in any language with the usual complement of arithmetic operations, and since the result of a zero-division is undefined, any subsequent computation that uses that result is also undefined. For this reason, processors are usually designed to throw a hardware exception in the event of a divide-by-zero. However, it’s also possible to write programs that internally catch and throw their own divide-by-zero faults and thus circumvent the low-level code that would trigger a hardware exception. In the hardware case, some line in the circuit checks for a zero in one of the operands of a divide instruction and sets an exception if an attempt is made to execute the instruction. In the software case, the program itself examines the data passed to a routine where a division may take place, and returns a special error value if a zero is passed in at the wrong place. Intuitively, both software and hardware divide-by-zero exceptions have something in common: they are both consequences of running a computation, but are not themselves the defined result of that computation. In both cases, the comptutation produces something apart from its defined output value.
An easy way to modify functional programs to handle anomalous conditions, e.g. divide by zero, is to lift susceptible functions into a special kind of sum type:
data Maybe a = Just a | Nothing
where the constructor Just specifies an ordinary value, and Nothing stands for the distinguished error condition. Thus, if f : Int -> Int, and f performs a division with its argument, f can be modified to something equivalent to:
f’ x = if (x == 0) then Nothing else Just (f x)
The new function f’ knows that f will create an anomalous condition if passed a zero, and so circumvents f entirely if this condition is detected. However, this doesn’t quite solve the problem of specifying how a program should behave in the presence of an anomalous condition: the value resulting from an anomalous condition will have to propapate upward through the call tree. While f’ can screen f from anomalous inputs and set an error condition, this error condition will have to be passed to whatever function calls f’, and so on, possibly all the way up to the top-level call. Any or all of these callers may in turn have to handle the error value that originates with f’.
If it is not quite clear that other changes will be necessary throughout the program, consider what the type judgments throughout the program will look like as a result of the change from f to f’. For example, if
g :: Int -> Int
then for any value v,
g (f v)
is well-typed, but
g (f’ v)
is not well-typed, as g is being passed the wrong type of argument.
The difficulty here is that f may do more than one thing; besides doing some arithmetic, f may also cause an error. Although f can be changed into f’, the change will necessitate other changes throughout the program to reflect the possible propagation of the effects of the error state. What we want is a systematic way to map a “pure” i.e. non-effectful program (one that uses f) into one that can account for a divide-by-zero error (one that uses f’) without a lot of extra fuss.
The first thing such a transformation will need is some way of distinguishing computations causing or depending upon effects from computations independent of effects, since only some parts of the program will need to be changed. The data type Maybe provides exactly this with its Just constructor; Just indicates that its argument can be considered as-is because it does not set the error condition. Conversely, anything in Maybe that is not wrapped by Just must cause the error condition. The distinction between effecting and non-effecting computations is thus present in the discrimination between the constructors of the sum type Maybe.
The next thing the transformation will need is some way of specifying how effects propagate through program execution. Recall that this is important because the consequences of an error condition will not necessarily be localized to the function in which it occurs. For these purposes, let’s adopt the simple convention that the error condition is unrecoverable, i.e. that once it has been set, no other function can unset it. Since we have already accepted that Just indicates that a result has been produced free from error conditions, it is relatively straightforward to transform individual functions, using the transformation of f to f’ as above. If g is any function of suitable type and v any value of suitable type, and assuming a call-by-value evaluation strategy, the following transformation should then be performed:
⌈ g (f v) ⌉ = ⌈ f v ⌉ >>= \ v’ -> ⌈ g v’⌉
where the infix operator >>= is defined as:
x >>= f =
case x of
Just u -> f u
Nothing -> Nothing
Take a moment to really understand these definitions. Since evaluation is performed in a call-by-value fashion, f v should be evaluated before applying g, hence the placement of f v on the left-hand-side of the >>= operator. The >>= operator itself will check the result of f v before proceeding with execution, propagating the error condition forward if it is present, and passing through the value normaly if it has not.
The very fastidious reader (or just one already accustomed to monads) may have noted that the transformation ⌈-⌉ above also could have been written:
⌈ g (f v) ⌉ = ⌈ f v ⌉ >>= g
This is fundamentally the same, however, since g is eta-equivalent to \ v’ -> g v’. Including the extra parameter just happens to be a habit of my own, as its presence emphasizes the passing of data between actions, and because it may often be useful to have lexically bound for use in other actions in scope.
What the transformation above provides is a way to add the effect of an error condition to functions that should be considered susceptible, together with a systematic (albeit quite simple) way of propagating any effects through the program without disturbing non-effecting results. A monad is simply a concept of effectfulness (in this case, error condtions) together with a way to distinguish non-effecting computations (in this case the constructors of Maybe) and a way of sequentially propagating effects forward (in this case, the >>= operator). In general, how exactly all of these things are defined will depend upon what notion of effectfulness is chosen. The effect of exceptional conditions embodied by the Maybe monad is one, but it is not the only one, as we shall see.
A Familiar Example: L’Etat C’est Moi
Many programmers migrating from an imperative language to their first functional language are perplexed and frustrated by the absence of anything like assignments. Imperative programming makes very liberal use of state transitions, with very large and complex states changing in complicated ways each time a statement is executed. This is useful for some applications, but not so helpful with others. Good functional programming style tries to isolate uses of state so that possible states are relatively small and sparse, and their transitions are very clear. While ML and Scheme concede reference types that do allow for reading and writing mutable storage locations, accepted practice makes sparse use of these. Haskell, for its part, does not incorporate reference types at all. While there are good reasons for this aversion, it is nonetheless sometimes convenient to have a state that can be read and written as execution proceeds. For example, it is often quite useful to incorporate a counter into your program. While functional style does not allow for direct assignments, its usual variable-binding conventions can nonetheless be harnessed to accomplish the same end.
The naive way to implement a mutable state without references is simply to thread an additional argument through all functions in your program, while making sure that each function returns the new value of the state in addition to its usual return value. While this should work in principle, it creates a real mess in the code, and it is very easy to write a function call that is passed the wrong state argument, which will lead to all kinds of subtle bugs. While you’re welcome (perhaps even encouraged) to go and try writing a non-trivial program this way, you won’t enjoy the experience. As is usually the case in programming, a little bit of careful thought about what you’re actually trying to do will lead to a much better solution.
Suppose we want to implement a program that contains a global counter of type Int. The intuition that state should be threaded through all function calls and returns is basically correct. However, a close inspection will reveal that there is a pattern to how functions are augmented to receive and return state. Firstly, the function needs to be passed an extra argument that binds the current state. Secondly, the function needs to return the new value of the state together with its result value. Although not at first obvious, this can be neatly summarized in a single binary operation on a counter-transition datatype:
type Counter a = Int -> (a, Int)
s >>= f = \c ->
let (v, c’) = s c
in
(f v) c’
(This is a different definition than the >>= presented for Maybe above. The reuse of the notation is deliberate, but don’t be confused.)
The datatype Counter reflects a lifting of “pure” functional values into a setting where they occur together with transitions on the counter state. That is, any value may come with a side-effect on the state. The >>= operator uses this datatype to formalize the procedure of passing in the state argument (from a state transition) to a function and returning a new one with its result. This is actually much less complicated than it appears. Firstly, the lambda that binds the parameter c specifies the extra argument that must be passed in as the current state. Next, the state transition s is applied to that state c to obtain a value and a new state. If f is a side-effecting function, i.e. one that produces not just a value but also a new counter state, then f is passed the value v resulting from the transition s. Closely reading the types, it should be clear that the result of applying f to the value v has type Counter b which is identical to Int -> (b, Int). Thus, (f v) can be applied to the state c’ resulting from the transition x to obtain a tuple consisting of a value and a new state. Take a moment to think through all of this and understand why it should be.
In the case of Maybe, we saw why it was important to specify how effects propagate through the program. The very essence of a mutable state is that effects must propagate globally in the background. The >>= operator specifies how exactly this should work. However, notice that Counter is not a simple sum-type like Maybe; discriminating between side-effecting and non-effecting computations is not as easy as simply discriminating between constructors, as with Just and Nothing. There is, nonetheless, an easy way to do so. Firstly, we must specify which computations have no effect. Since there is no Just constructor, let’s introduce a unary operator return that simply injects an arbitrary value into the stream of state transitions without altering the state. This is easy:
return x = \n -> (x, n)
The transition return is trivial; the state is unaffected. Meanwhile, an arbitrary pure value can be placed in the value-slot of the tuple.
This leaves unanswered the question of what exactly should be done to specify side-effecting computations. However, this matter is open to much more interpretation. While there will always be some need for computations that do just one thing, i.e. that are functionally pure, side-effects may take on a wide array of forms depending upon what’s necessary and convenient. Thus, the actual application in question needs to be considered.
We’ll likely need to read the current value of the counter, which is easy to construct as:
get = \n -> (n, n)
This little function demonstrates an important but somewhat subtle feature of the sequencing operation >>=; the value of state remains in the background unless it is deliberately moved to the foreground as a value. The function get does exactly this, allowing for the value of the counter to be passed to another possibly side-effecting function using >>=.
We’ll certainly want to change the value of the counter as well, otherwise nothing can actually be counted. One way to define this behavior is:
put v = \_ -> ((), v)
(Note that I use Haskell’s notation for a function with a dummy parameter, as ’_’; this is only to stress that changing the state in this coarse way takes no account of the preceding state.)
The function put simply replaces the current state value with a new state. While this may be a useful general-purpose operation, counters are much more often incremented or decremented. As such, we might choose to use inc and dec functions instead of a general-purpose put:
inc = \n -> ((), n + 1)
dec :: Counter ()
dec = \n -> ((), n - 1)
To illustrate how >>= works, notice that these also could have been defined using get and put:
inc’ = get >>= \n -> put (n + 1)
dec’ :: Counter ()
dec’ = get >>= \n -> put (n - 1)
I should emphasize, again, that these are only possible features to throw in; they are chosen and defined as may or may not be expedient. Only return will be necessary in order to ground the distinction between side-effecting and non-effecting computations.
What we now have is a nice way of threading a counter through a program without passing around a lot of extra stuff. Any computations independent of the counter simply must be wrapped by a return in order to be included in the stream of state transitions. What may not, however, be obvious is what the initial value of the counter will be, or how the final result will be obtained. The attentive reader may have noticed that the actions above specify what will happen, but do not actually cause it to happen. The result of chaining many state transitions with >>=, after all, will simply be an action of type Counter a, which is itself a function. In order to actually run the effecting computations, the function must be applied to an initial counter value. This is easy to specify:
run x n = x n
which will return a value and a final counter state. Note, however, that if the value in the left-hand component of the tuple is removed and used in some other, arbitrary functional expression, no information about the effects in x would be propagated. Running the action is necessary to obtain a result, but also means that pure values may escape from the effectful setting in which they were produced.
What we’ve just seen here is a special case of the common state monad. The state monad is virtually identical to the Counter construction above, except that it parameterizes over the type of the state, as:
type State s a = s -> (a, s)
Observing that Counter a is identical to State Int a, it should be quite easy to see how all of the constructions above can then be generalized.
Bringing Together the Meaning of ’Monad’
Thus far, we’ve proceeded using only examples. An effort has been made to allude to some of the patterns and important features of a monad without having to state them in an artificial way. The motivation behind this decision is that, while the formal definition of a monad has deep consequences for its behavior and usefulness, its relation to any sort of concrete meaning or usefulness is quite obscure. That said, it’s important to understand this definition. Hopefully, with the two examples above, the utility of this definition will become clear.
Formally, a monad consists of three things: a functor m on types, a return :: a -> m a (or ’unit’) operator and a >>= :: m a -> (a -> m b) -> m b (or ’bind’) operator. The operators return and >>= must also satisfy the following laws:
| return v >>= f | = | f v | (left-unit) |
| x >>= return | = | x | (right-unit) |
| (x >>= f) >>=g | = | x >>= (λv -> (f v) >>=g) | (associativity) |
Any construction with those three pieces, satisfying those three conditions, is a monad. While this seems highly abstract, the abstraction is of a good kind; it provides a very general framework for describing a very wide range of computational behaviors. To see how exactly this works, it suffices to return to the examples above, and see how they satisfy these definitions.
The purpose of the functor m is to relate pure functional types to some suitable counterpart that reflects the possibility of side-effects. In the case of Maybe, the functor m takes an ordinary type a to a sum-type a + Nothing. In the case of Counter (or State Int if you prefer), the functor takes a type a to a state transition type as Int -> (Int, a).
The purpose of return is to specify computations that are effect-free; this allows ordinary functional computations to interface with the side-effecting constructions. The return operation for Counter is just what was specified above. For Maybe, it is even simpler; the constructor Just behaves in exactly the right way. That is, for Maybe, return x = Just x.
The purpose of >>= is to sequence computations in a way that specifies how effects propagate. Definitions for >>= in both Maybe and Counter were given above; the reader is encouraged to go back and momentarily review them.
What is most important about all these parts, however, is how they fit together. This aspect is what makes constructing a new monad relatively difficult; there are many possible ways that m, return and >>= could be defined, but it is how exactly they relate that makes them into a monad. The nature of this relationship is expressed in the three monad laws above. These laws, however are not purely for the sake of formalism, as they describe exactly the same intuitions we used to construct Maybe and Counter above. A construction that did not satisfy them would, in some concrete sense, fail to realize the necessary notions of the distinction between side-effecting and non-effecting computations, and the expected way in which effects propagate.
First let’s consider the two unit laws. When considering return in the examples above, it was important that return clearly distinguish computations with no effects. Since return v has no effects, there should be nothing to propagate forward when return v is sequenced on the left-hand side of a >>=. Thus, it should be possible simply to cancel out the trivial left-hand side of such a >>=, and consider only the value to be passed to f. This behavior is what the left-unit law expresses. Similarly, since return has no side-effects, it should cancel on the right-hand-side of >>= as well, since it has no effects to add to whatever action precedes it.
Note, however, that not all returns will cancel as the result of a right-unit. Consider, for example:
get >>= \ v -> return (v + 1)
Although return appears to the right of the >>=, the function that properly serves as the right-hand operand to >>= is not simply return; it is (\ v -> return (v + 1)), which is not the same function. Thus the above cannot be simplified to just get. The reader is encouraged to work this through to see why.
The important point, however, is just that the two unit laws formalize the intuition that return must induce no side-effects in terms of the effect-sequencing defined by >>=. This feature is necessary if a clean distinction between side-effecting and non-effecting computations is to be possible. If return couldn’t be erased from a sequence of >>= -chained actions, that would necessarily mean that it had some effect that had to be carried forward, or that depended upon a previous side-effect.
The law of associativity of >>= is less clear, but no less important. Associativity is essential to any meaningful notion of sequentiality, in which one action can be said to precede or follow another in a well-defined way. While the relation between the two may not seem quite clear, recall that parentheses indicate order of operation, and >>= is a binary operator that produces a new action from two existing actions. Writing out a sequence of binds specifies a sequential composition of side-effecting actions. If one pair of actions is parenthesized, then they must be composed first. If the ordering of these compositions matters in addition to the ordering of actions specified by the left-to-right order of the binds themselves, i.e. if associativity does not hold, then >>= would have some effect on its arguments other than simply propagating a single effect forward through execution.
To see why this is the case, consider the Maybe monad above, and suppose that u, is an action in Maybe such that:
(u >>= \ x -> Nothing) >>= \ y -> u ≠ u >>= (\ x -> Nothing >>= \ y -> u)
Reducing the parenthesized binds on the LHS and RHS, this means that:
Nothing >>= \ y -> u ≠ u >>= \ x -> Nothing
which would mean that following a computation with an uncaught sticky exception has a different effect than preceding that computation with a sticky exception. Such a situation does not really make sense. A consequence of this kind of behavior would be that >>= could do something more than just propagate effects forward to the next action; it would also be free to take into account arbitrary other features of successive computations in deciding the result of composing its two arguments. In this case, >>= would not merely carry the effect of an exceptional condition from one action to another; it would have some mysterious, ill-defined power to anticipate an exception in its RHS argument and catch it before it had even happened. What intuition that should correspond to, I am not sure, but it has nothing to do with side-effects as they are generally understood.
What this all comes down to is that the formal definition of a monad specifies a way of lifting a computation into a setting where side-effects may occur at any point in execution, together with well-defined notions of what it means for a computation to occur with no side-effect, and a well-defined notion of what it means for effects to propagate from one action to the next. Some of the difficulty of understanding what exactly a monad is may come from the fact that it captures not just a single data structure or type, but a particular relationship between a data structure, a distinction between pure and side-effecting computations, and a concept of sequential composition. The goal here has been to illustrate that this construction is not artificial, but is rooted in fairly concrete and familiar computer science concepts that it just happens to articulate in a very powerful and flexible way.
And Beyond
The above presentation was driven by examples, but restricted itself to only two. The reason is that there are a quite a few other monads,but most of them rely upon powerful notions with a steep learning curve. The best example of this is the continuation monad:
return x = \ k . k x
x >>= f = \ k . x (\ y . (f a) k))
which makes control-flow itself an effect, so that any computation may have the effect of radically changing the course of execution. As it happens, continuations control flow that is so flexible (perhaps too flexible) that it surpasses even the expressiveness of unrestricted GOTO statements. One way to view continuations, informally, is as a kind of state monad for which executable code itself is the type of the state.
A number of other useful monads are: the resumption monad, a restricted form of continuation; the environment monad, a formalization of the idea of lexical scoping of variables which happens to be applicable to other kinds of nested context behaviors; the parser monad, a concept first proposed by Graham Hutton that later found its way into the powerful and quite useful Parsec library for constructing parsers in Haskell.
The really sharp reader may have also conjectured whether perhaps monads can be combined with one another. As it happens, they can. A down-to-earth explanation of how this happens or what it means is well beyond the scope of the current presentation, but the interested reader should consult Haskell’s monad transformer libraries or Sheng Liang’s “Monad Transformers and Modular Interpreters” for a detailed treatment. In short, monad transformers act to layer different kind of effects over one another, which happens to be quite powerful, though challenging to get one’s head around at first.
The main object, however, has been to present the concept of a monad in a motivated way that presumes minimal prerequisites. Monads turn out to be quite useful in functional programming practice, but their adoption is very arguably hampered by the difficulty of understanding what exactly monads are and how they work. Many practical explanations correctly detail how to use monads, but give little notion of why monads are a suitable structuring technique, or why they should behave as they do, while theoretical explanations require so much background as to be over the heads of most non-specialists. It’s my sincere hope that the reader has found this explanation to lie comfortably between the two, providing just enough examples to remain concrete but just enough generality to make clear the motivation behind the connections made. Thanks for reading.
Four Qualities of a Good Theory
Posted: 2011/09/02 Filed under: conceptual explanations, interesting questions | Tags: epistemology, rationalism, theory of computation Leave a comment »Generally speaking, there are two steps to using any model:
The first step involves knowledge of the particular situation; the second step involves knowledge of the model. A useful analogy is made with raw materials and the tools used to craft them into something: the situation itself is the raw material, the model or theory is the tool, and the end-product is some information. As is to be expected, the quality of the product will depend significantly on the quality of the material, the power of the tools, and the skill of the craftsman who wields them.
The important difference between physical and intellectual work, however, is that intellectual work may produce “tools” that are not at all useful. It might be argued that this difference is not an essential one; there is, after all, no barrier to the manufacture of implements that function in purely imaginary modes. (One real example of this phenomenon is the production of “fantasy weapons”, e.g. extremely ornate blades that look very impressive but carry very little utility as real aids to attack or defense, and were probably never intended to provide such.) That said, intellectual work is able to produce its tools quite quickly, and some of these tools defy any practical attempts to definitively prove their usefulness. Rather than bring down the ire of any one discipline by making all the usual accusations that their basic theories are airy-yet-crude blunders, I’d like to constructively examine the question of what makes for a good useful theory. Here are four simple criteria to consider:
A good theory makes its inapplicability promptly and unambiguously known. This might be the most important feature a theory can have. There is always a very strong temptation to become so enamored with a theory that it becomes difficult to distinguish an elegant demonstration from a completely insubstantial fantasy. Physical chemistry, with its need for a huge plethora of quick-and-dirty theories seems to be quite apt at producing models that speak their applicability up-front and neatly hand off control to their alternatives when their presuppositions fail. Different models predict significantly different behaviors given different spatial scales and different temperature and pressure conditions, sometimes radically. While such a diversity of different views might seem cluttered and confusing from the perspective of assimilating the knowledge of the discipline, it is nonetheless quite easy to determine which model applies to a given situation. Assumptions (e.g. “this system behaves as an Ideal Gas”) are very clear from the start, and even though they incorporate known and deliberate approximations, these are accepted in a way that understands the imprecisions and their consequences.
A good theory approximates objects, not their relationships. An outstanding example of this feature comes from, of all places, political philosophy. John Rawl’s theory of justice (as constructed in his famous book of the same name) proceeds from an extremely idealized view of individual humans and the origins of social organization, as do virtually all other political-philosophical arguments. “A Theory of Justice”, however, stands out as perhaps the most compelling political argument of the 20th century; Rawls became famous following this work, and virtually every other theory that followed was compelled to address Rawls’ Theory in some form. Nonetheless, other theorists very extensively criticized Rawls for some unrealistic features of his model, specifically the extremely strong risk-aversion individual agents are assumed to display. These criticisms, though well-founded and justified, made Rawls’ Theory no less compelling. The reason is that Rawls’ Theory, though it overstates individual aversion to risk, very precisely captures the way in which individuals evaluate their position relative to others in real societies. This emphasis on relations between individuals stands in sharp contrast to traditional Utilitarianism, which presumes that individuals will assent to any social contract that maximizes net social welfare with no consideration as to how they will fare personally, a presumption very clearly at odds with reality.
A good theory tells you what it can’t tell you. A theory that incorporates its own limits can very rapidly and efficiently prune away lines of inquiry that are essentially fruitless. The example par excellance comes from the classical theory of computation, with its results on formally undecidable propositions. A beautiful example of this dynamic at work comes from the use of Turing’s famous result to demonstrate in just a few lines the undecidability of a static information flow safety analysis. While practitioners don’t very frequently encounter results from Godel, Turing, Church, Post, or Skolem, it is arguably because the theoretical foundation of computation very quickly and firmly establishes the limits of what can and can’t be done, so that engineers need never be visited by the insidious temptation to construct the unconstructible.
A good theory rapidly makes new predictions from old predictions. This criterion applies to how much uncertainty is introduced by applying a theory, or alternately, to what degree a theory lends itself to computational procedure. It is precisely this feature that accounts for the unparalleled success of Newton’s mathematization of physics. Translating observed phenomena into readily transformable symbolic representations allows inferences to be easily composed with one another, which means that a theory can readily build on its own successes. While there is some danger that concrete realities will not fit well with their symbolic outlines, i.e. that failures will also build on failures, a theory that can rapidly turn over its findings into new findings will have the opportunity to propagate errors forward in a way that will eventually become conspicuous, and hopefully diagnosable. By contrast, a theory that cannot readily incorporate its own predictions as antecedents to new inferences is more likely to function as a kind of myth or parable than as a real producer of knowledge. While it’s essential to have a conceptual foundation for considering any phenomenon, and while such a foundation is a necessary condition for a theory, it’s easy to see that a theory, as considered here, is more than just a framing device.
While a definitive breakdown of what makes for a good theory is certainly an appealing goal, this short exposition is intended more as an exploration of the issues than as any sort of final word. Much, no doubt, has already been said on the subject. While this may seem too general a subject to consider for a computer science blog, it’s worth reflecting upon for the simple fact that computer science is presently faced with the temptations of a lot of new theories. Unfortunately, few of these new theories have gained any kind of wide use or acceptance outside of academic circles for the simple reason that they have thus far failed to demonstrate their usefulness in any compelling way. I would emphasize, once again, that this is especially the case for security. A good theory of security, hopefully, can make is applicability clearly known, precisely describe relationships between its agents, make clear the fundamental limitations of security (i.e. articulate the existence of fundamental insecurity), and draw useful conclusions.
Work in Progress: IFL 2011
Posted: 2011/08/31 Filed under: news | Tags: conferences, publications Leave a comment »I’ve been unable to post normally for the last couple weeks, as I’ve been working feverishly on a submission to 23rd Symposium on Implementation and Application of Functional Languages (IFL’11). Although I probably shouldn’t discuss the details of the paper ahead of its review, I will hopefully be able to resume writing here on the usual variety of other topics, with some inspiration from all the work I’ve put in on this result.
If you do happen to be attending IFL in Lawrence, Kansas this October, be sure to look for my talk. Otherwise, I look forward to seeing you here!
–s
Resumptions: A Syntax for Trace Semantics?
Posted: 2011/08/17 Filed under: interesting questions | Tags: formal methods, monads, resumptions, speculations Leave a comment »The resumption monad is a surprisingly powerful specification tool with a fairly distinguished history in the literature. Resumptions go back to foundational work in concurrency by Gordon Plotkin, and were subsequently incorporated into Eugenio Moggi’s monadic theory of programming languages. Andrezj Filinski made essential use of the resumption monad in his important work on efficiently representing layered monads. Nikolaos Papaspyrou took resumptions to their logical conclusion in using them to formulate a denotational semantics for a simple imperative language with concurrency and non-determinism. William L. Harrison used resumptions as an expressive, modular framework for specifying the basic scheduling and handling behaviors of a kernel.
The use of resumptions, however, remains something of a niche interest in programming languages. It’s well recognized that resumptions are useful for representing concurrency, but concurrency is a widely studied problem, and its literature already boasts a great many competing theories and calculi. Given the simplicity and power of resumptions, it seems fair to ask whether they may be useful for something more than just describing concurrent behaviors.
Resumptions come in two kinds, and the interaction of these two provides some interesting insight into whether they provide some more general expressiveness. This speculation is partly inspired by Harrison’s work on kernels; any framework powerful enough to specify such an essential low level behavior must have some interesting feature. (Disclosure: Bill is my PhD advisor, and so it is only natural that much of my thinking on this matter has been influenced by his work.) Informally, a kernel can be factored into three major parts: a scheduler, an interrupt handler, and some data structure describing the global state of the system. Harrison’s work employs the relation between the resumption and reactive resumption monads to model the interaction of the scheduler and handler in specifying the set of permitted system executions. It is no stretch at all to observe that assertions of liveness (or lack thereof) are inherent in any scheduler, and it is scarcely any more of a stretch to suggest that the request-response signals used to construct a reactive resumption monad entail a safety property with respect to the threads or processes that employ these signals.
I mentioned in the preceding post that it was known that, in general, all properties can be specified as an intersection of a safety property and a liveness property. If the analogy beteen resumptions to liveness properties and reactive resumptions as liveness properties is indeed a sound one (which is speculative at this point), then the handler-scheduler framework may have very useful expressive power that goes well beyond the construction of kernels. This would not be entirely surprising; in the above-mentioned work, Filinski demonstrates a transformation from an arbitrary layered monad (with a few minor constraints) into a layered resumption-state monad, and any structure expressive enough to allow the powerful control flow behaviors of anything like a scheduler is likely to describe a very wide range of behaviors. One could argue that continuations already provide exactly this amount of expressiveness, but even granting that this is so, they entail a considerable (though not impossible) verification overhead and do not separate behaviors into clearly distinct liveness and safety components.
The ability to give a syntax to the purely semantic trace-based models of traditional formal methods is a fairly attractive prospect. The literature is rife with papers that construct traces, runs, behaviors and all the like, and then proceed prove all kinds of useful properties about these objects. While this work is interesting, it is difficult to apply for the simple reason that they give no sort of language for specifying the objects that they manipulate; real programming does not operate by constructing sets of traces, and it never has. It is interesting to speculate what formal methods results could be realistically applied given an implementation in terms of resumption-monadic specifications.
Further Reading
Andrezj Filinski. Representing Layered Monads. Proceedings of the 26h ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 175-188, 1999.
William L. Harrison. The Essence of Multitasking. Proceedings of 11th International Conference on Algebraic Methdology and Software Technology, LNCS 4019, pages 158-172, 2006.
Eugenio Moggi. An Abstract View of Programming Languages. Tech Report ECS-LFCS-90-113, Department of Computer Science, Edinburgh University, 1990.
Nikos Papaspyrou. A Resumption Monad Transformer and its Applications in the Semantics of Concurrency, Proceedings of the 3rd Panhellenic Logic Symposium, 2001.
Gordon Plotkin. A Powerdomain Construction. SIAM Journal of Computation, volume 5, pages 452-487, 1976.
Security Practice is Not Applied Security Theory, Part 2
Posted: 2011/08/11 Filed under: conceptual explanations, interesting questions | Tags: formal methods, information theory, rationalism, security, specification Leave a comment »Suppose, for the sake of argument, that security is a system property that can always be specified and implemented. Any model of security entails some notion of misuse of the system, and the very possibility of such misuse implies that the system’s basic structure, as specified and implemented, admits some kinds of use that are simply undesired. An important question is whether unauthorized use differs from authorized use in an intrinsic way, i.e. in a way that can be defined purely in terms of the system itself. Security is the problem of allowing use while preventing misuse, which means that security can be specified and implemented only to the extent that authorized and unauthorized uses can be clearly discriminated, either statically or at runtime. If security can always be statically specified and implemented, then it should be possible to construct, even if only in the abstract, ideal systems that can identify and block misuse.
Informally, a system consists of a (possibly infinite) set of possible states, together with a (possibly infinite) set of transitions between those states. Following the well-established and fairly common-sense convention of the formal methods literature, a system specification can be split into two properties: the machine property which describes how the system will function, and a supplemental property that describes any additional constraints that the system is expected to satisfy, which are not necessarily provided for by the machine property. Given a set of initial states, these two properties may be thought of as sets of paths through a directed graph that describes all possible runtime behaviors of the system. More precisely, the machine property describes the space of all possible behaviors, i.e. what the machine itself is actually capable of doing, while the supplementary property further constrains this behavior by excluding some paths in the machine property. Using this vocabulary, the question of whether security is always specifiable and implementable can be phrased as a question of whether security properties can always be reduced to machine properties, or whether a supplemental property is sometimes necessary.
Suppose that S is some system and that P is a security property that S is expected to satisfy. It is known that any property P can be written as the intersection of a safety property and a liveness property. (“The Existence of Refinement Mappings”, Martin Abadi and Leslie Lamport, 1988.) Safety describes a subset of a states in S that are allowable and specifies that execution will only visit members of this subset. Alternately, safety may be thought of as a requirement that certain states are never reached, i.e. that certain system events never take place. Liveness, on the other hand, describes a subset of states in S that must eventually be reached by any execution of S. That is, liveness stipulates that there are events in S that may not be delayed forever. The security property P is thus reducible to a joint guarantee that certain events in S will never happen, while other events in S will always happen eventually. As a consequence, S is secure if and only if certain events in the state space of S never happen, and if certain other events in the state space of S can never be infinitely delayed.
The question to answer is whether it is always possible to transform a possibly insecure system S (i.e. a system that may not satisfy P) into a definitely secure system S’ (i.e. a system that does satisfy P). As a consequence of the fact that P is the intersection of a safety property and a liveness property, S fails to satisfy P if and only if at least one of the following holds:
- There is some execution of S that reaches a state not allowed by P;
- There is some execution of S that never reaches a state required by P.
Assume that S is supplied with at least one non-deterministic input at some stage in its execution. Denote this input by x and let ! denote the set of externally provided inputs to S that may result in either (1) or (2). Since x is non-deterministic, its value can only be determined at runtime. Therefore, S can only satisfy P with certainty if the value of x is checked at runtime before any event that x might cause to transition to a forbidden state or an infinite delay.
Suppose first that (1) holds, and let F be any state in S such that if x is a member of !, the next transition will reach a state F’ that is prohibited by P. Since x is nondeterministic, S can only satisfy P in general if some state is inserted in S that checks the value of x and branches away to an error handler in the event that x is in !. Call this state G. The new state G can only ensure that F’ is never reached if it is possible to compute whether a given input is a member of ! or not, i.e. whether there is a decision procedure that fully describes the set ! of inputs that violate the security property. The question is now whether there is any set of inputs ! that cannot be described by a decision procedure on the set of inputs to S.
The answer to this question is: yes, there is such a set. If the state space of S is infinite, then any infinite subset of states that is selected from the state space by a genuinely random process will be impossible to describe as a decision procedure. A corrollary to this fact is that the quantitative difficulty of preventing the event F’ will correspond to the entropy of the set ! of bad inputs, with impossibility corresponding to the case of true randomness. For the purpose of this argument, however, the existence of a set of inputs to S that cannot be described by a decision procedure implies that, in general, G may fail to transition to either F or to an error handler, since the procedure to determine whether x is in ! may not terminate. Even if G is inserted into the execution graph of S and an attempt made to compute whether x is in ! or not, the computation may not terminate, which in turn will violate any liveness condition included in P. If G is not inserted into the execution graph of S, then some x in ! may be provided as input, resulting in a transition to F’, which in turn violates the safety component of P. If S cannot be transformed in a way that satisfies P for all possible inputs, then S cannot implement P, which means that there is at least one input x that causes an unintended or unauthorized behavior in S.
An important conclusion of this informal argument can be summarized in the following informal theorem:
Theorem (Existence of Fundamentally Insecure Systems): Given some system S and a security property P on S,
- If S takes nondeterministic inputs then S satisfies P only if there is a decision procedure for computing whether an input will violate P.
- There exists a system S and a security property P such that there is no system with functionality identical to that of S satisfies P.
It is worth noting that the fact that P is a security property is in no way essential to this argument; P could be any expressible property.
What has (arguably) been shown is that is is possible for a security property to attempt to constrain system execution in a way that is impossible to implement. A close corollary would seem to be that relative difficulty of securing a system is directly related to the information complexity (read: entropy) of the set of inputs that might cause behaviors deemed insecure. Viewing security this way, it is easy to see why, for instance buffer overflows are simple to prevent: it is always possible, and quite straightforward, to compute the length of a string. It is also easy to see why it is so difficult to prevent malicious programs from being installed on a computer: although malcious programs tend to eventually reveal their malicious intent, and some suspcious behaviors can be detected heurstically, there is no general procedure for deciding whether any given program will eventually do something harmful or forbidden.
In the prequel to this article, I made the argument that security theory and practice oftentimes seem to diverge in their respective approaches because some security problems are hard in an essential way that requires, or at least favors, a heuristic approach. The knowledge that some security properties are generally undecidable is not new; Denning and Denning showed that the detection of information leakage is not in general decidable way back in 1976. This knowledge notwithstanding, relatively little has been said about what makes some kinds of security properties hard to ensure. Traditional formal methods and security theory have usually emphasized two kinds of challenges: identifying and constructing security properties that can be decided, and designing tools and methodologies that facillitate the verifcation or at least the approximation of these properties. This focus, together with the findings above, partly explains why so many formal methods tools (e.g. Coq) are so fiendishly complex in their construction and so thoroughly unusable by non-experts; the trade-off of verifying strong assurances (e.g. decidable inference of dependent types) is a correspondingly complex and elaborate computation that is capable of excluding bad inputs. Arguably, the above findings also partly explain why security practice has come to favor patch-and-penetrate approaches: some classes of security problems (e.g. preventing unauthorized priviledge escalation) are very difficult and perhaps even impossible to enclose within a finite description. What this means is that security theory and practice tend to differ not because one is simply more reactive than the other, but because their traditional assumptions and methods tend to favor different classes of problems.
Security is an old problem and a basic problem, and has long been known to be a very hard problem as well. Relatively little attention has been paid to detailing precisely why security is hard problem, or what characterizes security breaches in general. I have tried in this space to demonstrate two facts, one unsurprising and one quite contentious: firstly, that security is basically a problem of identifying and screening a certain set of inputs to the system, and secondly that the complexity of this set gives a reliable measure of the hardness of a security problem. The existence of input sets with maximal entropy implies that there are some properties that a system can never be modified to satisfy. While this by no means implies that security is an impossible or unsolvable problem, it does indicate that there are strict limits to what can and cannot be secured against.