kena

On the Turing-Completeness of C

In address, lecture on November 18, 2012 at 12:00

On November 17th, 2012, a link to a Brainfuck interpreter written using the C preprocessor language was posted to Hacker News. In the necessary flurry of HN comments that followed, the question of CPP’s Turing Completeness came up, to be answered negatively: the proposed interpreter needs to be extended manually for each possible loop upper bound, which makes it barely equivalent to Hofstadter’s BlooP language from Gödel, Escher, Bach.

However a gem was hidden in the midst of these comments: is the C language Turing-Complete?

Despite the appearance of controversy, with some commenters leaning on the side of “yes” and some other on the side of “no,” there is only one correct answer from the perspective of the language specification. But it’s not intuitive.

EDIT (Nov 20th): following advice from both Scott and commenter E.G. Daylight, I have followed up with a “cleanup post” that summarizes the argument and focuses the discussion below.

❦❦❦

As commenter Scott Schneider puts it, the answer can only phrased after one disentangles the language from the implementation:

Turing completeness is about separating the expression of a computation from the machinery that carries out the computation. If your expression language can be mapped to the idealized Turing machine, then your expression language can express all computations that can be computed. (That is, it is Turing Complete.) This allows us to reason about the expression of computation outside of the machinery that carries it out – because the machinery will always have limits.

The subtlety is: where do we draw the line between the expression and the machinery?

That is, assuming the question is “is the C preprocessor Turing Complete?” I answer no. Because implied in that question is that the C preprocessor language is the expression language we are interested in, and the C preprocessor is the machinery. The reason why I say that the C preprocessor expression language is not Turing complete is simple: you cannot actually express recursion or unbounded loops. […] Why? Because the simulated loops are bounded, always. If you can’t express arbitrary loops or recursion, then you can’t compute everything that can be computed.

Note that I stressed express. Yes, an actual C program is clearly limited in how much it can recurse, but the language itself can express unbounded recursion.

I followed and agreed to his reasoning so far. C can indeed express unbounded recursion, and therefore is strictly more expressive than the C preprocessor language.

However, is the ability to express unbounded recursions sufficient to be Turing-Complete? Another commenter worries that the fixed width of pointers in C amounts to fixing the amount of storage reachable from any C program, and thus losing Turing Completeness. Scott follows up with:

You don’t need to reason about C’s pointers to consider [it’s] Turing Complete. That it allows arbitrary storage*, it has conditionals and allows full looping (either unbounded loops or full recursion) is enough. That is, a language that had C’s semantics sans pointers would still be Turing Complete.

* emphasis mine

This argument seems valid at face value. If the language does indeed allow arbitrary storage, it would become possible to implement a bignum library on top of it using only regular recursion and conditionals, and then use integer and pointer types expressed as bignums.

An obstacle remains, however: how do we know that C indeed allows arbitrary storage?

Based on my extensive analysis of the ISO 9899:1999 and 9899:2011 language standards, I would like to highlight this is not trivial to determine.

First of all, the notion of “arbitrary storage” cannot be extracted from the use of C arrays and pointers. The reason is that all pointer types are defined in the language to have a static, fixed width, and so have array indices. This excludes using large arrays or dynamic invocations of “malloc” at run-time to grab storage up to arbitrary sizes.

Arbitrary storage, if such a thing indeed exists in C, would need to come from the ability to define an arbitrary large number of separate objects, each equipped with its own storage. (For clarity, I have summarized what constitutes an object and the distinction with storage in appendix F of this book).

Obviously, we cannot reserve arbitrarily more space by adding extra object definitions in the program’s source. This would not be satisfying: we would need to modify the program every time we need more storage; i.e. for a given program specification the amount of storage would be finite. Turing Completeness can thus only be attained if there exists a way to allocate an arbitrary large number of new objects at run-time starting from a static program specification.

The only mechanism available in C that enables dynamically and arbitrarily growing storage is the automatic storage allocation of local variables in unbounded recursions. But even this is not satisfactory: the local variables in one function activation are not directly visible from another function activation, e.g. the next recursion step. To access one level from the next, one would need to take the address of the local variable and pass a pointer to it. However, since pointers have a statically fixed width, this would in turn again limit the number of different storage slots that can be used/visible at any level.

Instead, possibly, if C was extended with primitives to navigate variables, at run-time, between function activations without using pointers, then one could claim that C supports arbitrary storage and thus Turing Completeness. Unfortunately, that is not the case with standard C.

I have read and heard arguments that the limitation could be “easily lifted” by simply stating that pointers have dynamically varying sizes. Unfortunately, due to the way aggregation is defined in C, the resulting type system would inductively require that all objects have dynamically varying sizes, and thus that all allocations must happen dynamically. The resulting language would be so different from C that I would deem it quite misleading to keep the same name.

❦❦❦

EDIT (Nov 19th): Following an e-mail conversation with Scott, I now realize a shortcoming of my perspective. Scott highlights that the compiler is part of the “machine,” and thus any compiler-specific constants such as the widths of types are not limiting factors to determine Turing-Completeness.

His argument is as follows: suppose one has written a given terminating program which may require, during its execution, up to some arbitrary amount of storage. Without changing that program, it is possible a posteriori to implement a piece of computer hardware and its C compiler that provide enough storage to accommodate this computation. This may require widening the widths of char (via CHAR_BITS) and/or pointers (via size_t), but the program would not need to be modified. Because this is possible, C is indeed Turing-Complete for terminating programs.

The tricky part of this argument is that it only works when considering terminating programs. Terminating programs have this nice property that they have a static upper bound to their storage requirement, which one can determine experimentally by running the program over the desired input with increasing storage sizes, until it “fits.”

The reason why I was mislead in my train of thoughts it that I was considering the wider class of “useful” non-terminating programs, e.g. interactive programs with non-deterministic input. Let us consider, for example, a C program that repeatedly reads a value N from the computer’s keyboard and repeatedly computes the fibonacci sequence and stores it in memory up to N iterations. By definition, this program does not terminate and uses up to N memory slots, with N arbitrarily large at run-time. This program would obviously eventually fail for some inputs on any real computer with a finite memory. But it would work for all possible inputs on a real (theoretical) Turing Machine because a real Turing Machine has an infinite tape.

The question is thus whether one could write such a program in C and compile it to a theoretical Turing Machine. The answer, according to my previous argument, is no: even if we were compiling C to a machine with infinite memory, the finiteness of C’s address space would get in the way.

So the abstract machine of C is equivalent to a Turing Machine when considering only terminating programs, but is not when including also non-terminating programs.

 

Advertisements
  1. That’s a splendid account of a difficult topic. Since I’m very interested in this particular issue myself, I would like to ask a question and make a comment.

    I always assumed that the language C permits access to _any_ part in the computer’s memory. If so, then it should be possible to store and increment an arbitrarily large integer (as long as the computer’s memory remains big enough for the computation at hand) without resorting to mallocs and procedure calls. This, in turn, along with one potentially unbounded `while loop’, suffices to prove that C is a Turing complete language. Since you claim this is not so, I guess my assumption was wrong. Could you elaborate?

    I have written a book about the history of programming languages, and about the influence of Turing’s work in this regard. Chapter 8 covers your account from an historical perspective and in terms of the language Algol, not C. Your readers may thus be interested in my book: http://www.dijkstrascry.com/dawn

    I look forward to hearing from you.

    Many thanks,
    Edgar.

    • Hey Edgard, it’s a pleasure to engage in conversation with you again after two/three years. You may be pleased to know I have cited your paper on Dijkstra’s legacy in my thesis, albeit on a different topic (the struggle between the strive for expressiveness against that of performance).

      But back to the topic at hand.

      Your assumption is wrong at two levels, one maybe less problematic than the other.

      The first error is to assume that C permits access to any part in the computer’s memory. In fact, the C language specification goes to great extents to ensure that you can only access objects in memory that have been properly allocated: a pointer indirection is only a valid expression if the pointer value (the address) is a valid offset within an existing object. An object is then defined to “exist” throughout its lifetime, which is decided by the visibility of its declaration and the allocation rules. As you probably already know, objects are allocated either statically using global-scope named definitions, or as ‘static’ object definitions within C function bodies, or they can be allocated dynamically either by definition in C function bodies (automatic allocation upon function activation), or by calling ‘malloc’. Although it is syntactically valid to construct manually a pointer to an arbitrary address in memory, using this pointer with the indirection operator would be invalid according to the standard C language semantics. So if you play with standard C you are really limited to the storage of the C objects your program defines, either statically or at run-time.

      The other error is to assume that having access to any part of the computer’s memory is sufficient to increment an integer arbitrarily. To be specific, C’s concept of “memory” is a big array of cells, each cell having type ‘char‘. This concept emerges from the specification that very type in C, and thus every object that can be possibly defined, must have (as per the language specification) a representation as an array of char cells. The C language specification then in turn states that each char cell contains precisely CHAR_BITS binary digits, where CHAR_BITS is a compile-time (i.e. static) constant. This implies that each addressable cell in the memory of C’s abstract machine can only store a finite set of values. To compute and store arbitrarily large values, a program would thus require an object containing an arbitrary number of cells, i.e. of arbitrary size. The problem then is that the C language also specifies an upper bound on the size of a single object, as follows. The standard specification first says that the size of any object must be measurable with the sizeof operator. Is then states that the evaluation of sizeof yields a value that can be represented by the standard type size_t. It then states that size_t is a standard integer type. All integer types have a fixed size, i.e. are representable with a statically constant number of ‘char’ cells. Therefore, there is an upper bound on the size of any object, and thus a finite number of different values for any possible object in C.

      This is why I found in my argument that the only way to achieve arbitrary storage is to find a way to define an arbitrary number of distinct objects. This is because, although each C objects gives you a finite memory, there is no hard limit on the number of distinct objects. Unfortunately, distinct objects in C must have different names, and standard C has neither by-name object indirection nor navigation across the activation frames of an arbitrarily deep recursion. Therefore, it is impossible to re-create a virtual memory out of different C objects, and one cannot achieve Turing Completeness this way.

  2. Many thanks, that’s very insightful.

    Sticking to the standard-C language semantics, is it then correct to say that

    1) the Halting Problem for an arbitrary standard-C program P is decidable?

    2) the Halting Problem for an arbitrary standard-C program P is decidable, given the additional constraint that P runs on an isolated computer; that is, P does not interact with the outside environment (e.g. with a user on the Internet)?

    best wishes,
    Edgar

    • You may want to check the extension to my original post that I posted this morning too.

      As to your questions:

      I am still working out what happens (or can possibly happen) with interactive programs. So for #1 I would say “the jury is still out” and probably the halting problem is not decidable if you allow the program to be dynamically recompiled while it is running.

      For #2, I would say the halting problem is indeed decidable once you choose your C compiler, even if your compiler generates code towards a universal Turing Machine with infinite tape. That is, the limitations about storage come from the language specification, not from the final machine where the program runs.

  3. Hi again,

    The following is a reply to your #2 answer and without incorporating your extension (which I might comment on in a separate reply).

    I understand your words: “even if your compiler generates code towards a UTM with infinite tape”. I guess I should have been more explicit in my question, which I reformulate as follows:

    Is the Halting Problem for an arbitrary standard-C program P decidable, given the additional constraint that P runs on an isolated computer and given that we leave the compiler out of the picture?

    In some ideal sense, it should be possible to leave the C compiler out of the equation, but I guess your point is that we cannot do that in this standard-C setting. Can you elaborate?

    best wishes,
    Edgar.

    • As you have determined in your other comment already, the language states that pointers have a finite width *when measured by sizeof() for a single pointer*, which is decided by the compiler. However the language does not prohibit an exotic implementation of C to dynamically recompile and reload the program at run-time using increasingly large pointer sizes. Since the compiler is part of the “machine,” this is conceptually possible. To take this into account in the language specification, you could stretch the interpretation of the language semantics to say that the standard does not explicitly require the width to be constant: in a conceptual “elastic” implementation, which would be very exotic but not the less possible, we could keep implementing pointers of increasing sizes and stretch the value returned by sizeof() as necessary.

      BUT I am not sure this conceptual loophole can be exploited with a non-interactive computer. It may, but I haven’t thought about properly yet.

  4. Hi once more,

    Concerning the extension to your original post, you write: “Scott highlights that the compiler is part of the “machine”, ” which is fine by me.

    Then you continue: “and thus any compiler-specific constants […] are not limiting factors to determine Turing-Completeness.”

    I understand that the compiler determines the actual value of, say, the width of a particular type. But is it not so that the semantics states that the width (of that particular type) is finite? That is, we don’t know what the exact width is unless we incorporate the compiler, but we do know that it is finite and not larger than an a priori defined number N. The latter suffices to conclude that the standard-C language is not Turing complete. For, you just need to take N to be a sufficiently large number, e.g. the number of atoms in this universe.

    Please feel free to correct me wherever you can.

    best wishes,
    Edgar

    P.S. I have difficulty understanding the “a posteriori” argument.

    • Hum, not I am not sure I answered your previous questions properly.

      The semantics state that the width is finite for any particular pointer object, but as I realized in my previous comment we could imagine an exotic implementation of C where all the type system is “elastic” and stretches in the implementation dynamically, at run-time. This would require to recompile the program *while it is executing*, but that seems conceptually OK because “the compiler is part of the machine”.

      Because of this, I would now say that the answer to #1 is “the halting problem is likely not decidable”.

      The question that then remains for me is whether this implementation trick may be used without interaction with the outside world. I am not sure of that. I am leaning on “yes” which would make the answer to your question #2 “also not decidable”, but I’d need to think further on that.

    • As for the “a posteriori” argument: it was phrased poorly, sorry for that.

      The argument is that any terminating program has an upper bound on the amount of storage it requires for a given input. It is thus possible to run the program once with some machine with a limited storage, see if it fails, and then try again with larger memory until it succeeds again, widening the types incrementally as needed. This way we implement the appropriate machine *after* writing the program (a posteriori). But this is only possible if you can re-run the program multiple times, ie if it is terminating.

      • Hi,

        The discussion is becoming difficult to follow 🙂
        I would like to simplify the discussion by assuming that

        (i) the semantics states that the width is finite and _fixed_ (i.e., not elastic).
        (ii) the semantics does not state the value of the width and leaves that issue to the compiler.
        (iii) we only consider a non-interactive (i.e. isolated) standard-C program.

        I think that when all these restrictions hold, the Halting Problem for standard-C programs is decidable. The reasons are imo stated in the first part of your post, and in my previous reply. (My previous reply was actually written under the assumption that we do not have elasticity.)

        best wishes,
        Edgar

  5. Yes, under these assumptions I believe you are correct.

  6. Hi,

    Your post is very relevant in the following sense: If I recall correctly, I think I have read some “fundamental” articles in the past years in which the authors have implicitly assumed that languages like C are Turing complete. But, as your post indicates, this is not a trivial matter at all. Furthermore, these authors mention several practical implications of their theoretical findings….

    In the interest of continuing our discussion (perhaps on another occasion), I would like to suggest making some more explicit assumptions which can always be lifted at a later stage.

    First, I suggest to solely concentrate on compilers that map the standard-C language onto a fixed and finite machine. The implication is that we do not consider the Universal Turing Machine as a target machine. I think this assumption makes the discussion more relevant and not less theoretical.

    Second, I suggest we stick (as much as possible) to the intended semantics of standard-C. You have mentioned that the “C language specification goes to great extents to ensure that you can only access objects in memory that have been properly allocated”, etc. So, even though the intent may not have been formulated clearly, let us first ignore exotic implementations (like the elastic implementation).

    Given my previous assumptions (in my previous comment) and the two assumptions presented above, I think only one of them is not very realistic and that is the assumption that the standard-C program is not interactive. I furthermore believe that an interactive program can be understood in terms of non-interactive behaviour followed by a specific event followed by non-interactive behaviour followed by an event, etc. In short: truly understanding non-interactive programs will help in our study of interactive programs.

    I am sorry if I got carried away. But I’d definitely like to continue this discussion orally some time in the future. In any case, I look forward to receiving more posts from you.

    best wishes,
    Edgar

  7. […] days ago I started to investigate whether C was Turing Complete. With help from two serious people also interested in the topic, I […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: