kena

On the Turing Completeness of C – part 2

In lecture on November 20, 2012 at 22:00

Two days ago I started to investigate whether C was Turing Complete. With help from two serious people also interested in the topic, I came to the preliminary conclusion that it is probably not. Here is a summary of the arguments so far.

Turing Completeness is a property of languages (i.e. not machines) which can be phrased as follows: a language is Turing complete if any function that can be computed by a Turing Machine can be expressed in that language. An interesting property is that if a language is turing complete then the halting problem is not decidable for arbitrary programs in that language.

The usual way to determine whether a given language is Turing complete is to prove that it is sufficiently general to express all computable functions. This is the approach indirectly suggested by Scott Schneider in his comment to Hacker News.

Furthermore, Scott proposes a simple test for Turing Completeness: check whether the language supports unbounded recursion, conditionals and arbitrary storage.

“Support for unbounded recursion” has been formally described by R. Péter in 1934 [1]; to simplify, it is the ability to write a recursive function whose recursion depth is not known in advance. The standard C language as defined by ISO 9899:1999 [2] and 9899:2011 [3] does support unbounded recursion, by omitting to specify of a maximum recursion depth. Of course, C also supports conditionals.

The question then came of whether C supports arbitrary storage. Both Scott and I have come to think that this is not the case: due to the finite and fixed width’s of C types, especially pointer types and array indices, the abstract machine that defines the semantics of C has a memory of finite size.

But are we right? This needs to be properly investigated.

As highlighted by commenter Edgard G. Daylight, we should also start by restricting the entire discussion to non-interactive C programs, because Turing Completeness is about the equivalence with Turing machines which are also no-interactive. Interactivity in C comes specifically from the “volatile” keyword and the standard library. we will thus consider only the subset of C defined for a “freestanding” environment (i.e. no library) and without “volatile”, where the input to the program is only defined by the initial value at run-time of global objects left unitialized in the program source.

If we can determine Turing Completeness for this restricted language, then the full C language would also be Turing Complete by construction. If we instead determine that the restricted language is not Turing Complete, then we can organize a separate discussion, later, to determine whether there are features of the full C language that “bring back” Turing Completeness.

❦❦❦

Another point needs to be clarified, too: Scott’s test for Turing Completeness is different from the traditional phrasing. The traditional phrasing does not mention unbounded recursion nor arbitrary storage; instead, it states  that a language is Turing Complete if it supports conditionals, unbounded loops and a finite number of memory cells which each can contain arbitrary large values. This is, for example, the basis for the definition of Hofstadter’s FlooP which is Turing Complete.

Scott’s test, instead, mentions “arbitrary storage” which can be understood in the context of C to correspond to an arbitrarily large number of different cells, each containing a fixed size value.

I am confident (although I haven’t proved it formally) that both phrasings are equivalent. To me, it is intuitively possible to implement unbounded loops with unbounded recursion, as well as cells that can retain arbitrarily large value (i.e. “bignums”) using arbitrary storage. Conversely, with one cell retaining a call stack and an unbounded loop, one can simulate unbounded recursion, and another cell can simulate arbitrary storage.

Even assuming the phrasings are not equivalent, since C supports both unbounded loops and unbounded recursion, the remaining question would then be whether C supports an arbitrarily large number of finite cells or a fixed number of cells that can store arbitrarily large values.

❦❦❦

We can try first to find out whether the individual cells can contain arbitrary values, i.e. whether the type “char” is specified to have a finite width.

A “char” cell is defined to contain exactly CHAR_BIT bits; CHAR_BIT is in turn defined to be greater than or equal to 8, but its exact value is implementation-defined. One can interpret this specification in two ways: either the specification should be understood to implicitly mandate CHAR_BIT to be finite; or it leaves space for an (hypothetic, possibly impractical) implementation of C where CHAR_BIT is not finite. If the latter holds, then C would indeed support arbitrarily large values in each char cell.

To test this, two approaches are possible.

The first is to check whether other statements in the C language specification indirectly constrain CHAR_BIT to be finite. At the point CHAR_BIT is defined (sections 6.5.2 and 5.2.4.2.1), the text of the language specification does not state so explicitly; but a careful reading of the entire specification may find this finiteness elsewhere.

The other approach is to try and actually construct a C program that simulates a universal Turing machine (or graph reduction machine, or stack machine) with CHAR_BIT not assumed to be finite. This succeeds when we can write such a program and cannot find any statement in the language specification that the program is incorrect or that its behaviour is unspecified or undefined.

(The first approach should be used by someone who expects to find this finiteness; the second approach should be used by someone who expects otherwise.)

I do not have a definite answer on this yet.

However, note that finding Turing Completeness in this direction would require a conceptual implementation of C with a “very large”  char data type (“infinitely” large, in fact), which is very far from what most people understand the C language is about.

Which brings me to another, informal answer. We can step out of the frame and check instead the intentions of the various  ISO committees in charge of writing and publishing the C language specification.

It seems obvious (to me, at least) that these people wanted CHAR_BIT represent the capacity of individual cells that can be implemented in memory hardware, expected to stay stable over time and close to the POSIX/Internet standard of 8 bits per “char”. By this argument, the people in charge of standardizing C’s semantics intended a finite storage space in bits per cell.

❦❦❦

Assuming CHAR_BIT is finite, which, irrespective of the language specification, is the most intuitive and practical choice regarding implementation, we can instead try to find whether C supports an arbitrarily large number of cells.

This is actually surprisingly difficult to determine.

Firstly, the language specification states in section 6.2.6 (representation of types) that all types have a representation as a finite array of “char” cells. This implies that all individual objects, which are the mechanism by which C gives access to storage to program writers, have a finite number of cells. Since we assume that each cell can hold only a finite number of different values, this means that any object of any given type also has a finite number of possible values. Since each pointer is an object, this implies that the number of different addressable objects in C is also finite, including the number of cells in arrays.

Therefore, it is not possible to support arbitrary storage by using arrays or variable pointers to different objects.

However, it is also possible in C to construct/define non-addressable objects, namely:

  • objects allocated statically by name (either in the global scope or with “static” on local scopes) whose address is never taken with the & (“address of”) operator; and
  • literal constants (i.e. character literals, enumeration constants, string literals and number literals); and
  • objects defined by name to be allocated automatically upon function activation, i.e. “local variables” in functions, whose address is never taken with the & operator.

Of these three categories, the first two only come in finite supply for any given program (a given, finite program text has a finite supply of different names and literal constants). That leaves us with the remaining question: can one simulate arbitrary storage using only automatic variables and without using arrays nor the & operator?

I am increasingly confident that this is not possible. My informal argument goes as follows: since each object has a finite size, and the program text limits us to a finite number of different object per function definition, the only way to increase the storage space is function recursion: each recursive function call will automatically instantiate new objects, and their number is unbounded because function recursion is unbounded. However, each specific function activation cannot “navigate” the entire storage created this way, like a Turing machine could move its head left or right to arbitrary positions on the tape, because the variables in one activation cannot be accessed from another activation.

It is not possible to introduce this “navigation” feature by means of function arguments, because these would need to be basic C integers or pointers which again would limit the span of the navigation to a finite set of locations.

If my argument is correct, this would imply that C restricts each program to a finite supply of “char” cells. Combined with the finiteness of CHAR_BIT, assumed above, one would then conclude that C is not Turing Complete (under the initial restriction of a freestanding environment without “volatile”).

❦❦❦

The summary so far is thus:

  • if we consider C restricted to a freestanding environment, removing the “volatile” keyword, and assuming CHAR_BIT is finite, then C is (very likely) not Turing Complete;
  • if CHAR_BIT is not finite, then C is (likely) Turing Complete;
  • if CHAR_BIT is finite, there may be features of a hosted C environment and/or C with “volatile” that make it Turing Complete, however we have not looked at this yet.

The topic is really fascinating and I cherish the idea to lay out the discussion into a proper academic publication. But before that I need to clean up my understanding of Turing completeness and how to test for it in an arbitrary language. The first step in that direction was to order a copy of Elements of the Theory of Computation [4] today, which I hope to digest over the Christmas vacation. Stay tuned!

References

[1] Rózsa Péter. Über den Zussammenhang der verschiedenen Begriffe der rekursiven Funktion. Mathematische Annale, 110:612–632, 1934.

[2] International Standards Organization and International Electrotechnical Commission. ISO/IEC 9899:1999(E), Programming Languages – C. American National Standards Institute (ANSI), 11 West 42nd Street, New York, New York 1O036, second edition, December 1999.

[3] International Standards Organization and International Electrotechnical Commission. ISO/IEC 9899:2011, Programming Languages – C. American National Standards Institute (ANSI), 11 West 42nd Street, New York, New York 1O036, first edition, December 2011.

[4] Harry Lewis and Christos H. Papadimitrious. Elements of the Theory of Computation. Prentice Hall, second edition, August 1997. ISBN 978-0132624787.

Advertisements
  1. […] following advice from both Scott and commenter E.G. Daylight, I have followed up with a “cleanup post” that summarizes the argument and focuses the […]

  2. I’d say, “Of course C isn’t Turing complete, because a pointer is a finite size S, so it can only ever be a finite state machine with 2^^S states.” But the point of the tape is that it is unbounded. To emulate that in C, you have to use the filing system: getc() and putc(). If you need a longer tape, buy a bigger disc drive. Voilà!

    • I am glad that you find the issue trivial.

      Note that your assumption that a pointer has a finite size is neither obvious (it is not mandated by the standard, see the discussion on CHAR_BIT) nor relevant (you can address many more objects in C than there are pointer values).

      Meanwhile my post was specifically focussed on freestanding C, ie without access to the standard library. So your suggestion to use getc/putc does not apply. Then even if it did, you would still have a problem: you need to seek left and right on the tape, and file size limits apply (fseek must return the current position, which in turn must fit some integer type).

      So all in all it may seem “obvious” at a first glance but I reckon it is not obvious at all.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: