A Characterization of the Chomsky Hierarchy by String Turing Machines

A Characterization of the Chomsky Hierarchy by String Turing Machines Hans W. Lang University of Applied Sciences, Flensburg, Germany Abstract A string Turing machine is a variant of a Turing machine designed for easy manipulation of strings. In contrast to the standard Turing machine, a string Turing machine can insert and delete squares on the tape. It is easy to see that the models of standard and string Turing machines are equivalent in computing power. However, in case of the string Turing machine, imposing certain restrictions on the allowed actions of the machine exactly yields recognizers for the type-i language classes of the Chomsky hierarchy. Keywords: Turing machine, Chomsky hierarchy 1. Introduction In standard textbooks on formal languages and automata like [1] or [4] the hierarchy of automata corresponding to the Chomsky hierarchy comprises nondeterministic Turing machine, linear bounded automaton, pushdown automaton, and finite automaton. In this paper, a variant of the Turing machine, called string Turing machine, is introduced. A string Turing machine is able to manipulate strings and, especially, to reduce words to the start symbol according to the productions of a grammar. Restrictions of the form of productions correspond in a beautiful way to restrictions of the actions of the string Turing machine. Thus, the Chomsky hierarchy which is based on restrictions to grammars can also be based on restrictions of string Turing machines. In the following, the Chomsky hierarchy of language classes as characterized by certain restricted forms of grammars is revisited. Then, the new characterization by certain restrictions of string Turing machines is introduced. 2. The Chomsky hierarchy The Chomsky hierarchy identifies language classes L 0, L 1, L 2, L 3 where L 0 L 1 L 2 L 3 All the inclusions are proper. The language classes are denoted as type-i languages for i =0, 1, 2, 3. L 1 is also known as the class of context-sensitive languages, L 2 as the class of context-free languages, and L 3 as the class of regular languages, because their languages are generated by context-sensitive, contextfree, and regular grammars, respectively. Definition: A grammar is a tuple G =(V,T,P,S) with V T the alphabet of variables or nonterminal symbols, the alphabet of terminal symbols where V T = ; moreover, let A = V T, P a finite relation with P A + A ; the elements of P are called productions or replacement rules, S V a special variable, the start symbol. The replacement rules are written in the form u v, indicating that the subword u occurring in some word w may be replaced by the subword v. Given a grammar, the words of a language are generated by applying such a sequence of replacements to the start symbol until a word consisting of terminal symbols only is reached. The sequence of replacements is called a derivation of the word. Languages can also be recognized by grammars in the sense that all words that can be reduced to the start symbol belong to the language [3]. Such a reduction

is a derivation in opposite direction. The string Turing machine described later makes use of reductions. 3. Special grammars Certain restrictions on the form of the productions of a grammar may or may not restrict its power to generate a language. However, the restriction that P V A requiring that the left side of each production consists of exactly one variable restricts the languages generated by this kind of grammars to the class L 2 which is a proper subset of L 0. It turns out that the following restrictions imposed on the form of the productions of a grammar correspond to the language classes of the Chomsky hierarchy. Observe that each form of the productions is a special case of the preceding one. Type Productions of the form Name 0 u w Recursively enumerable 1 u v with u v Context-sensitive 2 X v Context-free 3 X ay or X a Regular w 0 w 1 w 2 w 3 w 4 w 5 & string cursor control Fig. 1: String Turing machine symbol is moved one position to the left. A blank symbol is inserted and becomes the new cursor symbol. The delete action is shown in Figure 2(b). The cursor symbol is deleted. The prefix of the string left to the cursor is moved one position to the right. Its last symbol becomes the new cursor symbol. I - insert a b c d & a b c d & (a) (b) Fig. 2: (a) Insert and (b) delete action a b c d & D - delete a b d & where u, v A +, w A, X,Y V, a T. If necessary, as an exception the production S ε is allowed to produce the empty word ε. 4. String Turing machine A string Turing machine is a device as shown in Figure 1. It has access to the symbols of a string, one at a time. A cursor points to the current position. The machine can read the symbol at the cursor position (the cursor symbol) and it can overwrite that symbol by some other symbol. It can also move the cursor to the left and to the right. Moreover, a string Turing machine can insert a symbol at the cursor position and it can delete the cursor symbol. Initially, the string consists of an input word enclosed by the special delimiter symbols and &. The string has finite length, however, by inserting symbols it can be made arbitrarily long. The insert action is performed as depicted in Figure 2(a). The prefix of the string including the cursor Formally, a string Turing machine is defined as follows. Definition: A nondeterministic string Turing machine is a tuple M = (Z, E, A, d, q, p) with Z E a finite, non-empty set of states, the input alphabet, A the string alphabet where E A, d q Z the transition relation with d Z A A Z where A = A {L, R, I, D}, the start state, p {, &} the start position. The string alphabet A contains the special symbols, & and the blank symbol ; these symbols do not belong to the input alphabet. The elements of the set {L, R, I, D} do not belong to A, these elements are

called cursor actions. At the beginning, the string Turing machine is in its start state, and the cursor points to one of the symbols or & that enclose the input word. An element (s, a, a,s ) of the transition relation is interpreted in the following way. If the string Turing machine is in state s and reads symbol a at the cursor position, it replaces symbol a by symbol a and enters state s. However, if a is not a symbol but one of the cursor actions, the string Turing machine does not overwrite symbol a but performs the corresponding cursor action: move left (L), move right (R), insert (I) or delete (D). The string Turing machine accepts an input word w, if there is a sequence of transitions that deletes its string completely. 5. Equivalence with standard Turing machine A standard Turing machine can simulate a string Turing machine. The standard Turing machine uses as tape alphabet the alphabet A of the string Turing machine plus symbols a for all symbols a in A. It then simulates the insert action as follows. It overwrites the symbol a under its read/write head by the symbol a. Then it moves all symbols to the right of symbol a by one position to the right. It returns to symbol a, overwrites it by a, moves one position to the right and prints a blank symbol. In a similar way, the standard Turing machine simulates the delete action. It overwrites the symbol a under its read/write head by the symbol a. Then it moves left to the first non-blank symbol on its tape and moves all these symbols by one position to the right until it reaches a. All other actions are identical to those of the string Turing machine. The standard Turing machine enters the accepting state when it has deleted all symbols on the tape. A string Turing machine can simulate a standard Turing machine. It performs identical actions as the standard Turing machine, except when it reaches the delimiter symbols or & (which do not belong to the tape alphabet of the standard Turing machine simulated). Then it performs insert actions if it needs space. When the standard Turing machine enters an accepting state and stops, the string Turing machine deletes its string so that it accepts, too. Otherwise, the string Turing machine does not accept, because at least the symbols and & remain. 6. Recognition of languages by string Turing machines Given a grammar G and a nonempty word w as input string, the string Turing machine tries to reduce the word w to the start symbol S of the grammar. It does so by replacing the right side of some production that occurs in w by the corresponding left side in a nondeterministic way. It repeats this procedure until only the start symbol S remains. Finally, it deletes S and the delimiter symbols and & and recognizes the word w. Otherwise, if no such reduction sequence to the start symbol S is possible, it does not recognize the word w. Example: Let G be a grammar with the productions S asbc abc cb Bc bb bb and let the input word be w = aabbcc. The string Turing machine searches nondeterministically for occurrences of right sides of the productions in the input word w. It finds bb, and replaces it with bb, yielding aabbcc. Then it replaces Bc with cb, yielding aabcbc. Then it replaces abc with S, yielding asbc. Finally, it replaces asbc with S and deletes S and the delimiter symbols and &. Thus, it has recognized the word w. When the right side of a production is longer than the left side, the string Turing machine needs to perform delete actions. When the right side is shorter than the left side, it needs to perform insert actions. However, this last case does only occur in type-0 grammars. Thus, any type-1 language is recognizable by a string Turing machine without insert actions.

This observation is part of the following hierarchy theorem for string Turing machines. Recall that the cursor actions L, R, I, D denote left move, right move, insert and delete, respectively. Theorem: Any type-0 language is recognizable by a string Turing machine with cursor actions {I, L, R, D}. Any type-1 language is recognizable by a string Turing machine with cursor actions {L, R, D}. Any type-2 language is recognizable by a string Turing machine with cursor actions {R, D}. Any type-3 language is recognizable by a string Turing machine with cursor actions {D}. Proof sketch: The string Turing machine recognizes a word w of a type-0 language by applying reduction steps in a nondeterministic way until it reaches the start symbol S of the grammar, which it subsequently deletes together with the delimiter symbols and &. In the same way, a string Turing machine recognizes a word w of a type-1 language. However, every type-1 language has a monotonic grammar, i.e. a grammar where no right side of a production is shorter than the left side. Thus, each reduction step can be performed without insert actions. Again in the same way, a string Turing machine recognizes a word w of a type-2 language. Every type-2 (context-free) language has a grammar in reverse Greibach normal form, with each production in the form X Y k 1... Y 0 a where k N 0, X, Y 0,..., Y k 1 V and a T. Observe that when k =0the production has the form X a. Recognition of a context-free language is based on such a grammar in reverse Greibach normal form. When processing an input word, the string Turing machine applies a production after each terminal symbol read. It does so by matching and deleting the right side of the production except of the last symbol, which it overwrites by the left side of the production. Cursor actions {R, D} suffice for this. In this way, the string Turing machine simulates a pushdown automaton. The prefix of the string including the cursor position corresponds to the stack of the pushdown automaton. Every type-3 language is generated by a left linear grammar. A string Turing machine with cursor actions {D} starts at the delimiter symbol & and reads the input word from right to left. When reading the first symbol a, it applies some production X a and enters state X. then it reads the next symbol b and applies some production Y Xb and enters state Y, and so on. If it reads the delimiter symbol and is in the state of the start symbol S, it deletes and accepts, otherwise it overwrites with and rejects. It may seem strange that in the case of type-3 languages the string Turing machine processes the input word from right to left. However, if acceptance by empty string is required there is no other choice. Another possibility would have been to define acceptance by final state and to restrict the type-3 cursor actions to {R}. The following example illustrates the way a word of a context-free language is recognized by a string Turing machine. Example: Consider the grammar S Xb XSb X a which is in reverse Greibach normal form. Let w = aabb be the input word. The string Turing machine first reduces each a to X by the production X a yielding the string XXbb. When it reads the first b it chooses production S Xb. It deletes b with the result that X appears at the cursor position. It overwrites X by the left side of the production, S, yielding the string XSb. It then moves the cursor to the right, reads b, and applies the production S XSb. Namely, it deletes b, deletes S, and overwrites X by the left side S. Now it has reduced the input word w to the start symbol S. In order to make sure that the current word is just S, it moves the

cursor to the right, deletes the delimiter symbol &, deletes S, and deletes the other delimiter symbol. Since the string is now empty, the string Turing machine accepts the input word. 7. Language classes recognized by string Turing machines We have seen that every type-i language is recognizable by a corresponding type of nondeterministic string Turing machine, which we call type-i string Turing machine. We show now that, vice-versa, any language recognized by a type-i string Turing machine is a type-i language. Theorem: Any language recognized by a type-i string Turing machine is a type-i language (i =0, 1, 2, 3). Proof: We show that a type-i string Turing machine for i =0, 1, 2, 3 can be simulated by a nondeterministic standard Turing machine, linear bounded automaton, pushdown automaton, and finite automaton, respectively. Thus, if a language is recognized, for instance, by a type-2 string Turing machine, it is recognized by a pushdown automaton and therefore is context-free or type-2. For string Turing machines of type i =0, 1 the construction of a corresponding standard Turing machine and linear bounded automaton, respectively, is obvious. We show in detail the construction of a nondeterministic pushdown automaton from a type-2 string Turing machine and a nondeterministic finite automaton from a type-3 string Turing machine. The input alphabet of the pushdown automaton is the same as that of the string Turing machine, the stack alphabet corresponds to the string alphabet, the sets of states are identical, so is the start state. It is assumed that a stack initially contains the delimiter symbol. The transition relation of a pushdown automaton consists of 5-tuples of the form (s, a, h, h,s ) where s is the current state, a is the symbol read, h is the topmost stack symbol that is popped from the stack, h is the symbol pushed to the stack, and s is the next state. Any of a, h, and h may be the empty word ε. We construct the transition relation d of the pushdown automaton from the transition relation d of the string Turing machine in the following way. 1) For all elements (s, a, R, s ) and (s,b,b,s ) of d where b E let (s, b, ε, b, s ) be an element of d. That is, whenever the string Turing machine moves its cursor to the right the pushdown automaton reads the next input symbol b and pushes it to the stack. Moreover, the tuple (s, ε, ε, &,s ) is added; the pushdown automaton may choose this transition when it has read the input completely. 2) For each element (s, a, a,s ) of d with a A let (s, ε, a, a,s ) be an element of d. That is, when the string Turing machine overwrites symbol a by symbol a, the pushdown automaton pops symbol a from the stack and and pushes symbol a to the stack. 3) For each element (s, a, D, s ) of d let (s, ε, a, ε, s ) be an element of d. That is, when the string Turing machine deletes symbol a, the pushdown automaton pops symbol a from the stack. If the string Turing machine processes a word w, then the pushdown automaton processes w in a corresponding way. If and only if the string Turing machine has reduced its string to the empty string and accepts, then the pushdown automaton accepts with empty stack. Thus, the pushdown automaton recognizes the same language. Any language recognized by a pushdown automaton is context-free. Therefore, any language recognized by a type-2 string Turing machine is context-free. A type-3 Turing machine essentially acts like a nondeterministic finite automaton that processes the input word from right to left. Thus, if L is the language accepted by the string Turing machine, then &L R is the language accepted by the nondeterministic finite automaton, where L R is the mirror image of L. Since &L R is regular, so is L R, and so is L, since the mirror image of a regular language is regular.

8. Conclusions We have introduced the concept of the string Turing machine as a natural device for recognizing languages. A string Turing machine can recognize a word by applying reduction steps according to the productions of some grammar. Depending of the type of the grammar, a corresponding type of string Turing machine suffices to recognize the language. We have defined type-i string Turing machines for i =0, 1, 2, 3 where each type i is a special case of type i 1 using only a proper subset of the cursor actions {I, L, R, D}. The main result is that the (nondeterministic) type-i string Turing machines correspond exactly to the typei languages of the Chomsky hierarchy. References [1] J.E. Hopcroft, R. Motwani, J.D. Ullman, Automata Theory, Languages, and Computation, 3rd edition, Addison-Wesley, 2006. [2] H.R. Lewis, C.H. Papadimitriou, Elements of the Theory of Computation, Prentice Hall, 1981. [3] A. Salomaa, Formal Languages, Academic Press, 1973. [4] M. Sipser, Introduction to the Theory of Computation, PWS Publishing Company, 1996.