A Note on the Succinctness of Descriptions of Deterministic Languages

Similar documents
A Characterization of the Chomsky Hierarchy by String Turing Machines

On the Recursion-Theoretic Complexity of Relative Succinctness of Representations of Languages

Formal languages and computation models

Theory of Computations Spring 2016 Practice Final Exam Solutions

Regular Languages (14 points) Solution: Problem 1 (6 points) Minimize the following automaton M. Show that the resulting DFA is minimal.

Source of Slides: Introduction to Automata Theory, Languages, and Computation By John E. Hopcroft, Rajeev Motwani and Jeffrey D.

TAFL 1 (ECS-403) Unit- V. 5.1 Turing Machine. 5.2 TM as computer of Integer Function

UNIT I PART A PART B


Decidable Problems. We examine the problems for which there is an algorithm.

Multiple Choice Questions

Limited Automata and Unary Languages

(a) R=01[((10)*+111)*+0]*1 (b) ((01+10)*00)*. [8+8] 4. (a) Find the left most and right most derivations for the word abba in the grammar

University of Nevada, Las Vegas Computer Science 456/656 Fall 2016

Reflection in the Chomsky Hierarchy

1. [5 points each] True or False. If the question is currently open, write O or Open.

CS402 - Theory of Automata Glossary By

CT32 COMPUTER NETWORKS DEC 2015

We show that the composite function h, h(x) = g(f(x)) is a reduction h: A m C.

ONE-STACK AUTOMATA AS ACCEPTORS OF CONTEXT-FREE LANGUAGES *

From Theorem 8.5, page 223, we have that the intersection of a context-free language with a regular language is context-free. Therefore, the language

Theory of Computation

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016

R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

Midterm Exam II CIS 341: Foundations of Computer Science II Spring 2006, day section Prof. Marvin K. Nakayama

Specifying Syntax COMP360

Computer Sciences Department

Decision Properties for Context-free Languages

Turing Machine Languages

Theory of Programming Languages COMP360

Formal Languages and Automata

Chapter 14: Pushdown Automata

JNTUWORLD. Code No: R

Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and

Compiler Construction

TOPIC PAGE NO. UNIT-I FINITE AUTOMATA

Theory of Computations Spring 2016 Practice Final

Learn Smart and Grow with world

The Turing Machine. Unsolvable Problems. Undecidability. The Church-Turing Thesis (1936) Decision Problem. Decision Problems

Material from Recitation 1

Finite-State Transducers in Language and Speech Processing

where is the transition function, Pushdown Automata is the set of accept states. is the start state, and is the set of states, is the input alphabet,

Introduction to Automata Theory. BİL405 - Automata Theory and Formal Languages 1

Turing Machines. A transducer is a finite state machine (FST) whose output is a string and not just accept or reject.

1. Draw the state graphs for the finite automata which accept sets of strings composed of zeros and ones which:

Closure Properties of CFLs; Introducing TMs. CS154 Chris Pollett Apr 9, 2007.

CS525 Winter 2012 \ Class Assignment #2 Preparation

Computation Engineering Applied Automata Theory and Logic. Ganesh Gopalakrishnan University of Utah. ^J Springer

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Theory of Computation, Homework 3 Sample Solution

Definition: A context-free grammar (CFG) is a 4- tuple. variables = nonterminals, terminals, rules = productions,,

Skyup's Media. PART-B 2) Construct a Mealy machine which is equivalent to the Moore machine given in table.

THEORY OF COMPUTATION

LECTURE NOTES THEORY OF COMPUTATION

CS210 THEORY OF COMPUTATION QUESTION BANK PART -A UNIT- I

Final Course Review. Reading: Chapters 1-9

Theory of Languages and Automata

Name: Finite Automata

LECTURE NOTES THEORY OF COMPUTATION

CS 44 Exam #2 February 14, 2001

PDA s. and Formal Languages. Automata Theory CS 573. Outline of equivalence of PDA s and CFG s. (see Theorem 5.3)

6.045J/18.400J: Automata, Computability and Complexity. Practice Quiz 2

ECS 120 Lesson 16 Turing Machines, Pt. 2

I have read and understand all of the instructions below, and I will obey the Academic Honor Code.

Context Free Grammars. CS154 Chris Pollett Mar 1, 2006.

Yet More CFLs; Turing Machines. CS154 Chris Pollett Mar 8, 2006.

6. Advanced Topics in Computability

Outline. Language Hierarchy

Problems, Languages, Machines, Computability, Complexity

CS6160 Theory of Computation Problem Set 2 Department of Computer Science, University of Virginia

Regular Languages and Regular Expressions

Universal Turing Machine Chomsky Hierarchy Decidability Reducibility Uncomputable Functions Rice s Theorem Decidability Continued

Ambiguous Grammars and Compactification

Lexical Analysis - 2

Theory of Computation Dr. Weiss Extra Practice Exam Solutions

We can create PDAs with multiple stacks. At each step we look at the current state, the current input symbol, and the top of each stack.

CS402 Theory of Automata Solved Subjective From Midterm Papers. MIDTERM SPRING 2012 CS402 Theory of Automata

Introduction to Computers & Programming

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

QUESTION BANK. Formal Languages and Automata Theory(10CS56)

Context Free Languages and Pushdown Automata

1 Parsing (25 pts, 5 each)

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

(Refer Slide Time: 0:19)

Equivalence of NTMs and TMs

Dr. D.M. Akbar Hussain

The Big Picture. Chapter 3

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

T.E. (Computer Engineering) (Semester I) Examination, 2013 THEORY OF COMPUTATION (2008 Course)

Variants of Turing Machines

CSE 105 THEORY OF COMPUTATION

LL(1) predictive parsing

Optimizing Finite Automata

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

ALGORITHMIC DECIDABILITY OF COMPUTER PROGRAM-FUNCTIONS LANGUAGE PROPERTIES. Nikolay Kosovskiy

AUBER (Models of Computation, Languages and Automata) EXERCISES

Enumerations and Turing Machines

Transcription:

INFORMATION AND CONTROL 32, 139-145 (1976) A Note on the Succinctness of Descriptions of Deterministic Languages LESLIE G. VALIANT Centre for Computer Studies, University of Leeds, Leeds, United Kingdom It is shown that the relative succinctness that may be achieved by describing deterministic context-free languages by general unambiguous grammars rather than by deterministic pushdown automata is not bounded by any recursive function. l. INTRODUCTION The result proved in this paper is that for the elements of some infinite class of deterministic context-free languages the size of deterministic pushdown amomata needed to describe them is not recursively bounded by the size of the smallest unambiguous context-free grammars that generate them. This is a quantitative explanation of the fact that some languages require large descriptions in terms of LR(1) grammars (Knuth, 1965; Aho and Ullman, 1972), or strict deterministic grammars (Harrison and Havel, 1973), even though they can be described very succinctly in terms of general, even unambiguous, context-free grammars. It therefore illustrates one of the tangible advantages of using parsing mechanisms more powerful than a single pushdown stack (e.g., Earley, 1970) even for languages for which that may be sufficient. The most closely related result previously known is that a similar nonrecursive relationship exists between the succinctness of descriptions of regular sets by finite automata and (ambiguous) context-free languages respectively (Meyer and Fischer, 1971). In contrast a fairly precise recursive relationship is known to exist between finite automata and deterministic pushdown automata (Stearns, 1967; Meyer and Fischer, 1971; Valiant, 1975). However, the two further questions of analogously relating finite automata with unambiguous grammars, and unambiguous grammars with ambiguous ones, both remain open. Copyright 1976 by Academic Press, Inc. All rights of reproduction in any form reserved. 643/3z/z-4 139

140 LESLIE G. VALIANT 2. PRELIMINARIES Comparisons between the succinctness of different formalisms can be quantified within the following general framework. Let X be a set on which there is an equivalence relation :-, and a size measure ~ mapping X into the positive integers. Then the maximum succinctness of X~ over Xi for X~, X: C X can be expressed by the function Sij defined by S,:(n) = max{m,(x) [ a(x) ~ n, x ~ X~}, where Mi(x) = min{a(y) [ y ~ Xi, y :" x}, and min and max take the value zero for the empty set. For the case Xi C X s C X~ the following general relationships follow immediately from the definitions. LEMMA 1. LEMMA 2. For all n, S~,(n) ~ Sie(n). For all n, S,lc(n) ~ Si,(Ssle(n)). For our applications we want X a, X 2, X a, X 4 to correspond to the class of finite automata, deterministic pushdown automata, unambiguous contextfree grammars, and general context-free grammars, respectively. The equivalence relation,-, will hold between two elements if and only if they describe the same language. For uniformity we define X 1 to be the class of Chomsky type 3 grammars obtained from finite automata (as in Hopcroft and Ullman, 1969, p. 34) and X 2 to be the union of X 1 and class of canonical grammars obtained from deterministic pushdown automata (e.g., Hopcroft and Ullman, 1969, p. 76). It is well known that for all conventional size measures, both classes of grammars are recursively related to their corresponding automata. We therefore have four classes of grammars (X 1 C X 2 C X a C )24) in which we can define a uniformly. From among a number of possible size measures for a grammar G (e.g., Gruska, 1972; Ginsburg and Lynch, 1975) we choose a(g) to be the total number of occurrences of terminal and nonterminal symbols in the productions of G. The following definition of a deterministic pushdown automaton (dpda) is a convenient standard form into which all the other customary formulations can be recursively translated. A dpda M is specified by a sextuple (Q, 1", Z, A, cs, F), where Q, F, 2: are finite sets of states {s,...}, stack symbols {d,...}, and input symbols {a,...}, and A, cs, and F are the transitions, starting configuration, and accepting modes, respectively, as defined below.

A SUCCINCTNESS RESULT 141 Typically we denote words from N* and Z* by oj and % respectively, and their lengths by ] oj I and [~ I. A configuration c is a pair (s, ~o) from O F*, and its height I c I is defined to be I co I- A mode, designated either a reading mode or an E-mode, is a pair from Q (F k3 {f2}), where E2 is a special empty stack symbol. A is a set of transitions, each of the form (s, A) ~ (~', ~), where rr ~ 27 t3 { } and ] w ] ~< 2, such that (i) if (s, A) is a reading mode then for each a ~27 it has a unique transition with ~ = a but none with 7r = e; and 7T ~e. (ii) if (s, _//) is an e-mode then it has just one transition, and in this This machine makes a move (~, ~A) ", (s', ~') if and only if there is some transition (s, A) -~+ (s', oj). If ~ ~ 27 then this symbol is considered to have been read. A derivation is a sequence of such moves through successive configurations and is said to be an s-derivation if ~ is the concatenation of the symbols read by the constituent moves, c is a stacking configuration of the derivation if all the configurations following it have height > ] c 1. It is a popping configuration of the derivation if all the configurations preceding it have height >lcl. A special configuration of height 1 is designated the starting configuration c s. The set F of accepting modes is a set of reading modes. A word ~ is accepted by M if there is an s-derivation from c s to some c' with mode (i.e., state, top stack symbol) belonging to F. For a dpda M we denote the cardinalities of Q and _P by q and t, respectively. 3. PROPERTIES OF PUSHDOWN J~UTOMATA We start with a lemma that illustrates a basic technical argument applicable to deterministic pushdown automata.

142 LESLIE G. VALIANT LEMMA 3. Suppose for some configurations Co, cl, and c 2 of dpda M such that! c o I, I c2 I < m, ] q ] > n, and n -- m > q2t there is y-derivation from c o to q and a 8-derivation from q to c 2. Then for some fi strictly shorter than 78 there is a fi derivation from c o to c 2. Proof. Since n--m>q2t two integers i, j (m~<i<j <~n) clearly exist such that the modes of the stacking configurations in the y-derivation are the same at heights i + 1 and j + 1, and the states of the popping configurations of heights i and j in the 8-derivation are also identical. By removing from y8 the substrings that induce the subderivations between these two pairs of configurations a shorter string fi with the desired properties is clearly obtained. I Using the above style of argument in various ways we can prove the following. LENtMA 4. There is a positive constant k such that if for some dpda M with q states and t stack symbols ~ is a shortest string such that both ~a and ~b are accepted, then qt >/(log [ ~ [)1~. Proof. Consider the derivation of M from c s that reads ~ and reaches a reading mode c. Let % be one of the configurations in this derivation of maximal height. Let %, cb be the configurations with reading modes in F reached when c~a, c~b, respectively, are read. If ~ has the assumed minimality property then clearly no configuration in the c~-derivation can repeat. Consequently [ ~1 ~< qtlc< if t > 1 and [~I ~q']%] if t = 1. To prove the lemma it remains to show that I% ] <~ 4q at which then implies that qt > (1). (log l c ~ ])1/3 for all q, t. We assume the contrary, that [% [ > 4qat and deduce that in all the following four cases, which are clearly exhaustive, some substrings of can be removed to give shorter strings still with the required property: (i) If [%] --I c] >q2t then Lemma 3 (with c o =e,, c 1-~cm, ce : c, y3 = c~) can be applied directly to find a/3 ([ fi[ < [~ t) such that fa and fib are still both accepted. (ii) If Ic[ -- max{l ca I, [cb 1} > qat (i.e., the readings of both ~a and ~b are followed by long e-derivations) then extending the argument of Lemma 3 to find a stack segment corresponding to state repetitions in the popping configurations of both e-derivations as well as mode repetitions in the stacking configurations leads to a contradiction similar to (i). (iii) If max{[c a[,/c~[}-min{ic al,lcb[} >q 2t a stack segment exists that corresponds to state repetitions in the popping configurations

A SUCCINCTNESS RESULT 143 of the longer e-derivation and mode repetitions in the stacking configurations. This yields the required contradiction. (iv) If min{[ c, I, I cb 1} > qt then a substring of ~ corresponding to mode repetitions in the stacking configurations can be removed to give the contradiction. 4. SUCCINCTNESS THEOREM To prove our main result we use the idea (due to Meyer and Fischer, 1971) of encoding large Turing machine computations in small grammars. THEOREM. The function S~z is not recursively bounded. Proof. Let T be a TM wkh IT I quadruples that on null input halts in Z steps. Let the function Nextr(x) be defined at x if and only if x is an instantaneous description (ID) of T, and to have value y if y is the ID following x in a computation of T. Let x 0 be the ID for the starting configuration with null tape and let the set {$, a, b} be disjoint from the alphabet describing the ID's. Let L' be the set of strings of the form $xosxl$'"$x~$, where x~ is a halting ID of T, such that for all k (0 ~< h ~< (n -- 1)/2) X21c+lR = Nextr(X2~ ). Let L" be the set of strings of the same form under the different restriction that for all k (1 <~ k <~ n/2) x.)1~ = Nextr(x~k_l). Now let L = L'a kj L"b. It is easily verified that L', L" are both recognized by dpda's (say M', M", respectively) of size polynomial in [ T [. It follows that L'a and L"b are also both so recognizable and therefore both generated by unambiguous grammars of similar size. Since these languages are disjoint their union L is also generated by an unambiguous grammar of size recursive in F T I. However, L itself is recognized by a dpda. The dpda uses its finite state control to deal with strings of length no more than Z 2 and for longer strings acts as follows. It reads the first Z 2 or so characters of the input c and determines to which of L'a or L"b c may still possibly belong. This initial segment cannot be a prefix of words in both languages for that would imply that a computation of T of more than Z steps on null input exists. If it does not prefix words in either language ~ is rejected. Otherwise the dpda simulates the rest of ~ on M' (or M" as appropriate) and accepts it if after reaching an accepting mode of M' (or M") an a (or b) follows.

144 LESLIE G. VALIANT It remains to observe that for any dpda recognizing L, qt > (log Zf ~. This is because the string corresponding to the correct computation sequence of T on null input satisfies the condition of Lemma 4. From the undecidability of the halting problem for TM's we know that Z is not recursively bounded by I T I, and hence neither is qt. The family of languages obtained from halting TM's in the manner of L above therefore ensures that $23 is not recursively bounded. Note that since the reversal of L is recognized by a dpda of size recursive in ] T 1, this example also shows that the relative sizes of dpda's required to recognize a language and its reversal are not recursively bounded. As a further application the example can also be used to deduce that it is undecidable whether an unambiguous grammar generates a deterministic language. 5. SUMMARY We have investigated within a general framework the succinctness relationships among four classes of context-free grammars. Our result is that S~3 is not recursively bounded. This follows an analogous result for $14 by Meyer and Fischer (1971). The recursiveness of $1~ was first proved by Stearns (1967) and that it is in fact double exponential follows from Meyer and Fischer (1971) and Valiant (1975). That $2~ is not recursively bounded is immediate from the result for $14 together with Lemma 2, or alternatively from that for S~3 together with Lemma 1. The natures of Sla and of Sa4 remain unknown, although the result for $14 and Lemma 2 together implies that at least one of them is not recursively bounded. ACKNOWLEDGMENT The author is grateful to Albert Meyer and Yoshihide Igarashi for their helpful comments on this paper. RECEIVED: September 25, 1975; REVISED: January 29, 1976 REFERENCES AHO, A.V., AND ULLMAN, J. D. (1972), "The Theory of Parsing, Translation and Compiling," Prentice-Hall, Englewood Cliffs, N. J. EAaLEY, J. (1970), An efficient context-free parsing algorithm, Comm. Assoc. Comput. Mach. 13, 94-104.

A SUCCINCTNESS RESULT 145 GINSBURG, S., AND LYNCH, N. (1975), Comparative complexity of grammar forms, in "7th ACM Symposium on Theory of Computing," Albuquerque, N. M., May 5-7, 1975. GRUSKA, J. (1972), On the size of context-free grammars, Kybernetica 8, 213-218. H~mSON, M. A., AND HAVEL, I. M. (1973), Strict deterministic grammars, J. Comput. System Sci. 7, 237-277. HoPcaOF% J. E., AND ULLMAN, J. D. (1969), "Formal Languages and Their Relation to Automata," Addison-Wesley, Reading, Mass. KNUTH, D. E. (1965), On the translation of languages from left to right, Inform. Contr. 8, 607-639. MEYER, A. R., AND FISCHEa, M. J. (1971), Economy of descriptions by automata, grammars and formal systems, in "12th Annual Symposium on Switching and Automata Theory," pp. 188-191. STEARNS, R. E. (1967), A regularity test for pushdown machines, Inform. Contr. 11, 323-340. VALIANT, L. G. (1975), Regularity and related problems for deterministic pushdown automata, J. Assoc. Comput. Mach. 22, 1-10.