Problem Set 6 (Optional) Due: 5pm on Thursday, December 20

Similar documents
Languages and Automata

Problem Set 1 Due: 11:59pm Wednesday, February 7

CS 4120 Introduction to Compilers

University of Nevada, Las Vegas Computer Science 456/656 Fall 2016

Review main idea syntax-directed evaluation and translation. Recall syntax-directed interpretation in recursive descent parsers

Recursively Enumerable Languages, Turing Machines, and Decidability

Class Information ANNOUCEMENTS

Chapter 3: Lexing and Parsing

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Problem Set 3 Due: 11:59 pm on Thursday, September 25

CSE 401 Midterm Exam 11/5/10

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Compiler Construction

Parser. Larissa von Witte. 11. Januar Institut für Softwaretechnik und Programmiersprachen. L. v. Witte 11. Januar /23

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Bottom-Up Parsing. Lecture 11-12

Lecture Notes on Bottom-Up LR Parsing

Midterm I - Solution CS164, Spring 2014

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

1. [5 points each] True or False. If the question is currently open, write O or Open.

CS 164 Handout 11. Midterm Examination. There are seven questions on the exam, each worth between 10 and 20 points.

Topic 3: Syntax Analysis I

CS 406/534 Compiler Construction Putting It All Together

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994

Introduction to Lexing and Parsing

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

LR Parsing LALR Parser Generators

Syntax Analysis Part I

Theory of Computations Spring 2016 Practice Final Exam Solutions

Compiler Design Aug 1996

Midterm Exam II CIS 341: Foundations of Computer Science II Spring 2006, day section Prof. Marvin K. Nakayama

Question Points Score

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Lecture Notes on Bottom-Up LR Parsing

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

CSCI312 Principles of Programming Languages!

Syntax-Directed Translation. Lecture 14

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

Name: CS 341 Practice Final Exam. 1 a 20 b 20 c 20 d 20 e 20 f 20 g Total 207

Computer Sciences Department

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Specialized Grammars. Overview. Linear Grammars. Linear Grammars and Normal Forms. Yet Another Way to Specify Regular Languages

BSCS Fall Mid Term Examination December 2012

Compiler phases. Non-tokens

CS606- compiler instruction Solved MCQS From Midterm Papers

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Specifying Syntax COMP360

CMSC 330, Fall 2009, Practice Problem 3 Solutions

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016

Compiler Construction

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

Regular Expressions. A Language for Specifying Languages. Thursday, September 20, 2007 Reading: Stoughton 3.1

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Lecture 7: Deterministic Bottom-Up Parsing

Theory Bridge Exam Example Questions Version of June 6, 2008

15 212: Principles of Programming. Some Notes on Grammars and Parsing

Context-free grammars

Bottom-Up Parsing. Lecture 11-12

Problem Set 6 Due: 11:59 Sunday, April 29

Syntax Analysis, VII One more LR(1) example, plus some more stuff. Comp 412 COMP 412 FALL Chapter 3 in EaC2e. target code.

LR Parsing LALR Parser Generators

CS 536 Midterm Exam Spring 2013

Compiler Construction: Parsing

Lecture 8: Deterministic Bottom-Up Parsing

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Torben./Egidius Mogensen. Introduction. to Compiler Design. ^ Springer

Fall Compiler Principles Lecture 4: Parsing part 3. Roman Manevich Ben-Gurion University of the Negev

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

Syntax Analysis, III Comp 412

CSE3322 Programming Languages and Implementation

1. true / false By a compiler we mean a program that translates to code that will run natively on some machine.

Midterm I (Solutions) CS164, Spring 2002

Monday, September 13, Parsers

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Syntax Analysis, VI Examples from LR Parsing. Comp 412

Syntax. 2.1 Terminology

Wednesday, August 31, Parsers

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Top down vs. bottom up parsing

Conflicts in LR Parsing and More LR Parsing Types

(a) R=01[((10)*+111)*+0]*1 (b) ((01+10)*00)*. [8+8] 4. (a) Find the left most and right most derivations for the word abba in the grammar

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

Course Project 2 Regular Expressions

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

I have read and understand all of the instructions below, and I will obey the Academic Honor Code.

The Big Picture. Chapter 3

CSE3322 Programming Languages and Implementation

PS3 - Comments. Describe precisely the language accepted by this nondeterministic PDA.

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS. Regrades 10/6/15. Prelim 1. Prelim 1. Expression trees. Pointers to material

Appendix Set Notation and Concepts

CSE 401 Midterm Exam Sample Solution 2/11/15

Transcription:

CS235 Languages and Automata Handout # 13 Prof. Lyn Turbak December 12, 2007 Wellesley College Problem Set 6 (Optional) Due: 5pm on Thursday, December 20 All problems on this problem are completely optional. Any points you score on submitted problems can only help your grade, but not doing any problems cannot hurt your grade. You can use these problems to study the material from the final segment of the course for the final exam. If you prefer to see the solutions to certain problems without submitting them, that s OK; just let me know, and I will put them in your ps6 drop folder (as soon as they re ready, that is.) Problems that are particularly worthwhile to study for the final are: Problem 2 (Closure Properties), Problem 4 (Decidable Languages), Problem 5 (Undecidable Languages), and Problem 6 (Parsing). In terms of studying for the exam, you should focus on the non-programming parts of Problem 6 and only do the programming parts later if/when you have time. Notes: There are actually more parts to the parsing problem (Problem 6) than I have written up, but they are less important in terms of exam studying than the parts I have written up. So I m posting the problem now rather than holding onto to it. I may flesh out the remaining parts later if I have time. Also in the if I have time category: I may post an additional optional problem on an ML-Yacc parser for the Kitty language along with some ML-Yacc notes. This is not material that is tested on the final exam. Reading: Sipser, Chapters 3, 4, 5.1, 5.3; Appel Chapter 3; Stoughton Chapter 5. Submission: Submit a hardcopy submission packet of all problems that you decide to submit. If you do any of the programming problems, you should also submit a softcopy (consisting of your final ps6 directory) to the drop directory ~cs235/drop/ps6/username, where username is your username. To do this, execute the following commands in Linux: cd /students/username/cs235 cp -R ps6 ~cs235/drop/ps6/username/ Problem 1 [45]: Turing Machines a [25] Using the graphical transition notation presented in Lecture #36, give the definition of a Turing Machine that decides the language {a n b n c n n Nat}. You need not show an explicit reject state and may assume that any transition not listed goes to the reject state. b [20] Give high-level English description of Turing machines that decide the following languages: L 1 = {w w {a, b, c, d} and w has an equal number of as and bs and w has an equal number of cs and ds} L 2 = {a n b 2n a n n Nat} L 3 = {w w {a} and w = n 2 for some n Nat.} 1

Problem 2 [10]: Closure Properties a [5] Show that Dec (the set of recursive = Turing-decidable languages) is closed under intersection. b [5] Show that RE (the set of recursively enumerable = Turing-accceptable languages) is closed under intersection. Problem 3 [15]: Diagonalization Recall the following languages from lecture: HALT T M = { M, w M halts on w} ACCEP T T M = { M, w M accepts w} In lecture, we used diagonalization to prove that HALT T M is undecidable, and then used reduction of HALT T M to ACCEP T T M to prove that ACCEP T T M is undecidable. In this problem, use diagonalization to directly prove that ACCEP T T M is undecidable. Problem 4 [75]: Decidable Languages Show that the following languages are decidable by describing high-level algorithms for deciding the languages. Recall that if X is a specification of a language (regular expression, finite automata, context-free grammar, etc.), then L(X) is the language described by that specification. a BALANCED P ALINDROME = {w w {a, b}, w = w R, and the number of as in w = the number of bs in w} b ACCEP T AB Reg = { Reg Reg is a regular expression and ab L(Reg)} c ACCEP T AB NP DA = { NP DA NP DA is a nondeterministic pushdown automaton and ab L(NP DA)} d EIT HER CF G = { CF G 1, CF G 2, w CF G 1 and CF G 2 are context-free grammars and w L(CF G 1 ) or w L(CF G 2 )} e SUBSET DF A = { DF A 1, DF A 2 DF A 1 and DF A 2 are deterministic finite automata and L(DF A 1 ) L(DF A 2 )} f INF INIT E DF A = { DF A DF A is a deterministic finite automaton and L(DF A) is infinite} g P ALINDROMIC DF A = { DF A DF A is a deterministic finite automaton and L(DF A) = (L(DF A)) R } h EMP T Y CF G = { CF G CF G is a context-free grammar L(CF G) = } i SOME BS CF G = { CF G CF G is a context-free grammar over {a, b}* and L(b*) L(CF G) } j ALL BS CF G = { CF G CF G is a context-free grammar over {a, b}* and L(b*) L(CF G)} k EMP T Y T AKES 235 T M = { M M is a Turing Machine and M takes more than 235 steps on input %} l SOME T AKE 235 T M = { M M is a Turing Machine and M takes more than 235 steps on some input} m ALL T AKE 235 T M = { M M is a Turing Machine and M takes more than 235 steps on all inputs} 2

Problem 5 [75]: Undecidable Languages a Use reducibility to show that the following languages are undecidable. Although many of these results are a consequence of Rice s theorem, you should not invoke Rice s theorem in any of the parts. In your reducibility arguments You may assume that the following languages discussed in lecture are undecidable: HALT T M = { M, w M is a Turing Machine and M halts on w} ACCEP T T M = { M, w M is a Turing Machine and w L(M)} EMP T Y T M = { M M is a Turing Machine and L(M) = } EQ T M = { M 1, M 2 M 1 and M 2 are Turing Machines and L(M 1 ) = L(M 2 )} i. CF L T M = { M M is a Turing Machine and L(M) is a context-free language} ii. DEC T M = { M M is a Turing Machine and L(M) is a recursive (Turing-decidable) language} iii. INCLUDES A T M = { M M is a Turing Machine over {a, b}* and a L(M)} iv. ALL T M = { M M is a Turing Machine over {a, b}* and L(M) = L((a+b)*)} v. INF INIT E T M = { M M is a Turing Machine and L(M) is infinite.} vi. SOME EV EN T M = { M M is a Turing Machine over {a, b}* and L(((a+b)(a+b))*) L(M) } vii. ALL EV EN T M = { M M is a Turing Machine over {a, b}* and L(((a+b)(a+b))*) L(M)} viii. SUBSET T M = { M 1, M 2 M 1 and M 2 are Turing Machines and and L(M 1 ) L(M 2 )} b For each of the following languages mentioned above, indicate whether it is semi-decidable+ (i.e., in RE Dec), semi-decidable- (i.e., in co-re Dec), or not-even-semidecidable (i.e., in Lan (RE co-re)). i. INCLUDES A T M ii. iii. iv. ALL T M INF INIT E T M SOME EV EN T M v. ALL EV EN T M vi. SUBSET T M 3

Problem 6 [65]: Parsing Types This problem involves parsing types in a type language that is simple subset of Standard ML types. Here is a context-free grammar for these types: T BASE(t) T list T * T T -> T (T) In this grammar, BASE(t) stands for a token that is one of the base types of the type system: unit, bool, int, real, char, string the type T list stands for a type whose values are lists with elements of type T. the type T 1 * T 2 stands for a type whose values are pairs whose first component has type T 1 and whose second component has type T 2. (Unlike SML, this simple language does not have a notion of tuple types with more than two components.) the type T 1 -> T 2 stands for a type whose values are functions with argument type T 1 and result type T 2. Although not explicit in the grammar, it is intended that: list has a higher precdence than * and * has a higher precedence than ->. For example, int * char list -> bool list is parsed as if it were written (int * (char list)) -> (bool list) * is left-associative and -> is right-associative. For example, int * real * char -> string -> bool is parsed as if it were written ((int * real) * char) -> (string -> bool) Explicit parentheses can be used to override the default precedences and associativities. E.g., int * real * char list -> string list -> bool list would normally be parsed as ((int * real) * (char list)) -> ((string list) -> (bool list)) but we could explicilty parenthesize it instead as ((int * ((((real * char) list) -> string) list)) -> bool) list We will call the above grammar the infix grammar for types because the binary operators for products (*) and arrows (->) are placed between the two operaands. 4

In this problem, we explore analyzing and constructing parsers for several variants of this type language. All files mentioned can be found in the ps6 directory. a [10]: Ambiguity The above grammar is ambiguous. Demonstrate this by constructing all the possible parse for the type int * char -> bool list. b [20]: A Prefix Syntax for Types One way to remove ambiguity is to change the syntax for types. In this part we consider a prefix syntax for our types, in which all type operators come before their operands: E BASE(t) list E * E E -> E E i. [5] Explain why the prefix grammar LL(1) i.e., it can be used for predictive parsing by a recursive descent parser using only 1 token of lookahead. ii. [15] In the file TypesParserPrefix.sml, flesh out the definition of the function eattyp : unit -> AST.typ so that it performs recursive descent parsing on types written in prefix syntax. Your function should consume tokens from the implicit token stream in an imperative way using the nexttoken() function, which removes the first token from the token stream and returns it. Tokens are elements of the token data type from the Token module, which is presented in Fig. 1. The scanner for these tokens and all other supporting code has already been written for you; all you need to write is the eattyp function. Your function should return a parse tree expressed using the AST.typ data type from the AST module (Fig. 2). For example, parsing the string "-> * int char list bool" should yield the AST Notes: Arrow (Prod (Base Int,Base Char),List (Base Bool)) Load all the parser files for this problem using use "load-types-parser.sml"; You indicate an error in SML by raising an exception. The simplest exception to raise is a failure exception, which can be accomplished as follows: raise Fail error-message-string-goes-here You can test your prefix parser with the functions illustrated in the following example: TypesParserPrefix.stringToTyp "-> * int char list bool"; val it = Arrow (Prod (Base Int,Base Char),List (Base Bool)) : AST.typ - TypesParserPrefix.stringToTypString "-> * int char list bool"; val it = "((int * char) -> (bool list))" : string If the full types are not displaying, don t forget to set Control.Print.printDepth to a large value. E.g. Control.Print.printDepth := 1000; 5

structure Token = struct end datatype basety = Unit Bool Int Real Char String datatype token = EOF BASE of basety PROD ARROW LIST LPAREN RPAREN fun eof() = EOF fun iseof(eof) = true iseof(_) = false fun basetytostring(unit) = "unit" basetytostring(bool) = "bool" basetytostring(int) = "int" basetytostring(real) = "real" basetytostring(char) = "char" basetytostring(string) = "string" fun tostring(eof) = "[EOF]" tostring(base(b)) = "[" ^ (basetytostring(b)) ^ "]" tostring(prod) = "*" tostring(arrow) = "[->]" tostring(list) = "[list]" tostring(lparen) = "[(]" tostring(rparen) = "[)]" Figure 1: The Token module for the type language. structure AST = struct end datatype typ = Base of Token.baseTy List of typ Prod of typ * typ Arrow of typ * typ (* Construct a fully-parenthesized infix type string *) fun typtostring (Base(b)) = Token.baseTyToString b typtostring (List(t)) = "(" ^ (typtostring t) ^ " list)" typtostring (Prod(t1,t2)) = "(" ^ (typtostring t1) ^ " * " ^ (typtostring t2) ^ ")" typtostring (Arrow(t1,t2)) = "(" ^ (typtostring t1) ^ " -> " ^ (typtostring t2) ^ ")" (* Construct a string for a list of types *) fun typlisttostring t = "[" ^ (typeltstostring t) ^ "]" and typeltstostring [] = "" typeltstostring [t] = typtostring t typeltstostring (t::ts) = (typtostring t) ^ "," ^ (typeltstostring ts) Figure 2: The AST module for the type language. 6

c [35]: A Postfix Syntax for Types Another way to remove ambiguity is to change the syntax for types to a prefix syntax, in which all type operators come after their operands: O BASE(t) O list O O * O O -> i. [5] Explain why this postfix grammar is not predictive i.e., it is not LL(k) for any k. ii. [15] The postfix grammar is LR(0) i.e., it can be implemented by a shift-reduce parser in which the decision to shift or reduce depends only on the parser state, not on any tokens of lookahead. Give an informal explanation of how a shift-reduce parser can process a stream of tokens for the postfix grammar. (You do not have to construct parser states or parser tables for this part!) In which situations should the parser shift tokens to the stack? In which situations should the parser reduce elements on the stack? As part of your explanation, explain the steps of parsing the following token stream: int char * bool list -> EOF iii. [15] In the file TypesParserPostfix.sml, flesh out the definition of the function processtoken : (Token.token * AST.typ list) -> AST.typ list so that it performs shift-reduce parsing on types written in postfix syntax. As in TypesParserPrefix, you should consume tokens an implicit token stream in an imperative fashion via the nexttoken() function. In this file, a stack is represented as a list of type ASTs, with the top of the stack at the front of the list. In a more general shift-reduce parser, it would be necessary for the stack to hold both ASTs and tokens, but this parser is so simple that it combines each shift with the subsequent reduce into one step and can get by with just having a stack of type ASTs. The processtoken function takes the first token from the token stream and the current stack of type ASTs and returns the stack of type ASTs that results from processing the token (i.e., effectively pushing it on the stack and then immediately performing the appropriate reduction). You can test your postfix parser with the functions illustrated in the following example: - TypesParserPostfix.stringToTyp "int char * bool list ->"; val it = Arrow (Prod (Base Int,Base Char),List (Base Bool)) : AST.typ - TypesParserPostfix.stringToTypString "int char * bool list ->"; val it = "((int * char) -> (bool list))" : string 7