CS143 Handout 20 Summer 2012 July 18 th, 2012 Practice CS143 Midterm Exam. (signed)

Similar documents
CS143 Midterm Fall 2008

Midterm I - Solution CS164, Spring 2014

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

LR Parsing - The Items

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Chapter 2 :: Programming Language Syntax

Midterm I (Solutions) CS164, Spring 2002

Lecture Bottom-Up Parsing

Lexical and Syntax Analysis

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

CS 164 Handout 11. Midterm Examination. There are seven questions on the exam, each worth between 10 and 20 points.

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Example CFG. Lectures 16 & 17 Bottom-Up Parsing. LL(1) Predictor Table Review. Stacks in LR Parsing 1. Sʹ " S. 2. S " AyB. 3. A " ab. 4.

UNIT III & IV. Bottom up parsing

CSE 401 Midterm Exam 11/5/10

Formal Languages and Compilers Lecture VI: Lexical Analysis

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

CS 164 Programming Languages and Compilers Handout 9. Midterm I Solution

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

SLR parsers. LR(0) items

Lexical and Syntax Analysis. Bottom-Up Parsing

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

Context-Free Grammars

Bottom-Up Parsing II. Lecture 8

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

LECTURE 7. Lex and Intro to Parsing

Introduction to Parsing. Lecture 8

4. Lexical and Syntax Analysis

CS143 Midterm Sample Solution Fall 2010

CS103 Handout 42 Spring 2017 May 31, 2017 Practice Final Exam 1

CSE 401/M501 18au Midterm Exam 11/2/18. Name ID #

BSCS Fall Mid Term Examination December 2012

Zhizheng Zhang. Southeast University

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Downloaded from Page 1. LR Parsing

CS143 Midterm Spring 2014

Context-free grammars

Implementation of Lexical Analysis

Syntax Analysis, III Comp 412

COP4020 Spring 2011 Midterm Exam

UNIT -2 LEXICAL ANALYSIS

Implementation of Lexical Analysis

Compiler Construction: Parsing

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

Bottom Up Parsing Handout. 1 Introduction. 2 Example illustrating bottom-up parsing

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing.

Ambiguous Grammars and Compactification

4. Lexical and Syntax Analysis

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Bottom-Up Parsing. Lecture 11-12

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

UNIVERSITY OF CALIFORNIA

shift-reduce parsing

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

UNIT-III BOTTOM-UP PARSING

Talen en Compilers. Johan Jeuring , period 2. January 17, Department of Information and Computing Sciences Utrecht University

CSE302: Compiler Design

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

Lexical Analysis. Lecture 2-4

LR Parsers. Aditi Raste, CCOEW

CS103 Handout 13 Fall 2012 May 4, 2012 Problem Set 5

Lecture 7: Deterministic Bottom-Up Parsing

CSCE 531 Spring 2009 Final Exam

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

Lexical Analysis. Lecture 3-4

MODULE 14 SLR PARSER LR(0) ITEMS

Bottom-Up Parsing. Lecture 11-12

Syntax Analysis Part I

Lexical Analysis. Chapter 2

LL Parsing, LR Parsing, Complexity, and Automata

S Y N T A X A N A L Y S I S LR

Syntax Analysis, III Comp 412

CSc 453 Lexical Analysis (Scanning)

Lecture 8: Deterministic Bottom-Up Parsing

CS 4120 Introduction to Compilers

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

UVa ID: NAME (print): CS 4501 LDI Midterm 1

VIVA QUESTIONS WITH ANSWERS

Introduction to Lexical Analysis

2. λ is a regular expression and denotes the set {λ} 4. If r and s are regular expressions denoting the languages R and S, respectively

Syntax. 2.1 Terminology

CS 314 Principles of Programming Languages

Chapter 4. Lexical and Syntax Analysis

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

CSC 467 Lecture 3: Regular Expressions

CSE 401 Midterm Exam Sample Solution 2/11/15

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

CS 536 Midterm Exam Spring 2013

Implementation of Lexical Analysis

2068 (I) Attempt all questions.

Transcription:

CS143 Handout 20 Summer 2012 July 18 th, 2012 Practice CS143 Midterm Exam This midterm exam is open-book, open-note, open-computer, but closed-network. This means that if you want to have your laptop with you when you take the exam, that's perfectly fine, but you must not use a network connection. You should only use your computer to look at notes you've downloaded from online in advance. SUNetID: Last Name: First Name: I accept both the letter and the spirit of the honor code. I have not received any assistance on this test, nor will I give any. (signed) You have two hours to complete this midterm. There are sixty total points, which means that as a rough heuristic you might want to spend about two minutes per point. This midterm is worth 20% of your total grade in this course. Good luck! Question Points Grader (1) C-Style Comments (8) /8 (2) LL Parsing (16) /16 (3) LR Parsing (20) /20 (4) Right-to-Left Parsing (16) /16 (60) /60 The actual midterm exam will have space in which you can write your answers. In order to save paper, I've taken most of the whitespace out of this handout.

2 / 10 Problem 1: C-Style Comments (8 points total) Perhaps the hardest part of the first programming assignment was correctly handling C-style comments, which begin with /* and end with */. The challenge with matching these comments is, as you probably realized, that the obvious regular expression for C-style comments: ("/*").*("*/") does not interact correctly with maximal-munch. In particular, if this regular expression is run using maximal-munch on the string /* Comment. */ "No Comment." /* Comment. */ it will match the entire string as a comment, even though the intended tokenization is as a comment, then a string literal, then another comment. In this question, you'll consider two proposed solutions to the problem. (i) flex with Custom NFAs Suppose that an open-source contributor adds an extension to flex that allows you to describe lexemes to match using finite automata instead of regular expressions. The rules of maximalmunch still apply to these automata, so if an automaton can match a string in multiple ways, flex will pick the longest such match. In the space below, construct an NFA that correctly accepts C-style comments, even when using maximal-munch. For simplicity, you can assume that the alphabet consists of /, *, and the character a (which represents something other than slash or star ). Be sure to label the start state and any accepting states. You do not need to construct the automaton using the RE-to-NFA algorithm from class.

3 / 10 (ii) Negated Regular Expressions Another potential solution to the problem would be to support negated regular expressions. We augment the standard regular expression operators with a new operator,!, which matches anything except the given regular expression. For example, the regular expression a! matches any string except the single-character string a, while ("/*")(.*"*/".*!)("*/") would match C-style comments by matching the open comment, then any string that doesn't contain a close comment symbol, and finally a close-comment symbol. In order for this operator to be useful in flex, we need to come up with a construction that will convert regular expressions that use the! operator into automata. Consider the following proposed construction for converting a negated regular expression R! to an automaton: 1. Construct the automaton for R using the algorithm covered in class. 2. Create a new accepting state. 3. For each state in R except for R's accepting state, add an ε-transition from that state to the new accepting state. 4. Make the old accepting state of R no longer accepting. For example, here is a side-by-side comparison of the automata for the regular expressions a and a!, along with those for ab and (ab)! a start a ab start a ε b a! start a ε (ab)! start a ε b ε ε ε This construction contains an error that, for certain regular expressions using!, will cause the generated automaton to be incorrect. Find such a regular expression, show the resulting automaton, and explain why it is incorrect.

4 / 10 Problem 2: LL(1) Parsing (16 points total) Consider the following context-free grammar, which describes C-style function declarations involving pointers: Function Type id (Arguments) Type id Type Type* Arguments ArgList Arguments ε ArgList Type id, ArgList ArgList Type id For example, this grammar could generate declarations like these: id* id() id** id(id*** id) id id(id id, id* id) For reference, the terminals in this grammar are id ( ) *, $ where $ is the end-of-input marker, and the nonterminals are Function Type Arguments ArgList (i) The First Step is Admitting It (2 Points) As written, this grammar is not LL(1). Identify the conflicts in the grammar that make it not LL(1) and explain each.

5 / 10 (ii) Repairing the Grammar Based on the conflicts you identified above, modify the grammar so that it is LL(1). You may introduce new nonterminals if you wish, but do not change any rules besides those you identified in part (i) as causing the problem. Show the resulting grammar below. (iii) FIRST sets (3 Points) Construct the FIRST sets for each nonterminal in the modified grammar. Show your result below. (iv) FOLLOW sets (3 Points) Construct the FOLLOW sets for each nonterminal in the modified grammar. Show your result below. (v) Building the Parse Table Using your results from parts (ii), (iii), and (iv), construct the LL(1) parse table for your updated grammar. If you identify any LL(1) conflicts, don't worry; just put all applicable entries in the table. In other words, if you discover now that your grammar is not LL(1), we won't hold that against you for this part of the problem.

6 / 10 Problem 3: LR Parsing (20 points total) In this problem, we will explore how several different LR parsing algorithms handle or fail to handle a particular grammar: S X X DE D PD D ε E QE E ε P id id Q id = id Below is an LR(0) automaton for this language. We've numbered the states for your convenience. 2 3 S X P 4 start 1 X S X X DE D PD D P id id D P id D P D D PD D P id id id P id id id D D PD 5 6 P id id 7 X D E E QE E Q id=id 11 E X DE Q id 8 9 10 = id Q id =id Q id= id Q id=id 12 id E Q E E QE E Q id=id Q E 13 E QE The terminals in this grammar are id = $ where $ is the end-of-input symbol. Feel free to tear out this sheet as a reference.

7 / 10 (i) LR(0) (2 Points) This grammar is not LR(0). Why not? (ii) SLR(1) Is this grammar SLR(1)? If so, why? If not, why not? (iii) LALR(1), Part One In this next section, you'll determine the lookaheads for each reduce item to see if the grammar is LALR(1) by using the LALR(1)-by-SLR(1) algorithm. As a first step, create the augmented LALR(1) grammar for this language using the algorithm covered in lecture. Show your result.

8 / 10 (iv) LALR(1), Part Two (3 Points) Compute the FOLLOW sets for each nonterminal in the LALR(1)-augmented grammar. Show your result below. (v) LALR(1), Part Three Is this grammar LALR(1)? If so, use your answers from parts (iii) and (iv) to prove why. If not, use your answers from parts (iii) and (iv) to prove why not. (Even if you already know for a fact whether this grammar is LALR(1) or not, you must use the results from the last two problems to back up your claim) (vi) LR(1) (3 Points) Is this grammar LR(1)? If so, why? If not, why not?

9 / 10 Problem 4: Right-to-Left Parsing (16 points total) In lecture, we talked about seven parsing algorithms that work from the left to the right: Leftmost DFS, LL(1), LR(0), SLR(1), LALR(1), LR(1), and Earley. In the first written assignment, you had the opportunity to see what happens when you change the left-to-right scanning algorithm we covered in class to work from right-to-left. In this question, you'll get to see what happens when you change the left-to-right parsing algorithms we covered in class to work from right-toleft. (i) RR(1) Parsing An RR(1) parser is similar to an LL(1) parser except that it works from right-to-left, tracing out a top-down, rightmost derivation. The parser continuously consults the rightmost symbol of the derivation and the rightmost symbol of the input string that has not yet been matched to make its decisions about what production to use. The construction and operation of this parser is analogous to the LL(1) case, except that instead of having FIRST and FOLLOW sets we have LAST and PRECEDE sets, where LAST(A) is the set of terminals that could come at the end of a production of A (plus ε if A can derive the empty string), and PRECEDE(A) is the set of the terminals that could come before a production of A in some sentential form. Similarly, instead of having an end-of-input marker, we have a beginning-of-input marker, which we denote $. Is there a grammar that is RR(1) but not LL(1)? If so, show what it is and explain both why it is not LL(1) and why it is RR(1). If not, explain why not. (ii) RL(0) Parsing An RL(0) parser is similar to an LR(0) parser except that it scans the input from right-to-left. We can speak of RL(0) items as productions with a dot marking the position of what has been matched so far from the right side instead of the left. For example, if we have an RL(0) item of the form A x y, it means that we have matched y so far and are looking forward to matching x. Similarly, whereas an LR(0) reduce item has the form A v, with the dot at the end, an RL(0) reduce item has the form A v, with a dot at the beginning, since it means we've matched all of v in reverse order. Is there a grammar that is RL(0) but not LR(0)? If so, show what it is and explain both why it is not LR(0) and why it is RL(0). If not, explain why not.

10 / 10 (iii) Reverse Earley Parsing A reverse Earley parser is like a standard Earley parser, except that the SCAN step moves from right-to-left, the PREDICT step makes predictions based on nonterminals appearing before the dot, and the COMPLETE step looks at items with the dot at the beginning rather than the end. Instead of having each item track its start position, items track their end positions. Also, rather than putting the item S E @1 into the first item set at the start of the algorithm, we place the analogous item S E @n+1 into the (n+1)st item set at the start. The parser reports that the string has been parsed successfully if it finds S E @n+1 in the first item set at the end of parsing. Is there a grammar that can be parsed with a reverse Earley parser but not an Earley parser? If so, show what it is and explain why it could be parsed with the reverse Earley parser but not the Earley parser. If not, explain why not. (iv) Comparison with Right-to-Left Scanning (4 points) In the first problem set, you saw that certain sets of regular expressions would tokenize particular strings differently based on whether the scan was done from left-to-right or from right-to-left. In this question, you will consider the analogous question for right-to-left parsers. Is there a grammar that has all three of the following properties? 1. The grammar is LR(0). 2. The grammar is RL(0). 3. The parse tree produced by the LR(0) parser is not the same as the parse tree produced by the RL(0) parser. If so, give an example of one and exhibit the parse trees produced by the LR(0) and RL(0) parsers. If not, explain why not.