CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

Similar documents
Lecture 7: Deterministic Bottom-Up Parsing

Bottom-Up Parsing. Lecture 11-12

Lecture 8: Deterministic Bottom-Up Parsing

Bottom-Up Parsing. Lecture 11-12

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

shift-reduce parsing

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

CS606- compiler instruction Solved MCQS From Midterm Papers

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

CS143 Handout 20 Summer 2012 July 18 th, 2012 Practice CS143 Midterm Exam. (signed)

Lexical and Syntax Analysis. Bottom-Up Parsing

Midterm I (Solutions) CS164, Spring 2002

In One Slide. Outline. LR Parsing. Table Construction

UNIT III & IV. Bottom up parsing

Monday, September 13, Parsers

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

LR Parsing Techniques

Lecture Bottom-Up Parsing

Wednesday, August 31, Parsers

LR Parsing LALR Parser Generators

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Chapter 3: Lexing and Parsing

Conflicts in LR Parsing and More LR Parsing Types

S Y N T A X A N A L Y S I S LR

Downloaded from Page 1. LR Parsing

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

CS 164 Handout 11. Midterm Examination. There are seven questions on the exam, each worth between 10 and 20 points.

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

CS 4120 Introduction to Compilers

LR Parsing LALR Parser Generators

LALR Parsing. What Yacc and most compilers employ.

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

CS 314 Principles of Programming Languages

Principle of Compilers Lecture IV Part 4: Syntactic Analysis. Alessandro Artale

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.

CS 164 Programming Languages and Compilers Handout 9. Midterm I Solution

Compilers. Bottom-up Parsing. (original slides by Sam

Compiler Construction: Parsing

Midterm I - Solution CS164, Spring 2014

LR Parsing. Table Construction

LR Parsing. The first L means the input string is processed from left to right.

QUESTIONS RELATED TO UNIT I, II And III

Principles of Programming Languages

CS 314 Principles of Programming Languages. Lecture 3

General Overview of Compiler

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Context-free grammars

Action Table for CSX-Lite. LALR Parser Driver. Example of LALR(1) Parsing. GoTo Table for CSX-Lite

Bottom-Up Parsing II. Lecture 8

Formal Languages and Compilers Lecture VI: Lexical Analysis

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Compiler Design Aug 1996

ECE 468/573 Midterm 1 September 30, 2015

ECE251 Midterm practice questions, Fall 2010


CSE 401/M501 18au Midterm Exam 11/2/18. Name ID #

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

Example CFG. Lectures 16 & 17 Bottom-Up Parsing. LL(1) Predictor Table Review. Stacks in LR Parsing 1. Sʹ " S. 2. S " AyB. 3. A " ab. 4.

LR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States

Review: Shift-Reduce Parsing. Bottom-up parsing uses two actions: Bottom-Up Parsing II. Shift ABC xyz ABCx yz. Lecture 8. Reduce Cbxy ijk CbA ijk

LR Parsing - The Items

CS143 Handout 14 Summer 2011 July 6 th, LALR Parsing

CS453 : Shift Reduce Parsing Unambiguous Grammars LR(0) and SLR Parse Tables by Wim Bohm and Michelle Strout. CS453 Shift-reduce Parsing 1

Gujarat Technological University Sankalchand Patel College of Engineering, Visnagar B.E. Semester VII (CE) July-Nov Compiler Design (170701)

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

UNIVERSITY OF CALIFORNIA

Chapter 4. Lexical and Syntax Analysis

Syntax Analysis Part I

CS143 Midterm Fall 2008

ECE 468/573 Midterm 1 October 1, 2014

DEPARTMENT OF INFORMATION TECHNOLOGY / COMPUTER SCIENCE AND ENGINEERING UNIT -1-INTRODUCTION TO COMPILERS 2 MARK QUESTIONS

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

ECE 573 Midterm 1 September 29, 2009

VALLIAMMAI ENGNIEERING COLLEGE SRM Nagar, Kattankulathur

Compiler Construction Using

LALR stands for look ahead left right. It is a technique for deciding when reductions have to be made in shift/reduce parsing. Often, it can make the

Chapter 3 Parsing. time, arrow, banana, flies, like, a, an, the, fruit


2068 (I) Attempt all questions.

CS308 Compiler Principles Lexical Analyzer Li Jiang

LL(1) predictive parsing

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

COMPILER DESIGN. For COMPUTER SCIENCE

R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

CT32 COMPUTER NETWORKS DEC 2015

Optimizing Finite Automata

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Formal Languages and Compilers Lecture VII Part 4: Syntactic A

Transcription:

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution Exam Facts Format Wednesday, July 20 th from 11:00 a.m. 1:00 p.m. in Gates B01 The exam is designed to take roughly 90 minutes, though you will have two hours to complete the exam. The exam is open notes, open book, open computer, but closednetwork, meaning that you can download any of the handouts and slides that you'd like in advance, but must not use the Internet, IM, email, etc. during the exam. Material The midterm will cover material on the lectures beginning at scanning and ending at Earley parsing. This means you should fully understand the lexical and syntax analysis phases of traditional compilation. Material similar to the homework will be emphasized, although you are responsible for all topics presented in lecture and in the handouts. Make sure that you have an intuition for the concepts as well as an understanding, since some of the questions will ask you to think critically about the different scanning and parsing algorithms. Possible topics for the midterm include Overview of a compiler terminology, general task breakdown Regular expressions writing and understanding regular expressions, NFAs, and DFAs; conversion from regular expressions to NFAs; and conversion from NFAs to DFAs. Lexical analysis scanner generators, flex Grammars Parse trees, derivations, ambiguity, writing simple grammars Top down parsing LL(1) grammars, grammar conditioning (removing ambiguity, left recursion, left factoring), FIRST and FOLLOW set computation, table driven predictive parsing Shift reduce parsing building LR(0) and LR(1) configurating sets; parse tables, tracing parser operation; shift/reduce and reduce/reduce conflicts; differences between LR, SLR, and LALR; and construction of SLR and LALR lookaheads.

2 Earley parsing scanning, completion, and prediction; recognition versus parsing; parse forest grammars; the Earley on DFA algorithm. Comparisons between parsing techniques advantages/disadvantages: grammar restrictions, space/time efficiency, error handling, etc. The rest of this handout a modified and extended version of a midterm that was given a few years ago. Solutions to all problems are given at the end of this handout, but we encourage you to not look at them until you have worked through the problems yourself. Understand that we are in no way obligated to mimic the problem format of the exam. Everything up to and including Earley parsing is fair game. The actual midterm will not have as many questions as are covered here. This is mostly designed to give you a sample of what we might ask you.

3 Problem 1: Finite Automata and Regular Grammars Draw a deterministic finite automata that accepts the set of all strings over (a + b)* that contain either aba or bb (or both) as substrings. Then, present a context free grammar that generates the same exact language. Problem 2: LL(1) Parsing Consider the following grammar for simple LISP expressions: List ( Sequence ) Sequence Sequence Cell Sequence ε Cell List Cell Atom Atom a where a, (, and ) are terminal symbols, all other symbols are nonterminal symbols, with List being the start symbol. a) Why is the grammar as given not LL(1)? b) Transform the grammar into a form that is LL(1). Do not introduce any new nonterminals. c) Compute the FIRST and FOLLOW sets for all nonterminals in your LL(1) grammar. d) Construct an LL(1) parse table for your resulting grammar.

4 Problem 3: LR(1) Configurating Sets Given the following already augmented grammar: S ' S S aab ecs abb A aa a B cd Be ϵ Draw a goto graph of exactly those states of an LR(1) parser that are pushed on the parse stack during a parse of the input aaaaee (this will be a subset of all configurating sets). If you accidentally construct irrelevant states, cross them out. Do not add or change any productions in the grammar. Problem 4: bison and Ambiguity The following input file is a bison specification for strings of balanced parentheses. The scanner associated with it just returns each character read as an individual token. %% S : S S '(' S ')' '(' ')' ; a) Show that the grammar as given is ambiguous by providing two different parse trees for the smallest string to illustrate the ambiguity. b) The ambiguity creates a conflict in building the bison parser. Identify at what state in the parse the conflict is encountered, on which input tokens, and what kind of conflict it is. c) The conflict can be resolved by choosing one of the two conflicting actions. Which of the two choices will create a more efficient parser? Briefly explain why. d) Which of the two choices will be chosen by bison's default conflict resolution rules? e) Add the necessary bison precedence directives to use the alternate choice to resolve the conflict. Problem 5: Compare and Contrast Consider the bottom up parsing algorithms of SLR(1) and LALR(1). Briefly note what distinguishes these techniques in terms of constructing the parser and the behavior on erroneous input. (Do not describe what is the same, only what is different).

5 Problem 6: Earley Parsing (i) Trace the execution of the Earley recognizer on the string baab for the grammar S E E beb aea ε Show the resulting item sets. Does the recognizer accept this string? (ii) A prefix of a string T is a (possibly empty) substring of T that starts at the first character. For example, graph is a prefix of graphical, but hica is not. The empty string is a prefix of all strings. Using your result from part (i), determine all prefixes of baab that are in the language of the grammar. Justify your result. (iii) Suppose that we wanted to consider the programming language JavaLite, which is like Java but without the keywords double or float. To do this, we will use the Earley on DFA algorithm to filter the standard Java grammar through a finite automaton that will only accept strings not containing these tokens. Assuming that T is the set of legal tokens in Java (this includes tokens like identifiers, keywords, and specifically the keywords double and float), devise a DFA that we could feed into the Earley on DFA algorithm to disallow any program using the keywords double or float. Justify your answer.

6 Solution 1: Finite Automata and Regular Grammars Draw a deterministic finite automata that accepts the set of all strings that contain either aba or bb (or both) as substrings. Then, present a context free grammar that generates the same exact language. Here s the DFA. My automaton accepts the language, though it s better phrased as the language that accepts bb and the set of all strings containing either aba or abb as substrings. a start a b a a b b ab b a,b a,b The most straightforward context free grammar is S RabaR S RbbR R br R ar R ε S sets the constraint that either aba or bb needs to sit in between two random strings of length 0 or more. If I were truly evil, I would have constrained your grammar to be right regular, in which cause you would have needed to emulate your DFA that that a leftmost derivation corresponded to some path through the automaton. Here s the right regular grammar that maps to the above machine: S aa S bb A aa A baf A bbf B aa B bf F af F bf F ε

7 Solution 2: LL(1) Parsing Consider the following grammar for simple LISP expressions: List ( Sequence ) Sequence Sequence Cell Sequence ε Cell List Cell Atom Atom a where a, (, and ) are terminal symbols, all other symbols are nonterminal symbols, with List being the start symbol. a) Why is the grammar as given not LL(1)? The grammar as specified is left recursive, and that interferes with the ability of a predictive parser to do its think given just one lookahead token. b) Transform the grammar into a form that is LL(1). Do not introduce any new nonterminals. Left recursion: bad. Right recursion: not bad. Had the cells been comma delimited, things would have been more complicated. List ( Sequence ) Sequence Cell Sequence Sequence ε Cell List Cell Atom Atom a c) Compute the First and Follow sets for all nonterminals in your LL(1) grammar. First(List) = { ( } First(Sequence) = { a, (, ε } First(Cell) = { a, ( } First(Atom) = { a } Follow(List) = { a, (, ), $} Follow(Sequence) = { ) } Follow(Cell) = { a, (, ) } Follow(Atom) = { a, (, ) }

8 d. Construct an LL(1) parse table for your resulting grammar. If we number the new production as (1) List ( Sequence ) (2) Sequence Cell Sequence (3) Sequence ε (4) Cell List (5) Cell Atom (6) Atom a then the table would look as follows: ( ) a $ List 1 Sequence 2 3 2 Cell 4 5 Atom 6

9 Solution 3: LR(1) Configurating Sets Given the following already augmented grammar: S ' S S aab ecs abb A aa a B cd Be ϵ Draw a goto graph of exactly those states of an LR(1) parser that are pushed on the parse stack during a parse of the input aaaaee (this will be a subset of all configurating sets). If you accidentally construct irrelevant states, cross them out. Do not add or change any productions in the grammar. start S' S $ S aab $ S ecs $ S abb $ S a S a AB $ S a Bb $ A aa c/e/$ A a c/e/$ B cd b/e B Be b/e B b/e A a A a A A a A aa A a A aa A c/e/$ c/e/$ c/e/$ c/e/$ c/e/$ a S' S $ S aa B $ B cd $/e B Be $/e B $/e B S aab $ B B e $/e e B Be $/e

10 Solution 4: bison and Ambiguity The following input file is a bison specification for strings of balanced parentheses. The scanner associated with it just returns each character read as an individual token. %% S : S S '(' S ')' '(' ')' ; a) Show that the grammar as given is ambiguous by providing two different parse trees for the smallest string to illustrate the ambiguity. The string ()()() has two different parse trees. Both effectively include S SS, but one has the first of those two S s split, and the other has the second of those two S s split. S S S S S S S S ( ) ( ) S S ( )( ) ( )( ) b) The ambiguity creates a conflict in building the bison parser. Identify at what state in the parse the conflict is encountered, on which input tokens, and what kind of conflict it is. You can almost bet up front that it ll be a shift reduce conflict, since reduce reduce conflicts are rare. But we need to be a little more scientific than that, so we need to figure out what configurating set causes the problem. At some point during the parse, two or more S s will be on top of the stack, and '(' will be the lookahead character. We could shift the '(' and work toward what s supposed to be another S, or we could reduce S S S and pop the top two states onto the stack and replace it with a new one. That s a shift reduce conflict.

11 c) The conflict can be resolved by choosing one of the two conflicting actions. Which of the two choices will create a more efficient parser? Briefly explain why. The more efficient option would be to reduce, since it would keep the stack depth to a minimum. d) Which of the two choices will be chosen by bison's default conflict resolution rules? bison always chooses the shift by default, and it tells you so. e) Add the necessary bison precedence directives to use the alternate choice to resolve the conflict. To force bison to reduce instead, we need to set the precedence of the rule being reduced to be higher than the token being shifted. Add these precedence directives: %nonassoc '(' %nonassoc Higher S : S S %prec Higher You can also use one precedence directive as long as it is left associative: %left '(' S : S S %prec '('

12 Solution 5: Compare and Contrast Consider the bottom up parsing algorithms of SLR(1) and LALR(1). Briefly note what distinguishes these techniques in terms of constructing the parser and the behavior on erroneous input. (Do not describe what is the same, only what is different). Table construction: An SLR(1) parser uses only LR(0) configurations, which have no context, and don't require computing and propagating lookaheads. Reduction actions are entered in the table for all terminals in the follow set. An LALR(1) parser is prepared to uses LR(1) configurating sets, merging states whenever it can, to arrive at a goto graph that s isomorphic to that of the SLR(1) parser. The table, however, only includes a reduce action when the lookahead character is in the lookahead set (as opposed to the full follow set). So the LALR(1) parsing table is more sparsely populated than the corresponding SLR(1) table. Erroneous input: An LALR(1) parser never shifts any erroneous tokens and never reduces any production in error. Any error is detected immediately at first sight of the erroneous token and no further actions are taken. Because it has a more precise context, an LALR(1) parser may be able to report the error a little more accurately. An SLR(1) parser never shifts any erroneous token but may make fruitless reductions. If the next input is a member of the follow set but not in the particular lookahead, the SLR(1) parser will reduce. It will never shift an erroneous token, though.

13 Solution 6: Earley Parsing (i) Trace the execution of the Earley recognizer on the string baab for the grammar S E E beb aea ε Show the resulting item sets. Does the recognizer accept this string? The item sets are: Item Set 1 Item Set 2 Item Set 3 Item Set 4 Item Set 5 S E @1 E beb @1 E aea @1 E @1 S E @1 E b Eb @1 E beb @2 E aea @2 E @2 E be b @1 E a Ea @2 E aea @3 E beb @3 E @3 E ae a @2 E a Ea @3 E aea @2 E be b @1 E aea @4 E beb @4 E @4 E ae a @3 E beb @1 E b Eb @4 S E @1 E aea @5 E beb @5 E E be b @4 Because the last item set contains the item S E @1, the recognizer accepts the string. (ii) A prefix of a string T is a (possibly empty) substring of T that starts at the first character. For example, graph is a prefix of graphical, but hica is not. The empty string is a prefix of all strings. Using your result from part (i), determine all prefixes of baab that are in the language of the grammar. Justify your result. Any item set containing S E @1 represents a prefix of the string that is in the grammar. Since this item is only in the very first and very last item set, the only prefixes of the string that are accepted are the empty string and the complete string baab.

14 (iii) Suppose that we wanted to consider the programming language JavaLite, which is like Java but without the keywords double or float. To do this, we will use the Earley on DFA algorithm to filter the standard Java grammar through a finite automaton that will only accept strings not containing these tokens. Assuming that T is the set of legal tokens in Java (this includes tokens like identifiers, keywords, and specifically the keywords double and float), devise a DFA that we could feed into the Earley on DFA algorithm to disallow any program using the keywords double or float. Justify your answer. One DFA is as follows. The automaton dies if it sees double or float, so any string containing those tokens will be rejected. The only remaining state is accepting, so any strings that remain are accepted. start T float double