Fall Compiler Principles Lecture 4: Parsing part 3. Roman Manevich Ben-Gurion University of the Negev

Similar documents
Fall Compiler Principles Lecture 5: Parsing part 4. Roman Manevich Ben-Gurion University

Agenda. Previously. Tentative syllabus. Fall Compiler Principles Lecture 5: Parsing part 4 12/2/2015. Roman Manevich Ben-Gurion University

Compilers. Bottom-up Parsing. (original slides by Sam

COMP 181. Prelude. Prelude. Summary of parsing. A Hierarchy of Grammar Classes. More power? Syntax-directed translation. Analysis

Compilers and Language Processing Tools

Last Time. What do we want? When do we want it? An AST. Now!

Context-free grammars

Abstract Syntax. Mooly Sagiv. html://

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Compiler Construction: Parsing

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

Using JFlex. "Linux Gazette...making Linux just a little more fun!" by Christopher Lopes, student at Eastern Washington University April 26, 1999

In One Slide. Outline. LR Parsing. Table Construction

Conflicts in LR Parsing and More LR Parsing Types

Compilation 2013 Parser Generators, Conflict Management, and ML-Yacc

CUP. Lecture 18 CUP User s Manual (online) Robb T. Koether. Hampden-Sydney College. Fri, Feb 27, 2015

JavaCUP. There are also many parser generators written in Java

For example, we might have productions such as:

Bottom-Up Parsing. Lecture 11-12

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Lecture 8: Deterministic Bottom-Up Parsing

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

A clarification on terminology: Recognizer: accepts or rejects strings in a language. Parser: recognizes and generates parse trees (imminent topic)

LR Parsing - The Items

Lecture 7: Deterministic Bottom-Up Parsing

Bottom-Up Parsing. Lecture 11-12

Lecture Bottom-Up Parsing

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

LR Parsing LALR Parser Generators

Lexical and Syntax Analysis. Bottom-Up Parsing

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

LR Parsing LALR Parser Generators

Topic 5: Syntax Analysis III

LR Parsing. Table Construction

CSE 401 Midterm Exam Sample Solution 11/4/11

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

CS606- compiler instruction Solved MCQS From Midterm Papers

Topic 3: Syntax Analysis I

SLR parsers. LR(0) items

shift-reduce parsing

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Lexical Analysis. Textbook:Modern Compiler Design Chapter 2.1

S Y N T A X A N A L Y S I S LR

Syntax-Directed Translation. Lecture 14

Monday, September 13, Parsers

Formal Languages and Compilers Lecture VII Part 4: Syntactic A

How do LL(1) Parsers Build Syntax Trees?

CS 11 Ocaml track: lecture 6

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Fall Compiler Principles Lecture 3: Parsing part 2. Roman Manevich Ben-Gurion University

CSE302: Compiler Design

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Principle of Compilers Lecture IV Part 4: Syntactic Analysis. Alessandro Artale

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Wednesday, August 31, Parsers

FROWN An LALR(k) Parser Generator

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Compiler Construction

Properties of Regular Expressions and Finite Automata

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

Fall Compiler Principles Lecture 6: Intermediate Representation. Roman Manevich Ben-Gurion University of the Negev

Review: Shift-Reduce Parsing. Bottom-up parsing uses two actions: Bottom-Up Parsing II. Shift ABC xyz ABCx yz. Lecture 8. Reduce Cbxy ijk CbA ijk

CSE302: Compiler Design

Compiler Construction

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

LR Parsers. Aditi Raste, CCOEW

Last time. What are compilers? Phases of a compiler. Scanner. Parser. Semantic Routines. Optimizer. Code Generation. Sunday, August 29, 2010

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Lexical Analysis. Textbook:Modern Compiler Design Chapter 2.1.

Lexical and Syntax Analysis

Example CFG. Lectures 16 & 17 Bottom-Up Parsing. LL(1) Predictor Table Review. Stacks in LR Parsing 1. Sʹ " S. 2. S " AyB. 3. A " ab. 4.

CSE 401 Midterm Exam 11/5/10

Fall Compiler Principles Lecture 5: Intermediate Representation. Roman Manevich Ben-Gurion University of the Negev

Context-Free Grammars

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2003 Columbia University Department of Computer Science


3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Syntax-Directed Translation

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

Concepts Introduced in Chapter 4

CS 4120 Introduction to Compilers

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

A simple syntax-directed

Lecture Notes on Bottom-Up LR Parsing

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.

3. Parsing. Oscar Nierstrasz

Transcription:

Fall 2016-2017 Compiler Principles Lecture 4: Parsing part 3 Roman Manevich Ben-Gurion University of the Negev

Tentative syllabus Front End Intermediate Representation Optimizations Code Generation Scanning Operational Semantics Dataflow Analysis Register Allocation Top-down Parsing (LL) Lowering Loop Optimizations Energy Optimization Bottom-up Parsing (LR) Instruction Selection mid-term exam 2

Previously LR(0) parsing Running the parser Constructing transition diagram Constructing parser table Detecting conflicts SLR(0) Eliminating conflicts via FOLLOW sets 3

Agenda LR(1) LALR(1) Automatic LR parser generation Handling ambiguities 4

Going beyond SLR(0) Some common language constructs introduce conflicts even for SLR (0) S S (1) S L = R (2) S R (3) L * R (4) L id (5) R L 5

q0 S S S L = R S R L * R L id R L R S L q3 q1 q2 S R S S S L = R R L = q9 S L = R q6 R * * q4 id id q5 L id id S L = R R L L * R L id L * R R L L * R L id * L q8 L R L R L * R q7 6

shift/reduce conflict (0) S S (1) S L = R (2) S R (3) L * R (4) L id (5) R L q2 S L = R R L = q6 S L = R vs. R L FOLLOW(R) contains = S L = R * R = R SLR cannot resolve conflict S L = R R L L * R L id 7

Inputs requiring shift/reduce (0) S S (1) S L = R (2) S R (3) L * R (4) L id q2 (5) R L S L = R R L For the input id the rightmost derivation S S R L id requires reducing in q2 For the input id = id S S L = R L = L L = id id = id requires shifting = q6 S L = R R L L * R L id 8

LR(1) grammars In SLR: a reduce item N α is applicable only when the lookahead is in FOLLOW(N) But for a given context (state) are all tokens in FOLLOW(N) indeed possible? Not always We can compute a context-sensitive (i.e., specific to a given state) subset of FOLLOW(N) and use it to remove even more conflicts LR(1) keeps lookahead with each LR item Idea: a more refined notion of FOLLOW computed per item 9

LR(1) item Already matched To be matched Input N α β, t Hypothesis about αβ being a possible handle: so far we ve matched α, expecting to see β and after reducing N we expect to see the token t 10

LR(1) items LR(1) item is a pair LR(0) item Lookahead token Meaning We matched the part left of the dot, looking to match the part on the right of the dot, followed by the lookahead token Example The production L id yields the following LR(1) items LR(1) items (0) S S (1) S L = R (2) S R (3) L * R (4) L id (5) R L LR(0) items [L id] [L id ] [L id, *] [L id, =] [L id, id] [L id, $] [L id, *] [L id, =] [L id, id] [L id, $] 11

Computing Closure for LR(1) For every [A α Bβ, c] in S for every production B δ and every token b in the grammar such that b FIRST(βc) Add [B δ, b] to S 12

* q0 (S S, $) (S L = R, $) (S R, $) (L * R, = ) (L id, = ) (R L, $ ) (L id, $ ) (L * R, $ ) q4 * (L * R, =) (R L, =) (L * R, =) (L id, =) (L * R, $) (R L, $) (L * R, $) (L id, $) id R id q5 R L S q3 (S R, $) q1 (S S, $) q2 (S L = R, $) (R L, $) (L id, $) (L id, =) q7 L (L * R, =) (L * R, $) q6 q9 (S L = R, $) = R (S L = R, $) (R L, $) (L * R, $) (L id, $) q8 (R L, =) (R L, $) L * q11 (L id, $) id q12 (R L, $) q13 q10 (L * R, $) (R L, $) (L * R, $) (L id, $) R (L * R, $) id 13

Back to the conflict q2 (S L = R, $) (R L, $) = q6 (S L = R, $) (R L, $) (L * R, $) (L id, $) Is there a conflict now? 14

LALR(1) LR(1) tables have huge number of entries Often don t need such refined observation (and cost) Idea: find states with the same LR(0) component and merge their lookaheads component as long as there are no conflicts LALR(1) not as powerful as LR(1) in theory but works quite well in practice Merging may not introduce new shift-reduce conflicts, only reduce-reduce, which is unlikely in practice 15

* q0 (S S, $) (S L = R, $) (S R, $) (L * R, = ) (L id, = ) (R L, $ ) (L id, $ ) (L * R, $ ) q4 * (L * R, =) (R L, =) (L * R, =) (L id, =) (L * R, $) (R L, $) (L * R, $) (L id, $) id R id q5 R L S q3 (S R, $) q1 (S S, $) q2 (S L = R, $) (R L, $) (L id, $) (L id, =) q7 L (L * R, =) (L * R, $) q6 q9 (S L = R, $) = R (S L = R, $) (R L, $) (L * R, $) (L id, $) q8 (R L, =) (R L, $) L * q11 (L id, $) id q12 (R L, $) q13 q10 (L * R, $) (R L, $) (L * R, $) (L id, $) R (L * R, $) id 16

* q0 (S S, $) (S L = R, $) (S R, $) (L * R, = ) (L id, = ) (R L, $ ) (L id, $ ) (L * R, $ ) q4 * (L * R, =) (R L, =) (L * R, =) (L id, =) (L * R, $) (R L, $) (L * R, $) (L id, $) id R id q5 R L S q3 (S R, $) q1 (S S, $) q2 (S L = R, $) (R L, $) (L id, $) (L id, =) q7 L (L * R, =) (L * R, $) q6 q9 (S L = R, $) = R (S L = R, $) (R L, $) (L * R, $) (L id, $) q8 (R L, =) (R L, $) L * q11 (L id, $) id q12 (R L, $) q13 q10 (L * R, $) (R L, $) (L * R, $) (L id, $) R (L * R, $) id 17

* q0 (S S, $) (S L = R, $) (S R, $) (L * R, = ) (L id, = ) (R L, $ ) (L id, $ ) (L * R, $ ) q4 * (L * R, =) (R L, =) (L * R, =) (L id, =) (L * R, $) (R L, $) (L * R, $) (L id, $) id R id q5 R L S q3 (S R, $) q1 (S S, $) q2 (S L = R, $) (R L, $) (L id, $) (L id, =) q7 L id (L * R, =) (L * R, $) q6 q9 (S L = R, $) = R (S L = R, $) (R L, $) (L * R, $) (L id, $) q8 (R L, =) (R L, $) L * q10 (L * R, $) (R L, $) (L * R, $) (L id, $) R id 18

Left/Right- recursion At home: create a simple grammar with left-recursion and one with right-recursion Construct corresponding LR(0) parser Any conflicts? Run on simple input and observe behavior Attempt to generalize observation for long inputs 19

Example: non-lr(1) grammar (1) S Y b c $ (2) S Z b d $ (3) Y a (4) Z a S Y b c, $ S Y b c, $ Y a, b Z a, b a Y a, b Z a, b reduce-reduce conflict on lookahead b 20

Automated parser generation (via CUP) 21

High-level structure text Lexer spec JFlex.java javac Lexical analyzer LANG.lex Lexer.java tokens (Token.java) Parser spec CUP.java javac Parser LANG.cup Parser.java sym.java AST 22

Expression calculator expr expr + expr expr - expr expr * expr expr / expr - expr ( expr ) number Goals of expression calculator parser: Is 2+3+4+5 a valid expression? What is the meaning (value) of this expression? 23

Syntax analysis with CUP CUP parser generator Generates an LALR(1) Parser Input: spec file Output: a syntax analyzer Can dump automaton and table tokens Parser spec CUP.java javac Parser AST 24

CUP spec file Package and import specifications User code components Symbol (terminal and non-terminal) lists Terminals go to sym.java Types of AST nodes Precedence declarations The grammar Semantic actions to construct AST 25

Parsing ambiguous grammars 26

Expression Calculator 1 st Attempt terminal Integer NUMBER; terminal PLUS, MINUS, MULT, DIV; terminal LPAREN, RPAREN; non terminal Integer expr; Symbol type explained later expr ::= expr PLUS expr expr MINUS expr expr MULT expr expr DIV expr MINUS expr LPAREN expr RPAREN NUMBER ; 27

Ambiguities + * * a b c + a + b * c + a b c + + a b c a + b + c + a b c 28

Ambiguities as conflicts for LR(1) + * + * a b c a + b * c + a b c + + a b c a + b + c + a b c 29

terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV; terminal LPAREN, RPAREN; terminal UMINUS; non terminal Integer expr; Expression Calculator 2 nd Attempt precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; Increasing precedence expr ::= expr PLUS expr expr MINUS expr expr MULT expr expr DIV expr MINUS expr %prec UMINUS LPAREN expr RPAREN NUMBER ; Contextual precedence 30

Parsing ambiguous grammars using precedence declarations Each terminal assigned with precedence By default all terminals have lowest precedence User can assign his own precedence CUP assigns each production a precedence Precedence of rightmost terminal in production or user-specified contextual precedence On shift/reduce conflict resolve ambiguity by comparing precedence of terminal and production and decides whether to shift or reduce In case of equal precedences left/right help resolve conflicts left means reduce right means shift More information on precedence declarations in CUP s manual 31

Resolving ambiguity (associativity) precedence left PLUS + + + a b c + a b c a + b + c 32

Resolving ambiguity (op. precedence) precedence left PLUS precedence left MULT + * * a b c + a b c a + b * c 33

Resolving ambiguity (contextual) precedence left MULT MINUS expr %prec UMINUS * - - a b a * b - a * b 34

Resolving ambiguity terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV; terminal LPAREN, RPAREN; terminal UMINUS; UMINUS never returned by scanner (used only to define precedence) precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; expr ::= expr PLUS expr expr MINUS expr expr MULT expr expr DIV expr MINUS expr %prec UMINUS LPAREN expr RPAREN NUMBER ; Rule has precedence of UMINUS 35

More CUP directives precedence nonassoc NEQ Non-associative operators: < > ==!= etc. 1<2<3 identified as an error (semantic error?) start non-terminal Specifies start non-terminal other than first non-terminal Can change to test parts of grammar Getting internal representation Command line options: -dump_grammar -dump_states -dump_tables -dump 36

Scanner integration import java_cup.runtime.*; %% %cup Generated from token %eofval{ declarations in.cup file return new Symbol(sym.EOF); %eofval} NUMBER=[0-9]+ %% <YYINITIAL> + { return new Symbol(sym.PLUS); } <YYINITIAL> - { return new Symbol(sym.MINUS); } <YYINITIAL> * { return new Symbol(sym.MULT); } <YYINITIAL> / { return new Symbol(sym.DIV); } <YYINITIAL> ( { return new Symbol(sym.LPAREN); } <YYINITIAL> ) { return new Symbol(sym.RPAREN); } <YYINITIAL>{NUMBER} { return new Symbol(sym.NUMBER, new Integer(yytext())); } <YYINITIAL>\n { } <YYINITIAL>. { } Parser gets terminals from the scanner 37

Recap Package and import specifications and user code components Symbol (terminal and non-terminal) lists Define building-blocks of the grammar Precedence declarations May help resolve conflicts The grammar May introduce conflicts that have to be resolved 38

Abstract syntax tree construction 39

Assigning meaning expr ::= expr PLUS expr expr MINUS expr expr MULT expr expr DIV expr MINUS expr %prec UMINUS LPAREN expr RPAREN NUMBER ; So far, only validation Add Java code implementing semantic actions 40

non terminal Integer expr; Assigning meaning expr ::= expr:e1 PLUS expr:e2 {: RESULT = new Integer(e1.intValue() + e2.intvalue()); :} expr:e1 MINUS expr:e2 {: RESULT = new Integer(e1.intValue() - e2.intvalue()); :} expr:e1 MULT expr:e2 {: RESULT = new Integer(e1.intValue() * e2.intvalue()); :} expr:e1 DIV expr:e2 {: RESULT = new Integer(e1.intValue() / e2.intvalue()); :} MINUS expr:e1 {: RESULT = new Integer(0 - e1.intvalue(); :} %prec UMINUS LPAREN expr:e1 RPAREN {: RESULT = e1; :} NUMBER:n {: RESULT = n; :} ; Symbol labels used to name variables RESULT names the left-hand side symbol 41

Abstract Syntax Trees More useful representation of syntax tree Less clutter Actual level of detail depends on your design Basis for semantic analysis Later annotated with various information Type information Computed values Technically a class hierarchy of abstract syntax tree nodes 42

Parse tree vs. AST expr + expr + expr expr expr 1 + ( 2 ) + ( 3 ) 1 2 3 43

AST hierarchy example expr int_const plus minus times divide 44

AST construction AST Nodes constructed during parsing Stored in push-down stack Bottom-up parser Grammar rules annotated with actions for AST construction When node is constructed all children available (already constructed) Node (RESULT) pushed on stack 45

AST construction 1 + (2) + (3) expr + (2) + (3) expr + (expr) + (3) expr + (3) expr + (expr) expr expr expr ::= expr:e1 PLUS expr:e2 {: RESULT = new plus(e1,e2); :} LPAREN expr:e RPAREN {: RESULT = e; :} INT_CONST:i {: RESULT = new int_const(, i); :} plus e1 e2 expr plus e1 e2 expr expr expr 1 + ( 2 ) + ( 3 ) int_const val = 1 int_const val = 2 int_const val = 3 46

Example of lists terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV,LPAREN,RPAREN,SEMI; terminal UMINUS; non terminal Integer expr; non terminal expr_list, expr_part; precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; Executed when e is shifted expr_list ::= expr_list expr_part expr_part ; expr_part ::= expr:e {: System.out.println("= " + e); :} SEMI ; expr ::= expr PLUS expr expr MINUS expr expr MULT expr expr DIV expr MINUS expr %prec UMINUS LPAREN expr RPAREN NUMBER ; 47

Next lecture: IR and Operational Semantics