CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

Similar documents
CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter UW CSE P 501 Winter 2016 C-1

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Context-free grammars (CFG s)

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

EECS 6083 Intro to Parsing Context Free Grammars

Parsing II Top-down parsing. Comp 412

Introduction to Parsing

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Syntactic Analysis. Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree.

CMSC 330: Organization of Programming Languages

Compiler Passes. Syntactic Analysis. Context-free Grammars. Syntactic Analysis / Parsing. EBNF Syntax of initial MiniJava.

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Introduction to Parsing. Lecture 5

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CSE 3302 Programming Languages Lecture 2: Syntax

Context-Free Grammars

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

CMSC 330: Organization of Programming Languages

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Syntax Analysis Part I

CMSC 330: Organization of Programming Languages. Context Free Grammars

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

Part 5 Program Analysis Principles and Techniques

Introduction to Parsing. Lecture 5

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages. Context Free Grammars

Defining syntax using CFGs

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

CMSC 330: Organization of Programming Languages. Context Free Grammars

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

CS 314 Principles of Programming Languages

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

Introduction to Parsing. Lecture 8

Lecture 8: Deterministic Bottom-Up Parsing

CSE 401 Midterm Exam Sample Solution 2/11/15

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CMPT 755 Compilers. Anoop Sarkar.

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

CSE P 501 Compilers. Intermediate Representations Hal Perkins Spring UW CSE P 501 Spring 2018 G-1

Lecture 7: Deterministic Bottom-Up Parsing

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

CSE 582 Autumn 2002 Exam 11/26/02

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Parsing Expression Grammars and Packrat Parsing. Aaron Moss

Syntax Analysis Check syntax and construct abstract syntax tree

CSE 401/M501 Compilers

Principles of Programming Languages COMP251: Syntax and Grammars

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

COMP3131/9102: Programming Languages and Compilers

Monday, September 13, Parsers

Properties of Regular Expressions and Finite Automata

COP 3402 Systems Software Syntax Analysis (Parser)

Introduction to Parsing Ambiguity and Syntax Errors

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

Compilers and computer architecture From strings to ASTs (2): context free grammars

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Introduction to Parsing Ambiguity and Syntax Errors

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Intro To Parsing. Step By Step

3. Parsing. Oscar Nierstrasz

LR Parsing Techniques

CPS 506 Comparative Programming Languages. Syntax Specification

Grammars and Parsing. Paul Klint. Grammars and Parsing

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

A Simple Syntax-Directed Translator

( ) i 0. Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5.

CSE302: Compiler Design

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

Grammars and ambiguity. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 8 1

Introduction to Parsing. Comp 412

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

Optimizing Finite Automata

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Introduction to Parsing. Lecture 5. Professor Alex Aiken Lecture #5 (Modified by Professor Vijay Ganesh)

CS 406/534 Compiler Construction Parsing Part I

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

Outline. Regular languages revisited. Introduction to Parsing. Parser overview. Context-free grammars (CFG s) Lecture 5. Derivations.

Compiler Design Concepts. Syntax Analysis

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Syntax Analysis, VII One more LR(1) example, plus some more stuff. Comp 412 COMP 412 FALL Chapter 3 in EaC2e. target code.

Syntax. In Text: Chapter 3

LL(k) Compiler Construction. Choice points in EBNF grammar. Left recursive grammar

Syntax-Directed Translation. Lecture 14

Transcription:

CSE P 501 Compilers Parsing & Context-Free Grammars Hal Perkins Spring 2018 UW CSE P 501 Spring 2018 C-1

Administrivia Project partner signup: please find a partner and fill out the signup form by noon tomorrow if not done yet (only one form per group, please) Who s still looking for a partner? Watch for spam from CSE GitLab as repos are set up (save and ignore for now) Written HW2 out tonight or tomorrow, due Monday HW1 solution posted this weekend First part of project scanner out later this week, due in two weeks Programming is fairly simple; this is the infrastructure shakedown cruise. More about this next week. UW CSE P 501 Spring 2018 C-2

Deadlines & office hours From the discussion board: what should we do about assignment deadlines and office hours? Deadlines: early in the week to finish off old stuff before class and next assignment, or later in the week to catch office hours on Tuesdays? We could add virtual or maybe in-person office hours on Sunday. Useful? Not? How does it affect deadline issues? UW CSE P 501 Spring 2018 C-3

Communications Yikes! With luck we ve sorted out most of the discussion group issues. Let s see if we can channel things like this: Google discussion group general traffic about the class. Staff will monitor and post as needed regularly, but everyone should join in. Email to csep501-staff[at]cs for things not appropriate for posting. Class mailing list for announcement from staff only. There are lots of other things that generate email to instructor/staff when clicked if we can minimize that it would help. Thanks. UW CSE P 501 Spring 2018 C-4

Agenda for Today Parsing overview Context free grammars Ambiguous grammars Reading: Cooper & Torczon 3.1-3.2 Dragon book is also particularly strong on grammars and languages UW CSE P 501 Spring 2018 C-5

Syntactic Analysis / Parsing Goal: Convert token stream to an abstract syntax tree Abstract syntax tree (AST): Captures the structural features of the program Primary data structure for next phases of compilation Plan Study how context-free grammars specify syntax Study algorithms for parsing and building ASTs UW CSE P 501 Spring 2018 C-6

Context-free Grammars The syntax of most programming languages can be specified by a context-free grammar (CGF) Compromise between REs: can t nest or specify recursive structure General grammars: too powerful, undecidable Context-free grammars are a sweet spot Powerful enough to describe nesting, recursion Easy to parse; restrictions on general CFGs improve speed Not perfect Cannot capture semantics, like must declare every variable or must be int requires later semantic pass Can be ambiguous (something we ll deal with) UW CSE P 501 Spring 2018 C-7

Derivations and Parse Trees Derivation: a sequence of expansion steps, beginning with a start symbol and leading to a sequence of terminals Parsing: inverse of derivation Given a sequence of terminals (aka tokens) recover (discover) the nonterminals and structure, i.e., the parse tree (concrete syntax) UW CSE P 501 Spring 2018 C-8

Old Example program G program ::= statement program statement statement ::= assignstmt ifstmt assignstmt ::= id = expr ; ifstmt ::= if ( expr ) statement expr ::= id int expr + expr id ::= a b c i j k n x y z int ::= 0 1 2 3 4 5 6 7 8 9 program statement statement ifstmt assignstmt expr id expr expr expr statement assignstmt id expr int id int int w a = 1 ; if ( a + 1 ) b = 2 ; UW CSE P 501 Spring 2018 9

Parsing Parsing: Given a grammar G and a sentence w in L(G), traverse the derivation (parse tree) for w in some standard order and do something useful at each node The tree might not be produced explicitly, but the control flow of the parser will correspond to a traversal UW CSE P 501 Spring 2018 C-10

Standard Order For practical reasons we want the parser to be deterministic (no backtracking), and we want to examine the source program from left to right. (i.e., parse the program in linear time in the order it appears in the source file) UW CSE P 501 Spring 2018 C-11

Common Orderings Top-down Start with the root Traverse the parse tree depth-first, left-to-right (leftmost derivation) LL(k), recursive-descent Bottom-up Start at leaves and build up to the root Effectively a rightmost derivation in reverse(!) LR(k) and subsets (LALR(k), SLR(k), etc.) UW CSE P 501 Spring 2018 C-12

Something Useful At each point (node) in the traversal, perform some semantic action Construct nodes of full parse tree (rare) Construct abstract syntax tree (AST) (common) Construct linear, lower-level representation (often produced in later phases of production compilers by traversing initial AST ) Generate target code on the fly (done in 1-pass compilers; not common in production compilers) Can t generate great code in one pass, but useful if you need a quick n dirty working compiler UW CSE P 501 Spring 2018 C-13

Context-Free Grammars Formally, a grammar G is a tuple <N,Σ,P,S> where N is a finite set of non-terminal symbols Σ is a finite set of terminal symbols (alphabet) P is a finite set of productions A subset of N (N È Σ )* S is the start symbol, a distinguished element of N If not specified otherwise, this is usually assumed to be the non-terminal on the left of the first production UW CSE P 501 Spring 2018 C-14

Standard Notations a, b, c elements of Σ w, x, y, z elements of Σ* A, B, C elements of N X, Y, Z elements of N Σ a, b, g elements of (N Σ )* A a or A ::= a if <A, a> P UW CSE P 501 Spring 2018 C-15

Derivation Relations (1) a A g => a b g iff A ::= b in P derives A =>* a if there is a chain of productions starting with A that generates a transitive closure UW CSE P 501 Spring 2018 C-16

Derivation Relations (2) w A g => lm w b g iff A ::= b in P derives leftmost a A w => rm a b w iff A ::= b in P derives rightmost We will only be interested in leftmost and rightmost derivations not random orderings UW CSE P 501 Spring 2018 C-17

Languages For A in N, L(A) = { w A =>* w } If S is the start symbol of grammar G, define L(G) = L(S) Nonterminal on left of first rule is taken to be the start symbol if one is not specified explicitly UW CSE P 501 Spring 2018 C-18

Reduced Grammars Grammar G is reduced iff for every production A ::= a in G there is a derivation S =>* x A z => x a z =>* xyz i.e., no production is useless Convention: we will use only reduced grammars There are algorithms for pruning useless productions from grammars see a formal language or compiler book for details UW CSE P 501 Spring 2018 C-19

Ambiguity Grammar G is unambiguous iff every w in L(G ) has a unique leftmost (or rightmost) derivation Fact: unique leftmost or unique rightmost implies the other A grammar without this property is ambiguous Note that other grammars that generate the same language may be unambiguous, i.e., ambiguity is a property of grammars, not languages We need unambiguous grammars for parsing UW CSE P 501 Spring 2018 C-20

Example: Ambiguous Grammar for Arithmetic Expressions expr ::= expr + expr expr - expr expr * expr expr / expr int int ::= 0 1 2 3 4 5 6 7 8 9 Exercise: show that this is ambiguous How? Show two different leftmost or rightmost derivations for the same string Equivalently: show two different parse trees for the same string UW CSE P 501 Spring 2018 C-21

Example (cont) expr ::= expr + expr expr - expr expr * expr expr / expr int int ::= 0 1 2 3 4 5 6 7 8 9 Give a leftmost derivation of 2+3*4 and show the parse tree UW CSE P 501 Spring 2018 C-22

Example (cont) expr ::= expr + expr expr - expr expr * expr expr / expr int int ::= 0 1 2 3 4 5 6 7 8 9 Give a different leftmost derivation of 2+3*4 and show the parse tree UW CSE P 501 Spring 2018 C-23

Another example expr ::= expr + expr expr - expr expr * expr expr / expr int int ::= 0 1 2 3 4 5 6 7 8 9 Give two different derivations of 5+6+7 UW CSE P 501 Spring 2018 C-24

What s going on here? The grammar has no notion of precedence or associatively Traditional solution Create a non-terminal for each level of precedence Isolate the corresponding part of the grammar Force the parser to recognize higher precedence subexpressions first Use left- or right-recursion for left- or right-associative operators (non-associative operators are not recursive) UW CSE P 501 Spring 2018 C-25

Classic Expression Grammar (first used in ALGOL 60) expr ::= expr + term expr term term term ::= term * factor term / factor factor factor ::= int ( expr ) int ::= 0 1 2 3 4 5 6 7 UW CSE P 501 Spring 2018 C-26

Check: Derive 2 + 3 * 4 expr ::= expr + term expr term term term ::= term * factor term / factor factor factor ::= int ( expr ) int ::= 0 1 2 3 4 5 6 7 UW CSE P 501 Spring 2018 C-27

Check: Derive 5 + 6 + 7 expr ::= expr + term expr term term term ::= term * factor term / factor factor factor ::= int ( expr ) int ::= 0 1 2 3 4 5 6 7 Note interaction between left- vs right-recursive rules and resulting associativity UW CSE P 501 Spring 2018 C-28

Check: Derive 5 + (6 + 7) expr ::= expr + term expr term term term ::= term * factor term / factor factor factor ::= int ( expr ) int ::= 0 1 2 3 4 5 6 7 UW CSE P 501 Spring 2018 C-29

Another Classic Example Grammar for conditional statements stmt ::= if ( expr ) stmt if ( expr ) stmt else stmt (This is the dangling else problem found in many, many grammars for languages beginning with Algol 60) Exercise: show that this is ambiguous How? UW CSE P 501 Spring 2018 C-30

One Derivation stmt ::= if ( expr ) stmt if ( expr ) stmt else stmt if ( expr ) if ( expr ) stmt else stmt UW CSE P 501 Spring 2018 C-31

Another Derivation stmt ::= if ( expr ) stmt if ( expr ) stmt else stmt if ( expr ) if ( expr ) stmt else stmt UW CSE P 501 Spring 2018 C-32

Solving if Ambiguity Fix the grammar to separate if statements with else clause and if statements with no else Done in Java reference grammar Adds lots of non-terminals or, Change the language But it d better be ok to do this you need to own the language or get permission from owner or, Use some ad-hoc rule in the parser else matches closest unpaired if UW CSE P 501 Spring 2018 C-33

Resolving Ambiguity with Grammar (1) Stmt ::= MatchedStmt UnmatchedStmt MatchedStmt ::=... if ( Expr ) MatchedStmt else MatchedStmt UnmatchedStmt ::= if ( Expr ) Stmt if ( Expr ) MatchedStmt else UnmatchedStmt formal, no additional rules beyond syntax can be more obscure than original grammar UW CSE P 501 Spring 2018 C-34

Check Stmt ::= MatchedStmt UnmatchedStmt MatchedStmt ::=... if ( Expr ) MatchedStmt else MatchedStmt UnmatchedStmt ::= if ( Expr ) Stmt if ( Expr ) MatchedStmt else UnmatchedStmt if ( expr ) if ( expr ) stmt else stmt UW CSE P 501 Spring 2018 C-35

Resolving Ambiguity with Grammar (2) If you can (re-)design the language, just avoid the problem entirely Stmt ::=... if Expr then Stmt end if Expr then Stmt else Stmt end formal, clear, elegant allows sequence of Stmts in then and else branches, no {, } needed extra end required for every if (But maybe this is a good idea anyway?) UW CSE P 501 Spring 2018 C-36

Parser Tools and Operators Most parser tools can cope with ambiguous grammars Makes life simpler if used with discipline Usually can specify precedence & associativity Allows simpler, ambiguous grammar with fewer nonterminals as basis for parser let the tool handle the details (but only when it makes sense) (i.e., expr ::= expr+expr expr*expr with assoc. & precedence declarations can be the best solution) Take advantage of this to simplify the grammar when using parser-generator tools UW CSE P 501 Spring 2018 C-37

Parser Tools and Ambiguous Grammars Possible rules for resolving other problems: Earlier productions in the grammar preferred to later ones (danger here if parser input changed) Longest match used if there is a choice (good solution for dangling if) Parser tools normally allow for this But be sure that what the tool does is really what you want And that it s part of the tool spec, so that v2 won t do something different (that you don t want!) UW CSE P 501 Spring 2018 C-38

Coming Attractions Next topic: LR parsing Continue reading ch. 3 UW CSE P 501 Spring 2018 C-39