LL(k) Compiler Construction. Choice points in EBNF grammar. Left recursive grammar

Similar documents
LL(k) Compiler Construction. Top-down Parsing. LL(1) parsing engine. LL engine ID, $ S 0 E 1 T 2 3

EDA180: Compiler Construc6on. More Top- Down Parsing Abstract Syntax Trees Görel Hedin Revised:

.jj file with actions

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

Semantic analysis. Compiler Construction. An expression evaluator. Examples of computations. Computations on ASTs Aspect oriented programming

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Visitors. Move functionality

Context-free grammars (CFG s)

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Building Compilers with Phoenix

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Programming Assignment 2 LALR Parsing and Building ASTs

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Parsing Algorithms. Parsing: continued. Top Down Parsing. Predictive Parser. David Notkin Autumn 2008

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

A programming language requires two major definitions A simple one pass compiler

Building Compilers with Phoenix

Programming Assignment 2 LALR Parsing and Building ASTs

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

Chapter 3. Parsing #1

A Simple Syntax-Directed Translator

CPS 506 Comparative Programming Languages. Syntax Specification

LL parsing Nullable, FIRST, and FOLLOW

CS 314 Principles of Programming Languages

Syntactic Analysis. Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree.

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Compiler Passes. Syntactic Analysis. Context-free Grammars. Syntactic Analysis / Parsing. EBNF Syntax of initial MiniJava.

CSCI 1260: Compilers and Program Analysis Steven Reiss Fall Lecture 4: Syntax Analysis I

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Lexical and Syntax Analysis (2)

Table-Driven Top-Down Parsers

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

EDA180: Compiler Construc6on Context- free grammars. Görel Hedin Revised:

Context-free grammars

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Syntax Analysis Part I

Examination in Compilers, EDAN65

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

CSE 3302 Programming Languages Lecture 2: Syntax

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

CSCI312 Principles of Programming Languages!

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Question Points Score

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Topic 3: Syntax Analysis I

Compiler Construction: Parsing

How do LL(1) Parsers Build Syntax Trees?

3. Parsing. Oscar Nierstrasz

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

Earlier edition Dragon book has been revised. Course Outline Contact Room 124, tel , rvvliet(at)liacs(dot)nl

COMP3131/9102: Programming Languages and Compilers

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

CSE431 Translation of Computer Languages

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

Lexical and Syntax Analysis. Top-Down Parsing

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Lecture 8: Deterministic Bottom-Up Parsing

CSCI312 Principles of Programming Languages

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

Compiler phases. Non-tokens

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Compilers. Predictive Parsing. Alex Aiken

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

ASTs, Objective CAML, and Ocamlyacc

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Action Table for CSX-Lite. LALR Parser Driver. Example of LALR(1) Parsing. GoTo Table for CSX-Lite

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2003 Columbia University Department of Computer Science

Lecture 7: Deterministic Bottom-Up Parsing

Top down vs. bottom up parsing

CA Compiler Construction

Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

G53CMP: Lecture 4. Syntactic Analysis: Parser Generators. Henrik Nilsson. University of Nottingham, UK. G53CMP: Lecture 4 p.1/32

Types of parsing. CMSC 430 Lecture 4, Page 1

CSE P 501 Exam 11/17/05 Sample Solution


Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Bottom-Up Parsing. Lecture 11-12

Defining syntax using CFGs

Project 2 Interpreter for Snail. 2 The Snail Programming Language

Syntax-Directed Translation. Lecture 14

Abstract Syntax Trees & Top-Down Parsing

CSE P 501 Compilers. Implementing ASTs (in Java) Hal Perkins Autumn /20/ Hal Perkins & UW CSE H-1

In this simple example, it is quite clear that there are exactly two strings that match the above grammar, namely: abc and abcc

ECE251 Midterm practice questions, Fall 2010

COP4020 Programming Assignment 2 Spring 2011

Properties of Regular Expressions and Finite Automata

Optimizing Finite Automata

Transcription:

LL(k) Compiler Construction More LL parsing Abstract syntax trees Lennart Andersson Revision 2012 01 31 2012 Related names top-down the parse tree is constructed top-down recursive descent if it is implemented using recursive methods LL Left-to-right scanning of input using Leftmost derivation predictive the next k tokens are used to predict (decide) which production to use. k is often 1. Compiler Construction 2012 F05-1 Left recursive grammar Compiler Construction 2012 F05-2 Choice points in EBNF grammar p 1 : term term * factor p 2 : term factor p 3 : factor ID p 4 : factor INT NLBL FIRST term F ID INT factor F ID INT ID INT for each production p : X γ for each t FIRST (γ) add p to table[x, t] if NULLABLE(γ) for each s FOLLOW (X ) add p to table[x, s] term factor ID INT multiple productions for some nonterminal alternatives optionals [ ] or? iteration (... ) or (... ) + The recursive descent parser must be able to make the correct choice looking at the next token (or the next k tokens). term p 1 p 2 p 1 p 2 factor p 3 p 4 A left recursive grammar is never LL(k). If JavaCC suggests using more lookahead then compute FIRST and FOLLOW sets at the choice point before following the advice. Compiler Construction 2012 F05-3 Compiler Construction 2012 F05-4

Grammar with common prefix Another example p 1 : expr factor * expr p 2 : expr factor p 3 : factor ID p 4 : factor INT p 5 : factor ( expr ) NLBL FIRST expr F ID INT ( term F ID INT ( factor F ID INT ( p 1 : statement ID ( idlist ) p 2 : statement ID = expr * ID INT ( ) expr p 1, p 2 p 1, p 2 p 1, p 2 factor p 3 p 4 p 5 statement p 1, p 2... ID ( ) = A grammar with common prefixes is not LL(1). It may be LL(k). This one is not. Why? Compiler Construction 2012 F05-5 LL(2) table Compiler Construction 2012 F05-6 LL(1) with local lookahead p 1 : statement ID ( idlist ) p 2 : statement ID = expr p 1 : statement ID ( idlist ) p 2 : statement ID = expr ID ID ID ( ID ( ID = ( (... statement p 1 p 2... statement... ID ( ) = ( = p 1 p 2 Compiler Construction 2012 F05-7 Compiler Construction 2012 F05-8

JavaCC LOOKAHEAD(k) in JavaCC can handle general LL(k) k=1 is the default global lookahead k 2 must be specified; may be inefficient local lookahead may be specified for individual productions; may improve readability at a reasonable cost. local lookahead may be specified as number of tokens, grammatical expression,... JavaCC detects choice points where LOOKAHEAD(1) is not sufficient and suggests that the LOOKAHEAD should be increased After inserting LOOKAHEAD(k) with some k JavaCC will not check if this is sufficient but use the first rule that matches. Make sure that you understand what you are doing. If the grammar is ambiguous and you insert a LOOKAHEAD then JavaCC will not complain. Compiler Construction 2012 F05-9 JavaCC example void stmt() : <ID> "(" idlist() ")" ";" <ID> <ASSIGN> expr() > javacc PL0.jj Reading from file PL0.jj... Warning: Choice conflict involving two expansions at line 25, column 9 and line 26, column 9 respectively. A common prefix is: <ID> Consider using a lookahead of 2 for earlier expansion. Parser generated with 0 errors and 1 warnings. > How will the generated parser handle this conflict? Compiler Construction 2012 F05-11 Compiler Construction 2012 F05-10 Use of local lookahead in JavaCC void statement() : LOOKAHEAD(2) <ID> "(" idlist() ")" ";" <ID> "=" expr()... JavaCC will try the alternatives in order using lookahead 2 for the first one and lookahead=1 for the second... Compiler Construction 2012 F05-12

JavaCC dangling else Syntactic analysis void ifstmt(): <IF> expr <THEN> statement [<ELSE> statement] source code scanner tokens DFA described by RE lexical analysis Warning: Choice conflict in [...] construct at line 35, column 30. Expansion nested within construct and expansion following construct have common prefixes, one of which is: <ELSE> Consider using a lookahead of 2 or more for nested expansion. parser (parse tree) instrumented with semantic actions implicit How will the generated parser handle this conflict? LOOKAHEAD(1) before <ELSE> removes the warning. AST described by abstract grammar syntactic analysis Compiler Construction 2012 F05-13 What is an AST? Compiler Construction 2012 F05-14 Abstract vs. concrete grammar An abstract representation of a program revealing the structure suitable for analysis without unnecessary information required by for parsing independent of the parsing algorithm described by an abstract grammar The concrete grammar describes the external representation, a string (sequence) of tokens. It defines a language. It is used backwards to find a derivation for a given string. It can be used to build an abstract representation. The abstract grammar describes the internal representation, a labeled tree. It defines a data type. It is used forwards to build tree. Compiler Construction 2012 F05-15 Compiler Construction 2012 F05-16

Concrete/Abstract grammar Java model expr term ( + term)* term factor ( * factor)* factor ID INT ( expr ) Add: Expr ::= Expr Expr Mul: Expr ::= Expr Expr IdExpr: Expr ::= ID IntExpr: Expr ::= INT abstract class Expr class Add extends Expr Expr expr1, expr2; class Mul extends Expr Expr expr1, expr2; class IdExpr extends Expr String ID; class IntExpr extends Expr String INT; Compiler Construction 2012 F05-17 Java model with computation Compiler Construction 2012 F05-18 JastAdd specification class Add extends Expr Expr expr1, expr2; String tostring() return expr1.tostring() + " + " + expr2.tostring(); class IdExpr extends Expr String ID; String tostring() return ID; abstract Expr; Add: Expr ::= Expr Expr; Mul: Expr ::= Expr Expr; IdExpr: Expr ::= <ID>; IntExpr: Expr ::= <INT>; It is easy to write a program that converts a jastadd specification to Java classes. Compiler Construction 2012 F05-19 Compiler Construction 2012 F05-20

Parse tree, AST, Object diagram expr term + term factor ID a factor ID b * factor ID c Id a Add Id b Mul Id c Id a Add Id b Mul parse tree AST object diagram Id c Example, grammar CFG (EBNF): program begin [ stmt ( ; stmt)* ] end stmt ID = expr stmt if expr then stmt [ else stmt ] expr ID INT JastAdd grammar Program ::= Stmt*; abstract Stmt; Assinment: Stmt ::= <ID> Expr; IfStmt: Stmt ::= Expr Then: Stmt [Else: Stmt]; abstract Expr; IdExpr: Expr ::= <ID>; IntExpr: Expr ::= <INT>; Compiler Construction 2012 F05-21 Generated Java classes Compiler Construction 2012 F05-22 Generated Java classes Program ::= Stmt*; abstract Stmt; Assignment: Stmt ::= IdExpr Expr; IfStmt: Stmt ::= Expr Then:Stmt [Else:Stmt]; class Program extends ASTNode int getnumstmt(); Stmt getstmt(int i); abstract class Stmt extends ASTNode; class Assignment extends Stmt IdExpr getidexpr(); Expr getexpr(); class IfStmt extends Stmt Expr getexpr(); Stmt getthen(); boolean haselse(); Stmt getelse(); Compiler Construction 2012 F05-23 Compiler Construction 2012 F05-24

Generated Java classes ASTNode and SimpleNode abstract Expr; IdExpr: Expr ::= <ID:String>; IntExpr: Expr ::= <INT:String>; abstract class Expr extends ASTNode; class IdExpr extends Expr String getid(); class IntExpr extends Expr String getint(); class ASTNode extends SimpleNode int getnumchild(); ASTNode getchild(int i); ASTNode getparent(); class SimpleNode implements Node void dump(string prefix); Compiler Construction 2012 F05-25 Ambiguity Can an abstract grammar be ambiguous? No, an abstract grammar is essentially a data type In Java it is implemented by a set of classes that can be used to build ASTs. Different trees are just different. Compiler Construction 2012 F05-28 JastAdd notation overview notation generated Java code A1; class A1 extends ASTNode A2: B; class A2 extends B A3: B ::= C D; class A3 extends B C getc()... D getd()... abstract A4... ; abstract class A4... A5: B ::= [C]; class A5 extends B boolean hasc()... C getc()... A6: B ::= C*; class A6 extends B int getnumc()... C getc(int i)... A7: B ::= <C: D>; class A7 extends B D getc()... void setc(d d)... Compiler Construction 2012 F05-29 Compiler Construction 2012 F05-30

JastAdd notation overview notation generated Java code A8: B ::= Left: C Right: C; class A8 extends B C getleft()... C getright()... A10: A9 ::=... ; class A10 extends A9 A11: A ::= B C* [D]; class A11 extends A B getb()... int getnumc()...... Other JastAdd classes generated Java code class ASTNode extends SimpleNode int getnumchild()... ASTNode getchild(int index)... ASTNode getparent()... class List extends ASTNode... class Opt extends ASTNode... Avoid using getchild(int)! getchild and getparent will traverse List and Opt nodes explicitly. With Item: A ::= A B* [C] item.getb(2) and item.getc() are much clearer than (B) item.getchild(1).getchild(2) and (C) item.getchild(2).getchild(0) Compiler Construction 2012 F05-31 Inheriting right-hand side structure Compiler Construction 2012 F05-33 Defining an abstract grammar Useful for, e.g., binary operations and the Template Method design pattern: abstract BinExpr : Expr ::= Left:Expr Right:Expr; Add : BinExpr; Sub : BinExpr;... This is object oriented modeling. Which are the objects? Program, statement, statement list, assignment, if-statement, expression, identifier, numeral, a sum (+ with two operands),... Which are the generalizations (abstract classes)? Statement, expression,... Which are the components of an object? An assignment has an identifier and an expression. An if-statement has an expression and two statement lists, an then part and an else part. Compiler Construction 2012 F05-34 Compiler Construction 2012 F05-35

Use good names Recommendations when you write... the following should make sense A: B ::=... an A is a special kind of B C ::= D E F a C has a D, an E, and an F D ::= X:E Y:E a D has one E called X and another E called Y E ::= [F] an E may have an F F ::= <K:T> an F has a token called K with a value of type T L ::= M* an L has a number of Ms Keep things together reflecting the logical structure. Keep the rules simple. Introduce new classes for structures that appear in several places. Don t try to model restrictions that are better implemented by the concrete grammar or later semantic checks. A list with at least one element should be modeled by a simple list. Compiler Construction 2012 F05-36 Defining a concrete LL grammar Compiler Construction 2012 F05-37... Defining a concrete LL grammar Define the abstract grammar before the concrete grammar if this is an option. Use * and [] in favor of recursion and ɛ. For high level constructs like procedures and statements introduce keywords and separators to make parsing simple and unambiguous, LL(1). There will often be one grammar production for each AST class; the abstract classes will correspond to a selection of the alternatives for the subclasses. For low level constructs, primarily expressions, the concrete grammar should instead be based upon a verbal description making precedences clear: An expression is a sequence of one or more terms separated by + or - signs. A term is a sequence of one or more factors separated by * or / signs, etc. Focus on recognizing the structure of the program to make the AST generation easy. Don t try to detect semantic errors such as type errors with the CFG. It is usually impossible to detect all of them with the grammar. A parsing grammar will often be much simpler than a language description to be read by humans, since the latter will often describe the grammar and the meaning of constructs at the same time. Compiler Construction 2012 F05-38 Compiler Construction 2012 F05-39

Building the AST Hand-coded parser without actions A grammar production can be augmented with semantic actions. A semantic action is a piece of code that is executed when the production is used. Semantic actions can be used to build the AST. A parser generator may have special support for building ASTs. void stmt() switch(token) case IF: accept(if); expr(); accept(then); stmt(); break; case ID: accept(id); accept(eq); expr(); break; default: error(); Compiler Construction 2012 F05-40 Hand-coded parser with semantic actions Stmt stmt() switch(token.kind) case IF: accept(if); Expr e = expr(); accept(then); Stmt s = stmt(); return new IfStmt(e, s); case ID: IdExpr id = new IdExpr(token.image); accept(id); accept(eq); Expr e = expr(); return new Assignment(id, e); Compiler Construction 2012 F05-41.jj file with actions Stmt stmt() : Stmt s; (s = ifstmt() s = assignment()) return s; IfStmt ifstmt() : Expr e; Stmt s; "if" e = expr() "then" s = stmt() return new IfStmt(e, s); Assignment assignment() : Token t; Expr e; t = <ID> "=" e = expr() return new Assignment(new IdExpr(t.image), e); Compiler Construction 2012 F05-42 Compiler Construction 2012 F05-43

Building trees with JJTree jjtree stack jjtree is a preprocessor to javacc using a.jjt file grammar productions with tree building directives #... jjtree generates a.jj file with tree building actions builds a parse tree by default (Assignment 2) with option NODE DEFAULT VOID=true building requires # directives jjtree has a stack for collecting the children of a node. It is easy to build a node corresponding to a nonterminal ignore building a node for a terminal/nonterminal build List and Opt nodes build left recursive trees when the grammar is right recursive build trees when the grammar has common prefixes Compiler Construction 2012 F05-44 Building an AST node for a nonterminal Add a # directive to the corresponding AST class void ifstmt() #IfStmt : <IF> expr() <THEN> stmt() <FI> IfStmt: Stmt ::= Expr Stmt the current top of stack is marked expr() pushes an Expr node stmt() pushes a Stmt node a new IfStmt node is created the nodes above the mark will be popped and become children of the IfStmt node this node will be pushed on the stack Compiler Construction 2012 F05-45 Skipping nodes A terminal or nonterminal without a # directive will be skipped void ifstmt(): <IF> expr() <THEN> stmt() <FI> expr() pushes an Expr node if it has a # directive no node is pushed for THEN stmt() pushes a Stmt node if it has a # directive no nodes will be popped by this production no IfStmt is created this is probably an error Compiler Construction 2012 F05-46 Compiler Construction 2012 F05-47

Building a token node IdExpr: Expr ::= <ID:String>; void idexpr() #IdExpr : Token t; t = <ID> jjtthis.setid(t.image); jjtthis refers to the new IdExpr node the setid(string) method has been generated by jastadd Compiler Construction 2012 F05-48 Why #List(true)? the List will be built even it is empty with #List(false) nothing will be build for an empty list with jastadd empty trees must be generated Program Building a List node void program() #Program : "begin" ([stmt() (";" stmt())*]) #List(true) "end" Program ::= Stmt*; a marker for Program is put on the stack begin is parsed without pushing anything a marker for List is put on the stack a node is pushed on the stack for each stmt the parser reaches #List(true) and all nodes above the List marker are popped and become children of a new List node this node is pushed on the stack the end token is accepted the only node above the Program mark is popped an becomes the child of a new Program node this node is pushed on the stack Compiler Construction 2012 F05-49 Optional node void classdecl() #ClassDecl : "class" classid() (["extends" classid()]) #Opt(true) ClassBody() an Opt node will be built even if there is no extends clause with jastadd empty Opt nodes must be generated List ClassDecl Stmt... Stmt ClassId Opt ClassBody 0 or more may be empty... Compiler Construction 2012 F05-50 Compiler Construction 2012 F05-51

Building left-recursive trees with jjtree abstract Expr; BinExpr: Expr ::= Left:Expr Right:Expr; Mul: BinExpr; Div: BinExpr; void expr() : factor() ( "*" factor() #Mul(2) "/" factor() #Div(2))* #Mul(2) builds a Mul node pops two nodes from the stack and turns them into children pushes the new Mul node to the stack Compiler Construction 2012 F05-52 Returning the AST to the client Creating and using the parser Parser parser = new Parser(...); Start start = parser.start(); jjtree specification Start start() #Start : expr(); return jjtthis; jjtthis refers to the Start node created by start() Compiler Construction 2012 F05-53