LL(k) Compiler Construction. Top-down Parsing. LL(1) parsing engine. LL engine ID, $ S 0 E 1 T 2 3

Similar documents
LL(k) Compiler Construction. Choice points in EBNF grammar. Left recursive grammar

.jj file with actions

EDA180: Compiler Construc6on. More Top- Down Parsing Abstract Syntax Trees Görel Hedin Revised:

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

Semantic analysis. Compiler Construction. An expression evaluator. Examples of computations. Computations on ASTs Aspect oriented programming

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Context-free grammars (CFG s)

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Visitors. Move functionality

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Building Compilers with Phoenix

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Programming Assignment 2 LALR Parsing and Building ASTs

Parsing Algorithms. Parsing: continued. Top Down Parsing. Predictive Parser. David Notkin Autumn 2008

A programming language requires two major definitions A simple one pass compiler

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Building Compilers with Phoenix

A Simple Syntax-Directed Translator

CPS 506 Comparative Programming Languages. Syntax Specification

Programming Assignment 2 LALR Parsing and Building ASTs

Compiler Passes. Syntactic Analysis. Context-free Grammars. Syntactic Analysis / Parsing. EBNF Syntax of initial MiniJava.

Chapter 3. Parsing #1

Syntactic Analysis. Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree.

CS 314 Principles of Programming Languages

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

Lexical and Syntax Analysis (2)

CSCI312 Principles of Programming Languages!

CSCI 1260: Compilers and Program Analysis Steven Reiss Fall Lecture 4: Syntax Analysis I

CSE 3302 Programming Languages Lecture 2: Syntax

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Table-Driven Top-Down Parsers

Top down vs. bottom up parsing

EDA180: Compiler Construc6on Context- free grammars. Görel Hedin Revised:

Syntax Analysis Part I

Context-free grammars

LL parsing Nullable, FIRST, and FOLLOW

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Examination in Compilers, EDAN65

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Abstract Syntax Trees & Top-Down Parsing

Topic 3: Syntax Analysis I

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Question Points Score

3. Parsing. Oscar Nierstrasz

LECTURE 3. Compiler Phases

CSCI312 Principles of Programming Languages

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

Compiler Construction: Parsing

Action Table for CSX-Lite. LALR Parser Driver. Example of LALR(1) Parsing. GoTo Table for CSX-Lite

Introduction to Parsing. Lecture 5

How do LL(1) Parsers Build Syntax Trees?

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

Earlier edition Dragon book has been revised. Course Outline Contact Room 124, tel , rvvliet(at)liacs(dot)nl

CMSC 330: Organization of Programming Languages

COMP3131/9102: Programming Languages and Compilers

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

CSE431 Translation of Computer Languages

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

Lexical and Syntax Analysis. Top-Down Parsing

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

EECS 6083 Intro to Parsing Context Free Grammars

CSE 401 Midterm Exam Sample Solution 2/11/15

Bottom-Up Parsing. Lecture 11-12

COP4020 Programming Assignment 2 Spring 2011

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

Lecture 8: Deterministic Bottom-Up Parsing

Compilers. Predictive Parsing. Alex Aiken

Compiler phases. Non-tokens

Parsing Techniques. CS152. Chris Pollett. Sep. 24, 2008.

Optimizing Finite Automata

CMSC 330: Organization of Programming Languages

Properties of Regular Expressions and Finite Automata

CSE P 501 Compilers. Implementing ASTs (in Java) Hal Perkins Autumn /20/ Hal Perkins & UW CSE H-1

Parsing II Top-down parsing. Comp 412

Principles of Programming Languages COMP251: Syntax and Grammars

CMSC 330: Organization of Programming Languages. Context Free Grammars

Context-Free Grammars

Transcription:

LL(k) Compiler Construction More LL parsing Abstract syntax trees Lennart Andersson Revision 2011 01 31 2010 Related names top-down the parse tree is constructed top-down recursive descent if it is implemented using one (recursive) method for each nonterminal LL Left-to-right scanning of input using Leftmost derivation predictive the next k tokens are used to predict (decide) which production to use. k is often 1. Compiler Construction 2010 F05-1 Top-down Parsing Compiler Construction 2010 F05-2 LL(1) parsing engine stack input ID, ID $ recursive descent easy to understand and to implement most ambiguities need to be treated by rewriting the grammar cannot handle some common situations (left-recursion, common prefixes). In these cases the grammar needs to be rewritten slightly. S LL engine ID, $ S 0 E 1 T 2 3 0: S E $ 1: E ID T 2: T, ID T 3: T ε Compiler Construction 2010 F05-3 Compiler Construction 2010 F05-4

Left recursive grammar Choice points in EBNF grammar p 1 : term term "*" factor p 2 : term factor p 3 : factor ID p 4 : factor INT multiple productions for some nonterminal alternatives optionals [ ] or? iteration (... ) or (... ) + NLBL FIRST term F ID INT factor F ID INT ID INT term p 1 p 2 p 1 p 2 factor p 3 p 4 The recursive descent parser must be able to make the correct choice looking at the next token (or the next k tokens). A left recursive grammar is not LL(k). If JavaCC suggests using more lookahead then compute FIRST and FOLLOW sets at the choice point before following the advice. Compiler Construction 2010 F05-12 Grammar with common prefix p 1 : expr factor "*" expr p 2 : expr factor p 3 : factor ID p 4 : factor INT p 5 : factor "("expr ")" NLBL FIRST expr F ID INT ( term F ID INT ( factor F ID INT ( * ID INT ( ) expr p 1, p 2 p 1, p 2 p 1, p 2 factor p 3 p 4 p 5 Compiler Construction 2010 F05-13 Another example p 1 : statement ID "(" idlist ")" p 2 : statement ID "=" expr statement p 1, p 2... ID ( ) = A grammar with common prefixes is not LL(1). It may be LL(k). This one is not. Why? Compiler Construction 2010 F05-14 Compiler Construction 2010 F05-15

LL(2) table LL(1) with local lookahead p 1 : statement ID "(" idlist ")" p 2 : statement ID "=" expr p 1 : statement ID "(" idlist ")" p 2 : statement ID "=" expr ID ID ID ( ID ( ID = ( (... statement p 1 p 2... statement... ID ( ) = ( = p 1 p 2 Compiler Construction 2010 F05-16 JavaCC Compiler Construction 2010 F05-17 LOOKAHEAD(k) in JavaCC can handle general LL(k) k=1 is the default global lookahead k 2 must be specified; may be inefficient local lookahead may be specified for individual productions; may improve readability at a reasonable cost. local lookahead may be specified as number of tokens, grammatical expression,... JavaCC detects choice points where LOOKAHEAD(1) is not sufficient and suggests that the LOOKAHEAD should be increased After inserting LOOKAHEAD(k) with some k JavaCC will not check if this is sufficient but use the first rule that matches. Make sure that you understand what you are doing. If the grammar is ambiguous and you insert a LOOKAHEAD then JavaCC will not complain. Compiler Construction 2010 F05-18 Compiler Construction 2010 F05-19

JavaCC example void stmt() : <ID> "(" idlist() ")" ";" <ID> <ASSIGN> expr() > javacc PL0.jj Reading from file PL0.jj... Warning: Choice conflict involving two expansions at line 25, column 9 and line 26, column 9 respectively. A common prefix is: <ID> Consider using a lookahead of 2 for earlier expansion. Parser generated with 0 errors and 1 warnings. > How will the generated parser handle this conflict? Compiler Construction 2010 F05-20 JavaCC dangling else Use of local lookahead in JavaCC void statement() : LOOKAHEAD(2) <ID> "(" idlist() ")" ";" <ID> "=" expr()... JavaCC will try the alternatives in order using lookahead 2 for the first one and lookahead=1 for the second... Compiler Construction 2010 F05-21 Syntactic analysis void ifstmt(): <IF> expr <THEN> statement [<ELSE> statement] source code scanner tokens DFA described by RE lexical analysis Warning: Choice conflict in [...] construct at line 35, column 30. Expansion nested within construct and expansion following construct have common prefixes, one of which is: <ELSE> Consider using a lookahead of 2 or more for nested expansion. parser (parse tree) instrumented with semantic actions implicit How will the generated parser handle this conflict? LOOKAHEAD(1) before <ELSE> removes the warning. AST described by abstract grammar syntactic analysis Compiler Construction 2010 F05-22 Compiler Construction 2010 F05-23

What is an AST? Abstract vs. concrete grammar An abstract representation of a program revealing the structure suitable for analysis without unnecessary information required by for parsing independent of the parsing algorithm described by an abstract grammar The concrete grammar describes the external representation, a string (sequence) of tokens. It defines a language. It is used backwards to find a derivation for a given string. It can be used to build an abstract representation. The abstract grammar describes the internal representation, a labeled tree. It defines a data type. It is used forwards to build tree. Compiler Construction 2010 F05-24 Concrete/Abstract grammar Compiler Construction 2010 F05-25 Java model expr term ("+" term) term factor ("*" factor) factor ID INT "(" expr ")" Add: Expr ::= Expr Expr Mul: Expr ::= Expr Expr IdExpr: Expr ::= ID IntExpr: Expr ::= INT abstract class Expr class Add extends Expr Expr expr1, expr2; class Mul extends Expr Expr expr1, expr2; class IdExpr extends Expr String ID; class IntExpr extends Expr String INT; Compiler Construction 2010 F05-26 Compiler Construction 2010 F05-27

Java model with computation JastAdd specification class Add extends Expr Expr expr1, expr2; String tostring() return expr1.tostring() + expr2.tostring(); class IdExpr extends Expr String ID; String tostring() return ID; abstract Expr; Add: Expr ::= Expr Expr; Mul: Expr ::= Expr Expr; IdExpr: Expr ::= <ID>; IntExpr: Expr ::= <INT>; It is easy to write a program that converts a jastadd specification to Java classes. Compiler Construction 2010 F05-28 Parse tree, AST, Object diagram expr term + term factor ID a factor ID b * factor ID c Id a Add Id b Mul Id c Id a Add Id b Mul parse tree AST object diagram Id c Compiler Construction 2010 F05-29 Example, grammar CFG (EBNF): program "begin" [stmt ( ";" stmt) ] "end" stmt ID "=" expr stmt "if" expr "then" stmt [ "else" stmt] expr ID INT JastAdd grammar Program ::= Stmt*; abstract Stmt; Assinment: Stmt ::= <ID> Expr; IfStmt: Stmt ::= Expr Then: Stmt [Else: Stmt]; abstract Expr; IdExpr: Expr ::= <ID>; IntExpr: Expr ::= <INT>; Compiler Construction 2010 F05-30 Compiler Construction 2010 F05-31

Generated Java classes Generated Java classes Program ::= Stmt*; abstract Stmt; Assignment: Stmt ::= IdExpr Expr; IfStmt: Stmt ::= Expr Then:Stmt [Else:Stmt]; class Program extends ASTNode int getnumstmt(); Stmt getstmt(int i); abstract class Stmt extends ASTNode; class Assignment extends Stmt IdExpr getidexpr(); Expr getexpr(); class IfStmt extends Stmt Expr getexpr(); Stmt getthen(); boolean haselse(); Stmt getelse(); Compiler Construction 2010 F05-32 Generated Java classes Compiler Construction 2010 F05-33 ASTNode and SimpleNode abstract Expr; IdExpr: Expr ::= <ID:String>; IntExpr: Expr ::= <INT:String>; abstract class Expr extends ASTNode; class IdExpr extends Expr String getid(); class IntExpr extends Expr String getint(); class ASTNode extends SimpleNode int getnumchild(); ASTNode getchild(int i); ASTNode getparent(); class SimpleNode implements Node void dump(string prefix); Compiler Construction 2010 F05-34 Compiler Construction 2010 F05-37

Ambiguity Can an abstract grammar be ambiguous? No, an abstract grammar is essentially a data type In Java it is implemented by a set of classes that can be used to build ASTs. Different trees are just different. JastAdd notation overview notation generated Java code A1; class A1 extends ASTNode A2: B; class A2 extends B A3: B ::= C D; class A3 extends B C getc()... D getd()... abstract A4... ; abstract class A4... A5: B ::= [C]; class A5 extends B boolean hasc()... C getc()... A6: B ::= C*; class A6 extends B int getnumc()... C getc(int i)... A7: B ::= <C: D>; class A7 extends B D getc()... void setc(d d)... Compiler Construction 2010 F05-38 JastAdd notation overview notation generated Java code A8: B ::= Left: C Right: C; class A8 extends B C getleft()... C getright()... A10: A9 ::=... ; class A10 extends A9 A11: A ::= B C* [D]; class A11 extends B B getb()... int getnumc()...... Compiler Construction 2010 F05-39 Other JastAdd classes generated Java code class ASTNode extends SimpleNode int getnumchild()... ASTNode getchild(int index)... ASTNode getparent()... class List extends ASTNode... class Opt extends ASTNode... Avoid using getchild(int)! getchild and getparent will traverse List and Opt nodes explicitly. With Item: A ::= A B* [C] item.getb(2) and item.getc() are much clearer than (B) item.getchild(1).getchild(2) and (C) item.getchild(2).getchild(0) Compiler Construction 2010 F05-40 Compiler Construction 2010 F05-42

Inheriting right-hand side structure Defining an abstract grammar Useful for, e.g., binary operations and the Template Method design pattern: abstract BinExpr : Expr ::= Left:Expr Right:Expr; Add : BinExpr; Sub : BinExpr;... This is object-oriented modeling. Which are the objects? Program, statement, statement list, assignment, if-statement, expression, identifier, numeral, a sum (+ with two operands),... Which are the generalizations (abstract classes)? Statement, expression,... Which are the components of an object? An assignment has an identifier and an expression. An if-statement has an expression and two statement lists, an then part and an else part. Compiler Construction 2010 F05-43 Use good names Compiler Construction 2010 F05-44 Recommendations when you write... the following should make sense A: B ::=... an A is a special kind of B C ::= D E F a C has a D, an E, and an F D ::= X:E Y:E a D has one E called X and another E called Y E ::= [F] an E may have an F F ::= <K:T> an F has a token called K with a value of type T L ::= M* an L has a number of Ms Keep things together reflecting the logical structure. Keep the rules simple. Introduce new classes for structures that appear in several places. Don t try to model restrictions that are better implemented by the concrete grammar or later semantic checks. A list with at least one element should be modeled by a simple list. Compiler Construction 2010 F05-45 Compiler Construction 2010 F05-46

Defining a concrete LL grammar... Defining a concrete LL grammar Define the abstract grammar before the concrete grammar if this is an option. Use * and [] in favor of recursion and ɛ. For high level constructs like procedures and statements introduce keywords and separators to make parsing simple and unambiguous, LL(1). There will often be one grammar production for each AST class; the abstract classes will correspond to a selection of the alternatives for the subclasses. For low level constructs, primarily expressions, the concrete grammar should instead be based upon a verbal description making precedences clear: An expression is a sequence of one or more terms separated by + or - signs. A term is a sequence of one or more factors separated by * or / signs, etc. Focus on recognizing the structure of the program to make the AST generation easy. Don t try to detect semantic errors such as type errors with the CFG. It is usually impossible to detect all of them with the grammar. A parsing grammar will often be much simpler than a language description to be read by humans, since the latter will often describe the grammar and the meaning of constructs at the same time. Compiler Construction 2010 F05-47 Building the AST Compiler Construction 2010 F05-48 Hand-coded parser without actions A grammar production can be augmented with semantic actions. A semantic action is a piece of code that is executed when the production is used. Semantic actions can be used to build the AST. A parser generator may have special support for building ASTs. void stmt() switch(token) case IF: accept(if); expr(); accept(then); stmt(); break; case ID: accept(id); accept(eq); expr(); break; default: error(); Compiler Construction 2010 F05-49 Compiler Construction 2010 F05-50

Hand-coded parser with semantic actions Stmt stmt() switch(token.kind) case IF: accept(if); Expr e = expr(); accept(then); Stmt s = stmt(); return new IfStmt(e, s); case ID: IdExpr id = new IdExpr(token.image); accept(id); accept(eq); Expr e = expr(); return new Assignment(id, e);.jj file with actions Stmt stmt() : Stmt s; (s = ifstmt() s = assignment()) return s; IfStmt ifstmt() : Expr e; Stmt s; "if" e = expr() "then" s = stmt() return new IfStmt(e, s); Assignment assignment() : Token t; Expr e; t = <ID> "=" e = expr() return new Assignment(new IdExpr(t.image), e); Compiler Construction 2010 F05-51 Building trees with JJTree Compiler Construction 2010 F05-52 jjtree stack jjtree is a preprocessor to javacc using a.jjt file grammar productions with tree building directives #... jjtree generates a.jj file with tree building actions builds a parse tree by default (Assignment 2) with option NODE DEFAULT VOID=true building requires # directives jjtree has a stack for collecting the children of a node. It is easy to build a node corresponding to a nonterminal ignore building a node for a terminal/nonterminal build List and Opt nodes build left recursive trees when the grammar is right recursive build trees when the grammar has common prefixes Compiler Construction 2010 F05-53 Compiler Construction 2010 F05-54

Building an AST node for a nonterminal Add a # directive to the corresponding AST class void ifstmt() #IfStmt : <IF> expr() <THEN> stmt() <FI> IfStmt: Stmt ::= Expr Stmt the current top of stack is marked expr() pushes an Expr node stmt() pushes a Stmt node a new IfStmt node is created the nodes above the mark will be popped and become children of the IfStmt node this node will be pushed on the stack Skipping nodes A terminal or nonterminal without a # directive will be skipped void ifstmt(): <IF> expr() <THEN> stmt() <FI> expr() pushes an Expr node if it has a # directive no node is pushed for THEN stmt() pushes a Stmt node if it has a # directive no nodes will be popped by this production no IfStmt is created this is probably an error Compiler Construction 2010 F05-55 Building a token node IdExpr: Expr ::= <ID:String>; void idexpr() #IdExpr : Token t; t = <ID> jjtthis.setid(t.image); jjtthis refers to the new IdExpr node the setid(string) method has been generated by jastadd Compiler Construction 2010 F05-57 Compiler Construction 2010 F05-56 Building a List node void program() #Program : "begin" ([stmt() (";" stmt())*]) #List(true) "end" Program ::= Stmt*; a marker for Program is put on the stack "begin" is parsed without pushing anything a marker for List is put on the stack a node is pushed on the stack for each stmt the parser reaches #List(true) and all nodes above the List marker are popped and become children of a new List node this node is pushed on the stack the "end" token is accepted the only node above the Program mark is popped an becomes the child of a new Program node this node is pushed on the stack Compiler Construction 2010 F05-58

Why #List(true)? the List will be built even it is empty with #List(false) nothing will be build for an empty list with jastadd empty trees must be generated Program Optional node void classdecl() #ClassDecl : "class" classid() (["extends" classid()]) #Opt(true) ClassBody() an Opt node will be built even if there is no extends clause with jastadd empty Opt nodes must be generated List ClassDecl Stmt... Stmt ClassId Opt ClassBody 0 or more may be empty... Compiler Construction 2010 F05-59 Building left-recursive trees with jjtree abstract Expr; BinExpr: Expr ::= Left:Expr Right:Expr; Mul: BinExpr; Div: BinExpr; void expr() : factor() ( "*" factor() #Mul(2) "/" factor() #Div(2))* #Mul(2) builds a Mul node pops two nodes from the stack and turns them into children pushes the new Mul node to the stack Compiler Construction 2010 F05-61 Compiler Construction 2010 F05-60 Returning the AST to the client Creating and using the parser Parser parser = new Parser(...); Start start = parser.start(); jjtree specification Start start() #Start : expr(); return jjtthis; jjtthis refers to the Start node created by start() Compiler Construction 2010 F05-62