MiniJava Parser and AST. Winter /27/ Hal Perkins & UW CSE E1-1

Similar documents
Syntactic Analysis. Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree.

Compiler Passes. Syntactic Analysis. Context-free Grammars. Syntactic Analysis / Parsing. EBNF Syntax of initial MiniJava.

Context-free grammars (CFG s)

Compiler Passes. Semantic Analysis. Semantic Analysis/Checking. Symbol Tables. An Example

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

CPS 506 Comparative Programming Languages. Syntax Specification

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

CSE P 501 Compilers. Implementing ASTs (in Java) Hal Perkins Autumn /20/ Hal Perkins & UW CSE H-1

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

CSE P 501 Compilers. Implementing ASTs (in Java) Hal Perkins Winter /22/ Hal Perkins & UW CSE H-1

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

Intermediate Representation. With the fully analyzed program expressed as an annotated AST, it s time to translate it into code

CSE P 501 Compilers. Static Semantics Hal Perkins Winter /22/ Hal Perkins & UW CSE I-1

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Syntax. In Text: Chapter 3

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

LL(k) Compiler Construction. Choice points in EBNF grammar. Left recursive grammar

A Short Summary of Javali

Principles of Programming Languages COMP251: Syntax and Grammars

LL(k) Compiler Construction. Top-down Parsing. LL(1) parsing engine. LL engine ID, $ S 0 E 1 T 2 3

IC Language Specification

Last Time. What do we want? When do we want it? An AST. Now!

Lecture 4 Abstract syntax

Today. Assignments. Lecture Notes CPSC 326 (Spring 2019) Quiz 4. Syntax Analysis: Abstract Syntax Trees. HW3 due. HW4 out (due next Thurs)

Chapter 4. Abstract Syntax

Decaf Language Reference Manual

Question Points Score

Program Representations

Project 2 Interpreter for Snail. 2 The Snail Programming Language

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter UW CSE P 501 Winter 2016 C-1

Syntax Intro and Overview. Syntax

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

CSCI 1260: Compilers and Program Analysis Steven Reiss Fall Lecture 4: Syntax Analysis I

The Decaf Language. 1 Lexical considerations

Note first midterm date: Wed. evening, Feb. 23 Today's class: Types in OCaml and abstract syntax

Semantic actions for declarations and expressions

The Abstract Syntax Tree Differences Between Tolmach's and Porter's Representations

MIT Semantic Analysis. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Intermediate Formats. for object oriented languages

CMPT 379 Compilers. Parse trees

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

1 Lexical Considerations

EDA180: Compiler Construc6on. More Top- Down Parsing Abstract Syntax Trees Görel Hedin Revised:

Lexical Considerations

CSE 401/M501 Compilers

Lexical Considerations

CSE 401 Midterm Exam Sample Solution 2/11/15

Chapter 3. Describing Syntax and Semantics ISBN

A Simple Syntax-Directed Translator

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

Syntax-Directed Translation. Lecture 14

Static Semantics. Winter /3/ Hal Perkins & UW CSE I-1

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

Intermediate Formats. for object oriented languages

The Decaf language 1

Lexical and Syntax Analysis

Parser and syntax analyzer. Context-Free Grammar Definition. Scanning and parsing. How bottom-up parsing works: Shift/Reduce tecnique.

Building lexical and syntactic analyzers. Chapter 3. Syntactic sugar causes cancer of the semicolon. A. Perlis. Chomsky Hierarchy

(Not Quite) Minijava

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

Syntax and Grammars 1 / 21

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CA4003 Compiler Construction Assignment Language Definition

A programming language requires two major definitions A simple one pass compiler

Lecture 14 Sections Mon, Mar 2, 2009

Defining syntax using CFGs

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Table-Driven Top-Down Parsers

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Syntax Errors; Static Semantics

CIT 3136 Lecture 7. Top-Down Parsing

CSE 3302 Programming Languages Lecture 2: Syntax

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

CS 11 Ocaml track: lecture 6

Parsing a primer. Ralf Lämmel Software Languages Team University of Koblenz-Landau

Today Winter Compiler Construction T5 AST Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CSE 340 Fall 2014 Project 4

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

A clarification on terminology: Recognizer: accepts or rejects strings in a language. Parser: recognizes and generates parse trees (imminent topic)

Intermediate Formats. Martin Rinard Laboratory for Computer Science

Error Detection in LALR Parsers. LALR is More Powerful. { b + c = a; } Eof. Expr Expr + id Expr id we can first match an id:

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Syntax. A. Bellaachia Page: 1

CS143 Handout 03 Summer 2012 June 27, 2012 Decaf Specification

Compilers. Compiler Construction Tutorial The Front-end

For example, we might have productions such as:

COMP520 - GoLite Type Checking Specification

CSCE 314 Programming Languages

Chapter 3. Syntax - the form or structure of the expressions, statements, and program units

Describing Syntax and Semantics

CSE 401 Midterm Exam Sample Solution 11/4/11

Overview of Assignment 5

Decaf Language Reference

Lab 2 Tutorial (An Informative Guide)

Transcription:

CSE 401 Compilers MiniJava Parser and AST Hal Perkins Winter 2009 1/27/2009 2002-09 Hal Perkins & UW CSE E1-1

Abstract Syntax Trees The parser s output is an abstract syntax tree (AST) representing the grammatical structure of the parsed input ASTs represent only semantically meaningful aspects of input program, unlike concrete syntax trees which record the complete textual form of the input There s no need to record keywords or punctuation like (), ;, else The rest of compiler only cares about the abstract structure 1/27/2009 2002-09 Hal Perkins & UW CSE E1-2

MiniJava AST Node Classes Each node in an AST is an instance of an AST class IfStmt, AssignStmt, AddExpr, VarDecl, etc. Each AST class declares its own instance variables holding its AST subtrees IfStmt has testexpr, thenstmt, and elsestmt AssignStmt has lhsvar and rhsexpr AddExpr has arg1expr and arg2expr VarDecl has typeexpr and varname CSE401 Wi09 3

AST Class Hierarchy AST classes are organized into an inheritance hierarchy based on commonalities of meaning and structure Each "abstract non-terminal" that has multiple alternative concrete forms will have an abstract class that s the superclass of the various alternative forms Stmt is abstract superclass of IfStmt, AssignStmt, etc. Expr is abstract superclass of AddExpr, VarExpr, etc. Type is abstract superclass of IntType, ClassType, etc. CSE401 Wi09 4

AST Extensions For Project New variable declarations: StaticVarDecl New types: DoubleType ArrayType yp New/changed statements: IfStmt can omit else branch ForStmt BreakStmt ArrayAssignStmt New expressions: DoubleLiteralExpr OE OrExpr ArrayLookupExpr ArrayLengthExpr ArrayNewExpr CSE401 Wi09 5

Automatic Parser Generation in MiniJava We use the CUP tool to automatically create a parser from a specification file, Parser/minijava.cup The MiniJava Makefile automatically rebuilds the parser whenever its specification file fl changes A CUP file has several sections: introductory declarations included with the generated parser declarations of the terminals and nonterminals with ihtheir types The AST node or other value returned when finished parsing that nonterminal or terminal precedence declarations productions + actions CSE401 Wi09 6

Terminal Declarations Terminal declarations we saw before: /* reserved words: */ terminal CLASS, PUBLIC, STATIC, EXTENDS;... /* tokens with values: */ terminal String IDENTIFIER; terminal Integer INT_LITERAL; CSE401 Wi09 7

Nonterminal Declarations Nonterminals are similar: nonterminal Program Program; nonterminal MainClassDecl ass MainClassDecl; ass nonterminal List/*<...>*/ ClassDecls; nonterminal RegularClassDecl ClassDecl;... nonterminal List/*<Stmt>*/ Stmts; nonterminal Stmt Stmt; nonterminal List/*<Expr>*/ Exprs; nonterminal List/*<Expr>*/ MoreExprs; nonterminal Expr Expr; nonterminal String Identifier; CSE401 Wi09 8

Java Generics and MiniJava CUP did not support Java generics when the MiniJava starter code was written, so there are some hacks An example: we d like to write nonterminal List<Expr> Exprs; but instead the code has nonterminal List/*<Expr>*/ Exprs; There are other hacks. Deal with them as gracefully as you can Don t make pointless changes to the code save your energy for more interesting things 1/27/2009 2002-09 Hal Perkins & UW CSE E1-9

Precedence Declarations Can specify precedence and associativity of operators equal precedence in a single declaration lowest precedence textually first specify left, right, or nonassoc with each declaration Examples: precedence left AND_AND; precedence nonassoc EQUALS_EQUALS, EQUALS, EXCLAIM_EQUALS; EQUALS; precedence left LESSTHAN, LESSEQUAL, GREATEREQUAL, GREATERTHAN; precedence left PLUS, MINUS; precedence left STAR, SLASH; precedence left EXCLAIM; precedence left PERIOD; CSE401 Wi09 10

Productions All of the form: LHS ::= RHS1 {: Java code 1 :} RHS2 {: Java code 2 :}... RHSn {: Java code n :}; Can label symbols in RHS with :var suffix to refer to its result value in Java code varleft is set to line in input where var symbol was CSE401 Wi09 11

Productions (cont.) Example Expr ::= Expr:arg1 PLUS Expr:arg2 {: RESULT = new AddExpr(arg1, arg2, arg1left);:} INT_LITERAL:value{: RESULT = new IntLiteralExpr( value.intvalue(),valueleft);:} Expr:rcvr PERIOD Identifier:message OPEN_PAREN Exprs:args CLOSE_PAREN {: RESULT = new MethodCallExpr( rcvr,message,args,rcvrleft);:}... ; CSE401 Wi09 12

Error Handling How to handle syntax error? Option 1: quit compilation + easy - inconvenient for programmer Option 2: error recovery + try to catch as many errors as possible on one compile - difficult cut to avoid od streams s of spurious errors Option 3: error correction + fix syntax errors as part of compilation - hard!! CSE401 Wi09 13

Panic Mode Error Recovery When finding a syntax error, skip tokens until reaching a landmark landmarks in MiniJava: ;,),} FOLLOW sets can be a useful source of landmarks once a landmark is found, hope to have gotten back on track In top-down parser, maintain set of landmark tokens as recursive descent proceeds landmarks selected from terminals later in production as parsing proceeds, set of landmarks will change, depending on the parsing context CSE401 Wi09 14

Panic Mode Error Recovery In bottom-up parser, can add special error nonterminals, followed by landmarks if syntax error, then will skip tokens till seeing landmark, then reduce and continue normally E.g. Stmt ::=... error ; { error } Expr ::=... ( error ) CSE401 Wi09 15

EBNF Syntax of initial MiniJava Program ::= MainClassDecl { ClassDecl } MainClassDecl ::= class ID { public static void main ( String [ ] ID ) { { Stmt } } ClassDecl ::= class ID [ extends ID ] { { ClassVarDecl } { MethodDecl } } ClassVarDecl ::= Type ID ; MethodDecl ::= public Type ID ( [ Formal {, Formal } ] ) { { Stmt } return Expr ; } Formal ::= Type ID Type ::= int boolean l ID CSE401 Wi09 16

Initial minijava [continued] Stmt ::= Type ID ; { {Stmt} } if ( Expr ) Stmt else Stmt while ( Expr ) Stmt System.out.println ( Expr ) ; ID = Expr ; Expr ::= Expr Op Expr! Expr Expr. ID( [ Expr {, Expr } ] ) ID this Integer true false ( Expr ) Op ::= + - * / < <= >= > ==!= && CSE401 Wi09 17