Languages and Compilers (SProg og Oversættere)

Similar documents
Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

Languages and Compilers (SProg og Oversættere) Lecture 3 recap Bent Thomsen Department of Computer Science Aalborg University

The Phases of a Compiler. Course Overview. In Chapter 4. Syntax Analysis. Syntax Analysis. Multi Pass Compiler. PART I: overview material

Parsing. Parsing. Bottom Up Parsing. Bottom Up Parsing. Bottom Up Parsing. Bottom Up Parsing

Abstract Syntax Trees and Contextual Analysis. Roland Backhouse March 8, 2001

Course Overview. Levels of Programming Languages. Compilers and other translators. Tombstone Diagrams. Syntax Specification

Syntax Analysis. Chapter 4

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

CSE 3302 Programming Languages Lecture 2: Syntax

CPS 506 Comparative Programming Languages. Syntax Specification

CSE302: Compiler Design

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Chapter 4: Syntax Analyzer

Context-free grammars (CFG s)

Lexical and Syntax Analysis. Top-Down Parsing

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Chapter 3. Parsing #1

Syntax. In Text: Chapter 3

Syntax Intro and Overview. Syntax

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

ICOM 4036 Spring 2004

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Week 2: Syntax Specification, Grammars

Chapter 3. Describing Syntax and Semantics ISBN

4. Lexical and Syntax Analysis

A simple syntax-directed

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

Lecture 4: Syntax Specification

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

4. Lexical and Syntax Analysis

Principles of Programming Languages COMP251: Syntax and Grammars

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Lexical and Syntax Analysis

Non-deterministic Finite Automata (NFA)

Chapter 4. Lexical and Syntax Analysis

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

Syntax and Grammars 1 / 21

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Languages and Compilers (SProg og Oversættere) Lecture 15 (2) Bent Thomsen Department of Computer Science Aalborg University

Topic 3: Syntax Analysis I

Building Compilers with Phoenix

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER COMPILERS ANSWERS. Time allowed TWO hours

A programming language requires two major definitions A simple one pass compiler

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Homework & Announcements

Last time. What are compilers? Phases of a compiler. Scanner. Parser. Semantic Routines. Optimizer. Code Generation. Sunday, August 29, 2010

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Syntax-Directed Translation. Lecture 14

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

COP 3402 Systems Software Syntax Analysis (Parser)

3. Parsing. Oscar Nierstrasz

COMP3131/9102: Programming Languages and Compilers

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Compiler Design Concepts. Syntax Analysis

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

CSCI 1260: Compilers and Program Analysis Steven Reiss Fall Lecture 4: Syntax Analysis I

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

CS 403: Scanning and Parsing

COMP3131/9102: Programming Languages and Compilers

Parsing II Top-down parsing. Comp 412

Grammars and Parsing. Paul Klint. Grammars and Parsing

Syntax and Semantics

Context-Free Grammars

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Compilers - Chapter 2: An introduction to syntax analysis (and a complete toy compiler)

CSCI312 Principles of Programming Languages

CSE 413 Final Exam Spring 2011 Sample Solution. Strings of alternating 0 s and 1 s that begin and end with the same character, either 0 or 1.

Context-free grammars

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

announcements CSE 311: Foundations of Computing review: regular expressions review: languages---sets of strings

Related Course Objec6ves

Describing Syntax and Semantics

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Some Basic Definitions. Some Basic Definitions. Some Basic Definitions. Language Processing Systems. Syntax Analysis (Parsing) Prof.

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

CMSC 330: Organization of Programming Languages. Context Free Grammars

Wednesday, August 31, Parsers

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Compiler Construction: Parsing

Introduction to Lexical Analysis

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Lexical and Syntax Analysis

Transcription:

Languages and Compilers (SProg og Oversættere) Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to Norm Hutchinson whose slides this lecture is based on. 1

Syntax Analysis In This Lecture (Scanning: recognize words or tokens in the input) Parsing: recognize phrase structure Different parsing strategies How to construct a recursive descent parser AST Construction Theoretical Tools : Regular Expressions Grammars Extended BNF notation Beware this lecture is a tour de force of the front-end, but should help you get started with your projects. 2

The Phases of a Compiler Source Program Syntax Analysis This lecture Error Reports Contextual Analysis Abstract Syntax Tree Error Reports Code Generation Decorated Abstract Syntax Tree Object Code 3

Syntax Analysis The job of syntax analysis is to read the source text and determine its phrase structure. Subphases Scanning Parsing Construct an internal representation of the source text that reifies the phrase structure (usually an AST) Note: A single-pass compiler usually does not construct an AST. 4

calls Multi Pass Compiler A multi pass compiler makes several passes over the program. The output of a preceding phase is stored in a data structure and used by subsequent phases. Dependency diagram of a typical Multi Pass Compiler: This lecture Compiler Driver calls calls Syntactic Analyzer input output Contextual Analyzer input output Code Generator input output Source Text AST Decorated AST Object Code 5

Syntax Analysis Dataflow chart Source Program Stream of Characters Scanner Error Reports Stream of Tokens This lecture Parser Error Reports Abstract Syntax Tree 6

1) Scan: Divide Input into Tokens An example mini Triangle source program: let var y: Integer in!new year y := y+1 scanner Tokens are words in the input, for example keywords, operators, identifiers, literals, etc. let let var var ident. y colon : ident. Integer in in...... ident. y becomes := ident. y op. + intlit 1 eot 7

2) Parse: Determine phrase structure Parser analyzes the phrase structure of the token stream with respect to the grammar of the language. Program single-command Declaration single-declaration single-command Expression primary-exp Type Denoter V-Name V-Name primary-exp Ident Ident Ident Ident Op. Int.Lit let var id. col. id. in id. bec. id. op intlit eot let var y : Int in y := y + 1 8

3) AST Construction Program LetCommand AssignCommand VarDecl BinaryExpr SimpleT SimpleV. VNameExp Int.Expr SimpleV Ident Ident Ident Ident Op Int.Lit y Integer y y + 1 9

RECAP: Grammars The Syntax of a Language can be specified by means of a CFG (Context Free Grammar). CFG can be expressed in BNF (Bachus-Naur Formalism) Example: Mini Triangle grammar in BNF Program ::= single-command Command ::= single-command Command ; single-command single-command ::= V-name := := Expression begin Command end... 10

Grammars (ctd.) For our convenience, we will use EBNF or Extended BNF rather than simple BNF. EBNF = BNF + regular expressions Example: Mini Triangle in EBNF * means 0 or more occurrences of Program ::= single-command Command ::= ( Command ; )* )* single-command single-command ::= V-name := := Expression begin Command end... 11

Regular Expressions RE are a notation for expressing a set of strings of terminal symbols. Different kinds of RE: ε The empty string t Generates only the string t X Y Generates any string xy such that x is generated by x and y is generated by Y X Y Generates any string which generated either by X or by Y X* The concatenation of zero or more strings generated by X (X) For grouping, 12

Regular Expressions The languages that can be defined by RE and CFG have been extensively studied by theoretical computer scientists. These are some important conclusions / terminology RE is a weaker formalism than CFG: Any language expressible by a RE can be expressed by CFG but not the other way around! The languages expressible as RE are called regular languages Generally: a language that exhibits self embedding cannot be expressed by RE. Programming languages exhibit self embedding. (Example: an expression can contain an (other) expression). 13

Extended BNF Extended BNF combines BNF with RE A production in EBNF looks like LHS ::= RHS where LHS is a non terminal symbol and RHS is an extended regular expression An extended RE is just like a regular expression except it is composed of terminals and non terminals of the grammar. Simply put... EBNF adds to BNF the notation of (...) for the purpose of grouping and * for denoting 0 or more repetitions of ( + for denoting 1 or more repetitions of ) ( [ ] for denoting (ε ) ) 14

Extended BNF: an Example Example: a simple expression language Expression ::= PrimaryExp (Operator PrimaryExp)* PrimaryExp ::= Literal Identifier ( Expression ) Identifier ::= Letter (Letter Digit)* Literal ::= Digit Digit* Letter ::= a b c... z z Digit ::= 0 1 2 3 4... 9 15

A little bit of useful theory We will now look at a few useful bits of theory. These will be necessary later when we implement parsers. Grammar transformations A grammar can be transformed in a number of ways without changing the meaning (i.e. the set of strings that it defines) The definition and computation of starter sets 16

Left factorization 1) Grammar Transformations X Y X Z X ( Y Z ) X Y= ε Z Example: single-command ::= V-name := := Expression if if Expression then single-command if if Expression then single-command else single-command single-command ::= V-name := := Expression if if Expression then single-command ( ε else single-command) 17

1) Grammar Transformations (ctd) Elimination of Left Recursion N ::= X N Y N ::= X Y* Example: Identifier ::= Letter Identifier Letter Identifier Digit Identifier ::= Letter Identifier (Letter Digit) Identifier ::= Letter (Letter Digit)* 18

1) Grammar Transformations (ctd) Substitution of non-terminal symbols N ::= X M ::= α N β Example: N ::= X M ::= α X β single-command ::= for contrvar := := Expression to-or-dt Expression do do single-command to-or-dt ::= to to downto single-command ::= for contrvar := := Expression (to downto) Expression do do single-command 19

2) Starter Sets Informal Definition: The starter set of a RE X is the set of terminal symbols that can occur as the start of any string generated by X Example : starters[ (+ - ε)(0 1 9)* ] = {+,-, 0,1,,9} Formal Definition: starters[ε] ={} starters[t] ={t} (where t is a terminal symbol) starters[x Y] = starters[x] starters[y] (if X generates ε) starters[x Y] = starters[x] (if not X generates ε) starters[x Y] = starters[x] starters[y] starters[x*] = starters[x] 20

2) Starter Sets (ctd) Informal Definition: The starter set of RE can be generalized to extended BNF Formal Definition: starters[n] = starters[x] (for production rules N ::= X) Example : starters[expression] = starters[primaryexp (Operator PrimaryExp)*] = starters[primaryexp] = starters[identifiers] starters[(expression)] = starters[a b c... z] {(} = {a, b, c,, z, (} 21

We will now look at parsing. Topics: Parsing Some terminology Different types of parsing strategies bottom up top down Recursive descent parsing What is it How to implement one given an EBNF specification (How to generate one using tools in Lecture 4) (Bottom up parsing algorithms in Lecture 5) 22

Recognition Parsing: Some Terminology To answer the question does the input conform to the syntax of the language Parsing Recognition + determine phrase structure (for example by generating AST data structures) (Un)ambiguous grammar: A grammar is unambiguous if there is only at most one way to parse any input. (i.e. for syntactically correct program there is precisely one parse tree) 23

Different kinds of Parsing Algorithms Two big groups of algorithms can be distinguished: bottom up strategies top down strategies Example parsing of Micro-English Sentence ::= Subject Verb Object. Subject ::= I a Noun the Noun Object ::= me me a Noun the Noun Noun ::= cat mat rat Verb ::= like is is see sees The cat sees the rat. The rat sees me. I like a cat The rat like me. I see the rat. I sees a rat. 24

Top-down parsing The parse tree is constructed starting at the top (root). Sentence Subject Verb Object. Noun Noun The cat sees a rat. 25

Bottom up parsing The parse tree grows from the bottom (leafs) up to the top (root). Sentence Subject Object Noun Verb Noun The cat sees a rat. 26

Top-Down vs. Bottom-Up parsing LL-Analyse (Top-Down) Left-to-Right Left Derivative LR-Analyse (Bottom-Up) Left-to-Right Right Derivative Reduction Derivation Look-Ahead Look-Ahead 27

Recursive Descent Parsing Recursive descent parsing is a straightforward top-down parsing algorithm. We will now look at how to develop a recursive descent parser from an EBNF specification. Idea: the parse tree structure corresponds to the call graph structure of parsing procedures that call each other recursively. 28

Recursive Descent Parsing Sentence ::= Subject Verb Object. Subject ::= I a Noun the Noun Object ::= me me a Noun the Noun Noun ::= cat mat rat Verb ::= like is is see sees Define a procedure parsen for each non-terminal N private void parsesentence() ; private void parsesubject(); private void parseobject(); private void parsenoun(); private void parseverb(); 29

Recursive Descent Parsing public class MicroEnglishParser {{ private TerminalSymbol currentterminal; //Auxiliary methods will go go here...... //Parsing methods will go go here...... 30

Recursive Descent Parsing: Auxiliary Methods public class MicroEnglishParser {{ private TerminalSymbol currentterminal private void void accept(terminalsymbol expected) {{ if if (currentterminal matches expected) currentterminal = next next input terminal ;; else else report a syntax error...... 31

Recursive Descent Parsing: Parsing Methods Sentence ::= Subject Verb Object. private void void parsesentence() {{ parsesubject(); parseverb(); parseobject(); accept(. ); 32

Recursive Descent Parsing: Parsing Methods Subject ::= I a Noun the Noun private void void parsesubject() {{ if if (currentterminal matches I ) I ) accept( I ); else else if if (currentterminal matches a ) a ){{ accept( a ); parsenoun(); else else if if (currentterminal matches the ) {{ accept( the ); parsenoun(); else else report a syntax error 33

Recursive Descent Parsing: Parsing Methods Noun ::= cat mat rat private void void parsenoun() {{ if if (currentterminal matches cat ) accept( cat ); else else if if (currentterminal matches mat ) accept( mat ); else else if if (currentterminal matches rat ) accept( rat ); else else report a syntax error 34

Developing RD Parser for Mini Triangle Before we begin: The following non-terminals are recognized by the scanner They will be returned as tokens by the scanner Identifier := := Letter (Letter Digit)* Integer-Literal ::= Digit Digit* Operator ::= + - * / < > = Comment ::=! Graphic* eol Assume scanner produces instances of: public class Token {{ byte kind; String spelling; final static byte IDENTIFIER = 0, 0, INTLITERAL = 1; 1;...... 35

(1+2) Express grammar in EBNF and factorize... Program ::= single-command Command ::= single-command Command Left ; single-command recursion elimination needed single-command Left factorization needed ::= V-name := := Expression Identifier ( Expression ) if if Expression then single-command else single-command while Expression do do single-command let Declaration in in single-command begin Command end V-name ::= Identifier... 36

(1+2)Express grammar in EBNF and factorize... After factorization etc. we get: Program ::= single-command Command ::= single-command (;single-command)* single-command ::= Identifier ( := := Expression ( Expression ) ) if if Expression then single-command else single-command while Expression do do single-command let Declaration in in single-command begin Command end V-name ::= Identifier... 37

Developing RD Parser for Mini Triangle Expression Left recursion elimination needed ::= primary-expression Expression Operator primary-expression primary-expression ::= Integer-Literal V-name Operator primary-expression ( Expression ) Declaration Left recursion elimination needed ::= single-declaration Declaration ; single-declaration single-declaration ::= const Identifier ~ Expression var Identifier : Type-denoter Type-denoter ::= Identifier 38

(1+2) Express grammar in EBNF and factorize... After factorization and recursion elimination : Expression ::= primary-expression ( Operator primary-expression )* )* primary-expression ::= Integer-Literal Identifier Operator primary-expression ( Expression ) Declaration ::= single-declaration (;single-declaration)* single-declaration ::= const Identifier ~ Expression var Identifier : Type-denoter Type-denoter ::= Identifier 39

(3) Create a parser class with... public class Parser {{ private Token currenttoken; private void void accept(byte expectedkind) {{ if if (currenttoken.kind == == expectedkind) currenttoken = scanner.scan(); else else report syntax error private void void acceptit() {{ currenttoken = scanner.scan(); public void void parse() {{ acceptit(); //Get the the first first token parseprogram(); if if (currenttoken.kind!=!= Token.EOT) report syntax error...... 40

(4) Implement private parsing methods: Program ::= single-command private void void parseprogram() {{ parsesinglecommand(); 41

(4) Implement private parsing methods: single-command ::= Identifier ( := := Expression ( Expression ) ) if if Expression then single-command else single-command... more alternatives... private void void parsesinglecommand() {{ switch (currenttoken.kind) {{ case Token.IDENTIFIER ::... case Token.IF ::......... more cases...... default: report a syntax error 42

(4) Implement private parsing methods: single-command ::= Identifier ( := := Expression ( Expression ) ) if if Expression then single-command else single-command while Expression do do single-command let Declaration in in single-command begin Command end From the above we can straightforwardly derive the entire implementation of parsesinglecommand (much as we did in the microenglish example) 43

Algorithm to convert EBNF into a RD parser The conversion of an EBNF specification into a Java implementation for a recursive descent parser is so mechanical that it can easily be automated! => JavaCC Java Compiler Compiler We can describe the algorithm by a set of mechanical rewrite rules N ::= ::= X private void void parsen() {{ parse X 44

Algorithm to convert EBNF into a RD parser parse tt accept(t); parse N parsen(); where ttis is a terminal where N is is a non-terminal parse ε // // a dummy statement parse XY XY parse X parse Y 45

Algorithm to convert EBNF into a RD parser parse X* X* while (currenttoken.kind is is in in starters[x]) {{ parse X parse X Y X Y switch (currenttoken.kind) {{ cases in instarters[x]: parse X break; cases in instarters[y]: parse Y break; default: report syntax error 46

Example: Generation of parsecommand Command ::= ::= single-command ((;; single-command )* )* private void void parsecommand() { { parsesinglecommand(); single-command ((;; single-command )* )* while parse (currenttoken.kind==token.semicolon) ( ( ; ; single-command )* )* { { } } parse acceptit(); ; ; single-command parsesinglecommand(); single-command } } 47

Example: Generation of parsesingledeclaration single-declaration ::= ::= const Identifier ~ Type-denoter var varidentifier :: Expression private void void parsesingledeclaration() {{ switch (currenttoken.kind) {{ private case void void parsesingledeclaration() { Token.CONST: { parse switch acceptit(); const (currenttoken.kind) Identifier ~ Type-denoter { { case parseidentifier(); var var Token.CONST: :: Expression parse acceptit(); acceptit(token.is); const Identifier ~ Type-denoter case parsetypedenoter(); parseidentifier(); Token.VAR: case parse acceptit(token.is); parse ~ var varidentifier :: Expression Token.VAR: default: parsetypedenoter(); acceptit(); Type-denoter report syntax error case parseidentifier(); Token.VAR: parse acceptit(token.colon); var var Identifier : : Expression default: parseexpression(); report syntax error } } default: report syntax error } } 48

LL(1) Grammars The presented algorithm to convert EBNF into a parser does not work for all possible grammars. It only works for so called LL(1) grammars. What grammars are LL(1)? Basically, an LL(1) grammar is a grammar which can be parsed with a top-down parser with a lookahead (in the input stream of tokens) of one token. How can we recognize that a grammar is (or is not) LL(1)? There is a formal definition which we will skip for now We can deduce the necessary conditions from the parser generation algorithm. 49

parse X* X* LL(1) Grammars while (currenttoken.kind is is in in starters[x]) {{ parse X parse X Y X Y switch (currenttoken.kind) {{ cases in instarters[x]: parse X break; cases in instarters[y]: parse Y break; default: report syntax error Condition: starters[x] must be be disjoint from the the set set of of tokens that can immediately follow X * Condition: starters[x] and starters[y] must be be disjoint sets. 50

LL(1) grammars and left factorisation The original mini-triangle grammar is not LL(1): For example: single-command ::= V-name := := Expression Identifier ( Expression )... V-name ::= Identifier Starters[V-name := Expression] = Starters[V-name] = Starters[Identifier] Starters[Identifier ( Expression )] = Starters[Identifier] NOT DISJOINT! 51

LL(1) grammars: left factorization What happens when we generate a RD parser from a non LL(1) grammar? single-command ::= V-name := := Expression Identifier ( Expression )... private void void parsesinglecommand() {{ switch (currenttoken.kind) {{ case Token.IDENTIFIER: parse V-name := := Expression case Token.IDENTIFIER: parse Identifier ( Expression )...other cases... default: report syntax error wrong: overlapping cases 52

LL(1) grammars: left factorization single-command ::= V-name := := Expression Identifier ( Expression )... Left factorization (and substitution of V-name) single-command ::= Identifier ( := := Expression ( Expression ) )... 53

LL1 Grammars: left recursion elimination Command ::= single-command Command ; single-command What happens if we don t perform left-recursion elimination? public void voidparsecommand() {{ switch (currenttoken.kind) {{ case in in starters[single-command] parsesinglecommand(); case in in starters[command] parsecommand(); accept(token.semicolon); parsesinglecommand(); default: report syntax error wrong: overlapping cases 54

LL1 Grammars: left recursion elimination Command ::= single-command Command ; single-command Left recursion elimination Command ::= single-command (; (; single-command)* 55

Systematic Development of RD Parser (1) Express grammar in EBNF (2) Grammar Transformations: Left factorization and Left recursion elimination (3) Create a parser class with private variable currenttoken methods to call the scanner: accept and acceptit (4) Implement private parsing methods: add private parsen method for each non terminal N public parse method that gets the first token form the scanner calls parses (S is the start symbol of the grammar) 56

Abstract Syntax Trees So far we have talked about how to build a recursive descent parser which recognizes a given language described by an LL(1) EBNF grammar. Now we will look at how to represent AST as data structures. how to refine a recognizer to construct an AST data structure. 57

AST Representation: Possible Tree Shapes The possible form of AST structures is completely determined by an AST grammar (as described before in lecture 1-2) Example: remember the Mini-triangle abstract syntax Command ::= V-name := := Expression AssignCmd Identifier ( Expression ) CallCmd if if Expression then Command else Command IfCmd while Expression do do Command WhileCmd let Declaration in in Command LetCmd Command ; Command SequentialCmd 58

AST Representation: Possible Tree Shapes Example: remember the Mini-triangle AST (excerpt below) Command ::= VName := := Expression AssignCmd... AssignCmd V E 59

AST Representation: Possible Tree Shapes Example: remember the Mini-triangle AST (excerpt below) Command ::=... Identifier ( Expression ) CallCmd... CallCmd Identifier Spelling E 60

AST Representation: Possible Tree Shapes Example: remember the Mini-triangle AST (excerpt below) Command ::=... if if Expression then Command else Command IfCmd... IfCmd E C1 C2 61

AST Representation: Java Data Structures Example: Java classes to represent Mini-Triangle AST s 1) A common (abstract) super class for all AST nodes public abstract class AST {... } 2) A Java class for each type of node. abstract as well as concrete node types LHS ::=... Tag1... Tag2 abstract AST abstract LHS concrete Tag1 Tag2 62

Example: Mini Triangle Commands ASTs Command ::= V-name := := Expression AssignCmd Identifier ( Expression ) CallCmd if if Expression then Command else Command IfCmd while Expression do do Command WhileCmd let Declaration in in Command LetCmd Command ; Command SequentialCmd public abstract class Command extends AST AST {{... public class AssignCommand extends Command {{... public class CallCommand extends Command {{... public class IfCommand extends Command {{... etc. etc. 63

Example: Mini Triangle Command ASTs Command ::= V-name := := Expression AssignCmd Identifier ( Expression ) CallCmd... public class AssignCommand extends Command {{ public Vname V; V; // // assign to to what variable? public Expression E; E; // // what to to assign?...... public class CallCommand extends Command {{ public Identifier I; I; //procedure name public Expression E; E; //actual parameter............ 64

AST Terminal Nodes public abstract class Terminal extends AST AST {{ public String spelling;...... public class Identifier extends Terminal {{... public class IntegerLiteral extends Terminal {{... public class Operator extends Terminal {{... 65

AST Construction First, every concrete AST class of course needs a constructor. Examples: public class AssignCommand extends Command {{ public Vname V; V; // // Left Left side side variable public Expression E; E; // // right right side side expression public AssignCommand(Vname V; V; Expression E) E) {{ this.v = V; V; this.e=e;...... public class Identifier extends Terminal {{ public class Identifier(String spelling) {{ this.spelling = spelling;...... 66

AST Construction We will now show how to refine our recursive descent parser to actually construct an AST. N ::= X private N parsen() {{ N itsast; parse X at at the same time constructing itsast return itsast; 67

Example: Construction Mini-Triangle ASTs Command ::= ::= single-command ((;; single-command )* )* // // old AST-generating old (recognizing version only) version: private Command void void parsecommand() {{ {{ parsesinglecommand(); itsast; while itsast (currenttoken.kind==token.semicolon) = parsesinglecommand(); {{ while acceptit(); (currenttoken.kind==token.semicolon) {{ parsesinglecommand(); acceptit(); Command extracmd = parsesinglecommand(); itsast = new new SequentialCommand(itsAST,extraCmd); return itsast; 68

Example: Construction Mini-Triangle ASTs single-command ::= Identifier ( := := Expression ( Expression ) ) if if Expression then single-command else single-command while Expression do do single-command let Declaration in in single-command begin Command end private Command parsesinglecommand() {{ Command comast; parse it it and and construct AST AST return comast; 69

Example: Construction Mini-Triangle ASTs private Command parsesinglecommand() {{ Command comast; switch (currenttoken.kind) {{ case Token.IDENTIFIER: parse Identifier ((:= := Expression (( Expression )))) case Token.IF: parse if if Expression then single-command else else single-command case Token.WHILE: parse while Expression do do single-command case Token.LET: parse let let Declaration in in single-command case Token.BEGIN: parse begin Command end end return comast; 70

Example: Construction Mini-Triangle ASTs...... case Token.IDENTIFIER: //parse Identifier ((:= := Expression // // (( Expression )))) Identifier iast = parseidentifier(); switch (currenttoken.kind) {{ case Token.BECOMES: acceptit(); Expression east = parseexpression(); comast = new new AssignmentCommand(iAST,eAST); break; case Token.LPAREN: acceptit(); Expression east = parseexpression(); comast = new new CallCommand(iAST,eAST); accept(token.rparen); break; break;...... 71

Example: Construction Mini-Triangle ASTs...... break; case Token.IF: //parse if if Expression then single-command // // else else single-command acceptit(); Expression east = parseexpression(); accept(token.then); Command thnast = parsesinglecommand(); accept(token.else); Command elsast = parsesinglecommand(); comast = new new IfCommand(eAST,thnAST,elsAST); break; case Token.WHILE:...... 72

Example: Construction Mini-Triangle ASTs...... break; case Token.BEGIN: //parse begin Command end end acceptit(); comast = parsecommand(); accept(token.end); break; default: report a syntax error; return comast; 73

Syntax Error Handling Example: 1. let 2. var x:integer; 3. var y:integer; 4. func max(i:integer ; j:integer) : Integer; 5.! return maximum of integers I and j 6. begin 7. if I > j then max := I ; 8. else max := j 9. end; 10. in 11. getint (x);getint(y); 12. puttint (max(x,y)) 13. end. 74

Common Punctuation Errors Using a semicolon instead of a comma in the argument list of a function declaration (line 4) and ending the line with semicolon Leaving out a mandatory tilde (~) at the end of a line (line 4) Undeclared identifier I (should have been i) (line 7) Using an extraneous semicolon before an else (line 7) Common Operator Error : Using = instead of := (line 7 or 8) Misspelling keywords : puttint instead of putint (line 12) Missing begin or end (line 9 missing), usually difficult to repair. 75

Error Reporting A common technique is to print the offending line with a pointer to the position of the error. The parser might add a diagnostic message like semicolon missing at this position if it knows what the likely error is. The way the parser is written may influence error reporting is: private void parsesingledeclaration () { switch (currenttoken.kind) { case Token.CONST: { acceptit(); } break; case Token.VAR: { acceptit(); } break; default: report a syntax error } } 76

Error Reporting private void parsesingledeclaration () { if (currenttoken.kind == Token.CONST) { acceptit(); } else { acceptit(); } } Ex: d ~ 7 above would report missing var token 77

How to handle Syntax errors Error Recovery : The parser should try to recover from an error quickly so subsequent errors can be reported. If the parser doesn t recover correctly it may report spurious errors. Possible strategies: Panic mode Phase-level Recovery Error Productions 78

Panic-mode Recovery Discard input tokens until a synchronizing token (like; or end) is found. Simple but may skip a considerable amount of input before checking for errors again. Will not generate an infinite loop. 79

Perform local corrections Phase-level Recovery Replace the prefix of the remaining input with some string to allow the parser to continue. Examples: replace a comma with a semicolon, delete an extraneous semicolon or insert a missing semicolon. Must be careful not to get into an infinite loop. 80

Recovery with Error Productions Augment the grammar with productions to handle common errors Example: parameter_list ::= identifier_list : type parameter_list, identifier_list : type parameter_list; error ( comma should be a semicolon ) identifier_list : type 81

Syntactic analysis Prepare the grammar Grammar transformations Left-factoring Left-recursion removal Substitution (Lexical analysis) Next lecture Quick review Parsing - Phrase structure analysis Group words into sentences, paragraphs and complete programs Top-Down and Bottom-Up Recursive Decent Parser Construction of AST Note: You will need (at least) two grammars One for Humans to read and understand (may be ambiguous, left recursive, have more productions than necessary, ) One for constructing the parser 82