Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Similar documents
Context-free grammars (CFG s)

Syntactic Analysis. Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree.

Compiler Passes. Syntactic Analysis. Context-free Grammars. Syntactic Analysis / Parsing. EBNF Syntax of initial MiniJava.

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

CSE 401: Introduction to Compiler Construction. Course Outline. Grading. Project

3. Parsing. Oscar Nierstrasz

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:


Top down vs. bottom up parsing

Monday, September 13, Parsers

Lexical and Syntax Analysis. Top-Down Parsing

Parsing Algorithms. Parsing: continued. Top Down Parsing. Predictive Parser. David Notkin Autumn 2008

A programming language requires two major definitions A simple one pass compiler

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

CSE 3302 Programming Languages Lecture 2: Syntax

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Lexical and Syntax Analysis

Syntax. In Text: Chapter 3

Principles of Programming Languages COMP251: Syntax and Grammars

COP 3402 Systems Software Syntax Analysis (Parser)

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Defining syntax using CFGs

Project 2 Interpreter for Snail. 2 The Snail Programming Language

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter UW CSE P 501 Winter 2016 C-1

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

LL(k) Compiler Construction. Top-down Parsing. LL(1) parsing engine. LL engine ID, $ S 0 E 1 T 2 3

Context-Free Grammars

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

CMPT 755 Compilers. Anoop Sarkar.

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CPS 506 Comparative Programming Languages. Syntax Specification

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

Wednesday, August 31, Parsers

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Programming Language Syntax and Analysis

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Chapter 3. Describing Syntax and Semantics ISBN

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CSCI312 Principles of Programming Languages

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Parsing II Top-down parsing. Comp 412

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

CSCI 1260: Compilers and Program Analysis Steven Reiss Fall Lecture 4: Syntax Analysis I

How do LL(1) Parsers Build Syntax Trees?

LL(k) Compiler Construction. Choice points in EBNF grammar. Left recursive grammar

CMSC 330: Organization of Programming Languages

Compilers. Predictive Parsing. Alex Aiken

VIVA QUESTIONS WITH ANSWERS

CSCE 314 Programming Languages

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Bottom-Up Parsing. Lecture 11-12

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

CS 314 Principles of Programming Languages

Chapter 3. Parsing #1

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Syntactic Analysis. Top-Down Parsing

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Syntax. A. Bellaachia Page: 1

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Syntax Analysis Part I

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Using an LALR(1) Parser Generator

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

CS 406/534 Compiler Construction Parsing Part I

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Building Compilers with Phoenix

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

EECS 6083 Intro to Parsing Context Free Grammars

CA Compiler Construction

Context-free grammars

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Dr. D.M. Akbar Hussain

Abstract Syntax Trees & Top-Down Parsing

Parsing III. (Top-down parsing: recursive descent & LL(1) )

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

MiniJava Parser and AST. Winter /27/ Hal Perkins & UW CSE E1-1

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

CMSC 330: Organization of Programming Languages

Compiler Construction: Parsing

Chapter 3. Describing Syntax and Semantics

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

CMSC 330: Organization of Programming Languages. Context Free Grammars

Syntax-Directed Translation. Lecture 14

Concepts Introduced in Chapter 4

Lecture 8: Deterministic Bottom-Up Parsing

Transcription:

Susan Eggers 1 CSE 401 Syntax Analysis/Parsing Context-free grammars (CFG s) Purpose: determine if tokens have the right form for the language (right syntactic structure) stream of tokens abstract syntax tree (AST) AST: captures hierarchical structure of input program a primary representation of program Syntax specified using CFG s capture important structural characteristics Notation for CFG s: Backus Normal (Naur) Form (BNF) set of terminal symbols (tokens from lexical analysis) set of nonterminals (sequences of terminals &/or nonterminals): impose the hierarchical structure set of productions combine terminals & nonterminals nonterminal ::= nonterminals &/or terminals start symbol: nonterminal that denotes the language CFG: set of productions that define a language Susan Eggers 2 CSE 401 BNF description of PL/0 syntax Program ::= module Id Block Id. Block ::= DeclList begin StmtList end DeclList ::= { Decl Decl ::= ConstDecl ProcDecl VarDecl ConstDecl ::= const ConstDeclItem {, ConstDeclItem ConstDeclItem ::=Id : Type = ConstExpr ConstExpr ::= Id Integer VarDecl ::= var VarDeclItem {, VarDeclItem VarDeclItem ::= Id : Type ProcDecl ::= procedure Id ( [ FormalDecl {, FormalDecl ] ) Block Id FormalDecl ::= Id : Type Type ::= int StmtList ::= { Stmt Stmt ::= CallStmt AssignStmt OutStmt IfStmt WhileStmt CallStmt ::= Id ( [ Exprs ] ) AssignStmt ::= LValue := Expr LValue ::= Id OutStmt ::= output := Expr IfStmt ::= if Test then StmtList end WhileStmt ::= while Test do StmtList end Test ::= odd Sum Sum Relop Sum Relop ::= <= <> < >= > = Exprs ::= Expr {, Expr Expr ::= Sum Sum ::= Term { (+ -) Term Term ::= Factor { (* /) Factor Factor ::= - Factor LValue Integer input ( Expr ) Context-free grammars vs. Regular Expressions CFG can check everything a RE can but: not need CFG power for lexical analysis REs are a more concise notation for tokens lexical analyzers constructed automatically are more efficient more modular front end RE s not powerful enough for parsing nested constructs recursion Susan Eggers 3 CSE 401 Susan Eggers 4 CSE 401 Derivations & Parse Trees grammar Derivation: define the language specified by the grammar sequence of expansion steps, beginning with start symbol, leading to a string of terminals production seen as rewriting rule: nonterminal replaced by the rhs Parsing: inverse of derivation given target string of terminals (tokens), want to recover nonterminals representing structure Can represent derivation as a: parse tree (concrete syntax tree) graphical representation for a derivation keeps the grammar symbols don t record the expansion order abstract syntax tree (AST) simpler representation precedence implied by the hierarchy E ::= E Op E - E ( E ) id Op ::= + - * / (a + b * -c) * d E -> E op E -> E op id -> E * id -> (E) * id -> (E op E) * id -> (E op -E) * id -> (E op -id) * id -> (E * -id) * id -> (E op E * -id) * id -> (E op id * -id) * id -> (E + id * -id) * id -> (id + id * -id) * id Susan Eggers 5 CSE 401 Susan Eggers 6 CSE 401

Susan Eggers 7 CSE 401 AST s AST representations in project Abstract syntax trees represent only important aspects of concrete syntax trees no need for sign posts like ( ),, do, end rest of compiler only cares about abstract structure can regenerate concrete syntax tree from AST when needed Susan Eggers 8 CSE 401 AST extensions in project Expressions: true and false constants array index expression function call expression and, or operators tests are expressions constant expressions Statements: for statement return stmt if stmt with else array assignment stmt (similar to array index expression) Parsing algorithms Given grammar, want to see if an input program can be generated by it check legality produce AST representing structure be efficient Kinds of parsing algorithms: top-down LL(k) grammar bottom-up LR(k) grammar Left to right scan on input Leftmost/Rightmost derivation can see k tokens at once Declarations: procedures with result type var parameters (passed by reference) Types: boolean type array type Susan Eggers 9 CSE 401 Susan Eggers 10 CSE 401 Top-down parsing Predictive parsing Build parse tree for input program from the top (start symbol) down to leaves (terminals) find leftmost derivation for an input string (replace the leftmost nonterminal at each step) create parse tree nodes in preorder Basic issue: when replacing a nonterminal with some rhs, how to pick which rhs? E.g. Stmt ::= Call Assign If While Call ::= Id Assign ::= Id := Expr If ::= if Test then Stmts end if Test then Stmts else Stmts end While ::= while Test do Stmts end Predictive parser: top-down parser that can select correct rhs looking at at most k input tokens (the look-ahead) Efficient: no backtracking needed linear time to parse Implementation of predictive parsers: table-driven parser like table-driven FSA plus stack to hold productions, recursively recursive descent parser each nonterminal parsed by a procedure call other procedures to parse sub-nonterminals, recursively Solution: look at input tokens to help decide Susan Eggers 11 CSE 401 Susan Eggers 12 CSE 401

Susan Eggers 13 CSE 401 LL(k) grammars Ambiguity Can construct predictive parser automatically/easily if grammar is LL(k) Left-to-right scan of input, Leftmost derivation k tokens of look-ahead needed (k 1) to make parsing decisions Some restrictions: no ambiguity >1 parse tree (derivation) for a sentence in the language no common prefixes of length k S ::= if Test then Ss end if Test then Ss else Ss end no left recursion E ::= E Op E a few others Some grammars are ambiguous: multiple parse trees with same final string produces more than one leftmost or rightmost derivation Structure of parse tree captures much of meaning of program ambiguity multiple possible meanings for same program Solutions: 1) add meta-rules 2) change the grammar 3) change the language Restrictions guarantee that, given k input tokens, can always select correct rhs to expand nonterminal Susan Eggers 14 CSE 401 Famous ambiguities: dangling else Stmt ::= if Expr then Stmt if Expr then Stmt else Stmt if e 1 then if e 2 then s 1 else s 2 Option 1: add a meta-rule e.g. else associates with closest previous then + works + keeps original grammar intact - ad hoc and informal Susan Eggers 15 CSE 401 Susan Eggers 16 CSE 401 Option 2: rewrite the grammar to resolve ambiguity explicitly Stmt ::= MatchedStmt UnmatchedStmt MatchedStmt ::= if Expr then MatchedStmt else MatchedStmt UnmatchedStmt ::= if Expr then Stmt if Expr then MatchedStmt else UnmatchedStmt Option 3: redesign the language to remove the ambiguity Stmt ::= if Expr then Stmt end if Expr then Stmt else Stmt end extra end required for every if + formal, clear, elegant - changing the language + formal, no additional rules beyond syntax - sometimes obscures original grammar Susan Eggers 17 CSE 401 Susan Eggers 18 CSE 401

Susan Eggers 19 CSE 401 Another famous ambiguity: expressions E ::= E Op E - E ( E ) id Op ::= + - * / Option 1: add some meta-rules, e.g. precedence and associativity rules a + b * c : E ::= E Op E - E ( E ) id Op ::= + - * / = < and or operator precedence associativity prefix - highest right *, / left +, - left =, < none and left or lowest left Susan Eggers 20 CSE 401 Option 2: modify the grammar to explicitly resolve the ambiguity Strategy: create a nonterminal for each precedence level expr is lowest precedence nonterminal each nonterminal can be rewritten with a higher precedence operator highest precedence operator includes terminals at each precedence level, use: left recursion for left-associative operators right recursion for right-associative operators no recursion for non-associative operators Expr Expr0 Expr1 Expr2 Expr3 Expr4 Expr5 Expr6 ::= Expr0 ::= Expr0 or Expr1 Expr1 ::= Expr1 and Expr2 Expr2 ::= Expr3 (= <) Expr3 Expr3 ::= Expr3 (+ -) Expr4 Expr4 ::= Expr4 (* /) Expr5 Expr5 ::= -Expr5 Expr6 ::= id int (Expr0) Susan Eggers 21 CSE 401 Susan Eggers 22 CSE 401 Eliminating common prefixes Eliminating left recursion Can left factor common prefixes to eliminate them create new nonterminal for common prefix and/or different suffixes Before: If ::= if Test then Stmts end if Test then Stmts else Stmts end After: If IfCont ::= if Test then Stmts IfCont ::= end else Stmts end Grammar a bit uglier Easy to do by hand in recursive-descent parser Can rewrite grammar to eliminate left recursion Before: E ::= E + T T T ::= T * F F F ::= id After: E ::= T ECont ECont ::= + T ECont ε T ::= F TCont TCont ::= * F TCont ε F ::= id right-recursive productions Susan Eggers 23 CSE 401 Susan Eggers 24 CSE 401

Susan Eggers 25 CSE 401 Transition Diagrams Table-driven predictive parser Railroad diagrams another more graphical notation for CFGs diagram per nonterminal look like FSAs, where arcs can be labelled with nonterminals as well as terminals if terminal: follow the arc parser gets a new token & compares it with the terminal on the arc if nonterminal: go to new diagram parser calls the procedure for the nonterminal (recursive descent parser) Can automatically convert grammar into parsing table PREDICT(nonterminal, input-sym) production selects the right production to take given a nonterminal to expand and the next token of the input : stmt ::= if expr then stmt else stmt while expr do stmt begin stmts end stmts ::= stmt stmts ε expr ::= id Parsing table if then else while do begin end id stmt 1 2 3 stmts 1 1 1 2 expr 1 Susan Eggers 26 CSE 401 Table-driven predictive parser Stack implementation depends on top of stack & current input token initial stack configuration: Start $ top of stack = input token pop terminal, advance to next token top of stack = nonterminal pop nonterminal pick production from parsing table push production top of stack = input token = $ halt FOLLOW Definition: For all nonterminals B in N, FOLLOW(B) is the set of terminals (or $) that can follow B in a derivation. I.e., FOLLOW(B) = { a in (T u {$) S$ ==>* α B a β for some α, β in (N u T u {$) Computing FOLLOW + Add $ to FOLLOW(S) + Repeat until no change: For all rules A -> α B β (i) add (FIRST(β)-{ε) to FOLLOW(B) (ii) if ε in FIRST(β) [e.g. if β = ε] add FOLLOW(A) to FOLLOW(B) Errors: input token not match terminal on top of stack table entry empty Susan Eggers 27 CSE 401 Susan Eggers 28 CSE 401 Constructing PREDICT table Constructing PREDICT table S ::= if E then S else S while E do S begin Ss end Ss ::= S Ss ε E ::= id FIRST FOLLOW Start ::= S $ S ::= id If If ::= if E then S IfCont IfCont ::= else S ε FIRST Start ::= S $ S ::= id ::= If If ::= if E then S IfCont IfCont ::= else S ::= ε FOLLOW if then else while do begin end id stmt 1 2 3 stmts 1 1 1 2 expr 1 Start S If IfCont id if then else $ Susan Eggers 29 CSE 401 Susan Eggers 30 CSE 401

Susan Eggers 31 CSE 401 Another example PREDICT and LL(1) S ::= E $ E ::= T E E ::= (+ -) T E ε T ::= F T T ::= (* /) F T ε F ::= - F id ( E ) If PREDICT table has at most one entry in each cell, then grammar is LL(1) always exactly one right choice fast to parse and easy to implement LL(1) each column labelled by 1 token S ::= E $ E ::= T E E ::= (+ -) T E ε T ::= F T T ::= (* /) F T ε F ::= - F id ( E ) FIRST (RHS) FOLLOW (X) Can have multiple entries in each cell common prefixes left recursion ambiguity Susan Eggers 32 CSE 401 Recursive descent parsers Write subroutine for each non-terminal each subroutine first selects correct r.h.s. by peeking at input tokens then consume r.h.s. if terminal symbol, verify that it s next & then advance if nonterminal, call corresponding subroutine construct & return AST representing r.h.s. PL/0 parser is recursive descent PL/0 scanner routines: Token* Get() Token* Peek() Token* Read(SYMBOL expected_kind) bool CondRead(SYMBOL expected_kind) ParseExpr => ParseSum => ParseTerm => ParseFactor Sum ::= Term { (+ -) Term Term ::= Factor { (* /) Factor Expr* Parser::ParseSum() { Expr* expr = ParseTerm() for () { Token* t = scanner->peek() if (t->kind() == PLUS t->kind() == MINUS) {scanner->get() else { break Expr* Parser::ParseTerm() { Expr* expr = ParseFactor() for () { Token* t = scanner->peek() if (t->kind() == MUL t->kind() == DIVIDE) {scanner->get() else { break Susan Eggers 33 CSE 401 Susan Eggers 34 CSE 401 If ::= if Expr then Stmt [else Stmt] end IfStmt* Parser::ParseIfStmt() { scanner->read(if) Expr* test = ParseExpr() scanner->read(then) Stmt* then_stmt = ParseStmt() Stmt* else_stmt if (scanner->condread(else)) { else_stmt = ParseStmt() else { else_stmt = NULL scanner->read(semicolon) return new IfStmt(test, then_stmt, else_stmt) Stmt ::= IdStmt IdStmt ::= CallStmt AssignStmt CallStmt ::= IDENT ( Expr ) AssignStmt ::= IDENT := Expr Stmt* Parser::ParseIdentStmt() { Token* t = scanner->read(ident) if (scanner->condread(lparen)) { // call stmt: parse argument list args = ParseExprs() scanner->read(rparen) else { // assign stmt: parse the rest scanner->read(gets) Expr* expr = ParseExpr() Susan Eggers 35 CSE 401 Susan Eggers 36 CSE 401

Susan Eggers 37 CSE 401 Yacc Yacc input grammar yacc: yet another compiler-compiler Input: grammar possibly augmented with action code Output: C functions to parse grammar and perform actions LALR parser generator practical bottom-up parser more powerful than LL(1) used for parser generators yacc++, bison, byacc are modern updates of yacc declaration: %{ #include <stdio.h> % %token INTEGER grammar productions: %% assignstmt: IDENT GETS expr ifstmt: IF test THEN stmts END IF test THEN stmts ELSE stmts END expr: term expr '+' term expr '-' term factor: '-' factor IDENT INTEGER INPUT '(' expr ')' %% Susan Eggers 38 CSE 401 Yacc with semantic actions Error handling grammar productions: assignstmt: IDENT GETS expr { $$ = new AssignStmt($1, $3) ifstmt: IF test THEN stmtlist END { $$ = new IfStmt($2, $4) IF test THEN stmts ELSE stmts END { $$ = new IfElseStmt($2, $4, $6) expr: term { $$ = $1 expr '+' term { $$ = new BinOp(PLUS, $1, $3) expr '-' term { $$ = new BinOp(MINUS, $1, $3) How to handle syntax error: error recovery Option 1: quit compilation PL/0 + easy - inconvenient for programmer Option 2: do more before quit + try to catch as many errors as possible on one compile - avoid streams of spurious errors Option 3: error correction + fix syntax errors as part of compilation - hard! factor: '-' factor { $$ = new UnOp(MINUS, $2) IDENT { $$ = new VarRef($1) INTEGER { $$ = new IntLiteral($1) INPUT { $$ = new InputExpr '(' expr ')' { $$ = $2 Susan Eggers 39 CSE 401 Susan Eggers 40 CSE 401