Example. Programming Languages and Compilers (CS 421) Disambiguating a Grammar. Disambiguating a Grammar. Steps to Grammar Disambiguation

Similar documents
n <exp>::= 0 1 b<exp> <exp>a n <exp>m<exp> 11/7/ /7/17 4 n Read tokens left to right (L) n Create a rightmost derivation (R)

Programming Languages and Compilers (CS 421)

n Parsing function takes a lexing function n Returns semantic attribute of 10/27/16 2 n Contains arbitrary Ocaml code n May be omitted 10/27/16 4

Programming Languages and Compilers (CS 421)

Programming Languages and Compilers (CS 421)

Programming Languages and Compilers (CS 421)

n Takes abstract syntax trees as input n In simple cases could be just strings n One procedure for each syntactic category

MP 5 A Parser for PicoML

A Simple Syntax-Directed Translator

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Programming Languages and Compilers (CS 421)

Programming Languages Third Edition

n (0 1)*1 n a*b(a*) n ((01) (10))* n You tell me n Regular expressions (equivalently, regular 10/20/ /20/16 4

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) Programming Languages & Compilers. Major Phases of a Compiler

Parsing II Top-down parsing. Comp 412

Two Problems. Programming Languages and Compilers (CS 421) Type Inference - Example. Type Inference - Outline. Type Inference - Example

MIDTERM EXAM (Solutions)

CMSC 330: Organization of Programming Languages. Formal Semantics of a Prog. Lang. Specifying Syntax, Semantics

Chapter 4. Lexical and Syntax Analysis

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Programming Languages and Compilers (CS 421)

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) I. Major Phases of a Compiler. Programming Languages & Compilers

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Defining syntax using CFGs

CS 11 Ocaml track: lecture 6

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Lexical and Syntax Analysis. Top-Down Parsing

Chapter 3 (part 3) Describing Syntax and Semantics

A programming language requires two major definitions A simple one pass compiler

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Introduction to Parsing

This book is licensed under a Creative Commons Attribution 3.0 License

CSE 401 Midterm Exam Sample Solution 2/11/15

Topic 3: Syntax Analysis I

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

Programming Language Syntax and Analysis

Chapter 3. Syntax - the form or structure of the expressions, statements, and program units

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

4. Lexical and Syntax Analysis

Contents. Chapter 1 SPECIFYING SYNTAX 1

4. Lexical and Syntax Analysis

EECS 6083 Intro to Parsing Context Free Grammars

COSC252: Programming Languages: Semantic Specification. Jeremy Bolton, PhD Adjunct Professor

LECTURE 17. Expressions and Assignment

CSE 401 Midterm Exam 11/5/10

Chapter 3. Describing Syntax and Semantics ISBN

Chapter 3. Describing Syntax and Semantics

CSCE 314 Programming Languages

R13 SET Discuss how producer-consumer problem and Dining philosopher s problem are solved using concurrency in ADA.

COP 3402 Systems Software Syntax Analysis (Parser)

Syntax-Directed Translation. Lecture 14

Top down vs. bottom up parsing

Chapter 3: Syntax and Semantics. Syntax and Semantics. Syntax Definitions. Matt Evett Dept. Computer Science Eastern Michigan University 1999

CMSC 330: Organization of Programming Languages. Operational Semantics

Lexical and Syntax Analysis (2)

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

CMSC 330: Organization of Programming Languages

Principles of Programming Languages COMP251: Syntax and Grammars

CS 314 Principles of Programming Languages

Defining Languages GMU

CPS 506 Comparative Programming Languages. Syntax Specification

Grammars & Parsing. Lecture 12 CS 2112 Fall 2018

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Intro to semantics; Small-step semantics Lecture 1 Tuesday, January 29, 2013

Lexical and Syntax Analysis

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages

Chapter 3. Describing Syntax and Semantics

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

# true;; - : bool = true. # false;; - : bool = false 9/10/ // = {s (5, "hi", 3.2), c 4, a 1, b 5} 9/10/2017 4

CMSC 330: Organization of Programming Languages

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

Syntax. In Text: Chapter 3

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Chapter 3. Describing Syntax and Semantics ISBN

Languages and Compilers

ECE251 Midterm practice questions, Fall 2010

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

CS 230 Programming Languages

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

Lexical and Syntax Analysis

University of Technology Department of Computer Sciences. Final Examination st Term. Subject:Compilers Design

If you are going to form a group for A2, please do it before tomorrow (Friday) noon GRAMMARS & PARSING. Lecture 8 CS2110 Spring 2014

Goals: Define the syntax of a simple imperative language Define a semantics using natural deduction 1

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

4. LEXICAL AND SYNTAX ANALYSIS

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

Programming Languages and Compilers (CS 421)

n What is its running time? 9/18/17 2 n poor_rev [1,2,3] = n (poor_rev [1] = n ((poor_rev [1] =

Note that in this definition, n + m denotes the syntactic expression with three symbols n, +, and m, not to the number that is the sum of n and m.

7. Introduction to Denotational Semantics. Oscar Nierstrasz

Compiler Design Concepts. Syntax Analysis

Transcription:

Programming Languages and Compilers (CS 421) Sasa Misailovic 4110 SC, UIUC https://courses.engr.illinois.edu/cs421/fa2017/cs421a Based in part on slides by Mattox Beckman, as updated by Vikram Adve, Gul Agha, and Elsa L Gunter Ambiguous grammar: <exp> ::= 0 1 <exp> + <exp> <exp> * <exp> String with more then one parse: 0 + 1 + 0 1 * 1 + 1 Source of ambiuity: associativity and precedence 11/7/2017 1 11/7/2017 2 Disambiguating a Grammar Given ambiguous grammar G, with start symbol S, find a grammar G with same start symbol, such that language of G = language of G Not always possible No algorithm in general Disambiguating a Grammar Idea: Each non-terminal represents all strings having some property Identify these properties (often in terms of things that can t happen) Use these properties to inductively guarantee every string in language has a unique parse 11/7/2017 3 11/7/2017 4 Steps to Grammar Disambiguation Identify the rules and a smallest use that display ambiguity Decide which parse to keep; why should others be thrown out? What syntactic restrictions on subexpressions are needed to throw out the bad (while keeping the good)? Add a new non-terminal and rules to describe this set of restricted subexpressions (called stratifying, or refactoring) Replace old rules to use new non-terminals Rinse and repeat Two Major Sources of Ambiguity Lack of determination of operator precedence Lack of determination of operator assoicativity Not the only sources of ambiguity 11/7/2017 5 10/4/07 6 1

How to Enforce Associativity Have at most one recursive call per production When two or more recursive calls would be natural leave right-most one for right assoicativity, left-most one for left assoiciativity ::= 0 1 + () Becomes ::= <Num> <Num> + <Num> ::= 0 1 () 10/4/07 7 10/4/07 8 Operator Precedence Operators of highest precedence evaluated first (bind more tightly). Precedence for infix binary operators given in following table Needs to be reflected in grammar Predence in Grammar Higher precedence translates to longer derivation chain : <exp> ::= 0 1 <exp> + <exp> <exp> * <exp> Becomes <exp> ::= <mult_exp> <exp> + <mult_exp> <mult_exp> ::= <id> <mult_exp> * <id> <id> ::= 0 1 10/4/07 9 10/4/07 10 Disambiguating a Grammar Disambiguating a Grammar <exp>::= 0 1 b<exp> <exp>a <exp>m<exp> Want a has higher precedence than b, which in turn has higher precedence than m, and such that m associates to the left. <exp>::= 0 1 b<exp> <exp>a <exp>m<exp> Want a has higher precedence than b, which in turn has higher precedence than m, and such that m associates to the left. <exp> ::= <exp> m <not m> <not m> <not m> ::= b <not m> <not b m> <not b m> ::= <not b m>a 0 1 11/7/2017 11 11/7/2017 12 2

Disambiguating a Grammar Take 2 <exp>::= 0 1 b<exp> <exp>a <exp>m<exp> Want b has higher precedence than m, which in turn has higher precedence than a, and such that m associates to the right. Disambiguating a Grammar Take 2 <exp>::= 0 1 b<exp> <exp>a <exp>m<exp> Want b has higher precedence than m, which in turn has higher precedence than a, and such that m associates to the right. <exp> ::= <no a m> <no m> m <no a> <exp> a <no a> ::= <no a m> <no a m> m <no a> <no m> ::= <no a m> <exp> a <no a m> ::= b <no a m> 0 1 11/7/2017 13 11/7/2017 14 LR Parsing Read tokens left to right (L) Create a rightmost derivation (R) How is this possible? Start at the bottom (left) and work your way up Last step has only one non-terminal to be replaced so is right-most Working backwards, replace mixed strings by non-terminals Always proceed so that there are no nonterminals to the right of the string to be replaced LR Parsing Tables Build a pair of tables, Action and Goto, from the grammar This is the hardest part, we omit here Rows labeled by states For Action, columns labeled by terminals and end-of-tokens marker (more generally strings of terminals of fixed length) For Goto, columns labeled by nonterminals 11/7/2017 15 11/7/2017 16 Action and Goto Tables Given a state and the next input, Action table says either shift and go to state n, or reduce by production k (explained in a bit) accept or error Given a state and a non-terminal, Goto table says go to state m LR(i) Parsing Algorithm Based on push-down automata Uses states and transitions (as recorded in Action and Goto tables) Uses a stack containing states, terminals and non-terminals 11/7/2017 17 11/7/2017 18 3

LR(i) Parsing Algorithm 0. Insure token stream ends in special endof-tokens symbol 1. Start in state 1 with an empty stack 2. Push state(1) onto stack 3. Look at next i tokens from token stream (toks) (don t remove yet) 4. If top symbol on stack is state(n), look up action in Action table at (n, toks) LR(i) Parsing Algorithm 5. If action = shift m, a) Remove the top token from token stream and push it onto the stack b) Push state(m) onto stack c) Go to step 3 11/7/2017 19 11/7/2017 20 LR(i) Parsing Algorithm 6. If action = reduce k where production k is E ::= u a) Remove 2 * length(u) symbols from stack (u and all the interleaved states) b) If new top symbol on stack is state(m), look up new state p in Goto(m,E) c) Push E onto the stack, then push state(p) onto the stack d) Go to step 3 LR(i) Parsing Algorithm 7. If action = accept Stop parsing, return success 8. If action = error, Stop parsing, return failure 11/7/2017 21 11/7/2017 22 : = 0 1 () + => = shift LR(i) Parsing Algorithm 0. Insure token stream ends in special endof-tokens symbol 1. Start in state 1 with an empty stack 2. Push state(1) onto stack 3. Look at next i tokens from token stream (toks) (don t remove yet) 4. If top symbol on stack is state(n), look up action in Action table at (n, toks) 11/7/2017 23 11/7/2017 24 4

: = 0 1 () + => LR(i) Parsing Algorithm 5. If action = shift m, a) Remove the top token from token stream and push it onto the stack b) Push state(m) onto stack c) Go to step 3 = shift 11/7/2017 25 11/7/2017 26 : = 0 1 () + => : = 0 1 () + => = shift => ( 0 + 1 ) + 0 reduce = shift 11/7/2017 27 11/7/2017 28 LR(i) Parsing Algorithm 6. If action = reduce k where production k is E ::= u a) Remove 2 * length(u) symbols from stack (u and all the interleaved states) b) If new top symbol on stack is state(m), look up new state p in Goto(m,E) c) Push E onto the stack, then push state(p) onto the stack d) Go to step 3 : = 0 1 () + => = ( + 1 ) + 0 shift => ( 0 + 1 ) + 0 reduce = shift 11/7/2017 29 11/7/2017 30 5

: = 0 1 () + => : = 0 1 () + => = ( + 1 ) + 0 shift = ( + 1 ) + 0 shift => ( 0 + 1 ) + 0 reduce = shift => ( + 1 ) + 0 reduce = ( + 1 ) + 0 shift = ( + 1 ) + 0 shift => ( 0 + 1 ) + 0 reduce = shift 11/7/2017 31 11/7/2017 32 : = 0 1 () + => => ( + ) + 0 reduce => ( + 1 ) + 0 reduce = ( + 1 ) + 0 shift = ( + 1 ) + 0 shift => ( 0 + 1 ) + 0 reduce = shift LR(i) Parsing Algorithm 6. If action = reduce k where production k is E ::= u a) Remove 2 * length(u) symbols from stack (u and all the interleaved states) b) If new top symbol on stack is state(m), look up new state p in Goto(m,E) c) Push E onto the stack, then push state(p) onto the stack d) Go to step 3 11/7/2017 33 11/7/2017 34 : = 0 1 () + => : = 0 1 () + => = ( ) + 0 shift => ( + ) + 0 reduce => ( + 1 ) + 0 reduce = ( + 1 ) + 0 shift = ( + 1 ) + 0 shift => ( 0 + 1 ) + 0 reduce = shift => ( ) + 0 reduce = ( ) + 0 shift => ( + ) + 0 reduce => ( + 1 ) + 0 reduce = ( + 1 ) + 0 shift = ( + 1 ) + 0 shift => ( 0 + 1 ) + 0 reduce = shift 11/7/2017 35 11/7/2017 36 6

: = 0 1 () + => = + 0 shift => ( ) + 0 reduce = ( ) + 0 shift => ( + ) + 0 reduce => ( + 1 ) + 0 reduce = ( + 1 ) + 0 shift = ( + 1 ) + 0 shift => ( 0 + 1 ) + 0 reduce = shift : = 0 1 () + => = + 0 shift = + 0 shift => ( ) + 0 reduce = ( ) + 0 shift => ( + ) + 0 reduce => ( + 1 ) + 0 reduce = ( + 1 ) + 0 shift = ( + 1 ) + 0 shift => ( 0 + 1 ) + 0 reduce = shift 11/7/2017 37 11/7/2017 38 : = 0 1 () + => => + 0 reduce = + 0 shift = + 0 shift => ( ) + 0 reduce = ( ) + 0 shift => ( + ) + 0 reduce => ( + 1 ) + 0 reduce = ( + 1 ) + 0 shift = ( + 1 ) + 0 shift => ( 0 + 1 ) + 0 reduce = shift : = 0 1 () + => + <Sum > reduce => + 0 reduce = + 0 shift = + 0 shift => ( ) + 0 reduce = ( ) + 0 shift => ( + ) + 0 reduce => ( + 1 ) + 0 reduce = ( + 1 ) + 0 shift = ( + 1 ) + 0 shift => ( 0 + 1 ) + 0 reduce = shift 11/7/2017 39 11/7/2017 40 : = 0 1 () + => + <Sum > reduce => + 0 reduce = + 0 shift = + 0 shift => ( ) + 0 reduce = ( ) + 0 shift => ( + ) + 0 reduce => ( + 1 ) + 0 reduce = ( + 1 ) + 0 shift = ( + 1 ) + 0 shift => ( 0 + 1 ) + 0 reduce = shift LR(i) Parsing Algorithm 7. If action = accept Stop parsing, return success 8. If action = error, Stop parsing, return failure 11/7/2017 41 11/7/2017 42 7

11/7/2017 43 11/7/2017 44 11/7/2017 45 11/7/2017 46 11/7/2017 47 11/7/2017 48 8

11/7/2017 49 11/7/2017 50 11/7/2017 51 11/7/2017 52 11/7/2017 53 11/7/2017 54 9

11/7/2017 55 11/7/2017 56 Shift-Reduce Conflicts Problem: can t decide whether the action for a state and input character should be shift or reduce Caused by ambiguity in grammar Usually caused by lack of associativity or precedence information in grammar 11/7/2017 57 11/7/2017 58 : = 0 1 () + 0 + 1 + 0 shift -> 0 + 1 + 0 reduce -> + 1 + 0 shift -> + 1 + 0 shift -> + 1 + 0 reduce -> + + 0 - cont Problem: shift or reduce? You can shift-shift-reduce-reduce or reduce-shift-shift-reduce Shift first - right associative Reduce first- left associative 11/7/2017 59 11/7/2017 60 10

Reduce - Reduce Conflicts Problem: can t decide between two different rules to reduce by Again caused by ambiguity in grammar Symptom: RHS of one production suffix of another Requires examining grammar and rewriting it Harder to solve than shift-reduce errors S ::= A ab A ::= abc B ::= bc abc shift a bc shift ab c shift abc Problem: reduce by B ::= bc then by ::= ab, or by A::= abc then S::A? S 11/7/2017 61 11/7/2017 62 Recursive Descent Parsing Recursive descent parsers are a class of parsers derived fairly directly from BNF grammars A recursive descent parser traces out a parse tree in top-down order, corresponding to a left-most derivation (LL - left-to-right scanning, leftmost derivation) Recursive Descent Parsing Each nonterminal in the grammar has a subprogram associated with it; the subprogram parses all phrases that the nonterminal can generate Each nonterminal in right-hand side of a rule corresponds to a recursive call to the associated subprogram 11/7/2017 63 11/7/2017 64 Recursive Descent Parsing Each subprogram must be able to decide how to begin parsing by looking at the leftmost character in the string to be parsed May do so directly, or indirectly by calling another parsing subprogram Recursive descent parsers, like other topdown parsers, cannot be built from leftrecursive grammars Sometimes can modify grammar to suit Sample Grammar <expr> ::= <term> <term> + <expr> <term> - <expr> <term> ::= <factor> <factor> * <term> <factor> / <term> <factor> ::= <id> ( <expr> ) 11/7/2017 65 11/7/2017 66 11

Tokens as OCaml Types + - * / ( ) <id> Becomes an OCaml datatype type token = Id_token of string Left_parenthesis Right_parenthesis Times_token Divide_token Plus_token Minus_token Parse Trees as Datatypes <expr> ::= <term> <term> + <expr> <term> - <expr> type expr = Term_as_Expr of term Plus_Expr of (term * expr) Minus_Expr of (term * expr) 11/7/2017 67 11/7/2017 68 Parse Trees as Datatypes <term> ::= <factor> <factor> * <term> <factor> / <term> and term = Factor_as_Term of factor Mult_Term of (factor * term) Div_Term of (factor * term) Parse Trees as Datatypes <factor> ::= <id> ( <expr> ) and factor = Id_as_Factor of string Parenthesized_Expr_as_Factor of expr 11/7/2017 69 11/7/2017 70 Parsing Lists of Tokens Parsing an Expression Will create three mutually recursive functions: expr : token list -> (expr * token list) term : token list -> (term * token list) factor : token list -> (factor * token list) Each parses what it can and gives back parse and remaining tokens <expr> ::= <term> [( + - ) <expr> ] let rec expr tokens = (match term tokens with ( term_parse, tokens_after_term) -> (match tokens_after_term with( Plus_token :: tokens_after_plus) -> 11/7/2017 71 11/7/2017 72 12

Parsing an Expression Parsing a Plus Expression <expr> ::= <term> [( + - ) <expr> ] let rec expr tokens = (match term tokens with ( term_parse, tokens_after_term) -> (match tokens_after_term with ( Plus_token :: tokens_after_plus) -> <expr> ::= <term> [( + - ) <expr> ] let rec expr tokens = (match term tokens with ( term_parse, tokens_after_term) -> (match tokens_after_term with ( Plus_token :: tokens_after_plus) -> 11/7/2017 73 11/7/2017 74 Parsing a Plus Expression Parsing a Plus Expression <expr> ::= <term> [( + - ) <expr> ] let rec expr tokens = (match term tokens with ( term_parse, tokens_after_term) -> (match tokens_after_term with ( Plus_token :: tokens_after_plus) -> <expr> ::= <term> [( + - ) <expr> ] let rec expr tokens = (match term tokens with ( term_parse, tokens_after_term) -> (match tokens_after_term with ( Plus_token :: tokens_after_plus) -> 11/7/2017 75 11/7/2017 76 Parsing a Plus Expression Parsing a Plus Expression <expr> ::= <term> + <expr> <expr> ::= <term> + <expr> (match expr tokens_after_plus with ( expr_parse, tokens_after_expr) -> ( Plus_Expr ( term_parse, expr_parse ), tokens_after_expr)) (match expr tokens_after_plus with ( expr_parse, tokens_after_expr) -> ( Plus_Expr ( term_parse, expr_parse ), tokens_after_expr)) 11/7/2017 77 11/7/2017 78 13

Building Plus Expression Parse Tree Parsing a Minus Expression <expr> ::= <term> + <expr> <expr> ::= <term> - <expr> (match expr tokens_after_plus with ( expr_parse, tokens_after_expr) -> ( Plus_Expr ( term_parse, expr_parse ), tokens_after_expr)) ( Minus_token :: tokens_after_minus) -> (match expr tokens_after_minus with ( expr_parse, tokens_after_expr) -> ( Minus_Expr ( term_parse, expr_parse ), tokens_after_expr)) 11/7/2017 79 11/7/2017 80 Parsing a Minus Expression Parsing an Expression as a Term <expr> ::= <term> - <expr> ( Minus_token :: tokens_after_minus) -> (match expr tokens_after_minus with ( expr_parse, tokens_after_expr) -> ( Minus_Expr ( term_parse, expr_parse ), tokens_after_expr)) 11/7/2017 81 <expr> ::= <term> _ -> (Term_as_Expr term_parse, tokens_after_term))) Code for term is same except for replacing addition with multiplication and subtraction with division 11/7/2017 82 Parsing Factor as Id <factor> ::= <id> and factor tokens = (match tokens with (Id_token id_name :: tokens_after_id) = ( Id_as_Factor id_name, tokens_after_id) Parsing Factor as Parenthesized Expression <factor> ::= ( <expr> ) factor ( Left_parenthesis :: tokens) = (match expr tokens with ( expr_parse, tokens_after_expr) -> 11/7/2017 83 11/7/2017 84 14

Parsing Factor as Parenthesized Expression Error Cases <factor> ::= ( <expr> ) (match tokens_after_expr with Right_parenthesis :: tokens_after_rparen -> ( Parenthesized_Expr_as_Factor expr_parse, tokens_after_rparen) What if no matching right parenthesis? _ -> raise (Failure "No matching rparen") )) What if no leading id or left parenthesis? _ -> raise (Failure "No id or lparen" ));; 11/7/2017 11/7/2017 86 ( a + b ) * c - d expr [Left_parenthesis; Id_token "a ; Plus_token; Id_token "b ; Right_parenthesis; Times_token; Id_token "c ; Minus_token; Id_token "d"];; 11/7/2017 87 ( a + b ) * c - d - : expr * token list = (Minus_Expr (Mult_Term (Parenthesized_Expr_as_Factor (Plus_Expr (Factor_as_Term (Id_as_Factor "a"), Term_as_Expr (Factor_as_Term (Id_as_Factor "b")))), Factor_as_Term (Id_as_Factor "c")), Term_as_Expr (Factor_as_Term (Id_as_Factor "d"))), []) 11/7/2017 88 ( a + b ) * c d <expr> <term> - <expr> <factor> * <term> <term> ( <expr> ) <factor> <factor> <term> + <expr> <id> <id> <factor> <term> c d <id> <factor> a <id> b 11/7/2017 89 a + b * c d # expr [Id_token "a ; Plus_token; Id_token "b ; Times_token; Id_token "c ; Minus_token; Id_token "d"];; - : expr * token list = (Plus_Expr (Factor_as_Term (Id_as_Factor "a"), Minus_Expr (Mult_Term (Id_as_Factor "b", Factor_as_Term (Id_as_Factor "c")), Term_as_Expr (Factor_as_Term (Id_as_Factor "d")))), []) 11/7/2017 90 15

a + b * c d <expr> <term> + <expr> <factor> < term> - <expr> <id> <factor> * <term> <term> a <id> <factor> <factor> b <id> <id> c d ( a + b * c - d # expr [Left_parenthesis; Id_token "a ; Plus_token; Id_token "b ; Times_token; Id_token "c ; Minus_token; Id_token "d"];; Exception: Failure "No matching rparen". Can t parse because it was expecting a right parenthesis but it got to the end without finding one 11/7/2017 91 11/7/2017 92 a + b ) * c d ( expr [Id_token "a ; Plus_token; Id_token "b ; Right_parenthesis; Times_token; Id_token "c ; Minus_token; Id_token "d ; Left_parenthesis];; - : expr * token list = (Plus_Expr (Factor_as_Term (Id_as_Factor "a"), Term_as_Expr (Factor_as_Term (Id_as_Factor "b"))), [Right_parenthesis; Times_token; Id_token "c"; Minus_token; Id_token "d"; Left_parenthesis]) Parsing Whole String Q: How to guarantee whole string parses? A: Check returned tokens empty let parse tokens = match expr tokens with (expr_parse, []) -> expr_parse _ -> raise (Failure No parse");; Fixes <expr> as start symbol 11/7/2017 93 11/7/2017 94 Streams in Place of Lists More realistically, we don't want to create the entire list of tokens before we can start parsing We want to generate one token at a time and use it to make one step in parsing Can use (token * (unit -> token)) or (token * (unit -> token option)) in place of token list Problems for Recursive-Descent Parsing Left Recursion: A ::= Aw translates to a subroutine that loops forever Indirect Left Recursion: A ::= Bw B ::= Av causes the same problem 11/7/2017 95 11/7/2017 96 16

Problems for Recursive-Descent Parsing Parser must always be able to choose the next action based only only the very next token Pairwise Disjointedness Test: Can we always determine which rule (in the non-extended BNF) to choose based on just the first token Pairwise Disjointedness Test For each rule A ::= y Calculate FIRST (y) = {a y =>* aw} { if y =>* } For each pair of rules A ::= y and A ::= z, require FIRST(y) FIRST(z) = { } 11/7/2017 97 11/7/2017 98 Grammar: <S> ::= <A> a <B> b <A> ::= <A> b b <B> ::= a <B> a FIRST (<A> b) = {b} FIRST (b) = {b} Rules for <A> not pairwise disjoint Eliminating Left Recursion Rewrite grammar to shift left recursion to right recursion Changes associativity Given <expr> ::= <expr> + <term> and <expr> ::= <term> Add new non-terminal <e> and replace above rules with <expr> ::= <term><e> <e> ::= + <term><e> 11/7/2017 99 11/7/2017 100 Factoring Grammar Test too strong: Can t handle <expr> ::= <term> [ ( + - ) <expr> ] Answer: Add new non-terminal and replace above rules by <expr> ::= <term><e> <e> ::= + <term><e> <e> ::= - <term><e> <e> ::= You are delaying the decision point Both <A> and <B> have problems: <S> ::= <A> a <B> b <A> ::= <A> b b <B> ::= a <B> a Transform grammar to: <S> ::= <A> a <B> b <A> ::-= b<a1> <A1> :: b<a1> <B> ::= a<b1> <B1> ::= a<b1> 11/7/2017 101 11/7/2017 102 17

Semantics Expresses the meaning of syntax Static semantics Meaning based only on the form of the expression without executing it Usually restricted to type checking / type inference Dynamic semantics Method of describing meaning of executing a program Several different types: Operational Semantics Axiomatic Semantics Denotational Semantics 11/7/2017 103 11/7/2017 104 Dynamic Semantics Different languages better suited to different types of semantics Different types of semantics serve different purposes Operational Semantics Start with a simple notion of machine Describe how to execute (implement) programs of language on virtual machine, by describing how to execute each program statement (ie, following the structure of the program) Meaning of program is how its execution changes the state of the machine Useful as basis for implementations 11/7/2017 105 11/7/2017 106 Axiomatic Semantics Also called Floyd-Hoare Logic Based on formal logic (first order predicate calculus) Axiomatic Semantics is a logical system built from axioms and inference rules Mainly suited to simple imperative programming languages Axiomatic Semantics Used to formally prove a property (post-condition) of the state (the values of the program variables) after the execution of program, assuming another property (pre-condition) of the state before execution Written : {Precondition} Program {Postcondition} Source of idea of loop invariant 11/7/2017 107 11/7/2017 108 18

Denotational Semantics Construct a function M assigning a mathematical meaning to each program construct Lambda calculus often used as the range of the meaning function Meaning function is compositional: meaning of construct built from meaning of parts Useful for proving properties of programs Natural Semantics Aka Structural Operational Semantics, aka Big Step Semantics Provide value for a program by rules and derivations, similar to type derivations Rule conclusions look like (C, m) m or (E, m) v 11/7/2017 109 11/7/2017 110 Simple Imperative Programming Language I Identifiers N Numerals B ::= true false B & B B or B not B E < E E = E E::= N I E + E E * E E - E - E C::= skip C;C I ::= E if B then C else C fi while B do C od Natural Semantics of Atomic Expressions Identifiers: (I,m) m(i) Numerals are values: (N,m) N Booleans: (true,m) true (false,m) false 11/7/2017 111 11/7/2017 112 Booleans: Relations (B, m) false (B, m) true (B, m) b (B & B, m) false (B & B, m) b (B, m) true (B, m) false (B, m) b (B or B, m) true (B or B, m) b (B, m) true (B, m) false (not B, m) false (not B, m) true (E, m) U (E, m) V U ~ V = b (E ~ E, m) b By U ~ V = b, we mean does (the meaning of) the relation ~ hold on the meaning of U and V May be specified by a mathematical expression/equation or rules matching U and V 11/7/2017 113 11/7/2017 114 19

Arithmetic Expressions Commands (E, m) U (E, m) V U op V = N (E op E, m) N where N is the specified value for U op V Skip: Assignment: (skip, m) m (E,m) V (I::=E,m) m[i <-- V ] Sequencing: (C,m) m (C,m ) m (C;C, m) m 11/7/2017 115 11/7/2017 116 If Then Else Command While Command (B,m) true (C,m) m (if B then C else C fi, m) m (B,m) false (C,m) m (if B then C else C fi, m) m (B,m) false (while B do C od, m) m (B,m) true (C,m) m (while B do C od, m ) m (while B do C od, m) m 11/7/2017 117 11/7/2017 118 : If Then Else Rule : If Then Else Rule (2,{x->7}) 2 (3,{x->7}) 3 (2+3, {x->7}) 5 (x,{x->7}) 7 (5,{x->7}) 5 (y:= 2 + 3, {x-> 7} (x > 5, {x -> 7}) true {x- >7, y->5} (if x > 5 then y:= 2 + 3 else y:=3 + 4 fi, {x -> 7})? (2,{x->7}) 2 (3,{x->7}) 3 (2+3, {x->7}) 5 (x,{x->7}) 7 (5,{x->7}) 5 (y:= 2 + 3, {x-> 7} (x > 5, {x -> 7})? {x- >7, y->5} (if x > 5 then y:= 2 + 3 else y:=3 + 4 fi, {x -> 7})? {x->7, y->5} 11/7/2017 119 11/7/2017 120 20

: Arith Relation : Identifier(s) (2,{x->7}) 2 (3,{x->7}) 3? >? =? (2+3, {x->7}) 5 (x,{x->7})? (5,{x->7})? (y:= 2 + 3, {x-> 7} (x > 5, {x -> 7})? {x- >7, y->5} (if x > 5 then y:= 2 + 3 else y:=3 + 4 fi, {x -> 7})? {x->7, y->5} (2,{x->7}) 2 (3,{x->7}) 3 7 > 5 = true (2+3, {x->7}) 5 (x,{x->7}) 7 (5,{x->7}) 5 (y:= 2 + 3, {x-> 7} (x > 5, {x -> 7})? {x- >7, y->5} (if x > 5 then y:= 2 + 3 else y:=3 + 4 fi, {x -> 7})? {x->7, y->5} 11/7/2017 121 11/7/2017 122 : Arith Relation : If Then Else Rule (2,{x->7}) 2 (3,{x->7}) 3 7 > 5 = true (2+3, {x->7}) 5 (x,{x->7}) 7 (5,{x->7}) 5 (y:= 2 + 3, {x-> 7} (x > 5, {x -> 7}) true {x- >7, y->5} (if x > 5 then y:= 2 + 3 else y:=3 + 4 fi, {x -> 7})? {x->7, y->5} (2,{x->7}) 2 (3,{x->7}) 3 7 > 5 = true (2+3, {x->7}) 5 (x,{x->7}) 7 (5,{x->7}) 5 (y:= 2 + 3, {x-> 7} (x > 5, {x -> 7}) true?. (if x > 5 then y:= 2 + 3 else y:=3 + 4 fi, {x -> 7})? {x->7, y->5} 11/7/2017 123 11/7/2017 124 : Assignment : Arith Op (2,{x->7}) 2 (3,{x->7}) 3 7 > 5 = true (2+3, {x->7})? (x,{x->7}) 7 (5,{x->7}) 5 (y:= 2 + 3, {x-> 7} (x > 5, {x -> 7}) true? {x- >7, y->5} (if x > 5 then y:= 2 + 3 else y:=3 + 4 fi, {x -> 7})? {x->7, y->5}? +? =? (2,{x->7})? (3,{x->7})? 7 > 5 = true (2+3, {x->7})? (x,{x->7}) 7 (5,{x->7}) 5 (y:= 2 + 3, {x-> 7} (x > 5, {x -> 7}) true?. (if x > 5 then y:= 2 + 3 else y:=3 + 4 fi, {x -> 7})? {x->7, y->5} 11/7/2017 125 11/7/2017 126 21

: Numerals : Arith Op 2 + 3 = 5 (2,{x->7}) 2 (3,{x->7}) 3 7 > 5 = true (2+3, {x->7})? (x,{x->7}) 7 (5,{x->7}) 5 (y:= 2 + 3, {x-> 7} (x > 5, {x -> 7}) true?{x->7, y->5} (if x > 5 then y:= 2 + 3 else y:=3 + 4 fi, {x -> 7})? {x->7, y->5} 2 + 3 = 5 (2,{x->7}) 2 (3,{x->7}) 3 7 > 5 = true (2+3, {x->7}) 5 (x,{x->7}) 7 (5,{x->7}) 5 (y:= 2 + 3, {x-> 7} (x > 5, {x -> 7}) true? {x->7, y->5} (if x > 5 then y:= 2 + 3 else y:=3 + 4 fi, {x -> 7})? {x->7, y->5} 11/7/2017 127 11/7/2017 128 : Assignment : If Then Else Rule 2 + 3 = 5 (2,{x->7}) 2 (3,{x->7}) 3 7 > 5 = true (2+3, {x->7}) 5 (x,{x->7}) 7 (5,{x->7}) 5 (y:= 2 + 3, {x-> 7} (x > 5, {x -> 7}) true {x->7, y->5} (if x > 5 then y:= 2 + 3 else y:=3 + 4 fi, {x -> 7})? {x->7, y->5} 2 + 3 = 5 (2,{x->7}) 2 (3,{x->7}) 3 7 > 5 = true (2+3, {x->7}) 5 (x,{x->7}) 7 (5,{x->7}) 5 (y:= 2 + 3, {x-> 7} (x > 5, {x -> 7}) true {x->7, y->5} (if x > 5 then y:= 2 + 3 else y:=3 + 4 fi, {x -> 7}) {x->7, y->5} 11/7/2017 129 11/7/2017 130 Let in Command (E,m) v (C,m[I<-v]) m (let I = E in C, m) m Where m (y) = m (y) for y I and m (I) = m (I) if m(i) is defined, and m (I) is undefined otherwise (x,{x->5}) 5 (3,{x->5}) 3 (x+3,{x->5}) 8 (5,{x->17}) 5 (x:=x+3,{x->5}) {x->8} (let x = 5 in (x:=x+3), {x -> 17})? 11/7/2017 131 11/7/2017 132 22

(x,{x->5}) 5 (3,{x->5}) 3 (x+3,{x->5}) 8 (5,{x->17}) 5 (x:=x+3,{x->5}) {x->8} (let x = 5 in (x:=x+3), {x -> 17}) {x->17} Comment Simple Imperative Programming Language introduces variables implicitly through assignment The let-in command introduces scoped variables explictly Clash of constructs apparent in awkward semantics 11/7/2017 133 11/7/2017 134 Interpretation Versus Compilation A compiler from language L1 to language L2 is a program that takes an L1 program and for each piece of code in L1 generates a piece of code in L2 of same meaning An interpreter of L1 in L2 is an L2 program that executes the meaning of a given L1 program Compiler would examine the body of a loop once; an interpreter would examine it every time the loop was executed Interpreter An Interprete represents the operational semantics of a language L1 (source language) in the language of implementation L2 (target language) Built incrementally Start with literals Variables Primitive operations Evaluation of expressions Evaluation of commands/declarations 11/7/2017 135 11/7/2017 136 Interpreter Takes abstract syntax trees as input In simple cases could be just strings One procedure for each syntactic category (nonterminal) eg one for expressions, another for commands If Natural semantics used, tells how to compute final value from code If Transition semantics used, tells how to compute next state To get final value, put in a loop Natural Semantics compute_exp (Var(v), m) = look_up v m compute_exp (Int(n), _) = Num (n) compute_com(ifexp(b,c1,c2),m) = if compute_exp (b,m) = Bool(true) then compute_com (c1,m) else compute_com (c2,m) 11/7/2017 137 11/7/2017 138 23

Natural Semantics compute_com(while(b,c), m) = if compute_exp (b,m) = Bool(false) then m else compute_com (While(b,c), compute_com(c,m)) May fail to terminate - exceed stack limits Returns no useful information then 11/7/2017 139 24