n Parsing function takes a lexing function n Returns semantic attribute of 10/27/16 2 n Contains arbitrary Ocaml code n May be omitted 10/27/16 4

Similar documents
n <exp>::= 0 1 b<exp> <exp>a n <exp>m<exp> 11/7/ /7/17 4 n Read tokens left to right (L) n Create a rightmost derivation (R)

Programming Languages and Compilers (CS 421)

Programming Languages and Compilers (CS 421)

Programming Languages and Compilers (CS 421)

Example. Programming Languages and Compilers (CS 421) Disambiguating a Grammar. Disambiguating a Grammar. Steps to Grammar Disambiguation

Programming Languages and Compilers (CS 421)

n (0 1)*1 n a*b(a*) n ((01) (10))* n You tell me n Regular expressions (equivalently, regular 10/20/ /20/16 4

MP 5 A Parser for PicoML

Programming Languages and Compilers (CS 421)

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) I. Major Phases of a Compiler. Programming Languages & Compilers

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) Programming Languages & Compilers. Major Phases of a Compiler

CS 11 Ocaml track: lecture 6

Chapter 4. Lexical and Syntax Analysis

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Lexical and Syntax Analysis (2)

Programming Language Syntax and Analysis

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

CS 4120 Introduction to Compilers

Two Problems. Programming Languages and Compilers (CS 421) Type Inference - Example. Type Inference - Outline. Type Inference - Example

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

4. Lexical and Syntax Analysis

4. Lexical and Syntax Analysis

Context-free grammars

Outline CS412/413. Administrivia. Review. Grammars. Left vs. Right Recursion. More tips forll(1) grammars Bottom-up parsing LR(0) parser construction

Compiler Construction: Parsing

Top down vs. bottom up parsing

CS 230 Programming Languages

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

4. LEXICAL AND SYNTAX ANALYSIS

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Lexical and Syntax Analysis

CSE 401 Midterm Exam 11/5/10

Bottom-Up Parsing. Lecture 11-12

Topic 3: Syntax Analysis I

University of Technology Department of Computer Sciences. Final Examination st Term. Subject:Compilers Design

Bottom-Up Parsing. Lecture 11-12

CSE 401 Midterm Exam Sample Solution 2/11/15

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Syntax-Directed Translation. Lecture 14

CMSC 330: Organization of Programming Languages

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

CMSC 330: Organization of Programming Languages

CSCI312 Principles of Programming Languages

Compilation 2013 Parser Generators, Conflict Management, and ML-Yacc

Syntax. In Text: Chapter 3

UNIT III & IV. Bottom up parsing

Syntax Analysis Part I

CMSC 330: Organization of Programming Languages

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Parsing II Top-down parsing. Comp 412

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Review main idea syntax-directed evaluation and translation. Recall syntax-directed interpretation in recursive descent parsers

Chapter 3. Parsing #1

2068 (I) Attempt all questions.

CS 314 Principles of Programming Languages

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

COP 3402 Systems Software Syntax Analysis (Parser)

Compilers. Bottom-up Parsing. (original slides by Sam

Chapter 3. Describing Syntax and Semantics ISBN

Defining syntax using CFGs

1. Explain the input buffer scheme for scanning the source program. How the use of sentinels can improve its performance? Describe in detail.

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

CMSC 330: Organization of Programming Languages. Context Free Grammars

A parser is some system capable of constructing the derivation of any sentence in some language L(G) based on a grammar G.

Introduction to Lexing and Parsing

A programming language requires two major definitions A simple one pass compiler

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

QUESTIONS RELATED TO UNIT I, II And III

LR Parsing LALR Parser Generators

CS 536 Midterm Exam Spring 2013

CMSC 330: Organization of Programming Languages. Context Free Grammars

Introduction to Bottom-Up Parsing

Introduction. Introduction. Introduction. Lexical Analysis. Lexical Analysis 4/2/2019. Chapter 4. Lexical and Syntax Analysis.

Syntax Analysis, III Comp 412

Recursive Descent Parsers

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

A clarification on terminology: Recognizer: accepts or rejects strings in a language. Parser: recognizes and generates parse trees (imminent topic)

Building Compilers with Phoenix

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing.

Concepts Introduced in Chapter 4

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

ASTs, Objective CAML, and Ocamlyacc

Parsing Algorithms. Parsing: continued. Top Down Parsing. Predictive Parser. David Notkin Autumn 2008

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

MP 3 A Lexer for MiniJava

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

LECTURE 7. Lex and Intro to Parsing

A Simple Syntax-Directed Translator

LR Parsing LALR Parser Generators

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Transcription:

Parser Code Programming Languages and Compilers (CS 421) Elsa L Gunter 2112 SC, UIUC http://courses.engr.illinois.edu/cs421 Based in part on slides by Mattox Beckman, as updated by Vikram Adve and Gul Agha n <grammar>.ml defines one parsing function per entry point n Parsing function takes a lexing function (lexer buffer to token) and a lexer buffer as arguments n Returns semantic attribute of corresponding entry point 10/27/16 1 10/27/16 2 Ocamlyacc Input n File format: %{ <header> %} <declarations> %% <rules> %% <trailer> Ocamlyacc <header> n Contains arbitrary Ocaml code n Typically used to give types and functions needed for the semantic actions of rules and to give specialized error recovery n May be omitted n <footer> similar. Possibly used to call parser 10/27/16 3 10/27/16 4 Ocamlyacc <declarations> n %token symbol symbol n Declare given symbols as tokens n %token <type> symbol symbol n Declare given symbols as token constructors, taking an argument of type <type> n %start symbol symbol n Declare given symbols as entry points; functions of same names in <grammar>.ml Ocamlyacc <declarations> n %type <type> symbol symbol Specify type of attributes for given symbols. Mandatory for start symbols n %left symbol symbol n %right symbol symbol n %nonassoc symbol symbol Associate precedence and associativity to given symbols. Same line,same precedence; earlier line, lower precedence (broadest scope) 10/27/16 5 10/27/16 6

Ocamlyacc <rules> n nonterminal : symbol... symbol { semantic_action }... symbol... symbol { semantic_action } ; n Semantic actions are arbitrary Ocaml expressions n Must be of same type as declared (or inferred) for nonterminal n Access semantic attributes (values) of symbols by position: $1 for first symbol, $2 to second - Base types (* File: expr.ml *) type expr = Term_as_Expr of term Plus_Expr of (term * expr) Minus_Expr of (term * expr) and term = Factor_as_Term of factor Mult_Term of (factor * term) Div_Term of (factor * term) and factor = Id_as_Factor of string Parenthesized_Expr_as_Factor of expr 10/27/16 7 10/27/16 8 - Lexer (exprlex.mll) { (*open Exprparse*) } let numeric = ['0' - '9'] let letter =['a' - 'z' 'A' - 'Z'] rule token = parse "+" {Plus_token} "-" {Minus_token} "*" {Times_token} "/" {Divide_token} "(" {Left_parenthesis} ")" {Right_parenthesis} letter (letter numeric "_")* as id {Id_token id} [' ' '\t' '\n'] {token lexbuf} eof {EOL} - Parser (exprparse.mly) %{ open Expr %} %token <string> Id_token %token Left_parenthesis Right_parenthesis %token Times_token Divide_token %token Plus_token Minus_token %token EOL %start main %type <expr> main %% 10/27/16 9 10/27/16 10 - Parser (exprparse.mly) expr: term { Term_as_Expr $1 } term Plus_token expr { Plus_Expr ($1, $3) } term Minus_token expr { Minus_Expr ($1, $3) } - Parser (exprparse.mly) term: factor { Factor_as_Term $1 } factor Times_token term { Mult_Term ($1, $3) } factor Divide_token term { Div_Term ($1, $3) } 10/27/16 11 10/27/16 12

- Parser (exprparse.mly) factor: Id_token { Id_as_Factor $1 } Left_parenthesis expr Right_parenthesis {Parenthesized_Expr_as_Factor $2 } main: expr EOL { $1 } - Using Parser # #use "expr.ml";; # #use "exprparse.ml";; # #use "exprlex.ml";; # let test s = let lexbuf = Lexing.from_string (s^"\n") in main token lexbuf;; 10/27/16 13 10/27/16 14 - Using Parser # test "a + b";; - : expr = Plus_Expr (Factor_as_Term (Id_as_Factor "a"), Term_as_Expr (Factor_as_Term (Id_as_Factor "b"))) LR Parsing n Read tokens left to right (L) n Create a rightmost derivation (R) n How is this possible? n Start at the bottom (left) and work your way up n Last step has only one non-terminal to be replaced so is right-most n Working backwards, replace mixed strings by non-terminals n Always proceed so that there are no nonterminals to the right of the string to be replaced 10/27/16 15 10/27/16 16 : = 0 1 () + => : = 0 1 () + => = shift = shift 10/27/16 17 10/27/16 18

: = 0 1 () + => : = 0 1 () + => = shift = ( + 1 ) + 0 shift = shift 10/27/16 19 10/27/16 20 : = 0 1 () + => : = 0 1 () + => = ( + 1 ) + 0 shift = ( + 1 ) + 0 shift = shift => ( + 1 ) + 0 reduce = ( + 1 ) + 0 shift = ( + 1 ) + 0 shift = shift 10/27/16 21 10/27/16 22 : = 0 1 () + => : = 0 1 () + => => ( + ) + 0 reduce => ( + 1 ) + 0 reduce = ( + 1 ) + 0 shift = ( + 1 ) + 0 shift = shift = ( ) + 0 shift => ( + ) + 0 reduce => ( + 1 ) + 0 reduce = ( + 1 ) + 0 shift = ( + 1 ) + 0 shift = shift 10/27/16 23 10/27/16 24

: = 0 1 () + => : = 0 1 () + => => ( ) + 0 reduce = ( ) + 0 shift => ( + ) + 0 reduce => ( + 1 ) + 0 reduce = ( + 1 ) + 0 shift = ( + 1 ) + 0 shift = shift = + 0 shift => ( ) + 0 reduce = ( ) + 0 shift => ( + ) + 0 reduce => ( + 1 ) + 0 reduce = ( + 1 ) + 0 shift = ( + 1 ) + 0 shift = shift 10/27/16 25 10/27/16 26 : = 0 1 () + => = + 0 shift = + 0 shift => ( ) + 0 reduce = ( ) + 0 shift => ( + ) + 0 reduce => ( + 1 ) + 0 reduce = ( + 1 ) + 0 shift = ( + 1 ) + 0 shift = shift : = 0 1 () + => => + 0 reduce = + 0 shift = + 0 shift => ( ) + 0 reduce = ( ) + 0 shift => ( + ) + 0 reduce => ( + 1 ) + 0 reduce = ( + 1 ) + 0 shift = ( + 1 ) + 0 shift = shift 10/27/16 27 10/27/16 28 : = 0 1 () + => + <Sum > reduce => + 0 reduce = + 0 shift = + 0 shift => ( ) + 0 reduce = ( ) + 0 shift => ( + ) + 0 reduce => ( + 1 ) + 0 reduce = ( + 1 ) + 0 shift = ( + 1 ) + 0 shift = shift : = 0 1 () + => + <Sum > reduce => + 0 reduce = + 0 shift = + 0 shift => ( ) + 0 reduce = ( ) + 0 shift => ( + ) + 0 reduce => ( + 1 ) + 0 reduce = ( + 1 ) + 0 shift = ( + 1 ) + 0 shift = shift 10/27/16 29 10/27/16 30

10/27/16 31 10/27/16 32 10/27/16 33 10/27/16 34 10/27/16 35 10/27/16 36

10/27/16 37 10/27/16 38 10/27/16 39 10/27/16 40 10/27/16 41 10/27/16 42

10/27/16 43 10/27/16 44 LR Parsing Tables n Build a pair of tables, Action and Goto, from the grammar n This is the hardest part, we omit here n Rows labeled by states n For Action, columns labeled by terminals and end-of-tokens marker n (more generally strings of terminals of fixed length) n For Goto, columns labeled by nonterminals 10/27/16 45 10/27/16 46 Action and Goto Tables n Given a state and the next input, Action table says either n shift and go to state n, or n reduce by production k (explained in a bit) n accept or error n Given a state and a non-terminal, Goto table says n go to state m LR(i) Parsing Algorithm n Based on push-down automata n Uses states and transitions (as recorded in Action and Goto tables) n Uses a stack containing states, terminals and non-terminals 10/27/16 47 10/27/16 48

LR(i) Parsing Algorithm 0. Insure token stream ends in special endof-tokens symbol 1. Start in state 1 with an empty stack 2. Push state(1) onto stack 3. Look at next i tokens from token stream (toks) (don t remove yet) 4. If top symbol on stack is state(n), look up action in Action table at (n, toks) LR(i) Parsing Algorithm 5. If action = shift m, a) Remove the top token from token stream and push it onto the stack b) Push state(m) onto stack c) Go to step 3 10/27/16 49 10/27/16 50 LR(i) Parsing Algorithm 6. If action = reduce k where production k is E ::= u a) Remove 2 * length(u) symbols from stack (u and all the interleaved states) b) If new top symbol on stack is state(m), look up new state p in Goto(m,E) c) Push E onto the stack, then push state(p) onto the stack d) Go to step 3 LR(i) Parsing Algorithm 7. If action = accept n Stop parsing, return success 8. If action = error, n Stop parsing, return failure 10/27/16 51 10/27/16 52 Adding Synthesized Attributes n Add to each reduce a rule for calculating the new synthesized attribute from the component attributes n Add to each non-terminal pushed onto the stack, the attribute calculated for it n When performing a reduce, n gather the recorded attributes from each nonterminal popped from stack n Compute new attribute for non-terminal pushed onto stack Shift-Reduce Conflicts n Problem: can t decide whether the action for a state and input character should be shift or reduce n Caused by ambiguity in grammar n Usually caused by lack of associativity or precedence information in grammar 10/27/16 53 10/27/16 54

: = 0 1 () + 0 + 1 + 0 shift -> 0 + 1 + 0 reduce -> + 1 + 0 shift -> + 1 + 0 shift -> + 1 + 0 reduce -> + + 0 - cont n Problem: shift or reduce? n You can shift-shift-reduce-reduce or reduce-shift-shift-reduce n Shift first - right associative n Reduce first- left associative 10/27/16 55 10/27/16 56 Reduce - Reduce Conflicts n Problem: can t decide between two different rules to reduce by n Again caused by ambiguity in grammar n Symptom: RHS of one production suffix of another n Requires examining grammar and rewriting it n Harder to solve than shift-reduce errors n S ::= A ab A ::= abc B ::= bc abc a bc ab c abc shift shift shift n Problem: reduce by B ::= bc then by S ::= ab, or by A::= abc then S::A? 10/27/16 57 10/27/16 58 Recursive Descent Parsing n Recursive descent parsers are a class of parsers derived fairly directly from BNF grammars n A recursive descent parser traces out a parse tree in top-down order, corresponding to a left-most derivation (LL - left-to-right scanning, leftmost derivation) Recursive Descent Parsing n Each nonterminal in the grammar has a subprogram associated with it; the subprogram parses all phrases that the nonterminal can generate n Each nonterminal in right-hand side of a rule corresponds to a recursive call to the associated subprogram 10/27/16 59 10/27/16 60

Recursive Descent Parsing n Each subprogram must be able to decide how to begin parsing by looking at the leftmost character in the string to be parsed n May do so directly, or indirectly by calling another parsing subprogram n Recursive descent parsers, like other topdown parsers, cannot be built from leftrecursive grammars n Sometimes can modify grammar to suit Sample Grammar <expr> ::= <term> <term> + <expr> <term> - <expr> <term> ::= <factor> <factor> * <term> <factor> / <term> <factor> ::= <id> ( <expr> ) 10/27/16 61 10/27/16 62 Tokens as OCaml Types n + - * / ( ) <id> n Becomes an OCaml datatype type token = Id_token of string Left_parenthesis Right_parenthesis Times_token Divide_token Plus_token Minus_token Parse Trees as Datatypes <expr> ::= <term> <term> + <expr> <term> - <expr> type expr = Term_as_Expr of term Plus_Expr of (term * expr) Minus_Expr of (term * expr) 10/27/16 63 10/27/16 64 Parse Trees as Datatypes <term> ::= <factor> <factor> * <term> <factor> / <term> and term = Factor_as_Term of factor Mult_Term of (factor * term) Div_Term of (factor * term) Parse Trees as Datatypes <factor> ::= <id> ( <expr> ) and factor = Id_as_Factor of string Parenthesized_Expr_as_Factor of expr 10/27/16 65 10/27/16 66

Parsing Lists of Tokens Parsing an Expression n Will create three mutually recursive functions: n expr : token list -> (expr * token list) n term : token list -> (term * token list) n factor : token list -> (factor * token list) n Each parses what it can and gives back parse and remaining tokens <expr> ::= <term> [( + - ) <expr> ] let rec expr tokens = (match term tokens with ( term_parse, tokens_after_term) -> (match tokens_after_term with( Plus_token :: tokens_after_plus) -> 10/27/16 67 10/27/16 68 Parsing an Expression Parsing a Plus Expression <expr> ::= <term> [( + - ) <expr> ] let rec expr tokens = (match term tokens with ( term_parse, tokens_after_term) -> (match tokens_after_term with ( Plus_token :: tokens_after_plus) -> <expr> ::= <term> [( + - ) <expr> ] let rec expr tokens = (match term tokens with ( term_parse, tokens_after_term) -> (match tokens_after_term with ( Plus_token :: tokens_after_plus) -> 10/27/16 69 10/27/16 70 Parsing a Plus Expression Parsing a Plus Expression <expr> ::= <term> [( + - ) <expr> ] let rec expr tokens = (match term tokens with ( term_parse, tokens_after_term) -> (match tokens_after_term with ( Plus_token :: tokens_after_plus) -> <expr> ::= <term> [( + - ) <expr> ] let rec expr tokens = (match term tokens with ( term_parse, tokens_after_term) -> (match tokens_after_term with ( Plus_token :: tokens_after_plus) -> 10/27/16 71 10/27/16 72

Parsing a Plus Expression Parsing a Plus Expression <expr> ::= <term> + <expr> <expr> ::= <term> + <expr> (match expr tokens_after_plus with ( expr_parse, tokens_after_expr) -> ( Plus_Expr ( term_parse, expr_parse ), tokens_after_expr)) (match expr tokens_after_plus with ( expr_parse, tokens_after_expr) -> ( Plus_Expr ( term_parse, expr_parse ), tokens_after_expr)) 10/27/16 73 10/27/16 74 Building Plus Expression Parse Tree Parsing a Minus Expression <expr> ::= <term> + <expr> <expr> ::= <term> - <expr> (match expr tokens_after_plus with ( expr_parse, tokens_after_expr) -> ( Plus_Expr ( term_parse, expr_parse ), tokens_after_expr)) ( Minus_token :: tokens_after_minus) -> (match expr tokens_after_minus with ( expr_parse, tokens_after_expr) -> ( Minus_Expr ( term_parse, expr_parse ), tokens_after_expr)) 10/27/16 75 10/27/16 76 Parsing a Minus Expression Parsing an Expression as a Term <expr> ::= <term> - <expr> ( Minus_token :: tokens_after_minus) -> (match expr tokens_after_minus with ( expr_parse, tokens_after_expr) -> ( Minus_Expr ( term_parse, expr_parse ), tokens_after_expr)) 10/27/16 77 <expr> ::= <term> _ -> (Term_as_Expr term_parse, tokens_after_term))) n Code for term is same except for replacing addition with multiplication and subtraction with division 10/27/16 78

Parsing Factor as Id <factor> ::= <id> and factor tokens = (match tokens with (Id_token id_name :: tokens_after_id) = ( Id_as_Factor id_name, tokens_after_id) Parsing Factor as Parenthesized Expression <factor> ::= ( <expr> ) factor ( Left_parenthesis :: tokens) = (match expr tokens with ( expr_parse, tokens_after_expr) -> 10/27/16 79 10/27/16 80 Parsing Factor as Parenthesized Expression Error Cases <factor> ::= ( <expr> ) (match tokens_after_expr with Right_parenthesis :: tokens_after_rparen -> ( Parenthesized_Expr_as_Factor expr_parse, tokens_after_rparen) n What if no matching right parenthesis? _ -> raise (Failure "No matching rparen") )) n What if no leading id or left parenthesis? _ -> raise (Failure "No id or lparen" ));; 10/27/16 10/27/16 82 ( a + b ) * c - d expr [Left_parenthesis; Id_token "a ; Plus_token; Id_token "b ; Right_parenthesis; Times_token; Id_token "c ; Minus_token; Id_token "d"];; 10/27/16 83 ( a + b ) * c - d - : expr * token list = (Minus_Expr (Mult_Term (Parenthesized_Expr_as_Factor (Plus_Expr (Factor_as_Term (Id_as_Factor "a"), Term_as_Expr (Factor_as_Term (Id_as_Factor "b")))), Factor_as_Term (Id_as_Factor "c")), Term_as_Expr (Factor_as_Term (Id_as_Factor "d"))), []) 10/27/16 84

( a + b ) * c d <expr> <term> - <expr> <factor> * <term> <term> ( <expr> ) <factor> <factor> <term> + <expr> <id> <id> <factor> <term> c d <id> <factor> a <id> b 10/27/16 85 a + b * c d # expr [Id_token "a ; Plus_token; Id_token "b ; Times_token; Id_token "c ; Minus_token; Id_token "d"];; - : expr * token list = (Plus_Expr (Factor_as_Term (Id_as_Factor "a"), Minus_Expr (Mult_Term (Id_as_Factor "b", Factor_as_Term (Id_as_Factor "c")), Term_as_Expr (Factor_as_Term (Id_as_Factor "d")))), []) 10/27/16 86 a + b * c d <expr> <term> + <expr> <factor> < term> - <expr> <id> <factor> * <term> <term> a <id> <factor> <factor> b <id> <id> c d ( a + b * c - d # expr [Left_parenthesis; Id_token "a ; Plus_token; Id_token "b ; Times_token; Id_token "c ; Minus_token; Id_token "d"];; Exception: Failure "No matching rparen". Can t parse because it was expecting a right parenthesis but it got to the end without finding one 10/27/16 87 10/27/16 88 a + b ) * c - d *) expr [Id_token "a ; Plus_token; Id_token "b ; Right_parenthesis; Times_token; Id_token "c ; Minus_token; Id_token "d"];; - : expr * token list = (Plus_Expr (Factor_as_Term (Id_as_Factor "a"), Term_as_Expr (Factor_as_Term (Id_as_Factor "b"))), [Right_parenthesis; Times_token; Id_token "c"; Minus_token; Id_token "d"]) Parsing Whole String n Q: How to guarantee whole string parses? n A: Check returned tokens empty let parse tokens = match expr tokens with (expr_parse, []) -> expr_parse _ -> raise (Failure No parse");; n Fixes <expr> as start symbol 10/27/16 89 10/27/16 90

Streams in Place of Lists n More realistically, we don't want to create the entire list of tokens before we can start parsing n We want to generate one token at a time and use it to make one step in parsing n Will use (token * (unit -> token)) or (token * (unit -> token option)) in place of token list Problems for Recursive-Descent Parsing n Left Recursion: A ::= Aw translates to a subroutine that loops forever n Indirect Left Recursion: A ::= Bw B ::= Av causes the same problem 10/27/16 91 10/27/16 92 Problems for Recursive-Descent Parsing n Parser must always be able to choose the next action based only only the very next token n Pairwise Disjointedness Test: Can we always determine which rule (in the non-extended BNF) to choose based on just the first token Pairwise Disjointedness Test n For each rule A ::= y Calculate FIRST (y) = {a y =>* aw} {ε if y =>* ε} n For each pair of rules A ::= y and A ::= z, require FIRST(y) FIRST(z) = { } 10/27/16 93 10/27/16 94 Grammar: <S> ::= <A> a <B> b <A> ::= <A> b b <B> ::= a <B> a FIRST (<A> b) = {b} FIRST (b) = {b} Rules for <A> not pairwise disjoint Eliminating Left Recursion n Rewrite grammar to shift left recursion to right recursion n Changes associativity n Given <expr> ::= <expr> + <term> and <expr> ::= <term> n Add new non-terminal <e> and replace above rules with <expr> ::= <term><e> <e> ::= + <term><e> ε 10/27/16 95 10/27/16 96

Factoring Grammar n Test too strong: Can t handle <expr> ::= <term> [ ( + - ) <expr> ] n Answer: Add new non-terminal and replace above rules by <expr> ::= <term><e> <e> ::= + <term><e> <e> ::= - <term><e> <e> ::= ε n You are delaying the decision point Both <A> and <B> have problems: <S> ::= <A> a <B> b <A> ::= <A> b b <B> ::= a <B> a Transform grammar to: <S> ::= <A> a <B> b <A> ::-= b<a1> <A1> :: b<a1> ε <B> ::= a<b1> <B1> ::= a<b1> ε 10/27/16 97 10/27/16 98 Ocamlyacc Input n File format: %{ <header> %} <declarations> %% <rules> %% <trailer> Ocamlyacc <header> n Contains arbitrary Ocaml code n Typically used to give types and functions needed for the semantic actions of rules and to give specialized error recovery n May be omitted n <footer> similar. Possibly used to call parser 10/27/16 99 10/27/16 100 Ocamlyacc <declarations> n %token symbol symbol n Declare given symbols as tokens n %token <type> symbol symbol n Declare given symbols as token constructors, taking an argument of type <type> n %start symbol symbol n Declare given symbols as entry points; functions of same names in <grammar>.ml Ocamlyacc <declarations> n %type <type> symbol symbol Specify type of attributes for given symbols. Mandatory for start symbols n %left symbol symbol n %right symbol symbol n %nonassoc symbol symbol Associate precedence and associativity to given symbols. Same line,same precedence; earlier line, lower precedence (broadest scope) 10/27/16 101 10/27/16 102

Ocamlyacc <rules> n nonterminal : symbol... symbol { semantic_action }... symbol... symbol { semantic_action } ; n Semantic actions are arbitrary Ocaml expressions n Must be of same type as declared (or inferred) for nonterminal n Access semantic attributes (values) of symbols by position: $1 for first symbol, $2 to second - Base types (* File: expr.ml *) type expr = Term_as_Expr of term Plus_Expr of (term * expr) Minus_Expr of (term * expr) and term = Factor_as_Term of factor Mult_Term of (factor * term) Div_Term of (factor * term) and factor = Id_as_Factor of string Parenthesized_Expr_as_Factor of expr 10/27/16 103 10/27/16 104 - Lexer (exprlex.mll) { (*open Exprparse*) } let numeric = ['0' - '9'] let letter =['a' - 'z' 'A' - 'Z'] rule token = parse "+" {Plus_token} "-" {Minus_token} "*" {Times_token} "/" {Divide_token} "(" {Left_parenthesis} ")" {Right_parenthesis} letter (letter numeric "_")* as id {Id_token id} [' ' '\t' '\n'] {token lexbuf} eof {EOL} - Parser (exprparse.mly) %{ open Expr %} %token <string> Id_token %token Left_parenthesis Right_parenthesis %token Times_token Divide_token %token Plus_token Minus_token %token EOL %start main %type <expr> main %% 10/27/16 105 10/27/16 106 - Parser (exprparse.mly) expr: term { Term_as_Expr $1 } term Plus_token expr { Plus_Expr ($1, $3) } term Minus_token expr { Minus_Expr ($1, $3) } - Parser (exprparse.mly) term: factor { Factor_as_Term $1 } factor Times_token term { Mult_Term ($1, $3) } factor Divide_token term { Div_Term ($1, $3) } 10/27/16 107 10/27/16 108

- Parser (exprparse.mly) factor: Id_token { Id_as_Factor $1 } Left_parenthesis expr Right_parenthesis {Parenthesized_Expr_as_Factor $2 } main: expr EOL { $1 } - Using Parser # #use "expr.ml";; # #use "exprparse.ml";; # #use "exprlex.ml";; # let test s = let lexbuf = Lexing.from_string (s^"\n") in main token lexbuf;; 10/27/16 109 10/27/16 110 - Using Parser # test "a + b";; - : expr = Plus_Expr (Factor_as_Term (Id_as_Factor "a"), Term_as_Expr (Factor_as_Term (Id_as_Factor "b"))) 10/27/16 111