Compilation 2012 ocamllex and menhir

Similar documents
Compilation 2010 SableCC

Compilation 2013 Parser Generators, Conflict Management, and ML-Yacc

Conflicts in LR Parsing and More LR Parsing Types

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

Principle of Compilers Lecture IV Part 4: Syntactic Analysis. Alessandro Artale

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

CS 11 Ocaml track: lecture 6

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

CS 411 Midterm Feb 2008

Parsing. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.

S Y N T A X A N A L Y S I S LR

A Bison Manual. You build a text file of the production (format in the next section); traditionally this file ends in.y, although bison doesn t care.

LR Parsing LALR Parser Generators

Formal Languages and Compilers Lecture VII Part 4: Syntactic A

In One Slide. Outline. LR Parsing. Table Construction

shift-reduce parsing

LR Parsing LALR Parser Generators

Bottom-Up Parsing. Lecture 11-12

Bottom-Up Parsing. Lecture 11-12

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

LR Parsing - Conflicts

More Bottom-Up Parsing

Lecture 7: Deterministic Bottom-Up Parsing

Compilers. Bottom-up Parsing. (original slides by Sam

Syntax-Directed Translation

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

LALR stands for look ahead left right. It is a technique for deciding when reductions have to be made in shift/reduce parsing. Often, it can make the

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

UNIVERSITY OF CALIFORNIA

Parsing Techniques. CS152. Chris Pollett. Sep. 24, 2008.

ASTs, Objective CAML, and Ocamlyacc

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

Action Table for CSX-Lite. LALR Parser Driver. Example of LALR(1) Parsing. GoTo Table for CSX-Lite

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

SECOND PUBLIC EXAMINATION. Compilers

Programming Languages and Compilers (CS 421)

Configuration Sets for CSX- Lite. Parser Action Table

n (0 1)*1 n a*b(a*) n ((01) (10))* n You tell me n Regular expressions (equivalently, regular 10/20/ /20/16 4

LR Parsing. Table Construction

G53CMP: Lecture 4. Syntactic Analysis: Parser Generators. Henrik Nilsson. University of Nottingham, UK. G53CMP: Lecture 4 p.1/32

Lecture Notes on Bottom-Up LR Parsing

Lecture 8: Deterministic Bottom-Up Parsing

Kinder, Gentler Nation

Compiler Construction

Lexical Analysis. Finite Automata. (Part 2 of 2)

Lecture Notes on Shift-Reduce Parsing

Lexing, Parsing. Laure Gonnord sept Master 1, ENS de Lyon

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) Programming Languages & Compilers. Major Phases of a Compiler

3. Parsing. 3.1 Context-Free Grammars and Push-Down Automata 3.2 Recursive Descent Parsing 3.3 LL(1) Property 3.4 Error Handling

Programming Languages and Compilers (CS 421)

Chapter 3: Lexing and Parsing

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) I. Major Phases of a Compiler. Programming Languages & Compilers

Review: Shift-Reduce Parsing. Bottom-up parsing uses two actions: Bottom-Up Parsing II. Shift ABC xyz ABCx yz. Lecture 8. Reduce Cbxy ijk CbA ijk

LALR Parsing. What Yacc and most compilers employ.

A simple implementation of grammar libraries

Lecture Notes on Bottom-Up LR Parsing

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

CSE 401 Midterm Exam Sample Solution 11/4/11

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Parsing Algorithms. Parsing: continued. Top Down Parsing. Predictive Parser. David Notkin Autumn 2008

Compiler Construction

Using an LALR(1) Parser Generator

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

Exercise 1: Balanced Parentheses

Chapter 3 Parsing. time, arrow, banana, flies, like, a, an, the, fruit

Syntax Analysis MIF08. Laure Gonnord

2.2 Syntax Definition

Error Detection in LALR Parsers. LALR is More Powerful. { b + c = a; } Eof. Expr Expr + id Expr id we can first match an id:

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

CMSC 430 Introduction to Compilers. Fall Lexing and Parsing

MP 3 A Lexer for MiniJava

CSE 401 Midterm Exam Sample Solution 2/11/15

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2003 Columbia University Department of Computer Science

UNIT III & IV. Bottom up parsing

Menhir Reference Manual (version ) INRIA {Francois.Pottier,

Building a Parser Part III

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Syntactic Analysis. Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree.

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

A clarification on terminology: Recognizer: accepts or rejects strings in a language. Parser: recognizes and generates parse trees (imminent topic)

Midterm I - Solution CS164, Spring 2014

Compiler Passes. Syntactic Analysis. Context-free Grammars. Syntactic Analysis / Parsing. EBNF Syntax of initial MiniJava.

MODULE 14 SLR PARSER LR(0) ITEMS

In this simple example, it is quite clear that there are exactly two strings that match the above grammar, namely: abc and abcc

Defining syntax using CFGs

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.

Chapter 4. Lexical and Syntax Analysis

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Syntax Analysis Check syntax and construct abstract syntax tree

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Lexical and Syntax Analysis. Bottom-Up Parsing

Monday, September 13, Parsers

COMP 520 Fall 2012 Scanners and Parsers (1) Scanners and parsers

Bottom-Up Parsing II. Lecture 8

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Compiler Construction

Wednesday, September 9, 15. Parsers

Transcription:

Compilation 2012 Jan Midtgaard Michael I. Schwartzbach Aarhus University

The tools ocamllex is a lexer/scanner generator Input: a sequence of token definitions Output: a lexer/scanner available as an OCaml module menhir is a parser generator Input: an LR(1) context-free grammar Output: a parser for the defined language available as an OCaml module 2

Our Favorite Grammar in { open Parser let get = Lexing.lexeme } (* Helpers *) let tab let cr let lf let eol = '\009' = '\013' = '\010' rule token = parse = cr lf cr lf eol { token lexbuf } (' ' tab) { token lexbuf } eof { EOF } '+' { PLUS } '-' { MINUS } '*' { STAR } '/' { SLASH } '(' { LPAR } ')' { RPAR } ('x' 'y' 'z') { ID(get lexbuf) } %{ (* Put OCaml helper functions here *) %} %token EOF %token PLUS MINUS STAR SLASH LPAR RPAR %token <string>id %start <unit> start /* entry point */ %% /* Productions */ start : expr EOF { }; expr : expr PLUS term { } expr MINUS term { } term { }; term : term STAR factor { } term SLASH factor { } factor { }; factor : ID { } LPAR expr RPAR { }; 3

Generated Modules -rw-r--r-- 1 jmi users 6384 Sep 1 22:53 lexer.ml -rw-r--r-- 1 jmi users 594 Sep 1 22:20 lexer.mll -rw-r--r-- 1 jmi users 317 Sep 1 22:20 main.ml -rw-r--r-- 1 jmi users 8554 Sep 1 22:53 parser.ml -rw-r--r-- 1 jmi users 170 Sep 1 22:53 parser.mli -rw-r--r-- 1 jmi users 480 Sep 1 22:20 parser.mly -rw-r--r-- 1 jmi users 2416 Sep 1 22:53 parser.automaton These are the input files 4

Generated Modules -rw-r--r-- 1 jmi users 6384 Sep 1 22:53 lexer.ml -rw-r--r-- 1 jmi users 594 Sep 1 22:20 lexer.mll -rw-r--r-- 1 jmi users 317 Sep 1 22:20 main.ml -rw-r--r-- 1 jmi users 8554 Sep 1 22:53 parser.ml -rw-r--r-- 1 jmi users 170 Sep 1 22:53 parser.mli -rw-r--r-- 1 jmi users 480 Sep 1 22:20 parser.mly -rw-r--r-- 1 jmi users 2416 Sep 1 22:53 parser.automaton These are the input files These are the output files (run menhir with v) 5

Generated Modules -rw-r--r-- 1 jmi users 6384 Sep 1 22:53 lexer.ml -rw-r--r-- 1 jmi users 594 Sep 1 22:20 lexer.mll -rw-r--r-- 1 jmi users 317 Sep 1 22:20 main.ml -rw-r--r-- 1 jmi users 8554 Sep 1 22:53 parser.ml -rw-r--r-- 1 jmi users 170 Sep 1 22:53 parser.mli -rw-r--r-- 1 jmi users 480 Sep 1 22:20 parser.mly -rw-r--r-- 1 jmi users 2416 Sep 1 22:53 parser.automaton -rw-r--r-- 1 jmi users 2416 Sep 1 22:53 parser.conflicts These are the input files These are the output files (run menhir with v) Menhir generates a conflicts-file in case of conflicts. These two are useful for debugging conflicts. 6

The Main Application let lexbuf = Lexing.from_channel stdin in try Parser.start Lexer.token lexbuf with Failure msg -> print_endline ("Failure in " ^ msg) Parser.Error -> print_endline "Parse error" End_of_file -> print_endline "Parse error: unexpected end of string" 7

An Ambiguous Grammar X a X a a X Any string in this language has exponentially many different parse trees a a... a has exactly Fib(n) parse trees n 8

The ocamllex + menhir version { open Parser } rule token = parse 'a' { A } %{ %} %token A EOF %start <unit> main %% main : x EOF { }; x : { } A x { } A A x { }; 9

menhir is not happy $ menhir -v parser.mly Warning: one state has reduce/reduce conflicts. Warning: one reduce/reduce conflict was arbitrarily resolved. File "parser.mly", line 11, characters 6-14: Warning: production x -> A A x is never reduced. Warning: in total, 1 productions are never reduced. $ The parser table in parser.automaton contains conflicting actions! 10

parser.conflicts is more informative: ** Conflict (reduce/reduce) in state 3. ** Token involved: EOF ** This state is reached from main after reading: A A x ** The derivations that appear below have the following common factor: ** (The question mark symbol (?) represents the spot where the derivations begin to differ.) main x EOF // lookahead token appears (?) ** In state 3, looking ahead at EOF, reducing production ** x -> A x ** is permitted because of the following sub-derivation: A x // lookahead token is inherited A x. ** In state 3, looking ahead at EOF, reducing production ** x -> A A x ** is permitted because of the following sub-derivation: A A x. 11

Solution: Less Stupid Grammar { open Parser } rule token = parse 'a' { A } %{ %} %token A EOF %start <unit> main %% main : x EOF { }; x : { } A x { }; 12

A Grammar for If-Statements { open Parser } %{ %} let tab let lf let cr let eol = '\009' = '\010' = '\013' = cr lf cr lf %token EOF %token EXP IF THEN ELSE ASSIGN %start <unit> main %% rule token = parse (eol ' ' tab) { token lexbuf } eof { EOF } "exp" { EXP } "if" { IF } "then" { THEN } "else" { ELSE } "assign" { ASSIGN } main : stm EOF { }; stm : IF EXP THEN stm { } IF EXP THEN stm ELSE stm { } ASSIGN { }; 13

menhir is not happy $ menhir -v parser.mly Warning: one state has shift/reduce conflicts. Warning: one shift/reduce conflict was arbitrarily resolved. $ But the grammar does not appear to be stupid... 14

Again parser.conflicts is your friend: ** Conflict (shift/reduce) in state 5. ** Token involved: ELSE ** This state is reached from main after reading: IF EXP THEN IF EXP THEN stm ** The derivations that appear below have the following common factor: ** (The question mark symbol (?) represents the spot where the derivations begin to differ.) main stm EOF (?) ** In state 5, looking ahead at ELSE, reducing production ** stm -> IF EXP THEN stm ** is permitted because of the following sub-derivation: IF EXP THEN stm ELSE stm // lookahead token appears IF EXP THEN stm. ** In state 5, looking ahead at ELSE, shifting is permitted ** because of the following sub-derivation: IF EXP THEN stm IF EXP THEN stm. ELSE stm 15

Solution: Less Natural Grammar { open Parser } %{ %} let tab let lf let cr let eol = '\009' = '\010' = '\013' = cr lf cr lf %token EOF %token EXP IF THEN ELSE ASSIGN %start <unit> main %% rule token = parse (eol ' ' tab) { token lexbuf } eof { EOF } "exp" { EXP } "if" { IF } "then" { THEN } "else" { ELSE } "assign" { ASSIGN } main : stm EOF { }; stm : IF EXP THEN stm { } IF EXP THEN stm2 ELSE stm { } ASSIGN { }; stm2 : IF EXP THEN stm2 ELSE stm2 { } ASSIGN { }; 16

Dangling Else Problem An example statement: if exp then if exp then assign else assign To which if does the else belong? The first grammar is ambiguous Our modified grammar parses the string as: if exp then ( if exp then assign else assign ) 17

The Palindrome Grammar { open Parser } rule token = parse '0' { Zero } '1' { One } %{ %} %token Zero One EOF %start <unit> main %% main : pal EOF { }; pal : { } One { } Zero { } One pal One { } Zero pal Zero { }; 18

menhir is not happy $ menhir -v parser.mly Warning: 2 states have shift/reduce conflicts. Warning: 2 states have reduce/reduce conflicts. Warning: 6 shift/reduce conflicts were arbitrarily resolved. $ 19

Again parser.conflict is descriptive ** Conflict (shift/reduce) in state 2. ** Tokens involved: Zero One ** The following explanations concentrate on token One. ** This state is reached from main after reading: One ** The derivations that appear below have the following common factor: ** (The question mark symbol (?) represents the spot where the derivations begin to differ.) main pal EOF (?) ** In state 2, looking ahead at One, reducing production ** pal -> ** is permitted because of the following sub-derivation: One pal One // lookahead token appears. 20

No Solution! There is no LR(1) grammar for this language Some grammars are not LR(1) And some languages are not LR(1) Some grammars are ambiguous And some languages are ambiguous 21

Language Containments { a i b j c k i=j or j=k } Context-Free Unambiguous LR(1) palindromes 22

EBNF Features Some parser generators (e.g., menhir and SableCC) allows right-hand side abbreviations: Optional: x : y? List: x : y* Non-empty list: x : y+ This has benefits: shorter less error-prone fewer names must be invented 23

EBNF Example block : LBRACE decl* stm+ RBRACE { }; decl : type id init? SEMICOLON { }; init : EQUALS exp { }; 24

EBNF Expansion x : y? x : y ; x : y* x : y x ; x : y+ x : y y x ; 25