FROWN An LALR(k) Parser Generator

Similar documents
EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

S Y N T A X A N A L Y S I S LR

Bottom-Up Parsing. Lecture 11-12

Bottom-Up Parsing. Lecture 11-12

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

UNIT III & IV. Bottom up parsing

In One Slide. Outline. LR Parsing. Table Construction

Lexical and Syntax Analysis. Bottom-Up Parsing

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Lecture Bottom-Up Parsing

LR Parsing LALR Parser Generators

Context-free grammars

Chapter 4. Lexical and Syntax Analysis

Bottom-Up Parsing II. Lecture 8

Wednesday, August 31, Parsers

Compiler Construction: Parsing

Compiler Construction 2016/2017 Syntax Analysis

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Using an LALR(1) Parser Generator

Compilers. Bottom-up Parsing. (original slides by Sam

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

Review: Shift-Reduce Parsing. Bottom-up parsing uses two actions: Bottom-Up Parsing II. Shift ABC xyz ABCx yz. Lecture 8. Reduce Cbxy ijk CbA ijk

Lecture 7: Deterministic Bottom-Up Parsing

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

LALR stands for look ahead left right. It is a technique for deciding when reductions have to be made in shift/reduce parsing. Often, it can make the

LR Parsing. Table Construction

Lecture 8: Deterministic Bottom-Up Parsing

LR Parsing Techniques

Monday, September 13, Parsers

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

LR Parsing LALR Parser Generators

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.

Intro to Bottom-up Parsing. Lecture 9

CS 4120 Introduction to Compilers

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

MODULE 14 SLR PARSER LR(0) ITEMS

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing.

4. Lexical and Syntax Analysis

shift-reduce parsing

LR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States

Chapter 4: LR Parsing

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

4. Lexical and Syntax Analysis

Bottom up parsing. The sentential forms happen to be a right most derivation in the reverse order. S a A B e a A d e. a A d e a A B e S.

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

Programming Language Syntax and Analysis

CIS552: Advanced Programming

Outline. The strategy: shift-reduce parsing. Introduction to Bottom-Up Parsing. A key concept: handles

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Example CFG. Lectures 16 & 17 Bottom-Up Parsing. LL(1) Predictor Table Review. Stacks in LR Parsing 1. Sʹ " S. 2. S " AyB. 3. A " ab. 4.

Introduction to Bottom-Up Parsing

LR Parsing - The Items

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Fall Compiler Principles Lecture 4: Parsing part 3. Roman Manevich Ben-Gurion University of the Negev

CSCI312 Principles of Programming Languages


PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Downloaded from Page 1. LR Parsing

LALR Parsing. What Yacc and most compilers employ.

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

CSE302: Compiler Design

Parsing Techniques. CS152. Chris Pollett. Sep. 24, 2008.

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CS 11 Ocaml track: lecture 6

Lecture Notes on Bottom-Up LR Parsing

Syntax-Directed Translation. Lecture 14

Chapter 3. Describing Syntax and Semantics ISBN

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing

CSC 4181 Compiler Construction. Parsing. Outline. Introduction

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Formal Languages and Compilers Lecture VII Part 4: Syntactic A

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Fall Compiler Principles Lecture 5: Parsing part 4. Roman Manevich Ben-Gurion University

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

G53CMP: Lecture 4. Syntactic Analysis: Parser Generators. Henrik Nilsson. University of Nottingham, UK. G53CMP: Lecture 4 p.1/32

Principle of Compilers Lecture IV Part 4: Syntactic Analysis. Alessandro Artale

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

Configuration Sets for CSX- Lite. Parser Action Table

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

Syntax Analysis Part I

SLR parsers. LR(0) items

Parsing II Top-down parsing. Comp 412

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

Building Compilers with Phoenix

Concepts Introduced in Chapter 4

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Chapter 3: Lexing and Parsing

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CS 314 Principles of Programming Languages

Transcription:

FROWN An LALR(k) Parser Generator RALF HINZE Institute of Information and Computing Sciences Utrecht University Email: ralf@cs.uu.nl Homepage: http://www.cs.uu.nl/~ralf/ September, 2001 (Pick the slides at.../~ralf/talks.html#t30.) Joint work with Ross Paterson and Doaitse Swierstra.

Outline Usage Recap: LR parsing Implementation Features Facts and figures 1

Running example: well-formed parentheses data Tree = Node [Tree ] empty = Node [ ] join t (Node us) = Node (t : us) %{ Terminal = ( ) ; Nonterminal = Expr{Tree }; Expr{join t u } : (, Expr{t }, ), Expr{u }; {empty } ; }% frown ts = fail "syntax error" 2

Usage Frown :-( is invoked as follows: frown Paren.g This generates a Haskell source file (Paren.hs) that contains (among other things) the desired parser: expr :: (Monad m) [Char ] m Tree Here, Char is the type of terminals and Tree is the type of semantic values associated with Expr. 3

The standard example: arithmetic expressions type Op = Int Int Int %{ Terminal = Nat{Int } Add{Op } Mul{Op } L R; Nonterminal = Expr{Int } Term{Int } Factor{Int }; Expr{v 1 op v 2 } : Expr{v 1 }, Add{op }, Term{v 2 }; {e } Term{e }; Term{v 1 op v 2 } : Term{v 1 }, Mul{op }, Factor{v 2 }; {e } Factor{e }; Factor{e } : L, Expr{e }, R; {n } Nat{n }; }% frown ts = fail ("syntax error: " + show ts) 4

data Terminal = Nat Int Add Op Mul Op L R lexer :: String [Terminal ] lexer [ ] = [ ] lexer ( + : cs) = Add (+) : lexer cs lexer ( - : cs) = Add ( ) : lexer cs lexer ( * : cs) = Mul ( ) : lexer cs lexer ( / : cs) = Mul div : lexer cs lexer ( ( : cs) = L : lexer cs lexer ( ) : cs) = R : lexer cs lexer (c : cs) isdigit c = let (n, cs ) = span isdigit cs in Nat (read (c : n)) : lexer cs otherwise = lexer cs 5

Things to note The terminal symbols are arbitrary Haskell patterns (of the same Terminal type). Both terminal and nonterminal symbols may carry multiple semantic values (or no value). The parser generated for start symbol Start{T 1 }... {T n } has type start :: (Monad m) [Terminal ] m (T 1,..., T n ) A grammar may have several start symbols. 6

Outline Usage Recap: LR parsing Implementation Features Facts and figures 7

Shift-reduce parsing The parsers that are generated by Frown :-( are so-called LALR(k) parsers ( LA = lookahead, L = left-to-right scanning of input, R = constructing a rightmost derivation in reverse). LR parsing is a general method of shift-reduce parsing. General idea: reduce the input string to the start symbol of the grammar. Either shift a terminal symbol onto the stack or reduce the RHS of a production on top of the stack to the LHS. History: LR parsers were first introduced by Knuth in 1965. When DeRemer devised the LALR method in 1969, the LR technique became the method of choice for parser generators. 8

Running example The grammar is first augmented by an EOF symbol (here $ ) and a new start symbol (here Start ). %{ Terminal = ( ) $ ; Nonterminal = Start{Tree } Expr{Tree }; [0] Start{t } : Expr{t }, $ ; [1] Expr{join t u } : (, Expr{t }, ), Expr{u }; [2] Expr{empty } : ; }% 9

A non-deterministic parser data Stack = Stack Symbol parse :: (Monad m) Stack [Terminal ] m Tree parse = shift reduce 0 reduce 1 reduce 2 shift st (t : tr) = parse (st t) tr reduce 0 (st Expr{t } $ ) = return t reduce 1 (st ( Expr{t } ) Expr{u }) = parse (st Expr{join t u }) reduce 2 st = parse (st Expr{empty }) 10

A sample parse ()()$ shift ( )()$ reduce 2 (E )()$ shift (E) ()$ shift (E)( )$ reduce 2 (E)(E )$ shift (E)(E) $ reduce 2 (E)(E)E $ reduce 1 (E)E $ reduce 1 (E)E $ reduce 1 E $ shift E$ reduce 0 S 11

Recognition of handles Problem: How can we decide which action to take? In particular, how can we efficiently determine which RHS resides on top of the stack? This is another language recognition problem! A handle is the RHS of a production preceeded by a left context. S = r αnω = r αβω = r ω Here, αβ is a handle of the right-sentential form αβω (ω and ω contain only terminals). 12

Grammar of handles and left contexts H(p) is the language of handles for production p; L(N ) is the language of left contexts of N. H(0) : L(Start), Expr, $ ; H(1) : L(Expr), (, Expr, ), Expr; H(2) : L(Expr); L(Start) : ; L(Expr) : L(Start); L(Expr), ( ; L(Expr), (, Expr, ) ; By construction, this grammar is left-recursive. Thus, H(p) and L(N ) generate regular languages! 13

LR(0) automaton Regular languages can be recognized by deterministic finite state machines (set of states construction; N : α β is called an item). [1] [4] ( S : E$ E : (E)E E : ( E : ( E)E E : (E)E E : E [2] S : E $ $ [3] S : E$ E [5] E : (E )E ) [6] ( E : (E) E E : (E)E E : E [7] E : (E)E 14

Outline Usage Recap: LR parsing Implementation Features Facts and figures 15

Towards Haskell The LR(0) automaton can be directly encoded as a functional program. The stack records the transitions of the LR(0) automaton. data GStack = VStack State data VStack = GStack Symbol NB. s t s is meant to resemble the transition s t s. 16

Shift and reduce states State 2 is a shift state. parse 2 st ( $ : tr) = parse 3 (st $ 3) tr State 7 is a reduce state. parse 7 (st 1 ( 4 E{t } 5 ) 6 E{u } 7) = parse 2 (st 1 E{join t u } 2) parse 7 (st 4 ( 4 E{t } 5 ) 6 E{u } 7) = parse 5 (st 4 E{join t u } 5) parse 7 (st 6 ( 4 E{t } 5 ) 6 E{u } 7) = parse 7 (st 6 E{join t u } 7) 17

Conflicts We have a shift/reduce conflict if a state contains both a shift and a reduce action (states 1, 4, and 6 in our running example). A shift/reduce conflict can possibly be resolved using one token of lookahead. We have a reduce/reduce conflict if a state contains several reduce actions. A reduce/reduce conflict can possibly be resolved using k tokens of lookahead. 18

Computation of lookahead information Idea: partially execute the LR(0) machine at compile time to determine the shifts that might follow a reduce action. [1] E : 1 {$} [3] S : 1 E 2 $ 3 {} [4] E : 4 {)} [6] E : 6 {$, )} [7] E : 1 ( 4 E 5 ) 6 E 7 {$} E : 4 ( 4 E 5 ) 6 E 7 {)} E : 6 ( 4 E 5 ) 6 E 7 {$, )} NB. LALR(k) parsers merge the lookahead information for each production. In other words, the parsers generated by Frown :-( are slightly more general than LALR. 19

In Haskell: representation of the stack For each transition we introduce a constructor. data Stack = Empty St 1 4 Stack St 1 2 Stack (Tree) St 2 3 Stack St 4 4 Stack St 4 5 Stack (Tree) St 5 6 Stack St 6 4 Stack St 6 7 Stack (Tree) 20

In Haskell: LR(0) machine expr tr = parse 1 tr Empty parse 1 ts@[ ] st = parse 2 ts (St 1 2 st (empty)) parse 1 ( ( : tr) st = parse 4 tr (St 1 4 st) parse 1 ts st = frown ts parse 2 tr@[ ] st = parse 3 tr (St 2 3 st) parse 2 ts st = frown ts parse 3 ts (St 2 3 (St 1 2 st (v0 ))) = return (v0 ) parse 4 ( ( : tr) st = parse 4 tr (St 4 4 st) parse 4 ts@( ) : tr) st = parse 5 ts (St 4 5 st (empty)) parse 4 ts st = frown ts 21

parse 5 ( ) : tr) st = parse 6 tr (St 5 6 st) parse 5 ts st = frown ts parse 6 ( ( : tr) st = parse 4 tr (St 6 4 st) parse 6 ts st = parse 7 ts (St 6 7 st (empty)) parse 7 ts (St 6 7 (St 5 6 (St 4 5 (St 1 4 st) (t))) (u)) = parse 2 ts (St 1 2 st (join t u)) parse 7 ts (St 6 7 (St 5 6 (St 4 5 (St 4 4 st) (t))) (u)) = parse 5 ts (St 4 5 st (join t u)) parse 7 ts (St 6 7 (St 5 6 (St 4 5 (St 6 4 st) (t))) (u)) = parse 7 ts (St 6 7 st (join t u)) 22

Outline Usage Recap: LR parsing Implementation Features Facts and figures 23

Features Multiple entry points (start symbols). Multiple attribute values. Precedences to resolve conflicts. --lookahead=k: Use additional lookahead to resolve reduce/reduce conflicts (only used where needed). --backtrack: Produce a backtracking parser. start :: (MonadPlus m) [Terminal ] m (T 1,..., T n ) --monadic: Monadic semantic actions. 24

--lexer: Use monadic lexer instead of list of terminal symbols: get :: (Monad m) m Terminal start :: (Monad m) m (T 1,..., T n ) --expected: In case of error pass the set of expected tokens to the error routine. frown :: (Monad m) [Terminal ] [Terminal ] m a Four different parser schemes (standard, LALR-like, stackless, combinator-based). 25

Outline Usage Recap: LR parsing Implementation Features Facts and figures 26

Facts and figures: expression parser The expression grammar has 5 terminals, 4 nonterminals, and 7 productions. The LR automaton has 13 states and 23 transitions. Running time (expr ("a+b" + concat (replicate n "*c+d"))). 1.000 10.000 100.000 happy 0.01 0.29 5.20 happy -a 0.02 0.40 5.92 happy -c -g 0.01 0.25 4.44 happy -a -c -g 0.01 0.23 3.85 frown 0.01 0.10 1.98 frown -oc 0.01 0.11 2.01 frown -s 0.01 0.10 2.03 27

Facts and figures: Haskell parser The Haskell grammar has 61 terminals, 121 nonterminals, and 277 productions. The LR automaton has 490 states and 2961 transitions. Compilation time. time.hs space happy 210K 150M -fno-cpr happy -c -g 4.5 246K 180M -fno-cpr happy -a -c -g 4.5 164K 100M frown -oc 4.8 300K 160M.o a.out stripped happy 1.511K 2.536K 1.246K happy -c -g 1.093K 2.202K 1.134K happy -a -c -g 346K 1.711K 894K frown -oc 1.489K 2.547K 1.362K 28

Running time. 18K 300K 1653K lex only 0.01 0.55 4.8 happy 0.23 2.68 12.1 happy -c -g 0.14 2.23 11.6 happy -a -c -g 0.07 1.83 10.6 frown -oc 0.14 1.95 8.5 29

Outline Usage Recap: LR parsing Implementation Features Facts and figures 30

Conclusion Encoding the LR(0) automaton as a set of mutually recursive functions rather than as a huge table gives more flexibility (use lookahead only where it is needed). Frown :-( generates good code (huge but fast). The output is fairly readable. With some additional work the generated parsers can produce good error messages (monadic lexer that keeps track of line numbers). 31