Structurally Recursive Descent Parsing. Nils Anders Danielsson (Nottingham) Joint work with Ulf Norell (Chalmers)

Similar documents
CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Total Parser Combinators

Parsing II Top-down parsing. Comp 412

MIT Top-Down Parsing. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Parsing CSP-CASL with Parsec

Compilers. Compiler Construction Tutorial The Front-end

Compilers. Predictive Parsing. Alex Aiken

A programming language requires two major definitions A simple one pass compiler

Monads and all that III Applicative Functors. John Hughes Chalmers University/Quviq AB

LECTURE 7. Lex and Intro to Parsing

University of Utrecht. 1992; Fokker, 1995), the use of monads to structure functional programs (Wadler,

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Peace cannot be kept by force; it can only be achieved by understanding. Albert Einstein

Syntax Analysis, III Comp 412

Introduction to Parsing

Fusion-powered EDSLs

UNIVERSITY OF CALIFORNIA

CIS552: Advanced Programming

Syntax Analysis, III Comp 412

Parsing. Zhenjiang Hu. May 31, June 7, June 14, All Right Reserved. National Institute of Informatics

Midterm 1. CMSC 430 Introduction to Compilers Spring Instructions Total 100. Name: March 14, 2012

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Building a Parser II. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Type Checking. Outline. General properties of type systems. Types in programming languages. Notation for type rules.

CSCE 314 Programming Languages. Monadic Parsing

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

The Algebra of Programming in Haskell

Outline. General properties of type systems. Types in programming languages. Notation for type rules. Common type rules. Logical rules of inference

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

COP5621 Exam 3 - Spring 2005

Compiler Construction

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

Compiler Construction

CSCE 314 Programming Languages. Monadic Parsing

Lexical and Syntax Analysis. Top-Down Parsing

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Principles of Programming Languages COMP251: Syntax and Grammars

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Lisp: Lab Information. Donald F. Ross

CS 406/534 Compiler Construction Parsing Part I

Type Soundness. Type soundness is a theorem of the form If e : τ, then running e never produces an error

Parsing III. (Top-down parsing: recursive descent & LL(1) )

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

Parsing Expression Grammars and Packrat Parsing. Aaron Moss

Introduction to Haskell

CSCE 314 Programming Languages

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Parsing Techniques. CS152. Chris Pollett. Sep. 24, 2008.

ASTs, Objective CAML, and Ocamlyacc

Parsing Part II (Top-down parsing, left-recursion removal)

CSCI312 Principles of Programming Languages

CS 230 Programming Languages

COL728 Minor1 Exam Compiler Design Sem II, Answer all 5 questions Max. Marks: 20

Error Handling Syntax-Directed Translation Recursive Descent Parsing

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

Parsing II Top-down parsing. Comp 412

Grammars and Parsing, second week

Programming with Universes, Generically

Types of parsing. CMSC 430 Lecture 4, Page 1

LECTURE 3. Compiler Phases

Alonzo a Compiler for Agda

7. Introduction to Denotational Semantics. Oscar Nierstrasz

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

EECS 700 Functional Programming

CMSC 330: Organization of Programming Languages

CSCE 314 Programming Languages. Functional Parsers

EECS 6083 Intro to Parsing Context Free Grammars

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Thoughts on Assignment 4 Haskell: Flow of Control

CMSC 330: Organization of Programming Languages. Formal Semantics of a Prog. Lang. Specifying Syntax, Semantics

Recap: Typical Compiler Structure. G53CMP: Lecture 2 A Complete (Albeit Small) Compiler. This Lecture (2) This Lecture (1)

CSE 413 Languages & Implementation. Hal Perkins Winter 2019 Structs, Implementing Languages (credits: Dan Grossman, CSE 341)

Administrativia. PA2 assigned today. WA1 assigned today. Building a Parser II. CS164 3:30-5:00 TT 10 Evans. First midterm. Grammars.

Computer Science 160 Translation of Programming Languages

Tail Calls. CMSC 330: Organization of Programming Languages. Tail Recursion. Tail Recursion (cont d) Names and Binding. Tail Recursion (cont d)

Explicit Recursion in Generic Haskell

Haskell, Do You Read Me?

Principles of Programming Languages 2017W, Functional Programming

A Third Look At ML. Chapter Nine Modern Programming Languages, 2nd ed. 1

INFOB3TC Solutions for the Exam

FROWN An LALR(k) Parser Generator

Parsing a primer. Ralf Lämmel Software Languages Team University of Koblenz-Landau

Syntactic Analysis. Top-Down Parsing

Syntax-Directed Translation. Lecture 14

Topics Covered Thus Far. CMSC 330: Organization of Programming Languages. Language Features Covered Thus Far. Programming Languages Revisited

Topic of the day: Ch. 4: Top-down parsing

Context-Free Grammar (CFG)

Parser Tools: lex and yacc-style Parsing

CS 314 Principles of Programming Languages. Lecture 9

Abstract Syntax Trees & Top-Down Parsing

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

Lecture 6: Top-Down Parsing. Beating Grammars into Programs. Expression Recognizer with Actions. Example: Lisp Expression Recognizer

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Introduction to Syntax Analysis Recursive-Descent Parsing

Principles of Programming Languages COMP251: Syntax and Grammars

Transcription:

Structurally Recursive Descent Parsing Nils Anders Danielsson (Nottingham) Joint work with Ulf Norell (Chalmers)

Parser combinators Parser combinator libraries are great! Elegant code. Executable grammars. Easy to abstract out recurring patterns. Light-weight. Nowadays often fast enough.

Simple example expr = + $ term < sym + < > expr term term =...

Simple example expr = + $ expr < sym + < > term term term =...

Risk of non-termination Combinator parsing is not guaranteed to terminate. Most combinator parsers fail for left-recursive grammars. Executable grammars? Some errors are not caught at compile-time.

Another problem All interesting grammars are cyclic: expr = + $ term < sym + < > expr term Cyclic values are hard to understand and reason about. How do you implement combinator parsing in a language which requires structural recursion?

Our solution Remove cycles by representing grammars as functions from non-terminals to parsers: Grammar tok nt = nt res Parser tok nt res Rule out left recursion by restricting the types.

Examples

Example Non-terminals: data NT : ParserType where expr : NT N term : NT N Result type: N. Indices ensuring termination:. Inferred automatically.

Example g : Grammar Char NT g expr = + $! term < sym + < >! expr! term g term =... Note: g is not recursive.! turns a non-terminal into a parser.

Example g : Grammar Char NT g expr = + $! term < sym + < >! expr! term g term =... Uses applicative functor interface. Monadic interface also possible.

Abstraction Much of the flavour of ordinary combinator parsers is preserved. Abstraction requires a little work, though.

Abstraction data NT : ParserType where lib : L.Nonterminal NT i r NT r expr : NT N term : NT N op : NT (N N N) open L.Combinators lib g : Grammar Char NT g (lib p) = librarygrammar p g expr = chainl 1 (! term) (! op) g term = number parenthesised (! expr) g op = + <$ sym + <$ sym -

Abstraction data NT : ParserType where lib : L.Nonterminal NT i r NT r expr : NT N term : NT N op : NT (N N N) open L.Combinators lib g : Grammar Char NT g (lib p) = librarygrammar p g expr = chainl 1 (! term) (! op) g term = number parenthesised (! expr) g op = + <$ sym + <$ sym -

Running a parser parse : Parser tok nt i result Grammar tok nt [tok ] [result [tok ]]

How does it work?

Indices Parsers are indexed on two things: Index = Empty Corners Empty Does the parser accept the empty string? Corners A tree representation of the proper left corners of the parser.

Indices Empty Does the parser accept the empty string? Empty = Bool Corners Represents all positions in the grammar in which the parser must not recurse to itself. data Corners : Set where leaf : Corners step : Corners Corners node : Corners Corners Corners

Some basic combinators < > : Parser (e 1, c 1 ) Parser (e 2, c 2 ) Parser (e 1 e 2, if e 1 then node c 1 c 2 else c 1 ) : Parser (e 1, c 1 ) Parser (e 2, c 2 ) Parser (e 1 e 2, node c 1 c 2 )! : nt (e, c) Parser (e, step c)

Some basic combinators < > : Parser (e 1, c 1 ) Parser (e 2, c 2 ) Parser (e 1 e 2, if e 1 then node c 1 c 2 else c 1 ) : Parser (e 1, c 1 ) Parser (e 2, c 2 ) Parser (e 1 e 2, node c 1 c 2 )! : nt (e, c) Parser (e, step c)

Some basic combinators < > : Parser (e 1, c 1 ) Parser (e 2, c 2 ) Parser (e 1 e 2, if e 1 then node c 1 c 2 else c 1 ) : Parser (e 1, c 1 ) Parser (e 2, c 2 ) Parser (e 1 e 2, node c 1 c 2 )! : nt (e, c) Parser (e, step c) This does not type check: grammar : nt (e, c) res Parser tok nt (e, c) res grammar rec =! rec Reason: c step c.

Some basic combinators < > : Parser (e 1, c 1 ) Parser (e 2, c 2 ) Parser (e 1 e 2, if e 1 then node c 1 c 2 else c 1 ) : Parser (e 1, c 1 ) Parser (e 2, c 2 ) Parser (e 1 e 2, node c 1 c 2 )! : nt (e, c) Parser (e, step c) This works, though: grammar : nt (e, c) res Parser tok nt (e, c) res grammar rec = sym c >! rec Reason: sym c must consume a token.

Some basic combinators < > : Parser (e 1, c 1 ) Parser (e 2, c 2 ) Parser (e 1 e 2, if e 1 then node c 1 c 2 else c 1 ) : Parser (e 1, c 1 ) Parser (e 2, c 2 ) Parser (e 1 e 2, node c 1 c 2 )! : nt (e, c) Parser (e, step c) Indirect left recursion also fails: grammar : nt (e, c) res Parser tok nt (e, c) res grammar rec =! other < >... grammar other = many p < >! rec

Indices can be useful anyway Parsing zero or more things: many : Parser tok nt (false, c) r Parser tok nt [r ] Note that the input parser must not accept the empty string. Even if the backend can handle many empty it seems reasonable to assume that it is a bug.

Backend Simple backtracking implementation. (So far.) Lexicographic structural recursion over: 1. An upper bound on the length of the input string. 2. The Corners index. 3. The structure of the parser.

Expressive power? Can define grammars with an infinite number of non-terminals: data NT : ParserType where a 1+ : N NT Unit g : Grammar Char NT g (a 1+ zero) = sym a > return unit g (a 1+ (suc n)) = sym a >! (a 1+ n) Can use this to define non-context-free languages: a n b n c n.

Expressive power? Can define grammars with an infinite number of non-terminals: data NT : ParserType where a 1+ : N NT Unit g : Grammar Char NT g (a 1+ zero) = sym a > return unit g (a 1+ (suc n)) = sym a >! (a 1+ n) Careful! Types can become really complicated: nt : (n : N) NT (f n) Unit

Expressive power? Can define grammars with an infinite number of non-terminals: data NT : ParserType where a 1+ : N NT Unit g : Grammar Char NT g (a 1+ zero) = sym a > return unit g (a 1+ (suc n)) = sym a >! (a 1+ n) The same warning applies when defining libraries. Libraries

Conclusions Structurally recursive descent parsing. Termination guaranteed. Errors caught at compile-time. Still feels like combinator parsing. More complicated types, but the overhead for the user is usually small.

Possible future work More efficient backend. Use backend which can handle left recursion less complicated types. But the types can be nice to have anyway. And who needs left recursion? chainl is more high-level.

?

Extra slides

Defining a library: Non-terminals The non-terminals are parameterised on the outer grammar s non-terminals: data NT (nt : ParserType) : ParserType where many : Parser tok nt (false, c) r NT nt [r ] many 1 : Parser tok nt (false, c) r NT nt [r ]

Defining a library: Combinators Combinators parameterised on a lib constructor: module Combinators (lib : NT nt i r nt i r) where : Parser tok nt (false, c) r Parser tok nt [r ] p =! lib (many p) + : Parser tok nt (false, c) r Parser tok nt [r ] p + =! lib (many 1 p) library : NT nt i r Parser tok nt i r library (many p) = return [ ] p + library (many 1 p) = :: $ p < > p

Defining a library: Combinators Wrappers (to ease use of the library): module Combinators (lib : NT nt i r nt i r) where : Parser tok nt (false, c) r Parser tok nt [r ] p =! lib (many p) + : Parser tok nt (false, c) r Parser tok nt [r ] p + =! lib (many 1 p) library : NT nt i r Parser tok nt i r library (many p) = return [ ] p + library (many 1 p) = :: $ p < > p

Defining a library: Combinators Grammar (as before): module Combinators (lib : NT nt i r nt i r) where : Parser tok nt (false, c) r Parser tok nt [r ] p =! lib (many p) + : Parser tok nt (false, c) r Parser tok nt [r ] p + =! lib (many 1 p) library : NT nt i r Parser tok nt i r library (many p) = return [ ] p + library (many 1 p) = :: $ p < > p Go back