CSCE 314 TAMU Fall CSCE 314: Programming Languages Dr. Flemming Andersen. Functional Parsers

Similar documents
CSCE 314 Programming Languages. Functional Parsers

CSCE 314 Programming Languages. Monadic Parsing

CSCE 314 Programming Languages. Monadic Parsing

EECS 700 Functional Programming

Parsing. Zhenjiang Hu. May 31, June 7, June 14, All Right Reserved. National Institute of Informatics

Monadic Parsing. 1 Chapter 13 in the book Programming in Haskell (2nd ed.) by. 2 Chapters 10 and 16 in the book Real World Haskell by Bryan

CIS552: Advanced Programming

CSCE 314 Programming Languages

CSCE 314 Programming Languages

Shell CSCE 314 TAMU. Higher Order Functions

University of Utrecht. 1992; Fokker, 1995), the use of monads to structure functional programs (Wadler,

Standard prelude. Appendix A. A.1 Classes

Informatics 1 Functional Programming Lecture 5. Function properties. Don Sannella University of Edinburgh

A programming language requires two major definitions A simple one pass compiler

Building a Parser II. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Expressions. Parsing Expressions. Showing and Reading. Parsing. Parsing Numbers. Expressions. Such as. Can be modelled as a datatype

Parsing Expressions. Slides by Koen Lindström Claessen & David Sands

CSCE 314 TAMU Fall CSCE 314: Programming Languages Dr. Flemming Andersen. Haskell Functions

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

CS 209 Functional Programming

Administrativia. PA2 assigned today. WA1 assigned today. Building a Parser II. CS164 3:30-5:00 TT 10 Evans. First midterm. Grammars.

CMSC 330: Organization of Programming Languages

CSCE 314 Programming Languages. Interactive Programming: I/O

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Optimizing Finite Automata

CS 320: Concepts of Programming Languages

CMSC 330: Organization of Programming Languages. Context Free Grammars

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CMSC 330: Organization of Programming Languages. Context Free Grammars

Haske k ll An introduction to Functional functional programming using Haskell Purely Lazy Example: QuickSort in Java Example: QuickSort in Haskell

Informatics 1 Functional Programming Lectures 15 and 16. IO and Monads. Don Sannella University of Edinburgh

CPS 506 Comparative Programming Languages. Syntax Specification

CMSC 330: Organization of Programming Languages

An introduction introduction to functional functional programming programming using usin Haskell

CMSC 330: Organization of Programming Languages

Informatics 1 Functional Programming 19 Tuesday 23 November IO and Monads. Philip Wadler University of Edinburgh

ITEC2620 Introduction to Data Structures

ITEC2620 Introduction to Data Structures

Today. Assignments. Lecture Notes CPSC 326 (Spring 2019) Quiz 2. Lexer design. Syntax Analysis: Context-Free Grammars. HW2 (out, due Tues)

Lexical Analysis. Introduction

PROGRAMMING IN HASKELL. Chapter 2 - First Steps

A general introduction to Functional Programming using Haskell

Lecture 19: Functions, Types and Data Structures in Haskell

CSE302: Compiler Design

Principles of Programming Languages COMP251: Syntax and Grammars

Haskell Types, Classes, and Functions, Currying, and Polymorphism

Syntax-Directed Translation. Lecture 14

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Haskell: Lists. CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Friday, February 24, Glenn G.

INFOB3TC Solutions for the Exam

Syntax and Grammars 1 / 21

CS 314 Principles of Programming Languages

LL Parsing: A piece of cake after LR

CMSC 330: Organization of Programming Languages. Functional Programming with Lists

Properties of Regular Expressions and Finite Automata

CSE 3302 Programming Languages Lecture 2: Syntax

CMSC 330: Organization of Programming Languages. Formal Semantics of a Prog. Lang. Specifying Syntax, Semantics

2.2 Syntax Definition

A Simple Syntax-Directed Translator

Compiler Construction

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

This book is licensed under a Creative Commons Attribution 3.0 License

Dr. D.M. Akbar Hussain

Some Basic Definitions. Some Basic Definitions. Some Basic Definitions. Language Processing Systems. Syntax Analysis (Parsing) Prof.

Part I: Multiple Choice Questions (40 points)

Programming Languages Fall 2013

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

A Tour of Language Implementation

Semantic analysis and intermediate representations. Which methods / formalisms are used in the various phases during the analysis?

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

Talen en Compilers. Jurriaan Hage , period 2. November 13, Department of Information and Computing Sciences Utrecht University

Thoughts on Assignment 4 Haskell: Flow of Control

7. String Methods. Methods. Methods. Data + Functions Together. Designing count as a Function. Three String Methods 1/22/2016

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Compilers. Compiler Construction Tutorial The Front-end

Shell CSCE 314 TAMU. Haskell Functions

CSCE 314 Programming Languages

CSCE 314 TAMU Fall CSCE 314: Programming Languages Dr. Flemming Andersen. Haskell Basics

Introduction to Parsing

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Principles of Programming Languages COMP251: Syntax and Grammars

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

CMSC 330: Organization of Programming Languages. Context Free Grammars

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Lexical Analysis (ASU Ch 3, Fig 3.1)

SML A F unctional Functional Language Language Lecture 19

Principles of Programming Languages 2017W, Functional Programming

Programming Language Definition. Regular Expressions

Higher-Order Functions for Parsing

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Introduction to Parsing. Lecture 8

Haskell Introduction Lists Other Structures Data Structures. Haskell Introduction. Mark Snyder

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Compilers - Chapter 2: An introduction to syntax analysis (and a complete toy compiler)

Monads seen so far: IO vs Gen

Part 5 Program Analysis Principles and Techniques

Gujarat Technological University Sankalchand Patel College of Engineering, Visnagar B.E. Semester VII (CE) July-Nov Compiler Design (170701)

Transcription:

1 CSCE 314: Programming Languages Dr. Flemming Andersen Functional Parsers

What is a Parser? A parser is a program that takes a text (set of tokens) and determines its syntactic structure. String or [Token] Parser syntactic structure 2 3+4 Means Structure is made explicit in our data structure 2

The Parser Type In a functional language such as Haskell, parsers can naturally be viewed as functions. type Parser = String Tree A parser is a function that takes a string and returns some form of tree. However, a parser might not require all of its input string, so we also return any unused input: type Parser = String (Tree,String) A string might be parsable in many ways, including none, so we generalize to a list of results: type Parser = String [(Tree,String)] 3

4 Furthermore, a parser might not always produce a tree, so we generalize the result to a value of any type: type Parser res = String [(res,string)] Finally, a parser might take token streams of symbols instead of character streams: type TokenParser symb res = [symb] [(res,[symb])] Note: For simplicity, we will only consider parsers that either fail and return the empty list of results, or succeed and return a singleton list.

5 Approach and Rationale We are going produce code that will parse complex expressions. The code will be built (like most functional programming) by building from the bottom upwards. This requires a temporary shift in view because we look and think of parsed output from the top down. Let us analyze the following string: To find out how we can parse the syntax and extract its token elements

6 Approach and Rationale We are going produce code that will parse complex expressions. The code will be built (like most functional programming) by building from the bottom upwards. This requires a temporary shift in view because we look and think of parsed output from the top down.

7 Approach and Rationale

protocol ::= http https ftp port ::= 0 1 2 65535 host-domain-name ::= host ::= char char * host. sub-domain. domain sub-domain ::= char *. sub-domain domain ::= com edu gov 8

9 Approach and Rationale We are going produce code that will parse complex expressions. The code will be built (like most functional programming) by building from the bottom upwards. This requires a temporary shift in view because we look and think of parsed output from the top down. We start with very simple parsers When we will glue these together in two ways: One after the other: first an X then a Y afterwards Choice: either an X or a Y. Then this process can be applied recursively. protocol, domain, etc. sub-domain domain 2 25 255 2555

10 Approach and Rationale We are going to produce code that will parse complex expressions. The code will be built (like most functional programming) by building from the bottom upwards. This requires a temporary shift in view because we look and think of parsed output from the top down. High We Level start with Overview very simple parsers When we will glue these together in two ways: We want to be able to write a parser One after the other: first an X then a Y that closely resembles the grammatical structure afterwards the defines the language we re parsing. Choice: The either approach an X or described a Y. Then this process can be applied recursively. next, is involved, but it allows us to do that! protocol, domain, etc. sub-domain domain 2 25 255 2555

11 Basic Parsers (Building Blocks) The parser symboldot consumes the first character and succeeds if it is a dot, and fails otherwise: symboldot :: Parser Char :: String -> [(Char, String)] :: [Char] -> [(Char, [Char])] symboldot (x:xs) (x ==. ) = [(x, xs)] otherwise = [] Example: *Main> symboldot.com [(., com )]

12 Basic Parsers (Building Blocks) The parser item fails if the input is empty, and consumes the first character otherwise: item :: Parser Char :: String -> [(Char, String)] :: [Char] -> [(Char, [Char])] item = \inp -> case inp of [] -> [] (x:xs) -> [(x,xs)] Example: *Main> item "parse this" [('p',"arse this")]

13 The parser return v always succeeds, returning the value v without consuming any input: return :: a -> Parser a return v = \inp -> [(v,inp)] The parser failure always fails: failure :: Parser a failure = \inp -> [] Example: *Main> Main.return 7 "parse this" [(7,"parse this")] *Main> failure "parse this" []

14 We can make it more explicit by letting the function parse apply a parser to a string: parse :: Parser a String [(a,string)] parse p inp = p inp -- essentially id function Example: *Main> parse item "parse this" [('p',"arse this")]

Choice What if we have to backtrack? First try to parse p, then q? The parser p +++ q behaves as the parser p if p succeeds, and as the parser q otherwise. (+++) :: Parser a -> Parser a -> Parser a p +++ q = \inp -> case p inp of [] -> parse q inp [(v,out)] -> [(v,out)] Example: *Main> parse failure "abc" [] *Main> parse (failure +++ item) "abc" [('a',"bc")] 15

16 Examples > parse item "" [] > parse item "abc" [('a',"bc")] > parse failure "abc" [] > parse (return 1) "abc" [(1,"abc")] > parse (item +++ return 'd') "abc" [('a',"bc")] > parse (failure +++ return 'd') "abc" [('d',"abc")]

17 Note: The library file Parsing.hs is available as a link on the course webpage in the lecture table here. The Parser type is a monad, a mathematical structure that has proved useful for modeling many different kinds of computations.

Sequencing Often we want to sequence parsers, e.g., the following grammar: if-stmt ::= if ( expr ) then stmt First parse if, then (, then expr, A sequence of parsers can be combined as a single composite parser using the keyword do. For example: p :: Parser (Char,Char) p = do x item item y item return (x,y) Meaning: The value of x is generated by the item parser. 18

19 Note: Each parser must begin in precisely the same column. That is, the layout rule applies. The values returned by intermediate parsers are discarded by default, but if required can be named using the operator. The value returned by the last parser is the value returned by the sequence as a whole.

20 If any parser in a sequence of parsers fails, then the sequence as a whole fails. For example: > parse p "abcdef" [(( a, c ),"def")] > parse p "ab" [] The do notation is not specific to the Parser type, but can be used with any monadic type.

The Monadic Way Parser sequencing operator (>>=) :: Parser a -> (a -> Parser b) -> Parser b p >>= f = \inp -> case parse p inp of [] -> [] [(v, out)] -> parse (f v) out p >>= f fails if p fails Example otherwise applies f to the result of p this results in a new parser, which is then applied > parse ((failure +++ item) >>= (\_ -> item)) "abc" [('b',"c")] 21

22 Sequencing Typical parser structure p1 >>= \v1 -> p2 >>= \v2 ->... pn >>= \vn -> return (f v1 v2... vn) Using do notation do v1 <- p1 v2 <- p2... vn <- pn return (f v1 v2... vn) If some vi is not needed, vi <- pi can be written as pi, which corresponds to pi >>= \_ ->...

23 Example Typical parser structure rev3 = item >>= \v1 -> item >>= \v2 -> item >>= \_ -> item >>= \v3 -> return $ reverse (v1:v2:v3:[]) Using do notation rev3 = do v1 <- item v2 <- item item v3 <- item return $ reverse (v1:v2:v3:[]) > rev3 abcdef [( dba, ef )] > (rev3 >>= (\_ -> item)) abcde [( e, )] > (rev3 >>= (\_ -> item)) abcd []

24 Key benefit: The result of first parse is available for the subsequent parsers parse (item >>= (\x -> item >>= (\y -> return (y:[x])))) ab [( ba, )]

Derived Primitives Parsing a character that satisfies a predicate: sat :: (Char -> Bool) -> Parser Char sat p = do x <- item if p x then return x else failure Examples > parse (sat (== a )) abc [( a, bc )] > parse (sat (== b )) abc [] > parse (sat islower) abc [( a, bc )] > parse (sat isupper) abc [] 25

Derived Parsers from sat digit, letter, alphanum :: Parser Char digit = sat isdigit letter = sat isalpha alphanum = sat isalphanum lower, upper :: Parser Char lower = sat islower upper = sat isupper char :: Char Parser Char char x = sat (== x) 26

27 To accept a particular string Use sequencing recursively: string :: String -> Parser String string [] = return [] string (x:xs) = do char x string xs return (x:xs) Entire parse fails if any of the recursive calls fail > parse (string "if [") "if (a<b) return;" [] > parse (string "if (") "if (a<b) return;" [("if (","a<b) return;")]

28 many applies the same parser many times many :: Parser a -> Parser [a] many p = many1 p +++ return [] many1 :: Parser a -> Parser [a] many1 p = do v <- p vs <- many p return (v:vs) Examples > parse (many digit) "123ab" [("123","ab")] > parse (many digit) "ab123ab" [("","ab123ab")] > parse (many alphanum) "ab123ab" [("ab123ab","")]

Example We can now define a parser that consumes a list of one or more digits of correct format from a string: p :: Parser String p = do char '[' d digit ds many (do char ',' digit) char ']' return (d:ds) > parse p "[1,2,3,4]" [("1234","")] > parse p "[1,2,3,4" [] Note: More sophisticated parsing libraries can indicate and/or recover from errors in the input string. 29

30 Arithmetic Expressions Consider a simple form of expressions built up from single digits using the operations of addition + and multiplication *, together with parentheses. We also assume that: * and + associate to the right. * has higher priority than +.

Example: Parsing a token space :: Parser () space = many (sat isspace) >> return () token :: Parser a -> Parser a token p = space >> p >>= \v -> space >> return v identifier :: Parser String identifier = token ident ident :: Parser String ident = sat islower >>= \x -> many (sat isalphanum) >>= \xs -> return (x:xs) 31

32 Formally, the syntax of such expressions is defined by the following context free grammar: expr term '+' expr term term factor '*' term factor factor digit '(' expr ')' digit '0' '1' '9'

33 However, for reasons of efficiency, it is important to factorize the rules for expr and term: expr term ('+' expr ε) term factor ('*' term ε) Note: The symbol ε denotes the empty string.

34 It is now easy to translate the grammar into a parser that evaluates expressions, by simply rewriting the grammar rules using the parsing primitives. That is, we have: expr :: Parser Int expr = do t term do char '+' e expr return (t + e) +++ return t expr term ('+' expr ε) term factor ('*' term ε)

35 term :: Parser Int term = do f factor do char '*' t term return (f * t) +++ return f expr term ('+' expr ε) term factor ('*' term ε) factor :: Parser Int factor = do d digit return (digittoint d) +++ do char '(' e expr char ')' return e

36 Finally, if we define eval :: String Int eval xs = fst (head (parse expr xs)) then we try out some examples: > eval "2*3+4" 10 > eval "2*(3+4)" 14 > eval "2+5-" 7 > eval "+5-" *** Exception: Prelude.head: empty list