EECS 700 Functional Programming Dr. Andy Gill University of Kansas February 16, 2010 1 / 41
Parsing A parser is a program that analyses a piece of text to determine its syntactic structure. The expression 1 + 2 * 3 means + 1 2 3 2 / 41
The Parser Type In a functional language such as Haskell, parsers can naturally be viewed as functions. type Parser = String -> Tree A parser is a function that takes a string and returns some form of tree. Tree could be defined data Tree = Val Int Add Tree Tree Mul Tree Tree 3 / 41
However, a parser might not require all of its input string, so we also return any unused input: type Parser = String -> (Tree,String) A string might not be possible to parse, so we generalize to an optional result: type Parser = String -> Maybe (Tree,String) Finally, a parser might not always produce a tree, so we generalize to a value of any type: type Parser a = String -> Maybe (a,string) 4 / 41
Basic Parsers The parser item fails if the input is empty, and consumes the first character otherwise: item :: Parser Char item = \ inp -> case inp of [] -> Nothing (x:xs) -> Just (x,xs) 5 / 41
Basic Parsers (2) The parser failure always fails: failure :: Parser a failure = \ inp -> Nothing 6 / 41
Basic Parsers (3) The parser return v always succeeds, returning the value v without consuming any input: return :: a -> Parser a return v = \ inp -> Just (v,inp) 7 / 41
The parser p +++ q behaves as the parser p if it succeeds, and as the parser q otherwise: (+++) :: Parser a -> Parser a -> Parser a p +++ q = \ inp -> case p inp of Nothing -> q inp Just (v,out) -> Just (v,out) The function parse applies a parser to a string: parse :: Parser a -> String -> Maybe (a,string) parse p inp = p inp 8 / 41
Examples The behavior of the five parsing primitives can be illustrated with some simple examples: > parse item "" Nothing > parse item "abc" Just ( a,"bc") 9 / 41
> parse failure "abc" Nothing > parse (return 1) "abc" Just (1,"abc") > parse (item +++ return d ) "abc" Just ( a,"bc") > parse (failure +++ return d ) "abc" Just ( d,"abc") 10 / 41
Sequencing parsers Can a sequence of parsers can be combined as a single composite parser using the keyword do? p :: Parser (Char,Char) p = do x <- item item y <- item return (x,y) Yes, if we can figure out how to write >>= 11 / 41
Monads Both IO and Parser are Monads. Do not be scared by the name. There are many more Monads. All monads have the type XYZ a. All monads have a way of building a side-effect free XYZ a from an a, called return. All monads have a way of sequencing XYZ, called >>= Thats all there is! 12 / 41
Sequencing parsers (>>=) :: Parser a -> (a -> Parser b) -> Parser b p1 >>= p2 = \ inp -> case p1 inp of Nothing -> Nothing Just (v,out) -> p2 v out If the first parser fails, then fail, else run the second parser, using the result of the first parser. 13 / 41
Sequencing parsers If we can tell do to use our new >>=, then we can write p :: Parser (Char,Char) p = do x <- item item y <- item return (x,y) > parse p "abcdef" Just ((a,c),"def") > parse p "ab" Nothing Alas, this overloading only words for data, and Parser is not data. 14 / 41
Abstract Parser Our solution is put the Parser function inside a data constructor. data Parser a = P (String -> Maybe (a,string)) -- Deconstructs a Parser to its function runp :: Parser a -> (String -> Maybe (a,string)) runp (P p) = p We need to rewrite our (small number of) primitives to use this box. There is always a tension between using type, and making things simple and concrete, and using data, complicating things slightly, but using the abstraction. Abstractions keep you honest! 15 / 41
item :: Parser Char item = P (\ inp -> case inp of [] -> Nothing (x:xs) -> Just (x,xs)) failure :: Parser a failure = P (\ inp -> Nothing) (+++) :: Parser a -> Parser a -> Parser a p +++ q = P (\ inp -> case runp p inp of Nothing -> runp q inp Just (v,out) -> Just (v,out)) parse :: Parser a -> String -> Maybe (a,string) parse p = \ inp -> runp p inp 16 / 41
The keyword instance is used to tell the compiler that Parser is a Monad. instance Monad Parser where -- (>>=) :: Parser a -> (a -> Parser b) -> Parser b p1 >>= p2 = P (\ inp -> case runp p1 inp of Nothing -> Nothing Just (v,out) -> runp (p2 v) out) -- return :: a -> Parser a return v = P (\ inp -> Just (v,inp)) 17 / 41
Sequencing parsers With this instance in place, we can use the do notation p :: Parser (Char,Char) p = do x <- item item y <- item return (x,y) > parse p "abcdef" Just ((a,c),"def") > parse p "ab" Nothing 18 / 41
Derived Primitives Parsing a character that satisfies a predicate: sat :: (Char -> Bool) -> Parser Char sat p = do x <- item if p x then return x else failure We are building on our primitives, and the Parser abstraction is being used transparently. 19 / 41
Parsing a digit and specific characters: digit :: Parser Char digit = sat isdigit char :: Char -> Parser Char char x = sat (x ==) 20 / 41
Applying a parser zero or more times: many :: Parser a -> Parser [a] many p = many1 p +++ return [] Applying a parser one or more times: many1 :: Parser a -> Parser [a] many1 p = do v <- p vs <- many p return (v:vs) 21 / 41
Finally, we can now define a parser that consumes a list of one or more digits from a string: p :: Parser String p = do char [ d <- digit ds <- many (do char, digit) char ] return (d:ds) 22 / 41
For example: Note: > parse p "[1,2,3,4]" [("1234","")] > parse p "[1,2,3,4" Nothing More sophisticated parsing libraries can indicate and/or recover from errors in the input string. 23 / 41
Arithmetic Expressions Consider a simple form of expressions built up from single digits using the operations of addition + and multiplication *, together with parentheses. We also assume that: * and + associate to the right; * has higher priority than +. 24 / 41
Formally, the syntax of such expressions is defined by the following context free grammar: expr term + expr term term factor term factor factor digit ( expr ) digit 0 1... 9 25 / 41
However, for reasons of efficiency, it is important to factorize the rules for expr and term: Note: expr term ( + expr ɛ ) term factor ( term ɛ ) factor digit ( expr ) digit 0 1... 9 The symbol ɛ denotes the empty string. 26 / 41
It is now easy to translate the grammar into a parser that evaluates expressions, by simply rewriting the grammar rules using the parsing primitives. That is, we have: expr :: Parser Int expr = do t <- term do char + e <- expr return (t + e) +++ return t 27 / 41
term :: Parser Int term = do f <- factor do char * t <- term return (f * t) +++ return f factor :: Parser Int factor = do d <- digit return (digittoint d) +++ do char ( e <- expr char ) return e 28 / 41
Finally, if we define eval :: String -> Int eval xs = case parse expr xs of Just (v,_) -> v Nothing -> error "failed" then we try out some examples: > eval "2*3+4" 10 > eval "2*(3+4)" 14 > eval "1(2)" Error 29 / 41
Monads Both IO and Parser are Monads. Do not be scared by the name. There are many more Monads. All monads have the type XYZ a. All monads have a way of building a side-effect free XYZ a from an a, called return. All monads have a way of sequencing XYZ, called bind Thats all there is! 30 / 41
Monads Both IO and Parser are Monads. Do not be scared by the name. There are many more Monads. All monads have the type XYZ a. All monads have a way of building a side-effect free XYZ a from an a, called return. All monads have a way of sequencing XYZ, called bind Thats all there is! 31 / 41
Monads Laws How do we enforce sequencing? Right identity m >>= (\ a -> return a) = m do a <- m return a do a <- m return a = do m = m 32 / 41
Monads Laws How do we enforce sequencing? Left identity return v >>= (\ a ->...) = let a = v in... do a <- return v = do let a = v...... do a <- return v = let a = v... in do... 33 / 41
Monads Laws How do we enforce sequencing? Associativity (m >>= f) >>= g = m >>= (\ x -> f x >>= g) do y <- do x <- m = do x <- m f x do y <- f x...... do y <- do x <- m = do x <- m f x y <- f x...... 34 / 41
Monads Laws in Action f :: IO Char f = do a <- getchar return a By right identity f :: IO Char f = getchar 35 / 41
Monads Laws in Action g :: IO () g = do a <- return 4 print a By left identity OR g :: IO () g = do let a = 4 print a g :: IO () g = let a = 4 in print a 36 / 41
Monads Laws in Action h :: IO () h = do y <- do x <- getchar return (toupper x) print y By associativity h :: IO () h = do x <- getchar y <- return (toupper x) print y 37 / 41
Exception Monads instance Monad Maybe where -- return :: a -> Maybe a return a = Just a -- (>>=) :: Maybe a -- -> (a -> Maybe b) -- -> Maybe b m >>= k = case m of Nothing -> Nothing Just v -> k v -- fail :: String -> Maybe a fail msg = Nothing 38 / 41
Exception Monad Example gooddiv :: Int -> Int -> Maybe Int gooddiv x y y == 0 = fail "div by zero" otherwise = return (x div y) silly :: Int -> Int -> Int -> Maybe Int silly a b c = do v1 <- 32 gooddiv a v2 <- 16 gooddiv b v3 <- 9 gooddiv c return (a + b + c) 39 / 41
catch :: Maybe a -> Maybe a -> Maybe a catch (Just a) _ = Just a catch Nothing other = other -- Alternate way of writing catch catch :: Maybe a -> Maybe a -> Maybe a catch good@(just _) _ = good catch Nothing other = other runmaybe :: Maybe a -> a runmaybe Nothing = error "Nothing" runmaybe (Just v) = v 40 / 41
> runmaybe (30 gooddiv 9) 3 > runmaybe (30 gooddiv 0) Error > runmaybe ((30 gooddiv 0) catch return 1234) 1234 41 / 41