Parser Combinators 11/3/2003 IPT, ICS 1
Parser combinator library Similar to those from Grammars & Parsing But more efficient, self-analysing error recovery 11/3/2003 IPT, ICS 2
Basic combinators Similar to (E)BNF EBNF 's' x y x y x? e combinator psym 's' x < > y x <*> y x `opt` () psucceed () usage symbol alternatives sequence optional empty 11/3/2003 IPT, ICS 3
Symbol > parse (psym 'a') "a" ('a',"") symbol > parse (psym 'a') "b" ('a'," Deleted : 'b' before eof\n Inserted: 'a' before eof\n") error recovery t p inp -- test parser p on input inp = do let (res, msgs) = parse p inp putstr (if null msgs then "" else "Errors:\n" ++ msgs) putstr ("\n"++show(res)) > t (psym 'a') "b" Errors: Deleted : 'b' before eof Inserted: 'a' before eof 'a' 11/3/2003 IPT, ICS 4
Empty & optional > t (psucceed 'a') "a" Errors: Not used: 'a' 'a' empty > t (psym 'a' `opt` 'b') "a" 'a' > t (psym 'a' `opt` 'b') "c" Errors: Not used: 'c' 'b' optional 11/3/2003 IPT, ICS 5
Sequence t ( psucceed (\a b -> [b]++[a]) <*> psym 'a' <*> psym 'b' ) "ab" "ba" sequence f <$> p = psucceed f <*> p t ( (\a b -> [b]++[a]) <$> psym 'a' <*> psym 'b' ) "ab" "ba" application 11/3/2003 IPT, ICS 6
Alternative & range > t (psym 'a' < > psym 'b') "a" 'a' > t (psym 'a' < > psym 'b') "b" 'b' > t (psym 'a' < > psym 'b') "c" Errors: Deleted : 'c' before eof Inserted: 'a' before eof 'a' alternative > t (panysym "ab") "a" > t (panysym "ab") "b" pany f l = foldr1 (< >) (map f l) panysym l = pany psym l > t ('a' <..> 'b') "a" range 11/3/2003 IPT, ICS 7
Derived parsers Created by combining basic parsers (hence the name combinators) pfoldr plist pchainr x * x * x (y x)* sequence of x with result folding sequence of x result is list sequence of x, separated by y 11/3/2003 IPT, ICS 8
Repetition > t (pfoldr ((+),0) ( (\x -> ord x - ord '0') <$> '0' <..> '9' ) ) "34521" 15 sequence folding plist p = pfoldr ((:),[]) p t ( foldr (+) 0. map (\x -> ord x - ord '0') <$> plist ('0' <..> '9') ) "34521" t ( foldr (+) 0 <$> plist ( (\x -> ord x - ord '0') <$> '0' <..> '9' ) ) "34521" sequence listing 11/3/2003 IPT, ICS 9
Chain t (pchainr ( (+) <$ psym '+' < > (-) <$ psym '-' ) ( (\x -> ord x - ord '0') <$> '0' <..> '9' ) ) "3+4-5+2-1" 1 chaining Evaluates 3+(4-(5+(2-1))) 11/3/2003 IPT, ICS 10
Ambiguity and greediness t ((,) <$> plist (psym 'a') <*> plist (psym 'a') ) "aaaa" ("aaaa","") the 1st or 2nd? greedy: 1st takes it all t ((,) <$> plist_ng (psym 'a') <*> plist (psym 'a') ) "aaaa" (""," aaaa ") non greedy variant 11/3/2003 IPT, ICS 11
Example: expression parser module Expr where import UU_Parsing_Core import UU_Parsing_Derived instance Symbol Char pparens p = psym '(' *> p <* psym ')' pdigit = (\d -> ord d - ord '0') <$> panysym ['0'..'9'] pnat = foldl (\a b -> a*10 + b) 0 <$> plist1 pdigit pfact = pnat < > pparens pexpr pterm = pchainl ((*) <$ psym '*' < > div <$ psym '/' ) pfact pexpr = pchainl ((+) <$ psym '+' < > (-) <$ psym '-' ) pterm on :: Show a => Parser Char a -> [Char] -> IO () on p inp -- run parser p on input inp = do let (res, msgs) = parse p inp putstr (if null msgs then "" else "Errors:\n" ++ show msgs) putstr ("\n" ++ show res ++ "\n") main :: IO () main = do putstr "Enter expression: " inp <- getline pexpr `on` inp main 11/3/2003 IPT, ICS 12
Why use parser combinators? Reasons not to use because Haskell is so weird everybody elsewhere uses (e.g.) Java Reason(s) to use parser combinators are simple compared to... 11/3/2003 IPT, ICS 13
Example using JavaCC JavaCC generates Java source code from grammar specification javacc Expr javac *java java Expr JavaCC used for SUN s Java compiler 11/3/2003 IPT, ICS 14
Example: Expr.jj PARSER_BEGIN(Expr) public class Expr static int total; static java.util.stack argstack = new java.util.stack(); SKIP : " " "\r" "\t" public static void main(string args[]) throws ParseException Expr parser = new Expr(System.in); TOKEN : while (true) System.out.print("Enter Expression: "); < EOL: "\n" > System.out.flush(); try switch (parser.one_line()) TOKEN : /* OPERATORS */ case -1: System.exit(0); < PLUS: "+" > case 0: < MINUS: "-" > break; < MULTIPLY: "*" > case 1: < DIVIDE: "/" > Object x = argstack.pop(); System.out.println("Evaluation result = " + x.tostring()); break; TOKEN : catch (ParseException x) < CONSTANT: ( <DIGIT> )+ > System.out.println("Exiting."); < #DIGIT: ["0" - "9"] > throw x; 11/3/2003 IPT, ICS 15 PARSER_END(Expr)
Example: Expr.jj int one_line() : expr() <EOL> return 1; <EOL> return 0; <EOF> return -1; void expr() : Token x; term() ( ( x = <PLUS> x = <MINUS> ) term() int a = ((Integer) argstack.pop()).intvalue(); int b = ((Integer) argstack.pop()).intvalue(); if ( x.kind == PLUS ) argstack.push(new Integer(b + a)); else argstack.push(new Integer(b - a)); )* void term() : Token x; factor() ( ( x = <MULTIPLY> x = <DIVIDE> ) factor() int a = ((Integer) argstack.pop()).intvalue(); int b = ((Integer) argstack.pop()).intvalue(); if ( x.kind == MULTIPLY ) argstack.push(new Integer(b * a)); else argstack.push(new Integer(b / a)); )* void factor() : <CONSTANT> try int x = Integer.parseInt(token.image); argstack.push(new Integer(x)); catch (NumberFormatException ee) argstack.push(new Integer(0)); 11/3/2003 "(" expr() ")" IPT, ICS 16