HW8 Use Lex/Yacc to Turn this: Into this: Lex and Yacc. Lex / Yacc History. A Quick Tour. if myvar == 6.02e23**2 then f(..!

Similar documents
Lex and Yacc. A Quick Tour

Lex and Yacc. More Details

Syntax Analysis Part IV

Lexical and Syntax Analysis

Syntax Analysis The Parser Generator (BYacc/J)

Using an LALR(1) Parser Generator

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process.

Introduction to Lex & Yacc. (flex & bison)

Lex & Yacc (GNU distribution - flex & bison) Jeonghwan Park

Introduction to Yacc. General Description Input file Output files Parsing conflicts Pseudovariables Examples. Principles of Compilers - 16/03/2006

Yacc: A Syntactic Analysers Generator

PRACTICAL CLASS: Flex & Bison

Compiler Lab. Introduction to tools Lex and Yacc

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process

Lexical and Parser Tools

Compiler Design 1. Yacc/Bison. Goutam Biswas. Lect 8

Principles of Programming Languages

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

TDDD55 - Compilers and Interpreters Lesson 3

CSE302: Compiler Design

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

LECTURE 11. Semantic Analysis and Yacc

CS143 Handout 12 Summer 2011 July 1 st, 2011 Introduction to bison

Chapter 3 Lexical Analysis

Lex & Yacc. By H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Compiler course. Chapter 3 Lexical Analysis

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

Lex & Yacc. by H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Yacc Yet Another Compiler Compiler

An Introduction to LEX and YACC. SYSC Programming Languages

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Syntax-Directed Translation

CSC 467 Lecture 3: Regular Expressions

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

Topic 5: Syntax Analysis III

CSCI Compiler Design

JFlex Regular Expressions

Preparing for the ACW Languages & Compilers

As we have seen, token attribute values are supplied via yylval, as in. More on Yacc s value stack

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

CS664 Compiler Theory and Design LIU 1 of 16 ANTLR. Christopher League* 17 February Figure 1: ANTLR plugin installer

JFlex. Lecture 16 Section 3.5, JFlex Manual. Robb T. Koether. Hampden-Sydney College. Mon, Feb 23, 2015

Etienne Bernard eb/textes/minimanlexyacc-english.html

Syntax Errors; Static Semantics

Compil M1 : Front-End

Syntax Analysis Part VIII

Ray Pereda Unicon Technical Report UTR-03. February 25, Abstract

Gechstudentszone.wordpress.com

Chapter 3 -- Scanner (Lexical Analyzer)

COP 3402 Systems Software Syntax Analysis (Parser)

Yacc. Generator of LALR(1) parsers. YACC = Yet Another Compiler Compiler symptom of two facts: Compiler. Compiler. Parser

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Lab 2. Lexing and Parsing with Flex and Bison - 2 labs

Chapter 3: Lexing and Parsing

LECTURE 7. Lex and Intro to Parsing

Lexical Analyzer Scanner

CSE 3302 Programming Languages Lecture 2: Syntax

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Parsing. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Automatic Scanning and Parsing using LEX and YACC

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Compiler Construction

Compiler Construction

A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

TDDD55- Compilers and Interpreters Lesson 3

Lexical Analyzer Scanner

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

POLITECNICO DI TORINO. Formal Languages and Compilers. Laboratory N 1. Laboratory N 1. Languages?

B The SLLGEN Parsing System

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

Formal Languages and Compilers

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Context-free grammars

Parser Tools: lex and yacc-style Parsing

Introduction to Compiler Design

Module 8 - Lexical Analyzer Generator. 8.1 Need for a Tool. 8.2 Lexical Analyzer Generator Tool

Parser Tools: lex and yacc-style Parsing

Parsing How parser works?

An introduction to Flex

Compiler Construction: Parsing

A Bison Manual. You build a text file of the production (format in the next section); traditionally this file ends in.y, although bison doesn t care.

Languages and Compilers

Lesson 10. CDT301 Compiler Theory, Spring 2011 Teacher: Linus Källberg

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

CS 403: Scanning and Parsing

Do not write in this area EC TOTAL. Maximum possible points:

Lex Spec Example. Int installid() {/* code to put id lexeme into string table*/}

CS143 Handout 04 Summer 2011 June 22, 2011 flex In A Nutshell

Lexical Analysis - Flex

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

Chapter 2: Syntax Directed Translation and YACC

LEX/Flex Scanner Generator

Building Compilers with Phoenix

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

Applications of Context-Free Grammars (CFG)

Transcription:

Lex and Yacc A Quick Tour HW8 Use Lex/Yacc to Turn this: Into this: <P> Here's a list: <UL> <LI> This is item one of a list <LI>This is item two. Lists should be indented four spaces, with each item marked by a "*" two spaces left of fourspace margin. Lists may contain nested lists, like this:<ul><li> Hi, I'm item one of an inner list. <LI>Me two. <LI> Item 3, inner. </UL><LI> Item 3, outer list.</ul> This is outside both lists; should be back to no indent. <P><P> Final suggestions Here's a list: * This is item one of a list * This is item two. Lists should be indented four spaces, with each item marked by a "*" two spaces left of four-space margin. Lists may contain nested lists, like this: * Hi, I'm item one of an inner list. * Me two. * Item 3, inner. * Item 3, outer list. This is outside both lists; should be back to no indent. Final suggestions: 2 if myvar == 6.02e23**2 then f(..! char stream LEX token stream if myvar == 6.02e23**2 then f(! tokenstream YACC parse tree if-stmt == fun call var ** Arg 1 Arg 2 Lex / Yacc History! Origin early 1970 s at Bell Labs! Many versions & many similar tools! Lex, flex, jflex, posix,! Yacc, bison, byacc, CUP, posix,! Targets C, C++, C#, Python, Ruby, ML,! We ll use jflex & byacc/j, targeting java (but for simplicity, I usually just say lex/yacc) float-lit int-lit...! 3 4

Uses Lex: A Lexical Analyzer Generator! Front end of many real compilers! E.g., gcc! Little languages :! Many special purpose utilities evolve some clumsy, ad hoc, syntax! Often easier, simpler, cleaner and more flexible to use lex/yacc or similar tools from the start! Input:! Regular exprs defining "tokens"! Fragments of declarations & code! Output:! A java program yylex.java! Use:! Compile & link with your main()! Calls to yylex() read chars & return successive tokens. my.flex jflex yylex.java 5 7 yacc: A Parser Generator! Input:! A context-free grammar! Fragments of declarations & code! Output:! A java program & some header files! Use:! Compile & link it with your main()! Call yyparse() to parse the entire input ParserVal.java! yyparse() calls yylex() to get successive tokens my.y byaccj Parser.java 9 %: Lex section delims Rules/ regexps + {Actions} Lex Input: "mylexer.flex" // java stuff %byaccj %{ Declarations & code: most copied verbatim to java pgm public foo() %} [a-za-z]+ Token code {foo(); return(42); } [ \t\n] {; /* skip whitespace */} No action 11

Lex Regular Expressions! Letters & numbers match themselves! Ditto \n, \t, \r! Punctuation often has special meaning! But can be escaped: \* matches *! Union, Concatenation and Star! r s, rs, r*; also r+, r?; parens for grouping! Character groups! [ab*c] == [*cab], [a-z2648aeiou], [^abc]! ^ for not only in char groups, not complementation! 12 Java decls Yacc decls Rules and {Actions} Java code Yacc Input: expr.y S! E E! E+n E-n n %{ import java.io.*; %} Parser.java %token NUM VAR Parser.java stmt: exp { printf( %d\n,$1);} ; exp : exp + NUM { $$ = $1 + $3; } exp - NUM { $$ = $1 - $3; } NUM { $$ = $1; } ; C code; java ex later public static void main( Parser.java 14 Expression lexer: expr.l Lex/Yacc Interface: Compile Time %{ #include "y.tab.h" %} y.tab.h: #define NUM 258 #define VAR 259 #define YYSTYPE int extern YYSTYPE yylval; [0-9]+ { yylval = atoi(yytext); return NUM;} [ \t] { /* ignore whitespace */ } \n { return 0; /* logical EOF */ }. { return yytext[0]; /* +-*, etc. */ } yyerror(char *msg){printf("%s,%s\n",msg,yytext);} int yywrap(){return 1;} my.y byaccj Parser.java ParserVal.java javac Parser.class my.flex jflex Yylex.java more.java 15 17

Lex/Yacc Interface: Run Time Parser Value class Token code main() yyparse() yylex() Myaction:... yylval =...... return(code) Token value yylval 18 public class ParserVal { public int ival; public double dval; public String sval; public Object obj; public ParserVal(int val) { ival=val; } public ParserVal(double val) { dval=val; } public ParserVal(String val) { sval=val; } public ParserVal(Object val) { obj=val; } }//end class! //then do! yylval = new ParserVal(3.14); yylval = new ParserVal(42); //...or something like... yylval = new ParserVal(new mytypeofobject()); // in yacc actions, e.g.:! $$.ival = $1.ival + $2.ival; $$.dval = $1.dval - $2.dval;! 20 Token names & types Nonterm names & types Start sym More Yacc Declarations %token BHTML BHEAD BTITLE BBODY P BR LI %token EHTML EHEAD ETITLE EBODY %token <sval> TEXT Type of yylval (if any) %type <obj> page head title %type <obj> words list item items %start page Calculator example From http://byaccj.sourceforge.net/ On this & next 3 slides, some details may be %{! import java.lang.math;! import java.io.*;! import java.util.stringtokenizer;! % missing or wrong, but the big picture is OK /* YACC Declarations; mainly op prec & assoc */! %token NUM! %left '-' '+! %left '*' '/! %left NEG /* negation--unary minus */! %right '^' /* exponentiation */! /* Grammar follows */!!...! 22 25

...! /* Grammar follows */!! input: /* empty string */! input line! ;! line: \n! exp \n { System.out.println(" + $1.dval + " "); ;! exp: NUM!{ $$ = $1; }! exp '+' exp!{ $$ = new ParserVal($1.dval + $3.dval); exp '-' exp!{ $$ = new ParserVal($1.dval - $3.dval); exp '*' exp!{ $$ = new ParserVal($1.dval * $3.dval); exp '/' exp!{ $$ = new ParserVal($1.dval / $3.dval); '-' exp %prec NEG!{ $$ = new ParserVal(-$2.dval); exp '^' exp!{ $$=new ParserVal(Math.pow($1.dval, $3.dval)); '(' exp ')'!{ $$ = $2; ;!!...! input is one expression per line; output is its value Ambiguous grammar; prec/assoc decls are a (smart) hack to fix that. 26 token code via return! String ins;! StringTokenizer st;! void yyerror(string s){! System.out.println("par:"+s);! boolean newline;! int yylex(){! String s; int tok; Double d;! if (!st.hasmoretokens())! NOT using lex; barehanded lexer with same interface if (!newline) {! newline=true;! return \n'; //As in classic YACC example! } else return 0;! s = st.nexttoken();! value via yylval try {! d = Double.valueOf(s); /*this may fail*/! yylval = new ParserVal(d.doubleValue());! tok = NUM; }! catch (Exception e) {! See slide 20 tok = s.charat(0);/*if not float, return char*/! } return tok;! 27 void dotest(){! BufferedReader in = new BufferedReader(new InputStreamReader(System.in));! System.out.println("BYACC/J Calculator Demo");! System.out.println("Note: Since this example uses the StringTokenizer");! System.out.println("for simplicity, you will need to separate the items");! System.out.println("with spaces, i.e.: '( 3 + 5 ) * 2'");! while (true) { System.out.print("expression:");! try {! ins = in.readline();! }! catch (Exception e) { }! st = new StringTokenizer(ins);! newline=false;! yyparse();! Lex and Yacc More Details public static void main(string args[]){! Parser par = new Parser(false);! par.dotest();! 28

# set following 3 lines to the relevant paths on your system JFLEX = ~ruzzo/src/jflex-1.4.3/jflex-1.4.3/bin/jflex BYACCJ = ~ruzzo/src/byaccj/yacc.macosx JAVAC = javac LEXDEBUG = 0 # set to 1 for token dump # targets: run: Parser.class java Parser $(LEXDEBUG) test.ratml Parser.class: Yylex.java Parser.java Makefile test.ratml $(JAVAC) Parser.java Yylex.java: jratml.flex $(JFLEX) jratml.flex Parser.java: jratml.y $(BYACCJ) -J jratml.y clean: rm -f *~ *.class *.java Makefile: Not required, but convenient General form A: B C! (tab) D! Means A depends on B & C and is built by running D Parser states Not exactly elements of PDA s Q, but similar A yacc "state" is a set of "dotted rules" rules in G with a "dot (or _ ) somewhere in the right hand side. In a state, "A! "_#" means this rule, up to and including " is consistent with input seen so far; next terminal in the input must derive from the left end of some such #. E.g., before reading any input, "S! _ #" is consistent, for every rule S! # " (S = start symbol) Yacc deduces legal shift/goto actions from terminals/ nonterminals following dot; reduce actions from rules with dot at rightmost end. See examples below 31 32 State Diagram (partial) 0 $accept : S $end! 1 S : 'a' 'b' C 'd'! 2 'a' 'e' F 'g'! 3 C : 'h C! 4 'h'! 5 F : 'h' F! 6 'h'! h state 5! C : 'h'. C! C : 'h'.! 'h' shift 5! 'd' reduce 4! C goto 9! C state 9! C : 'h' C.!. reduce 3! h state 0! $acc :. S $end! S :. 'a' 'b' C 'd'! S :. 'a' 'e' F 'g'! 'a' shift 1! S goto 2! state 3! S : 'a' 'b'. C 'd' (1)! 'h' shift 5! C goto 6! state 1! S : 'a'. 'b' C 'd (1)! S : 'a'. 'e' F 'g (2)! state 6! S : 'a' 'b' C. 'd' (1)! 'd' shift 10! a 'b' shift 3! 'e' shift 4! b C d e S state 4! S : 'a' 'e'. F 'g' (2)! 'h' shift 7! F goto 8! state 2! $acc : S. $end! state 10! S : 'a' 'b' C 'd'. (1)!. reduce 1! $end accept! $end accept! 33 Yacc Output: Same Example# 0 $accept : S $end! 1 S : 'a' 'b' C 'd'! 2 'a' 'e' F 'g'! 3 C : 'h' C! 4 'h'! 5 F : 'h' F! 6 'h'! state 0! $accept :. S $end (0)! 'a' shift 1! S goto 2! state 1! S : 'a'. 'b' C 'd' (1)! S : 'a'. 'e' F 'g' (2)! 'b' shift 3! 'e' shift 4! state 2! $accept : S. $end (0)! $end accept! state 3! S : 'a' 'b'. C 'd' (1)! 'h' shift 5! C goto 6! F goto 11! state 4! state 8! S : 'a' 'e'. F 'g' (2)! S : 'a' 'e' F. 'g' (2)! 'h' shift 7! F goto 8! state 5! C : 'h'. C (3)! C : 'h'. (4)! 'h' shift 5! 'd' reduce 4! C goto 9! state 6! S : 'a' 'b' C. 'd' (1)! 'd' shift 10! state 7! F : 'h'. F (5)! F : 'h'. (6)! 'h' shift 7! 'g' reduce 6! 'g' shift 12! state 9! C : 'h' C. (3)!. reduce 3! state 10! S : 'a' 'b' C 'd'. (1)!. reduce 1! state 11! F : 'h' F. (5)!. reduce 5! state 12! S : 'a' 'e' F 'g'. (2)!. reduce 2! 34

Yacc In Action PDA stack: alternates between "states" and initially, push state 0 symbols from (V % $). while not done { let S be the state on top of the stack; let i in $ be the next input symbol; look at the action defined in S for i: if "accept", halt and accept; if "error", halt and signal a syntax error; if "shift to state T", push i then T onto the stack; if "reduce via rule r (A! " )", then: pop exactly 2* " symbols (the 1st, 3rd,... will be states, and the 2nd, 4th,... will be the letters of "); let T = the state now exposed on top of the stack; T's action for A is "goto state U" for some U; push A, then U onto the stack. } Implementation note: given the tables, it s deterministic, and fast -- just table lookups, push/pop. 35 Yacc "Parser Table" expr: expr '+' term term ; term: term '*' fact fact ; fact: '(' expr ')' 'A' ; State Dotted Rules Shift Actions Goto Actions A + * ( ) $end expr term fact (default) 0 $accept : _expr $end 5 4 1 2 3 error 1 $accept : expr_$end expr : expr_+ term 6 accept error 2 expr : term_ (2) term : term_* fact 7 reduce 2 3 term : fact_ (4) reduce 4 4 fact : (_expr ) 5 4 8 2 3 error 5 fact : A_ (6) reduce 6 6 expr : expr +_term 5 4 9 3 error 7 term : term *_fact 5 4 10 error 8 expr : expr_+ term fact : ( expr_) 6 11 error 9 expr : expr + term_ (1) term : term_* fact 7 reduce 1 10 term : term * fact_ (3) reduce 3 11 fact : ( expr )_ (5) reduce 5 36 Yacc Output shift/goto # # is a state # reduce # # is a rule # A : # _ (#) # is this rule #. default action Implicit Dotted Rules state 0 $accept : _expr $end ( shift 4 A shift 5. error expr goto 1 term goto 2 fact goto 3 state 1 $accept : expr_$end expr : expr_+ term $end accept + shift 6. error state 2 expr : term_ (2) term : term_* fact... * shift 7. reduce 2 37 state 0 $accept : _expr $end ( shift 4 A shift 5. error expr goto 1 term goto 2 fact goto 3 $accept: _ expr $end expr: _ expr '+ term expr: _ term term: _ term '*' fact term: _ fact fact: _ '(' expr ')' fact: _ 'A' 38

Goto & Lookahead state 0 $accept : _expr $end ( shift 4 A shift 5. error expr goto 1 term goto 2 fact goto 3 $accept: _ expr $end expr: _ expr '+ term expr: _ term term: _ term '*' fact term: _ fact fact: _ '(' expr ')' fact: _ 'A' 39 using the unambiguous expression grammar here & parse table on slide 36 Action: Stack: Input: shift 5 reduce fact! A, go 3 reduce fact! term, go 2 reduce expr! term, go 1 shift 6 Example: input "A + A $end" state 5 says reduce rule 6 on +; state 0 (exposed on pop) says goto 3 on fact expr: expr '+' term term ; term: term '*' fact fact ; fact: '(' expr ')' 'A' ; 0 A + A $end 0 A 5 + A $end 0 fact 3 + A $end 0 term 2 + A $end 0 expr 1 + A $end 40 Action: Stack: Input: shift 6 0 expr 1 + 6 A $end shift 5 0 expr 1 + 6 A 5 $end reduce fact! A, go 3 0 expr 1 + 6 fact 3 $end reduce term! fact, go 9 0 expr 1 + 6 term 9 $end reduce expr! expr + term, go 1 0 expr 1 $end accept An Error Case: "A ) $end": Action: Stack: Input: 0 A ) $end shift 5 0 A 5 ) $end reduce fact! A, go 3 0 fact 3 ) $end reduce fact! term, go 2 0 term 2 ) $end reduce expr! term, go 1 0 expr 1 ) $end error 41 42

Q: Do you have any advice for up-and-coming programmers? A:... One more piece of advice take a theoretician to lunch... From the end of a 2008 interview with Steve Johnson, creator of yacc http://www.techworld.com.au/article/252319/a-z_programming_languages_yacc 44