JavaCUP. There are also many parser generators written in Java

Similar documents
Fall Compiler Principles Lecture 4: Parsing part 3. Roman Manevich Ben-Gurion University of the Negev

Fall Compiler Principles Lecture 5: Parsing part 4. Roman Manevich Ben-Gurion University

CUP. Lecture 18 CUP User s Manual (online) Robb T. Koether. Hampden-Sydney College. Fri, Feb 27, 2015

For example, we might have productions such as:

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

Agenda. Previously. Tentative syllabus. Fall Compiler Principles Lecture 5: Parsing part 4 12/2/2015. Roman Manevich Ben-Gurion University

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Last Time. What do we want? When do we want it? An AST. Now!

Abstract Syntax. Mooly Sagiv. html://

COMP 181. Prelude. Prelude. Summary of parsing. A Hierarchy of Grammar Classes. More power? Syntax-directed translation. Analysis

Compilers. Bottom-up Parsing. (original slides by Sam

Using JFlex. "Linux Gazette...making Linux just a little more fun!" by Christopher Lopes, student at Eastern Washington University April 26, 1999

Parser and syntax analyzer. Context-Free Grammar Definition. Scanning and parsing. How bottom-up parsing works: Shift/Reduce tecnique.

Review main idea syntax-directed evaluation and translation. Recall syntax-directed interpretation in recursive descent parsers

Simple Lexical Analyzer

Operator Precedence a+b*c b*c E + T T * P ( E )

CPS 506 Comparative Programming Languages. Syntax Specification

A clarification on terminology: Recognizer: accepts or rejects strings in a language. Parser: recognizes and generates parse trees (imminent topic)

Lexical and Syntax Analysis

A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994

Properties of Regular Expressions and Finite Automata

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

Lexical Analysis. Textbook:Modern Compiler Design Chapter 2.1.

Introduction to Compiler Design

Compilers and Language Processing Tools

Lexical Analysis. Textbook:Modern Compiler Design Chapter 2.1

Lex Spec Example. Int installid() {/* code to put id lexeme into string table*/}

Using an LALR(1) Parser Generator

Module 8 - Lexical Analyzer Generator. 8.1 Need for a Tool. 8.2 Lexical Analyzer Generator Tool

Lexical and Syntax Analysis

Compiler Construction

Syntax Analysis The Parser Generator (BYacc/J)

Compiler Construction

Name EID. (calc (parse '{+ {with {x {+ 5 5}} {with {y {- x 3}} {+ y y} } } z } ) )

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Question Points Score

I/O and Parsing Tutorial

COMP3131/9102: Programming Languages and Compilers

Syntax. In Text: Chapter 3

CSE 401 Midterm Exam Sample Solution 11/4/11

Some Thoughts on Grad School. Undergraduate Compilers Review

Lecture 12: Parser-Generating Tools

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

JavaCC: SimpleExamples

Programming Assignment III

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Syntax-Directed Translation

CSE 401 Midterm Exam 11/5/10

How do LL(1) Parsers Build Syntax Trees?

CS 11 Ocaml track: lecture 6

CSE 3302 Programming Languages Lecture 2: Syntax

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Left to right design 1

A Simple Syntax-Directed Translator

Topic 3: Syntax Analysis I

Compiler Design (40-414)

A simple syntax-directed

Automated Tools. The Compilation Task. Automated? Automated? Easier ways to create parsers. The final stages of compilation are language dependant

Chapter 3. Describing Syntax and Semantics ISBN

Jim Lambers ENERGY 211 / CME 211 Autumn Quarter Programming Project 4

CS 536 Midterm Exam Spring 2013

Undergraduate Compilers in a Day

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

SML-SYNTAX-LANGUAGE INTERPRETER IN JAVA. Jiahao Yuan Supervisor: Dr. Vijay Gehlot

Syntax. A. Bellaachia Page: 1

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

Syntax Intro and Overview. Syntax

Little Language [Grand]

Programming Languages. Dr. Philip Cannata 1

Programming Languages. Dr. Philip Cannata 1

Chapter 4. Abstract Syntax

Install and Configure ANTLR 4 on Eclipse and Ubuntu

UNIVERSITY OF CALIFORNIA

Last time. What are compilers? Phases of a compiler. Scanner. Parser. Semantic Routines. Optimizer. Code Generation. Sunday, August 29, 2010

Compiler Lab. Introduction to tools Lex and Yacc

Introduction to Lex & Yacc. (flex & bison)

Introduction to Programming (Java) 2/12

Parsing. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.

ASTs, Objective CAML, and Ocamlyacc

Compilers - Chapter 2: An introduction to syntax analysis (and a complete toy compiler)

Parsing and Pattern Recognition

UNIT III & IV. Bottom up parsing

The Structure of a Syntax-Directed Compiler

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Lex & Yacc. by H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

CSE 431S Final Review. Washington University Spring 2013

Syntax Analysis Part IV

CSCI312 Principles of Programming Languages!

Compiler phases. Non-tokens

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2003 Columbia University Department of Computer Science

CS 541 Fall Programming Assignment 3 CSX_go Parser

Syntax and Grammars 1 / 21

POLITECNICO DI TORINO. (01JEUHT) Formal Languages and Compilers. Laboratory N 3. Lab 3. Cup Advanced Use

4. Semantic Processing and Attributed Grammars

CSCI Compiler Design

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

Let s look at the CUP specification for CSX-lite. Recall its CFG is

Transcription:

JavaCUP JavaCUP (Construct Useful Parser) is a parser generator Produce a parser written in java, itself is also written in Java; There are many parser generators. YACC (Yet Another Compiler-Compiler) for C programming language (dragon book chapter 4.9); There are also many parser generators written in Java JavaCC; ANTLR; 1

More on classification of java parser generators Bottom up Parser Generators Tools JavaCUP; SableCC, The Sable Compiler Compiler www.sablecc.org Topdown Parser Generators Tools ANTLR, Another Tool for Language Recognition www.antlr.org JavaCC, Java Compiler Compiler www.webgain.com/java_cc 2

What is a parser generator T o t a l : = p r i c e + t a x ; Scanner Total := price + tax ; assignment Parser id := Expr Exp + id Parser generator (JavaCup) id Context Free Grammar 3

Steps to use JavaCup Write a javacup specification (cup file) Defines the grammar and actions in a file (say, calc.cup) Run javacup to generate a parser java java_cup.main calc.cup Notice the package prefix java_cup before Main; Will generate parser.java and sym.java (default class names, which can be changed) Write your program that uses the parser For example, UseParser.java Compile and run your program 4

Example 1: parse an expression and evaluate it Grammar for arithmetic expression expr expr + expr expr expr expr * expr expr / expr ( expr ) number Example (2+4)*3 is an expression Our tasks: Tell whether an expression like (2+4)*3 is syntactically correct; Evaluate the expression (we are actually producing an interpreter for the expression language ). 5

The overall picture java_cup.runtime public interface Scanner { public Symbol next_token() throws java.lang.exception; } Scanner implements CalcScanner Symbol lr_parser extends CalcParser expression (2+4)*3 CalcScanner tokens CalcParser CalcParserUser JLex javacup result calc.lex calc.cup 6

Calculator javacup specification (calc.cup) terminal PLUS, MINUS, TIMES, DIVIDE, LPAREN, RPAREN; terminal Integer NUMBER; non terminal Integer expr; precedence left PLUS, MINUS; precedence left TIMES, DIVIDE; expr ::= expr PLUS expr expr MINUS expr expr TIMES expr expr DIVIDE expr LPAREN expr RPAREN NUMBER ; Is the grammar ambiguous? Add precedence and associativity left means, that a + b + c is parsed as (a + b) + c lowest precedence comes first, so a + b * c is parsed as a + (b * c) How can we get PLUS, NUMBER,...? They are the terminals returned by the scanner. How to connect with the scanner? 7

Ambiguous grammar error If we enter the grammar as below: Expression ::= Expression PLUS Expression; Without precedence JavaCUP will tell us: Shift/Reduce conflict found in state #4 between Expression ::= Expression PLUS Expression () and Expression ::= Expression () PLUS Expression under symbol PLUS Resolved in favor of shifting. The grammar is ambiguous! Telling JavaCUP that PLUS is left associative helps. 8

Corresponding scanner specification (calc.lex) 1.import java_cup.runtime.symbol; 2.Import java_cup.runtime.scanner; 3.%% 4.%implements java_cup.runtime.scanner 5.%type Symbol 6.%function next_token 7.%class CalcScanner 8.%eofval{ return null; 9.%eofval} 10.NUMBER = [0-9]+ 11.%% 12."+" { return new Symbol(CalcSymbol.PLUS); } 13."-" { return new Symbol(CalcSymbol.MINUS); } 14."*" { return new Symbol(CalcSymbol.TIMES); } 15."/" { return new Symbol(CalcSymbol.DIVIDE); } 16.{NUMBER} { return new Symbol(CalcSymbol.NUMBER, new Integer(yytext()));} 17.\r \n. {} Connection with the parser imports java_cup.runtime.*, Symbol, Scanner. implements Scanner next_token: defined in Scanner interface CalcSymbol, PLUS, MINUS,... new Integer(yytext()) 9

Run JLex D:\214>java JLex.Main calc.lex note the package prefix JLex program text generated: calc.lex.java D:\214>javac calc.lex.java classes generated: CalcScanner.class 10

Generated CalcScanner class 1. import java_cup.runtime.symbol; 2. Import java_cup.runtime.scanner; 3. class CalcScanner implements java_cup.runtime.scanner { 4....... 5. public Symbol next_token () { 6....... 7. case 3: { return new Symbol(CalcSymbol.MINUS); } 8. case 6: { return new Symbol(CalcSymbol.NUMBER, new Integer(yytext()));} 9....... 10. } 11. } Interface Scanner is defined in java_cup.runtime package public interface Scanner { public Symbol next_token() throws java.lang.exception; } 11

Run javacup Run javacup to generate the parser D:\214>java java_cup.main -parser CalcParser -symbols CalcSymbol calc.cup classes generated: CalcParser; CalcSymbol; Compile the parser and relevant classes D:\214>javac CalcParser.java CalcSymbol.java CalcParserUser.java Use the parser D:\214>java CalcParserUser 12

The token class Symbol.java 1. public class Symbol { 2. public int sym, left, right; 3. public Object value; 4. public Symbol(int id, int l, int r, Object o) { 5. this(id); left = l; right = r; value = o; 6. } 7....... 8. public Symbol(int id, Object o) { this(id, -1, -1, o); } 9. public String tostring() { return "#"+sym; } 10. } Instance variables: sym: the symbol type; left: left position in the original input file; right: right position in the original input file; value: the lexical value. Recall the action in lex file: return new Symbol(CalcSymbol.NUMBER, new Integer (yytext())); 13

CalcSymbol.java (default name is sym.java) 1. public class CalcSymbol { 2. public static final int MINUS = 3; 3. public static final int DIVIDE = 5; 4. public static final int NUMBER = 8; 5. public static final int EOF = 0; 6. public static final int PLUS = 2; 7. public static final int error = 1; 8. public static final int RPAREN = 7; 9. public static final int TIMES = 4; 10. public static final int LPAREN = 6; 11.} Contain token declaration, one for each token (terminal); Generated from the terminal list in cup file terminal PLUS, MINUS, TIMES, DIVIDE, LPAREN, RPAREN; terminal Integer NUMBER Used by scanner to refer to symbol types, e.g., return new Symbol(CalcSymbol.PLUS); Class name comes from symbols directive. java java_cup.main -parser CalcParser -symbols CalcSymbol calc.cup 14

The program that uses the CalcPaser 1. import java.io.*; 2. class CalcParserUser { 3. public static void main(string[] args) throws IOException{ 4. File inputfile = new File ("d:/214/calc.input"); 5. CalcParser parser= new CalcParser 6. (new CalcScanner(new FileInputStream(inputFile))); 7. parser.parse(); 8. } 9. } The input text to be parsed can be any input stream (in this example it is a FileInputStream); The first step is to construct a parser object. A parser can be constructed using a scanner. this is how scanner and parser get connected. If there is no error report, the expression in the input file is correct. 15

Recap To write a parser, how many things you need to write? cup file; lex file; a program to use the parser; To run a parser, how many things you need to do? Run javacup, to generate the parser; Run JLex, to generate the scanner; Compile the scanner, the parser, the relevant classes, and the class using the parser; relevant classes: CalcSymbol, Symbol Run the class that uses the parser. 16

Recap (cont.) java_cup.runtime Scanner Symbol lr_parser implements coded as use CalcSymbol extends expression 2+(3*5) CalcScanner tokens generate CalcParser CalcParserUser JLex javacup result calc.lex calc.cup 17

Evaluate the expression The previous specification only indicates the success or failure of a parser. No semantic action is associated with grammar rules. To calculate the expression, we must add java code in the grammar to carry out actions at various points. Form of the semantic action: expr:e1 PLUS expr:e2 {: RESULT=new Integer(e1.intValue()+ e2.intvalue()); :} Actions (java code) are enclosed within a pair {: :} Labels e1, e2: the objects that represent the corresponding terminal or nonterminal; RESULT: The type of RESULT should be the same as the type of the corresponding non-terminals. e.g., expr is of type Integer, so RESULT is of type integer. In the cup file, you need to specify expr is of Integer type. non terminal Integer expr; 18

Change the calc.cup 1. terminal PLUS, MINUS, TIMES, DIVIDE, LPAREN, RPAREN; 2. terminal Integer NUMBER; 3. non terminal Integer expr; 4. precedence left PLUS, MINUS; 5. precedence left TIMES, DIVIDE; 6. expr::= expr:e1 PLUS expr:e2 {: 7. RESULT = new Integer(e1.intValue()+ e2.intvalue()); :} 8. expr:e1 MINUS expr:e2 {: 9. RESULT = new Integer(e1.intValue()- e2.intvalue()); :} 10. expr:e1 TIMES expr:e2 {: 11. RESULT = new Integer(e1.intValue()* e2.intvalue()); :} 12. expr:e1 DIVIDE expr:e2 {: 13. RESULT = new Integer(e1.intValue()/ e2.intvalue()); :} 14. LPAREN expr:e RPAREN {: RESULT = e; :} 15. NUMBER:e {: RESULT= e; :} How do you guarantee NUMBER is of Integer type? Yytext() returns a String {NUMBER} { return new Symbol(CalcSymbol.NUMBER, new Integer(yytext()));} 19

Change CalcPaserUser 1. import java.io.*; 2. class CalcParserUser { 3. public static void main(string[] a) throws Exception{ 4. CalcParser parser= new CalcParser( 5. new CalcScanner(new FileReader( calc.input ))); 6. Integer result= (Integer)parser.parse().value; 7. System.out.println("result is "+ result); 8. } 9. } Why the result of parser().value can be casted into an Integer? Can we cast that into other types? This is determined by the type of expr, which is the head of the first production in javacup specification: non terminal Integer expr; 20

Calc: second round Calc program syntax program statement statement program statement assignment SEMI assignment ID EQUAL expr expr expr PLUS expr expr MULTI expr LPAREN expr RPAREN NUMBER ID Example program: X=1; y=2; z=x+y*2; Task: generate and display the parse tree in XML 21

Abstract syntax tree X=1; y=2; z=x+y*2; Program Statement Statement Statement Assignment Assignment Assignment ID Expr ID Expr ID Expr NUMBER NUMBER PLUS Expr Expr ID MULTI Expr Expr ID NUMBER 22

OO Design Rationale Write a class for every non-terminal Program, Statement, Assignment, Expr Write an abstract class for non-terminal which has alternatives Given a rule: statement assignment ifstatement Statement should be an abstract class; Assignment should extends Statement; Semantic part of the CUP file will construct the object; assignment ::= ID:e1 EQUAL expr:e2 {: RESULT = new Assignment(e1, e2); :} The first rule will return the top level object (the Program object) the result of parsing is a Program object It is similar to XML DOM parser. 23

Calc2.cup 1.terminal String ID, LPAREN, RPAREN, EQUAL, SEMI, PLUS, MULTI; 2.terminal Integer NUMBER; 3.non terminal Expr expr; 4.non terminal Statement statement; 5.non terminal Program program; 6.non terminal Assignment assignment; 7.precedence left PLUS; 8.precedence left MULTI; 9.program ::= statement:e {: RESULT = new Program(e); :} 10. statement:e1 program:e2 {: RESULT=new Program(e1, e2); :}; 11.statement ::= assignment:e SEMI {: RESULT = e; :} ; 12.assignment::= ID:e1 EQUAL expr:e2 13. {: RESULT = new Assignment(e1, e2); :}; 14.expr ::= expr:e1 PLUS:e expr:e2 {: RESULT=new Expr(e1,e2,e); :} 15. expr:e1 MULTI:e expr:e2 {: RESULT=new Expr(e1,e2,e); :} 16. LPAREN expr:e RPAREN {: RESULT = e; :} 17. NUMBER:e {: RESULT= new Expr(e); :} 18. ID:e {: RESULT = new Expr(e); :} 19. ; Common bugs in assignments: ; {: :} 24

Program class 1. import java.util.*; 2. public class Program { 3. private Vector statements; 4. public Program(Statement s) { 5. statements = new Vector(); 6. statements.add(s); 7. } 8. public Program(Statement s, Program p) { 9. statements = p.getstatements(); 10. statements.add(s); 11. } 12. public Vector getstatements(){ return statements; } 13. public String toxml() {...... } 14. } Program ::= statement:e {: RESULT=new Program(e); :} statement:e1 program:e2 {: RESULT=new Program(e1, e2); :} 25

Assignment statement class 1.class Assignment extends Statement{ 2. private String lhs; 3. private Expr rhs; 4. public Assignment(String l, Expr r){ 5. lhs=l; 6. rhs=r; 7. } 8. String toxml(){ 9. String result="<assignment>"; 10. result += "<lhs>" + lhs + "</lhs>"; 11. result += rhs.toxml(); 12. result += "</Assignment>"; 13. return result; 14. } 15.} assignment::=id:e1 EQUAL expr:e2 {: RESULT = new Assignment(e1, e2); :} 26

Expr class 1. public class Expr { 2. private int value; 3. private String id; 4. private Expr left; 5. private Expr right; 6. private String op; 7. public Expr(Expr l, Expr r, String o){ left=l; right=r; op=o; } 8. public Expr(Integer i){ value=i.intvalue();} 9. public Expr(String i){ id=i;} 10. public String toxml() {... } 11.} expr::= expr:e1 PLUS:e expr:e2 {: RESULT = new Expr(e1, e2, e); :} expr:e1 MULTI:e expr:e2 {: RESULT = new Expr(e1, e2, e);:} LPAREN expr:e RPAREN {: RESULT = e; :} NUMBER:e {: RESULT= new Expr(e); :} ID:e {: RESULT = new Expr(e); :} 27

Calc2.lex 1. import java_cup.runtime.*; 2. %% 3. %implements java_cup.runtime.scanner 4. %type Symbol 5. %function next_token 6. %class Calc2Scanner 7. %eofval{ return null; 8. %eofval} 9. IDENTIFIER = [a-za-z][a-za-z0-9_]* 10. NUMBER = [0-9]+ 11. %% 12. "+" { return new Symbol(Calc2Symbol.PLUS, yytext()); } 13. "*" { return new Symbol(Calc2Symbol.MULTI, yytext()); } 14. "=" { return new Symbol(Calc2Symbol.EQUAL, yytext()); } 15. ";" { return new Symbol(Calc2Symbol.SEMI, yytext()); } 16. "(" { return new Symbol(Calc2Symbol.LPAREN, yytext()); } 17. ")" { return new Symbol(Calc2Symbol.RPAREN, yytext()); } 18. {IDENTIFIER} {return new Symbol(Calc2Symbol.ID, yytext()); } 19. {NUMBER} { return new Symbol(Calc2Symbol.NUMBER, new Integer(yytext()));} 20. \n \r. { } 28

Calc2Parser User 1.class ProgramProcessor { 2.public static void main(string[] args) throws IOException{ 3. File inputfile = new File ("d:/214/calc2.input"); 4. Calc2Parser parser= new Calc2Parser( 5. new Calc2Scanner(new FileInputStream(inputFile))); 6. Program pm= (Program)parser.debug_parse().value; 7. String xml=pm.toxml(); 8. System.out.println("result is "+ xml); 9.} 10.} Debug_parser(): print out debug info, such as the current token being processed, the rule being applied. Useful to debug javacup specification. Parsing result value is of Program type this is decided by the type of the program rule: Program ::= statement:e {: RESULT = new Program(e); :} statement:e1 program:e2 {: RESULT=new Program(e1, e2); :} ; 29

Another way to define the expression syntax terminal PLUS, MINUS, TIMES, DIV, LPAREN, RPAREN; terminal NUMLIT; non terminal Expression, Term, Factor; start with Expression; Expression ::= Expression PLUS Term Expression MINUS Term Term ; Term ::= Term TIMES Factor Term DIV Factor Factor ; Factor ::= NUMLIT LPAREN Expression RPAREN ; 30

Debug the grammar import java.io.*; class A3User { public static void main(string[] args) throws Exception { File inputfile = new File ("A3.tiny"); A3Parser parser= new A3Parser(new A3Scanner(new FileInputStream(inputFile))); Integer result =(Integer)parser.debug_parse().value; FileWriter fw=new FileWriter(new File("A3.output")); fw.write("number of methods: "+ result.intvalue()); fw.close(); } } Parser will print out processed symbols and the current symbol that is causing the problem 31

Run all the programs using one command Save the following into a file: java JLex.Main A3.lex java java_cup.main -parser A3Parser -symbols A3Symbol < A3.cup javac A3.lex.java A3Parser.java A3Symbol.java A3User.java java A3User Under unix Can be any file name. say run214 Type: chmod 755 run214 Type run214 Under windows Save as run214.bat Type run214 It is script programming 32

More flexible Script program (say named run214) java JLex.Main $1.lex mv $1.lex.java $1Scanner.java java java_cup.main -parser $1Parser -symbols $1Symbol A3Lu.cup javac $1Scanner.java A3Parser.java A3Symbol.java A3User.java java $1User more $1.output Run the scrip program with parameter > run214 A3 33