Lab 2 Tutorial (An Informative Guide)

Similar documents
MP 3 A Lexer for MiniJava

MP 3 A Lexer for MiniJava

Project 3 - Parsing. Misc Details. Parser.java Recursive Descent Parser... Using Lexer.java

Defining syntax using CFGs

CPS2000 Compiler Theory & Practice

ASTs, Objective CAML, and Ocamlyacc

Project 1: Scheme Pretty-Printer

CSE P 501 Compilers. Implementing ASTs (in Java) Hal Perkins Autumn /20/ Hal Perkins & UW CSE H-1

Repeating patterns of boilerplate code (like parent/child navigation, Visitor pattern implementation, factory methods for parser tree construction)

ML 4 A Lexer for OCaml s Type System

Jianning Yue Wookyun Kho Young Jin Yoon. AGL : Animation applet Generation Language

Projects for Compilers

CSE 340 Fall 2014 Project 4

Topic 3: Syntax Analysis I

CSE P 501 Compilers. Implementing ASTs (in Java) Hal Perkins Winter /22/ Hal Perkins & UW CSE H-1

Last Time. What do we want? When do we want it? An AST. Now!

Compilers. Compiler Construction Tutorial The Front-end

Examination in Compilers, EDAN65

CUP. Lecture 18 CUP User s Manual (online) Robb T. Koether. Hampden-Sydney College. Fri, Feb 27, 2015

CS152 Programming Language Paradigms Prof. Tom Austin, Fall Syntax & Semantics, and Language Design Criteria

Grammars and Parsing, second week

GAWK Language Reference Manual

B E C Y. Reference Manual

Building lexical and syntactic analyzers. Chapter 3. Syntactic sugar causes cancer of the semicolon. A. Perlis. Chomsky Hierarchy

A clarification on terminology: Recognizer: accepts or rejects strings in a language. Parser: recognizes and generates parse trees (imminent topic)

Table-Driven Top-Down Parsers

Fall Compiler Principles Lecture 4: Parsing part 3. Roman Manevich Ben-Gurion University of the Negev

CS 11 Ocaml track: lecture 6

Control Structures. CIS 118 Intro to LINUX

Lecture Code Generation for WLM

Class Information ANNOUCEMENTS

PLT 4115 LRM: JaTesté

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

1 Lexical Considerations

Syntax and Grammars 1 / 21

CS164: Programming Assignment 3 Dpar Parser Generator and Decaf Parser

Outline. Top Down Parsing. SLL(1) Parsing. Where We Are 1/24/2013

Tree visitors. Context classes. CS664 Compiler Theory and Design LIU 1 of 11. Christopher League* 24 February 2016

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

Decaf Language Reference

A Bison Manual. You build a text file of the production (format in the next section); traditionally this file ends in.y, although bison doesn t care.

Defining syntax using CFGs

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Fall Compiler Principles Lecture 5: Parsing part 4. Roman Manevich Ben-Gurion University

CPSC 411, 2015W Term 2 Midterm Exam Date: February 25, 2016; Instructor: Ron Garcia

TML Language Reference Manual

SML-SYNTAX-LANGUAGE INTERPRETER IN JAVA. Jiahao Yuan Supervisor: Dr. Vijay Gehlot

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

1. Consider the following program in a PCAT-like language.

Semantic Analysis. Lecture 9. February 7, 2018

This class is about understanding how programs work

Abstract Syntax. Mooly Sagiv. html://

Intermediate Representations

KU Compilerbau - Programming Assignment

Lecture 14 Sections Mon, Mar 2, 2009

Decaf Language Reference Manual

CS 164 Handout 11. Midterm Examination. There are seven questions on the exam, each worth between 10 and 20 points.

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

The PCAT Programming Language Reference Manual

CSE 413 Languages & Implementation. Hal Perkins Winter 2019 Structs, Implementing Languages (credits: Dan Grossman, CSE 341)

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

MiniJava Parser and AST. Winter /27/ Hal Perkins & UW CSE E1-1

Writing Evaluators MIF08. Laure Gonnord

CSE 401 Midterm Exam 11/5/10

ECE251 Midterm practice questions, Fall 2010

The TinyJ Compiler's Static and Stack-Dynamic Memory Allocation Rules Static Memory Allocation for Static Variables in TinyJ:

Lexical Considerations

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Abstract Syntax Trees

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

Parsing and Pattern Recognition

Semantic actions for declarations and expressions

Implementing Actions

Better Extensibility through Modular Syntax. Robert Grimm New York University

CPS 506 Comparative Programming Languages. Syntax Specification

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

An ANTLR Grammar for Esterel

Parsing. CS321 Programming Languages Ozyegin University

Implementing a VSOP Compiler. March 12, 2018

CSE 401 Midterm Exam Sample Solution 11/4/11

Typescript on LLVM Language Reference Manual

Last time. What are compilers? Phases of a compiler. Scanner. Parser. Semantic Routines. Optimizer. Code Generation. Sunday, August 29, 2010

Question Points Score

CS 314 Principles of Programming Languages. Lecture 9

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

Functional Programming

LL(1) Grammars. Example. Recursive Descent Parsers. S A a {b,d,a} A B D {b, d, a} B b { b } B λ {d, a} D d { d } D λ { a }

CSE 401 Midterm Exam Sample Solution 2/11/15

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Name SOLUTIONS EID NOTICE: CHEATING ON THE MIDTERM WILL RESULT IN AN F FOR THE COURSE.

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

Today. Assignments. Lecture Notes CPSC 326 (Spring 2019) Quiz 2. Lexer design. Syntax Analysis: Context-Free Grammars. HW2 (out, due Tues)

Where we are. What makes a good IR? Intermediate Code. CS 4120 Introduction to Compilers

A simple syntax-directed

Semantics of programming languages

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Program Representations

Summer 2017 Discussion 10: July 25, Introduction. 2 Primitives and Define

Transcription:

Lab 2 Tutorial (An Informative Guide) Jon Eyolfson University of Waterloo October 18 - October 22, 2010

Outline Introduction Good News Lexer and Parser Infrastructure Your Task Example Conclusion Jon Eyolfson (UW) Lab 2 Tutorial Oct. 18-22, 2010 2 / 18

Introduction In this lab you will be implementing a lexer and parser without ANTLR The grammar you will be using is a subset of SQL which you can find in the hand out You will have to write a recursive descent parser, for further explanation refer to Wikipedia (just kidding!) Jon Eyolfson (UW) Lab 2 Tutorial Oct. 18-22, 2010 3 / 18

The Good News A skeleton of the Java code is provided The time commitment is far less than lab 1 (you only need to write two classes) All the information should be provided in the lab hand out and should provide additional insight Jon Eyolfson (UW) Lab 2 Tutorial Oct. 18-22, 2010 4 / 18

Lexer Infrastructure (1) As expected the lexer takes reads the input and seperates it into tokens The TokenStream class contains two methods hasmoreelements which returns a boolean and nextelement which returns a Token (this is because it implements the Enumeration<Token> interface) Don t mind the reader object, the details are not important hasmoreelements returns true if the file has not been completly read, false otherwise Jon Eyolfson (UW) Lab 2 Tutorial Oct. 18-22, 2010 5 / 18

Lexer Infrastructure (2) nextelement creates a LexerToken object, which is a String with line and column information, from the previously unread input Using the LexerToken object it creates a Token object which will determine and store its type (the Token class has a constructor which accepts a LexerToken) The Token object holds a reference to the LexerToken so it can print out the String read from the file with the line and column information; it also uses the String to determine its type Jon Eyolfson (UW) Lab 2 Tutorial Oct. 18-22, 2010 6 / 18

Lexer Infrastructure (3) Token contains an enum which will keep track of its type, this enum contains a constructor which accepts a String which will represent a regular expression (in the skeleton this is initially set to <BLANK> ) The constructor creates a regular expression pattern from the String which you provide To determine the type the Token constructor loops over every Type in order and attempts to match the regular expression using a case insenstive match On the first match it stops and sets the type to the regular expression it matched (note the order is important!) Jon Eyolfson (UW) Lab 2 Tutorial Oct. 18-22, 2010 7 / 18

Task 1 Write the regular expressions for each token, most of them will simply be the keyword Note: If you want to use a \ character to escape a regular expression character you must first escape it in the Java string (you must also escape ) Example: If you want to match the + character you will need to escape it so the regular expression doesn t use it as an operator, which is \+ the corresponding Java String would be \\+ You can test your regular expressions using the TokenDriver as the main class which expects a filename java ca.uwaterloo.ece251.tokendriver <filename> in a terminal or in Eclipse to set the main class and argument Jon Eyolfson (UW) Lab 2 Tutorial Oct. 18-22, 2010 8 / 18

Lexer Example Input CREATE TABLE f o o ( name ) ; Output t : [CREATE : ( CREATE ), l i n e 1, c o l 0 6] t : [ TABLE : ( TABLE ), l i n e 1, c o l 7 12] t : [ ID : ( foo ), l i n e 1, c o l 13 16] t : [ LPAREN : ( ( ), l i n e 1, c o l 17 17] t : [ ID : ( name ), l i n e 1, c o l 18 21] t : [RPAREN: ( ) ), l i n e 1, c o l 22 22] t : [ SEMICOLON : ( ; ), l i n e 1, c o l 23 23] Jon Eyolfson (UW) Lab 2 Tutorial Oct. 18-22, 2010 9 / 18

Parser Infrastructure The Parser class uses a TokenStream and parses the tokens using the stmt() method corresponding to the beginning production (like in Lab 1) This returns a Stmt which will be an instance of a class which extents Stmt (you ll see a clearer picture later) The XMLVisitor is responsable for printing the parse tree which is done after a successful parse Note: The visitor also starts printing using the starting production Jon Eyolfson (UW) Lab 2 Tutorial Oct. 18-22, 2010 10 / 18

AST Class Reference (1) Legend: Abstract Class, Concrete Class ASTNode void accept(visitor) Stmt boolean explain boolean queryplan Expr SingleSource ResultColumn IdExprPair IdExprPair(String id, Expr) ColumnDef ColumnDef(String id, Type) Type (enum) NULL, INTEGER, REAL, TEXT, BLOB, UNKNOWN Literal BooleanLiteral BooleanLiteral(String) NumericLiteral NumericLiteral(String) StringLiteral StringLiteral(String) QualifiedTableName QualifiedTableName(String db, String table) Jon Eyolfson (UW) Lab 2 Tutorial Oct. 18-22, 2010 11 / 18

AST Class Reference (2) Stmt boolean explain boolean queryplan CreateTableStmt CreateTableStmt(boolean temporary, boolean ifnotexists, QualifiedTableName, List<ColumnDef>, SelectStmt) DropTableStmt DropTableStmt(boolean ifexists, QualifiedTableName) InsertStmt InsertStmt(boolean insert, boolean replace, QualifiedTableName, List<String> columns, List<Expr> values, SelectStmt, boolean defaultvalues) SelectStmt SelectStmt(boolean distinct, boolean all, List<ResultColumn> cols, List<SingleSource> js, Expr where, Expr limit) UpdateStmt UpdateStmt(QualifiedTableName, List<IdExprPair> sets, Expr where) VacuumStmt VacuumStmt() Jon Eyolfson (UW) Lab 2 Tutorial Oct. 18-22, 2010 12 / 18

AST Class Reference (3) Expr BinaryOpExpr static String PLUS, MINUS, TIMES, DIV, MOD, LT, LE, GT, GE, EQ BinaryOpExpr(String op, Expr left, Expr right) LiteralExpr LiteralExpr(Literal value) QualifiedTableNameExpr QualifiedTableNameExpr(QualifiedTableName tn) UnaryOpExpr static String PLUS, MINUS, ISNULL, NOTNULL UnaryOpExpr(String op, Expr) Jon Eyolfson (UW) Lab 2 Tutorial Oct. 18-22, 2010 13 / 18

AST Class Reference (4) SingleSource JoinSingleSource JoinSingleSource(List<SingleSource>) QualifiedTableNameSingleSource QualifiedTableNameSingleSource(QualifiedTableName tn, String as) SelectSingleSource SelectSingleSource(SelectStmt select, String as) ResultColumn ExprResultColumn ExprResultColumn(Expr e) StarResultColumn StarResultColumn(String table) Jon Eyolfson (UW) Lab 2 Tutorial Oct. 18-22, 2010 14 / 18

Task 2 (1) You will implement your recursive descent parser in Parser.java To test your parser use ParserDriver as your main class Similar to ANTLR each production should have its own method associated with it which returns an object representing the rule The grammar is in LL(1) form (left to right, left most derivation looking ahead 1 token) for the most part, you will have to eliminate left recursion and use precedence For methods which return a List<ClassName>, you can create a LinkedList<ClassName> and add objects using its add(classname object) method. Jon Eyolfson (UW) Lab 2 Tutorial Oct. 18-22, 2010 15 / 18

Task 2 (2) There are 3 helper functions in Parser.java for you to use: accept(token.type), consume(token.type) and syntax error() accept will return true if the next token matches the type you use an argument and false otherwise consume will check that the type you use an argument matches, if it does not it will generate a syntax error and quit the program, otherwise it will advance to the next token in the parser and return a reference to the original token matched syntax error prints a message indicating a syntax error has occured along with the current token at the point of failure, it then exits the program Remember: Your Token class has a gettext() method which returns the String of the actual token Jon Eyolfson (UW) Lab 2 Tutorial Oct. 18-22, 2010 16 / 18

Parser Example Input CREATE TABLE f o o ( name ) ; Output <c r e a t e t a b l e tn = foo > <c o l u m n d e f i d=name t y p e=unknown /> </ c r e a t e t a b l e > Flow of events QualifiedTableName tn = new QualifiedTableName( foo ); List<ColumnDef> cdl = new LinkedList<ColumnDef>(); ColumnDef cd = new ColumnDef( name, Type.UNKNOWN); cdl.add(cd); Stmt s = new CreateTableStmt(false, false, tn, cdl, null); Jon Eyolfson (UW) Lab 2 Tutorial Oct. 18-22, 2010 17 / 18

Conclusion The overall program structure is almost identical to lab 1, minus the interpreter, just that you are writing the lexer and parser To complete the lexer you just have to fill in the token types with their corresponding regular expression For the parser you will implement your own recursive descent parser using the given helper functions Use the reference section of this tutorial to speed up the coding process Jon Eyolfson (UW) Lab 2 Tutorial Oct. 18-22, 2010 18 / 18