JavaCC. JavaCC File Structure. JavaCC is a Lexical Analyser Generator and a Parser Generator. The structure of a JavaCC file is as follows:

Similar documents
Structure of JavaCC File. JavaCC Rules. Lookahead. Example JavaCC File. JavaCC rules correspond to EBNF rules. JavaCC rules have the form:

JavaCC Rules. Structure of JavaCC File. JavaCC rules correspond to EBNF rules. JavaCC rules have the form:

Compilers CS S-01 Compiler Basics & Lexical Analysis

CS S-01 Compiler Basics & Lexical Analysis 1

Compilers CS S-01 Compiler Basics & Lexical Analysis

Abstract Syntax Trees

Compilation 2014 Warm-up project

Fundamentals of Compilation

CPSC 411: Introduction to Compiler Construction

CS 132 Compiler Construction

PART ONE Fundamentals of Compilation

CS 352: Compilers: Principles and Practice

Compiler Construction

Compiler Construction 1. Introduction. Oscar Nierstrasz

Topic 3: Syntax Analysis I

JavaCC: SimpleExamples

Compiler Construction 1. Introduction. Oscar Nierstrasz

Topic 5: Syntax Analysis III

Modern Compiler Implementation in ML

Automated Tools. The Compilation Task. Automated? Automated? Easier ways to create parsers. The final stages of compilation are language dependant

4. Parsing in Practice

PROGRAMMING FUNDAMENTALS

CA Compiler Construction

CS Parsing 1

CPSC 411, Fall 2010 Midterm Examination

CPS 506 Comparative Programming Languages. Syntax Specification

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

The Zephyr Abstract Syntax Description Language

A simple syntax-directed

Java Bytecode (binary file)

CA Compiler Construction

Lecture 12: Parser-Generating Tools

Full file at

Assignment 1 (Lexical Analyzer)

Abstract Syntax. Mooly Sagiv. html://

Parsing and Pattern Recognition

CSE au Final Exam Sample Solution

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Lexical Analysis 1 / 52

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

CS664 Compiler Theory and Design LIU 1 of 16 ANTLR. Christopher League* 17 February Figure 1: ANTLR plugin installer

ASTs, Objective CAML, and Ocamlyacc

Build your own languages with

MP 3 A Lexer for MiniJava

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

Wentworth Institute of Technology. Engineering & Technology WIT COMP1000. Java Basics

Building lexical and syntactic analyzers. Chapter 3. Syntactic sugar causes cancer of the semicolon. A. Perlis. Chomsky Hierarchy

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Compiler phases. Non-tokens

Writing a Lexer. CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Monday, February 6, Glenn G.

Parser Combinators 11/3/2003 IPT, ICS 1

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

( &% class MyClass { }

Compiler Theory. (Semantic Analysis and Run-Time Environments)

CSE 413 Final Exam Spring 2011 Sample Solution. Strings of alternating 0 s and 1 s that begin and end with the same character, either 0 or 1.

Simple Lexical Analyzer

MATVEC: MATRIX-VECTOR COMPUTATION LANGUAGE REFERENCE MANUAL. John C. Murphy jcm2105 Programming Languages and Translators Professor Stephen Edwards

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Chapter 3 Lexical Analysis

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Sequence structure. The computer executes java statements one after the other in the order in which they are written. Total = total +grade;

Assignment 1 (Lexical Analyzer)

cis20.1 design and implementation of software applications I fall 2007 lecture # I.2 topics: introduction to java, part 1

Compiler Compiler Tutorial

Program Fundamentals

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

CSE 3302 Programming Languages Lecture 2: Syntax

A Simple Syntax-Directed Translator

MP 3 A Lexer for MiniJava

Chapter 4. Abstract Syntax

1 Lexical Considerations

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

CSE 413 Final Exam. June 7, 2011

Mr. Monroe s Guide to Mastering Java Syntax

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

The MaSH Programming Language At the Statements Level

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Syntax Intro and Overview. Syntax

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

JME Language Reference Manual

COP4020 Programming Assignment 1 - Spring 2011

B The SLLGEN Parsing System

Typescript on LLVM Language Reference Manual

Conditionals, Loops, and Style

CPSC 411, 2015W Term 2 Midterm Exam Date: February 25, 2016; Instructor: Ron Garcia

Java: Classes. An instance of a class is an object based on the class. Creation of an instance from a class is called instantiation.

Lexical Analyzer Scanner

Lexical and Syntax Analysis

Introduction to Lexical Analysis

Name EID. (calc (parse '{+ {with {x {+ 5 5}} {with {y {- x 3}} {+ y y} } } z } ) )

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division

Peace cannot be kept by force; it can only be achieved by understanding. Albert Einstein

IPCoreL. Phillip Duane Douglas, Jr. 11/3/2010

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

Compiler Techniques MN1 The nano-c Language

Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Compiler Design Concepts. Syntax Analysis

YOLOP Language Reference Manual

Transcription:

JavaCC JavaCC File Structure JavaCC is a Lexical Analyser Generator and a Parser Generator Input: Set of regular expressions (each of which describes a type of token in the language), and a grammar defined using these tokens as constants Output: A lexical analyser, which reads an input file and separates it into tokens, and a parser which reads an input file and performs a syntax analysis on it JavaCC files have the form MyParser.jj They are processed by JavaCC using a command of the form javacc MyParser.jj. The following files will be generated for MyParser: MyParser.java : The generated parser MyParserTokenManager.java : The generated token manager (lexical analyser) MyParserConstants.java : A bunch of useful constants The structure of a JavaCC file is as follows: options /* Code to set various options flags */ PARSER_BEGIN(MyParser) public class MyParser /* Java program is placed here */ PARSER_END(MyParser) TOKEN_MGR_DECLS : /* Declarations used by lexical analyser */ /* Token Rules & Actions */ The name following PARSER_BEGIN and PARSER_END must be the same. A Java program is placed between the PARSER_BEGIN and PARSER_END. This must contain a class declaration with the same name as the generated parser. Compiler Construction 1 1 Compiler Construction 1 2

JavaCC Regular Expressions JavaCC Shorthand All characters and strings must be in quotation marks "else" "*" ("a" "b") All regular expressions involving * must be parenthesised ("a")*, not"a"* Canonlybeusedwith[] nota ["a","b","c","d"] = ("a" "b" "c" "d") ["d"-"g"] = ["d","e","f","g"] = ("d" "e" "f" "g") ["d"-"f","m"-"o"] = ["d","e","f","m","n","o"] = ("d" "e" "f" "M" "N" "O") (α)? = Optionally α (i.e., (α ɛ)) (α)+ = α(α)* (~["a","b"]) = Any character except a or b (~(a(a b)*b) is not legal) Examples: Regular Expression Langauge "if" if ["a"-"z"](["0"-"9","a"-"z"])* Set of legal identifiers ["0"-"9"]+ Set of integers (with leading zeroes) (["0"-"9"]+"."(["0"-"9"]*)) Set of real numbers ((["0"-"9"])*"."["0"-"9"]+) Compiler Construction 1 3 Compiler Construction 1 4

Token Rules in JavaCC Token Rules in JavaCC Tokens are described by rules with the following syntax: <TOKEN_NAME: RegularExpression> TOKEN_NAME is the name of the token being described RegularExpression is a regular expression that describes the token Examples: Several different tokens can be described in the same TOKEN block, with token descriptions separated by. For example: <ELSE: "else"> <INTEGER_LITERAL: (["0"-"9"])+> <SEMICOLON: ";"> <ELSE: "else"> <INTEGER_LITERAL: (["0"-"9"])+> Compiler Construction 1 5 Compiler Construction 1 6

getnexttoken getnexttoken When we run javacc on the input file MyParser.jj, it creates the class MyParserTokenManager The class MyParserTokenManager contains the static method getnexttoken() Every call to getnexttoken() returns the next token in the input stream. When getnexttoken is called, a regular expression is found that matches the next characters in the input stream. What if more than one regular expression matches the input stream? For example: <ELSE: "else"> <IDENTIFIER: (["a"-"z"])+> Use the longest match "elsed" should match to IDENTIFIER, notto ELSE followed by the identifier "d" If two matches have the same length, use the rule that appears first in the.jj file "else" should match to ELSE, not IDENTIFIER Compiler Construction 1 7 Compiler Construction 1 8

Tokens Example Each call to getnexttoken returns a Token object Token class is automatically created by JavaCC. Variables of type Token contain the following public variables: public int beginline, begincolumn, endline, endcolumn; - the location of the token in the input file public String image; - the text that was matched to create the token. public int kind; - the type of token. When javacc is run on the file MyParser.jj, a file MyParserConstants.java is created, which contains the symbolic names for each constant: public interface MyParserConstants int EOF = 0; int SEMIC = 1; int ASSIGN = 2; int PRINT = 3;... The following straight-line programming language from Appel s book (Chapter 1) will be used to illustrate some examples of using JavaCC. Stm Stm ; Stm (CompoundStm Stm id := Exp (AssignStm Stm print ( ExpList ) (PrintStm Exp id (IdExp Exp num (NumExp Exp Exp Binop Exp (OpExp Exp ( Stm, Exp ) (EseqExp ExpList Exp, ExpList (PairExpList ExpList Exp (LastExpList Binop + (Plus Binop - (Minus Binop (Times Binop / (Div Compiler Construction 1 9 Compiler Construction 1 10

Specifying Tokens For the Straight Line Programming Language /* Keywords and punctuation */ < SEMIC : ";" > < ASSIGN : ":=" > < PRINT : "print" > < LBR : "(" > < RBR : ")" > < COMMA : "," > < PLUS_SIGN : "+" > < MINUS_SIGN : "-" > < MULT_SIGN : "*" > < DIV_SIGN : "/" > /* Numbers and identifiers */ < NUM : (<DIGIT>)+ > < #DIGIT : ["0" - "9"] > < ID : (<LETTER>)+ > < #LETTER : ["a" - "z", "A" - "Z"] > SKIP Rules Tell JavaCC what to ignore (typically whitespace) using SKIP rules SKIP rule is just like a TOKEN rule, except that no TOKEN is returned. SKIP: < regularexpression1 > < regularexpression2 >... < regularexpressionn > For example: /* Anything not recognised so far */ < OTHER : ~[] > SKIP : /* Ignoring spaces/tabs/newlines */ " " "\t" "\n" "\r" "\f" Compiler Construction 1 11 Compiler Construction 1 12

JavaCC States JavaCC States Comments can be dealt with using SKIP rules How could we skip over 1-line C++ Style comments? // This is a comment Using a SKIP rule: SKIP : < "//" (~["\n"])* "\n" > Writing a regular expression to match multiline comments (using /* and */) ismuchmore difficult Writing a regular expression to match nested comments is impossible Therefore need to use JavaCC States We can label each TOKEN and SKIP rule with a state Unlabeled TOKEN and SKIP rules are assumed to be in the default state (named DEFAULT) Can switch to a new state after matching a TOKEN or SKIP rule using the : NEWSTATE notation For example: SKIP : < "/*" > : IN_COMMENT <IN_COMMENT> SKIP : < "*/" > : DEFAULT < ~[] > Compiler Construction 1 13 Compiler Construction 1 14

Actions in TOKEN and SKIP The TokenManager Code How do we deal with nested comments? Through the use of actions We can add Java code to any SKIP or TOKEN rule That code will be executed when the SKIP or TOKEN rule is matched. Any methods/variables defined in the TOKEN_MGR_DECLS section can be used by these actions. For example: TOKEN_MGR_DECLS : static int commentnesting = 0; SKIP : /* COMMENTS */ "/*" commentnesting++; : IN_COMMENT <IN_COMMENT> SKIP : "/*" commentnesting++; "*/" commentnesting--; if (commentnesting == 0) SwitchTo(DEFAULT); <~[]> /******************************* ***** SECTION 1 - OPTIONS ***** *******************************/ options JAVA_UNICODE_ESCAPE = true; /********************************* ***** SECTION 2 - USER CODE ***** *********************************/ PARSER_BEGIN(SLPTokeniser) public class SLPTokeniser public static void main(string args[]) SLPTokeniser tokeniser; if (args.length == 0) System.out.println("Reading from standard input..."); tokeniser = new SLPTokeniser(System.in); else if (args.length == 1) try tokeniser = new SLPTokeniser(new java.io.fileinputstream(args[0]) catch (java.io.filenotfoundexception e) System.err.println("File " + args[0] + " not found."); return; else System.out.println("SLP Tokeniser: Usage is one of:"); System.out.println(" java SLPTokeniser < inputfile"); System.out.println("OR"); System.out.println(" java SLPTokeniser inputfile"); return; Compiler Construction 1 15 Compiler Construction 1 16

The TokenManager Code /* * We ve now initialised the tokeniser to read from the appropriate place, * so just keep reading tokens and printing them until we hit EOF */ for (Token t = getnexttoken(); t.kind!=eof; t = getnexttoken()) // Print out the actual text for the constants, identifiers etc. if (t.kind==num) System.out.print("Number"); System.out.print("("+t.image+") "); else if (t.kind==id) System.out.print("Identifier"); System.out.print("("+t.image+") "); else System.out.print(t.image+" "); PARSER_END(SLPTokeniser) Compiler Construction 1 17