ECS 142 Spring Project (part 2): Lexical and Syntactic Analysis Due Date: April 22, 2011: 11:59 PM.

Similar documents
Object-oriented design Goal: construct a representation for program. Approach: Advantage:

The MaSH Programming Language At the Statements Level

1 Lexical Considerations

Lexical Considerations

Lexical Considerations

An Analyzer for Java. W. M. Waite W. E. Munsil. September 11, 2008

Typescript on LLVM Language Reference Manual

MATVEC: MATRIX-VECTOR COMPUTATION LANGUAGE REFERENCE MANUAL. John C. Murphy jcm2105 Programming Languages and Translators Professor Stephen Edwards

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

A Short Summary of Javali

Programming in C++ 4. The lexical basis of C++

The SPL Programming Language Reference Manual

Contents 1 Introduction 4 2 Programming Language Design Concerns Rationale for the Selection of Java....

Decaf Language Reference Manual

Syntax Intro and Overview. Syntax

Syntax. A. Bellaachia Page: 1

IC Language Specification

CPS 506 Comparative Programming Languages. Syntax Specification

Project 1: Scheme Pretty-Printer

Contents. Jairo Pava COMS W4115 June 28, 2013 LEARN: Language Reference Manual

Standard 11. Lesson 9. Introduction to C++( Up to Operators) 2. List any two benefits of learning C++?(Any two points)

IPCoreL. Phillip Duane Douglas, Jr. 11/3/2010

Sprite an animation manipulation language Language Reference Manual

Decaf Language Reference

VENTURE. Section 1. Lexical Elements. 1.1 Identifiers. 1.2 Keywords. 1.3 Literals

A programming language requires two major definitions A simple one pass compiler

Computational Expression

Reference Grammar Meta-notation: hfooi means foo is a nonterminal. foo (in bold font) means that foo is a terminal i.e., a token or a part of a token.

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

CSCI312 Principles of Programming Languages!

A simple syntax-directed

Types, Values and Variables (Chapter 4, JLS)

Features of C. Portable Procedural / Modular Structured Language Statically typed Middle level language

Ordinary Differential Equation Solver Language (ODESL) Reference Manual

VLC : Language Reference Manual

Language Reference Manual simplicity

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

B.V. Patel Institute of BMC & IT, UTU 2014

Object oriented programming. Instructor: Masoud Asghari Web page: Ch: 3

ARG! Language Reference Manual

CSC 467 Lecture 3: Regular Expressions

COMS W4115 Programming Languages & Translators FALL2008. Reference Manual. VideO Processing Language (VOPL) Oct.19 th, 2008

Compiler Construction D7011E

A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994

Compilers. Compiler Construction Tutorial The Front-end

The Warhol Language Reference Manual

Reference Grammar Meta-notation: hfooi means foo is a nonterminal. foo (in bold font) means that foo is a terminal i.e., a token or a part of a token.

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Introduction to Programming Using Java (98-388)

Project 2 Interpreter for Snail. 2 The Snail Programming Language

KU Compilerbau - Programming Assignment

DECLARATIONS. Character Set, Keywords, Identifiers, Constants, Variables. Designed by Parul Khurana, LIECA.

A Simple Syntax-Directed Translator

9. The Compiler I Syntax Analysis 1

Java Notes. 10th ICSE. Saravanan Ganesh

Full file at

SFPL Reference Manual

GraphQuil Language Reference Manual COMS W4115

The PCAT Programming Language Reference Manual

High Level Languages. Java (Object Oriented) This Course. Jython in Java. Relation. ASP RDF (Horn Clause Deduction, Semantic Web) Dr.

BIT Java Programming. Sem 1 Session 2011/12. Chapter 2 JAVA. basic

CA4003 Compiler Construction Assignment Language Definition

The Decaf Language. 1 Lexical considerations

Building lexical and syntactic analyzers. Chapter 3. Syntactic sugar causes cancer of the semicolon. A. Perlis. Chomsky Hierarchy

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking

An Introduction to LEX and YACC. SYSC Programming Languages

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

JME Language Reference Manual

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Java-Programm. Klassen. CompilitionUnit ::= PackageDeclaration opt ImportDeclarations opt TypeDeclarations

Programming Languages & Translators. XML Document Manipulation Language (XDML) Language Reference Manual

BASIC ELEMENTS OF A COMPUTER PROGRAM

ASML Language Reference Manual

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Renovating an Open Source Project: RECODER

Language Reference Manual

SFU CMPT 379 Compilers Spring 2018 Milestone 1. Milestone due Friday, January 26, by 11:59 pm.

The pixelman Language Reference Manual. Anthony Chan, Teresa Choe, Gabriel Kramer-Garcia, Brian Tsau

Pac Man Game Programming Language. Reference Manual. Chun Kang Chen (cc3260) Hui Hsiang Kuo (hk2604) Shuwei Cao (sc3331) Wenxin Zhu (wz2203)

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

QUark Language Reference Manual

6.096 Introduction to C++ January (IAP) 2009

Chapter 3. Describing Syntax and Semantics ISBN

Principles of Programming Languages COMP251: Syntax and Grammars

Chapter 4. Lexical and Syntax Analysis

More Assigned Reading and Exercises on Syntax (for Exam 2)

Using an LALR(1) Parser Generator

PYTHON- AN INNOVATION

Lexical and Syntax Analysis

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

COMS W4115 Programming Languages & Translators GIRAPHE. Language Reference Manual

CSCI312 Principles of Programming Languages!

Overview. - General Data Types - Categories of Words. - Define Before Use. - The Three S s. - End of Statement - My First Program

Expressions and Data Types CSC 121 Spring 2015 Howard Rosenthal

Lexical Analysis. Introduction

EECS483 D1: Project 1 Overview

UNIVERSITY OF COLUMBIA ADL++ Architecture Description Language. Alan Khara 5/14/2014

corgi Language Reference Manual COMS W4115

COP4020 Programming Assignment 1 CALC Interpreter/Translator Due March 4, 2015

Transcription:

Project (part 2): Lexical and Syntactic Analysis Due Date: April 22, 2011: 11:59 PM. 1 Overview This course requires you to write a compiler that takes a program written in a language, Java, and constructs an equivalent program written in assembly instructions for MIPS architecture family. Java is a subset of the Java programming language that inherits types, methods, expressions, and the notion of single inheritance from Java. However, it does not include more complex concepts such as concurrency, exception handling, packaging mechanism and interfaces. Further, it includes only a subset of Java s operators and expression building mechanisms. Finally, we have removed arrays from the language, since their implementation is very similar to that of classes. The primary objective in designing this project has been to select those features that will facilitate your understanding of how specific language features can be implemented in a compiler. This document first describes the lexical structure of the tokens of Java, followed by its syntactic structure. The semantics of the various constructs will be described in a separate document. 2 Lexical Structure Java derives much of its lexical structure from Java. The different tokens of the language are described below: 2.1 Comments and White Space Spaces, end of line, and comments may occur anywhere in a program except within a basic symbol. At least one space, end of line or comment must occur between any two adjacent identifiers or constants. Java supports single-line comments that begin with // and terminate at the end of the line. 2.2 Reserved words Java reserves the following identifiers: boolean char class continue else extends false if input output int new null return super string this true while void 2.3 Constants The language supports the following set of constants: 1. Integers: Integer constants denote any number of digits between 0 and 9. -1-

2. Char: Char constants begin and end with a single quote ( ) and may contain any ascii characters. In addition, the following (non)printable characters must be escaped using the following convention: Character Newline \n Tabulator \t Backspace \b Form feed \f \ \\ " \" Escape 3. Strings: Strings begin and end with a double quote ( ) and may contain any sequence of characters including newlines. 2.4 Identifiers An identifier is a sequence of letters, digits and the underscore character. It must start with a letter, and cannot be one of the reserved words. 2.5 Operators Java defines the following operators:,. ; ( ) { } + - * / = &&! > < ==!= >= <= 3 Syntax Description We use the EBNF notation to describe the syntax of Java. A few notes about the notation: Nonterminals begin with uppercase letters. Terminals that are grammar symbols ( [ for instance) or other non-identifier symbols ( ;, {, *, +, etc.) are enclosed in single quotes. Keywords (such as class and boolean ) are represented directly by their corresponding string representations. 3.1 Operator precedences and associativity The rules of composition specify operator precedences. The operators (from lowest precedence to the highest) are: Certain tokens such as identifier, integer and string constants are represented respectively by id, integer constant and string constant tokens. Your lexical analyzer will return them as tokens. -2-

&& ==!= < > <= >= + - * /! Unary - Unary + All operators other than!, Unary - and Unary + are left associative. Operators!, Unary - and Unary + are right associative. 3.2 EBNF representation The following describes the syntactic structure of Java. Program is the start symbol for the grammar. Program ::= ClassDeclaration* ClassDeclaration ::= class id [Extends] ClassBody Extends ::= extends ClassType ClassBody ::= { ClassBodyDeclaration* } ClassBodyDeclaration ::= ClassMemberDeclaration ClassMemberDeclaration ::= FieldDeclaration MethodDeclaration Type ::= PrimitiveType ReferenceType PrimitiveType ::= int boolean char string ReferenceType ::= ClassType ClassType ::= SimpleName Name ::= SimpleName QualifiedName SimpleName ::= id QualifiedName ::= Name. id FieldDeclaration ::= Type VariableDeclarators ; VariableDeclarators ::= VariableDeclarator (, VariableDeclarator)* VariableDeclarator ::= id MethodDeclaration ::= MethodHeader MethodBody MethodHeader ::= Type MethodDeclarator void MethodDeclarator MethodDeclarator ::= id ( [FormalParameterList] ) FormalParameterList ::= FormalParameter (, FormalParameter)* FormalParameter ::= Type id MethodBody ::= { LocalVariableDeclarationStatement* Statement* } ; LocalVariableDeclarationStatement ::= Type VariableDeclarators ; Statement ::= IfThenStatement IfThenElseStatement -3-

WhileStatement SimpleBlock EmptyStatement ExpressionStatement ContinueStatement ReturnStatement IOStatement SimpleBlock ::= { Statement* } EmptyStatement ::= ; ExpressionStatement ::= StatementExpression ; StatementExpression ::= MethodInvocation Assignment ClassInstanceCreationExpression IfThenStatement ::= if ( Expression ) Statement IfThenElseStatement ::= else Statement if ( Expression ) Statement WhileStatement ::= while ( Expression ) Statement ContinueStatement ::= continue ; ReturnStatement ::= return [Expression] ; IOStatement ::= (input output) Expression ; Primary ::= this Literal ( Expression ) ClassInstanceCreationExpression FieldAccess MethodInvocation ClassInstanceCreationExpression ::= new ClassType ( [ArgumentList] ) ArgumentList ::= Expression (, Expression)* FieldAccess ::= Primary. id super. id MethodInvocation ::= Name ( [ArgumentList] ) Primary. id ( [ArgumentList] ) super. id ( [ArgumentList] ) PrimitiveExpression ::= Name Expression ::= Primary Expression * Expression Expression / Expression Expression + Expression Expression - Expression Expression && Expression Expression Expression Expression == Expression Expression!= Expression Expression < Expression Expression > Expression Expression <= Expression Expression >= Expression -4-

Assignment - Expression + Expression! Expression PrimitiveExpression Assignment ::= LeftHandSide = Expression LeftHandSide ::= Name FieldAccess Literal ::= integer_constant string_constant null true false 4 Project Description For this phase of the project, you will implement a lexical analyzer and a parser for Java. Use Lex to construct the lexical analyzer, and Yacc to construct the syntactic analyzer. The output of your compiler will mainly involve reporting any lexical or parsing errors. Your lexical analyzer must detect the following set of errors: 1. unterminated character strings 2. unterminated comments, and 3. illegal characters. Your parser should also recognize any syntactic errors. It should handle errors by reporting the error along with the line number where the error occurred and then exit. In other words, your compiler will only report the first error in the source program. The grammar provided above can be used with Yacc with minor modifications. It has been presented in an expanded form to make it clearer. Feel free to modify or optimize it, as long as it generates the same language. -5-