CS 2210 Programming Project (Part I)

Similar documents
COP4020 Programming Assignment 1 CALC Interpreter/Translator Due March 4, 2015

CS2210 Compiler Construction Spring Part II: Syntax Analysis

CS 541 Spring Programming Assignment 2 CSX Scanner

Programming Assignment I Due Thursday, October 9, 2008 at 11:59pm

Programming Assignment I Due Thursday, October 7, 2010 at 11:59pm

Programming Assignment II

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

CS 6353 Compiler Construction Project Assignments

An Introduction to LEX and YACC. SYSC Programming Languages

Lecture Outline. COMP-421 Compiler Design. What is Lex? Lex Specification. ! Lexical Analyzer Lex. ! Lex Examples. Presented by Dr Ioanna Dionysiou

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Module 8 - Lexical Analyzer Generator. 8.1 Need for a Tool. 8.2 Lexical Analyzer Generator Tool

Yacc: A Syntactic Analysers Generator

Lex & Yacc. by H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

CS 6353 Compiler Construction Project Assignments

CS143 Handout 05 Summer 2011 June 22, 2011 Programming Project 1: Lexical Analysis

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process.

CS 2210 Programming Project (Part IV)

Figure 2.1: Role of Lexical Analyzer

CS 426 Fall Machine Problem 1. Machine Problem 1. CS 426 Compiler Construction Fall Semester 2017

Using Lex or Flex. Prof. James L. Frankel Harvard University

1 The Var Shell (vsh)

Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Introduction to Lex & Yacc. (flex & bison)

Lex & Yacc. By H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Chapter 3 -- Scanner (Lexical Analyzer)

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 3

Typescript on LLVM Language Reference Manual

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Programming Project 1: Lexical Analyzer (Scanner)

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process

Handout 7, Lex (5/30/2001)

CSC 467 Lecture 3: Regular Expressions

More Examples. Lex/Flex/JLex

Standard 11. Lesson 9. Introduction to C++( Up to Operators) 2. List any two benefits of learning C++?(Any two points)

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

The Structure of a Syntax-Directed Compiler

Project 1: Scheme Pretty-Printer

Programming in C++ 4. The lexical basis of C++

Contents. Jairo Pava COMS W4115 June 28, 2013 LEARN: Language Reference Manual

Features of C. Portable Procedural / Modular Structured Language Statically typed Middle level language

KU Compilerbau - Programming Assignment

IBM. UNIX System Services Programming Tools. z/os. Version 2 Release 3 SA

Ray Pereda Unicon Technical Report UTR-02. February 25, Abstract

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 2

The Structure of a Syntax-Directed Compiler

A Simple Syntax-Directed Translator

Languages and Compilers

Parsing and Pattern Recognition

Lex Spec Example. Int installid() {/* code to put id lexeme into string table*/}

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Assoc. Prof. Dr. Marenglen Biba. (C) 2010 Pearson Education, Inc. All rights reserved.

CMSC445 Compiler design Blaheta. Project 2: Lexer. Due: 15 February 2012

CS 4201 Compilers 2014/2015 Handout: Lab 1

CS4850 SummerII Lex Primer. Usage Paradigm of Lex. Lex is a tool for creating lexical analyzers. Lexical analyzers tokenize input streams.

12/22/11. Java How to Program, 9/e. Help you get started with Eclipse and NetBeans integrated development environments.

Flex and lexical analysis. October 25, 2016

Assignment 1 (Lexical Analyzer)

L L G E N. Generator of syntax analyzier (parser)

ECS 142 Spring Project (part 2): Lexical and Syntactic Analysis Due Date: April 22, 2011: 11:59 PM.

JFlex Regular Expressions

CS 297 Report. By Yan Yao

Table of Contents Date(s) Title/Topic Page #s. Abstraction

Assignment 1 (Lexical Analyzer)

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

CS143 Handout 04 Summer 2011 June 22, 2011 flex In A Nutshell

Marcello Bersani Ed. 22, via Golgi 42, 3 piano 3769

Using an LALR(1) Parser Generator

VENTURE. Section 1. Lexical Elements. 1.1 Identifiers. 1.2 Keywords. 1.3 Literals

MP 3 A Lexer for MiniJava

Reading Assignment. Scanner. Read Chapter 3 of Crafting a Compiler.

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Project 2 Interpreter for Snail. 2 The Snail Programming Language

CS/IT 114 Introduction to Java, Part 1 FALL 2016 CLASS 3: SEP. 13TH INSTRUCTOR: JIAYIN WANG

C Language, Token, Keywords, Constant, variable

Full file at C How to Program, 6/e Multiple Choice Test Bank

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Compiler Construction D7011E

DOID: A Lexical Analyzer for Understanding Mid-Level Compilation Processes

The Language for Specifying Lexical Analyzer

c) Comments do not cause any machine language object code to be generated. d) Lengthy comments can cause poor execution-time performance.

BASIC ELEMENTS OF A COMPUTER PROGRAM

Compiler Construction Assignment 3 Spring 2018

Ulex: A Lexical Analyzer Generator for Unicon

LECTURE 11. Semantic Analysis and Yacc

TDDD55 - Compilers and Interpreters Lesson 3

MP 3 A Lexer for MiniJava

POLITECNICO DI TORINO. Formal Languages and Compilers. Laboratory N 1. Laboratory N 1. Languages?

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

Formal Languages and Compilers

Part 5 Program Analysis Principles and Techniques

Alternation. Kleene Closure. Definition of Regular Expressions

Decaf Language Reference Manual

Assignment 4. Overview. Prof. Stewart Weiss. CSci 335 Software Design and Analysis III Assignment 4

CSCI 2010 Principles of Computer Science. Data and Expressions 08/09/2013 CSCI

Transcription:

CS 2210 Programming Project (Part I) January 17, 2018 Lexical Analyzer In this phase of the project, you will write a lexical analyzer for the CS 2210 programming language, MINI-JAVA. The analyzer will consist of a scanner, written in LEX, and routines to manage a lexical table, written in C. The rest of the compiler project will communicate with these program modules. Due date The assignment is due January 31, 2018 at the beginning of the class. Token specification Figure 1 defines the tokens that must be recognized, with their associated symbolic names. All multi-symbol tokens are separated by blanks, tabs, newlines, comments or delimiters. Comments are enclosed in /*... */ and cannot be nested. An identifier is a sequence of (upper or lower case) letters or digits, beginning with a letter. Identifiers are case sensitive (i.e. the identifier ABC is different from Abc). There is no limit on the length of identifiers. However, you may impose limits on the total number of distinct identifiers and string lexemes and on the total number of characters in all distinct identifiers and strings taken together (the size of the string table). If defined, these limits should be defined as follows. # d e f i n e LIMIT1 500 # d e f i n e LIMIT2 4096 There should be no other limitation on the number of lexemes that the lexical analyzer will process. An integer constant is an unsigned sequence of digits representing a base 10 number. A string constant is a sequence of characters surrounded also by single quotes, e.g. Hello, world. Hardto-type or invisible characters can be represented in character and string constants by escape sequences; these sequences look like two characters, but represent only one. The escape sequences support by the MINI-JAVA language are \n for newline, \t for tab, \ for the single quote and \\ for the backslash. Any other character following a backslash is not treated as escape sequence. 1

Token attributes A unique identification of each token (integer aliased with the symbolic token name) must be returned by the lexical analyzer. In addition, the lexical analyzer must pass extra information about some tokens to the parser (the lexeme). This extra information is passed to the parser as a single value, namely an integer, through a global variable as described below. For integer constants, the numeric value of the constant is passed. In order to allow other passes of the compiler to access the original identifier lexeme, the lexical analyzer passes an integer uniquely identifying an identifier (other than reserved words). String constants are treated in the same way, with a unique identifying number being passed. The unique identifying number for both identifiers and string constants should be an index (pointer) into a string table created by the lexical analyzer to record the lexemes. Same identifiers should return the same index. Implementation The central routine of the scanner is yylex, an integer function that returns a token number, indicating the type (identifier, integer constant, semicolon, etc.), of the next token in the input stream. In addition to the token type, yylex must set the global variables yyline and yycolumn to the line and column number at which that token appears. In the case of integer and string constants, store the value into the global integer variable yylval. Lex will write yylex for you, using the patterns and rules defined in your lex input file (which should be called lexer.l. Your rules must include the code to maintain yyline, yycolumn and yylval. In the case of identifiers and string constant, yylval contains an index into the string table that contains the real string followed by a null( \0 ) character. The same index should be returned for the same identifier that appear at different places. Similarly the same index is returned for the same string. Also identifiers and string constants need not be differentiated in the string table (i.e. abc and abc can have the same index in the string table). Reserved words may be handled as regular expressions or stored as part of the id table. For example, reserved words may be pre-stored in the string table so your program can determine a reserve word from an identifier by the section of the table in which the lexeme is found. Efficiency should be a factor in the management of the lexical and string table. You are to write a routine ReportError that takes a message and line and column numbers and reports an error, printing the message and indicating the position of the error. You need only print the line and column number to indicate the position. The #define mechanism should be used to allow the lexical analyzer to return token numbers symbolically. In order to avoid using token names that are reserved or significant in C or in the parser, the token names have been specified for you in Figure 1. The parser and the lexical analyzer must agree on the token number to ensure correct communication between them. The token numbers can be chosen by you, as the compiler writer, or, by default, by Yacc (a parser generator to be used in the next assignment). The default token number for a literal character (i.e. a one-character token) is the numerical value of the character in the local character set. Other names are assigned token numbers starting at 257 in the order they are declared in your Yacc specification. Regardless of how token numbers are chosen, the end-marker must has token number 0 or negative, and thus your lexical analyzer must return a 0 ( or a negative) as a token number upon reaching the end of input. 2

Token name Symbolic Name Token Name Symbolic Name ANDnum && CLASSnum class ASSGNnum := COMMAnum, DECLARATIONnum declarations DIVIDEnum / DOTnum. ELSEnum else ENDDECLARATIONSnum enddeclarations EQnum == EQUALnum = GEnum >= GTnum > ICONSTnum integerconstant IDnum identifier IFnum if INTnum int LBRACEnum { LBRACnum [ LEnum <= LPARENnum ( LTnum < METHODnum method MINUSnum - NEnum!= NOTnum! ORnum PLUSnum + PROGRAMnum program RBRACEnum RBRACnum ] RETURNnum return RPARENnum ) SCONSTnum stringconstant SEMInum ; TIMESnum * VALnum val VOIDnum void WHILEnum while EOFnum end of file Figure 1: Defined Tokens in Mini Java. Temporary driver In order to test your lexical analyzer without a parser, you will have to write a simple driver program which calls your lexical analyzer and print each token with its value as the input is scanned. For ease in combining the lexical analyzer and parser in the second assignment, the lexical analyzer function should be put in a file by itself. The following shows the structure of a driver. If you use it, please remember to break from the endless loop after recognizing the end of file token. main() {... while (1) { switch (yylex()) { case ICONSTnum:...... Error handling Your lexical analyzer should recover from all malformed lexemes, as well as such things as string constants that extend across a line boundary or comments that are never terminated. Specifically, an identifier which starts with a digit is considered to be an error and should be reported. 3

An example program with output The program: / Example 1 : A h e l l o world program / program xyz ; c l a s s T e s t { method void main ( ) { System. p r i n t l n ( H e l l o World!!! ) ; The output of Lexical Analyzer: Line Column Token Index in String table 2 13 PROGRAMnum 2 17 IDnum 0 2 18 SEMInum 3 11 CLASSnum 4 3 16 IDnum 3 18 LBRACEnum 4 18 METHODnum 4 23 VOIDnum 4 28 IDnum 9 4 29 LPARENnum 4 30 RPARENnum 4 32 LBRACEnum 5 22 IDnum 14 5 23 DOTnum 5 30 IDnum 21 5 31 LPARENnum 5 48 SCONSTnum 29 5 49 RPARENnum 5 50 SEMInum 6 13 RBRACEnum 7 7 RBRACEnum EOFnum String Table : xyz test main system println Hello World!!! Assignment submission When you are done, create a gzipped tarball of your commented source files. You must include a file that shows how to compile/execute your code named Readme.txt. Preferably, include a makefile named Makefile. The submission should be a compressed file that contains your project source code and readme (no executable please). On Linux, this can be done with the command tar zcvf USERNAME proj1.tar.gz *. Copy your archive to the directory: wahn/submit/2210/. 4

Make sure you name the file with your username, and that you have your name in the comments of your source file. Note that this directory is insert-only, you may not delete or modify your submission once in the directory. You can however check that the file has been copied by listing the contents of that directory. If you have made a mistake, before the deadline, resubmit with a number suffix like USERNAME proj1 2.tar.gz. Your most recent submission will be graded. 5