Assignment 1 (Lexical Analyzer)

Similar documents
Assignment 1 (Lexical Analyzer)

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

Structure of Programming Languages Lecture 3

Implementation of Lexical Analysis

Lexical Analysis. Sukree Sinthupinyo July Chulalongkorn University

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

Figure 2.1: Role of Lexical Analyzer

Chapter 3: Lexical Analysis

Projects for Compilers

Lexical and Syntax Analysis

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

Lexical Analysis. Implementation: Finite Automata

Implementation of Lexical Analysis

Formal Languages and Compilers Lecture VI: Lexical Analysis

Lexical Analysis. Introduction

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Chapter 3 Lexical Analysis

MANIPAL INSTITUTE OF TECHNOLOGY (Constituent Institute of MANIPAL University) MANIPAL

CS415 Compilers. Lexical Analysis

Building lexical and syntactic analyzers. Chapter 3. Syntactic sugar causes cancer of the semicolon. A. Perlis. Chomsky Hierarchy

Compiler course. Chapter 3 Lexical Analysis

Recognition of Tokens

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

Implementation of Lexical Analysis

Compiler Construction

WARNING for Autumn 2004:

Lexical Analysis. Lecture 3. January 10, 2018

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Compiler phases. Non-tokens

Compilers: table driven scanning Spring 2008

Front End: Lexical Analysis. The Structure of a Compiler

UNIT -2 LEXICAL ANALYSIS

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

2. λ is a regular expression and denotes the set {λ} 4. If r and s are regular expressions denoting the languages R and S, respectively

Lexical Analysis. Lecture 3-4

CS 403: Scanning and Parsing

Lexical Analysis 1 / 52

Finite Automata and Scanners

MP 3 A Lexer for MiniJava

CS 310: State Transition Diagrams

Lexical Analysis. Chapter 2

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

Monday, August 26, 13. Scanners

Lexical Analysis. Lecture 2-4

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Wednesday, September 3, 14. Scanners

CS164: Midterm I. Fall 2003

CS 314 Principles of Programming Languages. Lecture 3

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

CMSC 350: COMPILER DESIGN

The Language for Specifying Lexical Analyzer

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

Unit-II Programming and Problem Solving (BE1/4 CSE-2)

MP 3 A Lexer for MiniJava

Implementation of Lexical Analysis

Implementation of Lexical Analysis

Lexical Analysis (ASU Ch 3, Fig 3.1)

Implementation of Lexical Analysis

CSE302: Compiler Design

Name ( ) Person Number ( )

Compiler Construction

Lexical and Syntax Analysis

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Lecture 9 CIS 341: COMPILERS

1. Lexical Analysis Phase

A simple syntax-directed

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Compiling Regular Expressions COMP360

CSc 453 Lexical Analysis (Scanning)

Program Analysis ( 软件源代码分析技术 ) ZHENG LI ( 李征 )

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Introduction to Lexical Analysis

Compiler Construction D7011E

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Syntax Intro and Overview. Syntax

CSc 453 Compilers and Systems Software

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Compiler Construction LECTURE # 3

Lexical Scanning COMP360

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

CSCI312 Principles of Programming Languages!

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Automated Tools. The Compilation Task. Automated? Automated? Easier ways to create parsers. The final stages of compilation are language dependant

Lexical and Syntactic Analysis

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

CS 415 Midterm Exam Spring SOLUTION

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Programming in C++ 4. The lexical basis of C++

CS308 Compiler Principles Lexical Analyzer Li Jiang

EECS483 D1: Project 1 Overview

Week 2: Syntax Specification, Grammars

Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Transcription:

Assignment 1 (Lexical Analyzer) Compiler Construction CS4435 (Spring 2015) University of Lahore Maryam Bashir Assigned: Saturday, March 14, 2015. Due: Monday 23rd March 2015 11:59 PM Lexical analysis Lexical analysis is the process of reading in the stream of characters making up the source code of a program and dividing the input into tokens. In this assignment, you will use regular expressions and DFAs to implement a lexical analyzer for a subset of C programming language. Your Task Your task is to write a program that reads an input text file, and constructs a list of tokens in that file. Your program may be written in C, C++, Java or other programming language. Assuming that the input file contains the following code string: 2.1 Sample Input (C++ code) void main () { int sum = 0; while(sum == 0) { sum = sum + 10.43 + 34E4 + 45.34E-4 + E43 +.34; } } 2.2 Sample Output The output of the program should be similar to the following: Class : Lexeme keyword : void identifier : main 1

( : ( ) : ) { : { keyword : int = : = num : 0 ; : ; keyword : while ( : ( relop : == num : 0 ) : ) { : { = : = num : 10.43 num : 34.E4 num : 45.34E-4 identifier : E43 Error :. num : 34 ; : ; } : } } : } Note: Program prints error on. and considers 34 a separate number. E34 is also treated as identifier and not a number. Valid Tokens Programs in this language are composed of tokens displayed in table 1 : 2

Token Type Lexical Specification keyword One of the strings while, if, else, return, break, continue, int, float, void Token Id for identifiers matches a letter followed by letters or digits or underscore: identifier letter [A-Z a-z] digit [0-9] id letter (letter digit ) Token num matches unsigned numbers: digits digit digit num optional-fraction (. digits) ɛ optional-exponent (E(+ ɛ) digits ) ɛ num digits optional-fraction optional-exponent addop +, - mulop, / relop <, >, <=, >=, ==,! = assignop = and && or not! ) ) ( ( { { } } [ [ ] ] ; ; Table 1: Table of lexical specifications for tokens 3

Construction of Deterministic Finite Automata (DFA) or Transition Diagrams You can construct single DFA (also called transition diagram) for recognizing all tokens in this language by combining individual DFAs for each type of token. For example you can construct transition diagram for identifiers and numbers and them merge the start states of two diagrams to create a single transition diagram that recognizes both numbers and identifiers. For identifying keywords, you can store keywords in some data structure (hashmap or string array). Whenever your program recognizes a token of identifier, you can check your map or string array if this identifier matches keyword. If it matches keyword then consider it keyword otherwise consider it identifier. Implementation of Deterministic Finite Automata (DFA) I have written a small lexical analyser for a simple DFA shown in figure 1. You can find its source code ( Lexer.java ) on Piazza site. You can either make changes to this code or write your own lexical analyser from scratch. You can also write C++ version of this Java code if you want to program in C++. Please note that this DFA is not correct DFA for this assignment. It is just a sample DFA to give you an idea of how to convert your DFA to code. For example this DFA only allows letters in an identifier whereas in your C language you can use digits and underscore in identifier as well. Submission Email a zip file containing your complete project (all source files, along with whatever other files are needed to compile them; sample input and output files) to the following address: maryam.bashir@cs.uol.edu.pk The name of zip file should be roll numbers of all students in the group as follows: RollNumber1-RollNumber2 You should try to work on this assignment individually. If you think your programming is very weak then you can work in group size of maximum 2 students. 4

whitespace 7 8 other 6 error letter whitespace other 0 letter 1 digit other = 3 5 digit 4 2 Figure 1: DFA 1 5