Assignment 1 (Lexical Analyzer)

Similar documents
Assignment 1 (Lexical Analyzer)

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

Structure of Programming Languages Lecture 3

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Implementation of Lexical Analysis

Projects for Compilers

Lexical Analysis. Introduction

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Chapter 3 Lexical Analysis

Implementation of Lexical Analysis

MANIPAL INSTITUTE OF TECHNOLOGY (Constituent Institute of MANIPAL University) MANIPAL

Building lexical and syntactic analyzers. Chapter 3. Syntactic sugar causes cancer of the semicolon. A. Perlis. Chomsky Hierarchy

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Formal Languages and Compilers Lecture VI: Lexical Analysis

Compiler course. Chapter 3 Lexical Analysis

Lexical Analysis 1 / 52

Lexical and Syntax Analysis

Compiler Construction

Figure 2.1: Role of Lexical Analyzer

CS415 Compilers. Lexical Analysis

Lexical Analysis. Sukree Sinthupinyo July Chulalongkorn University

Chapter 3: Lexical Analysis

Lexical Analysis. Implementation: Finite Automata

Recognition of Tokens

MP 3 A Lexer for MiniJava

Monday, August 26, 13. Scanners

WARNING for Autumn 2004:

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

Wednesday, September 3, 14. Scanners

Lexical Analysis. Lecture 3. January 10, 2018

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

CS 314 Principles of Programming Languages. Lecture 3

Compiler phases. Non-tokens

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Lexical Analysis. Lecture 3-4

Compilers: table driven scanning Spring 2008

Front End: Lexical Analysis. The Structure of a Compiler

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

2. λ is a regular expression and denotes the set {λ} 4. If r and s are regular expressions denoting the languages R and S, respectively

UNIT -2 LEXICAL ANALYSIS

Lexical Analysis. Chapter 2

MP 3 A Lexer for MiniJava

Finite Automata and Scanners

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

CS 403: Scanning and Parsing

Lexical Analysis. Lecture 2-4

CS 310: State Transition Diagrams

Compiler Construction

CS164: Midterm I. Fall 2003

CSc 453 Lexical Analysis (Scanning)

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

CMSC 350: COMPILER DESIGN

Implementation of Lexical Analysis

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

The Language for Specifying Lexical Analyzer

Compiler Construction LECTURE # 3

Programming in C++ 4. The lexical basis of C++

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

CSCI312 Principles of Programming Languages!

Implementation of Lexical Analysis

Unit-II Programming and Problem Solving (BE1/4 CSE-2)

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

Automated Tools. The Compilation Task. Automated? Automated? Easier ways to create parsers. The final stages of compilation are language dependant

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

CS 415 Midterm Exam Spring SOLUTION

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Computational Expression

Syntax Intro and Overview. Syntax

CSE302: Compiler Design

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Lecture 9 CIS 341: COMPILERS

Name ( ) Person Number ( )

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

Compiling Regular Expressions COMP360

A simple syntax-directed

Week 2: Syntax Specification, Grammars

Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

1. Lexical Analysis Phase

CS308 Compiler Principles Lexical Analyzer Li Jiang

A Scanner should create a token stream from the source code. Here are 4 ways to do this:

Writing a Lexical Analyzer in Haskell (part II)

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Introduction to Lexical Analysis

Variables Data types Variable I/O. C introduction. Variables. Variables 1 / 14

ML 4 A Lexer for OCaml s Type System

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) Programming Languages & Compilers. Major Phases of a Compiler

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Programming Languages and Compilers (CS 421)

CSc 453 Compilers and Systems Software

Compiler Construction D7011E

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) I. Major Phases of a Compiler. Programming Languages & Compilers

Compilers CS S-01 Compiler Basics & Lexical Analysis

Transcription:

Assignment 1 (Lexical Analyzer) Compiler Construction CS4435 (Spring 2015) University of Lahore Maryam Bashir Assigned: Saturday, March 14, 2015. Due: Monday 23rd March 2015 11:59 PM Lexical analysis Lexical analysis is the process of reading in the stream of characters making up the source code of a program and dividing the input into tokens. In this assignment, you will use regular expressions and DFAs to implement a lexical analyzer for a subset of C programming language. Your Task Your task is to write a program that reads an input text file, and constructs a list of tokens in that file. Your program may be written in C, C++, Java or programming language. Assuming that the input file contains the following code string: 2.1 Sample Input (C++ code) void main () { int sum = 0; for(int j=0; j < 10; j=j+1) { sum = sum + j + 10.43 + 34E4 + 45.34E-4 + E43 +.34; } } 2.2 Sample Output The output of the program should be similar to the following: Class : Lexeme keyword : void identifier : main 1

( : ( ) : ) { : { keyword : int identifier : sum = : = num : 0 ; : ; keyword : for ( : ( keyword : int identifier : j = : = num : 0 ; : ; identifier : j < : < num : 10 ; : ; identifier : j = : = identifier : j + : + num : 1 ) : ) { : { identifier : sum = : = identifier : sum + : + identifier : j + : + num : 10.43 + : + num : 34.E4 + : + num : 45.34E-4 + : + identifier : E43 + : + Error :. num : 34 ; : ; } : } } : } 2

Token Type Lexical Specification keyword One of the strings while, if, else, return, break, continue, int, float, void Token Id for identifiers matches a letter followed by letters or digits or underscore: identifier letter [A-Z a-z] digit [0-9] id letter (letter digit ) Token num matches unsigned numbers: digits digit digit num optional-fraction (. digits) ɛ optional-exponent (E(+ ɛ) digits ) ɛ num digits optional-fraction optional-exponent addop +, - mulop, / relop <, >, <=, >=, ==,! = and && or not! ) ) ( ( { { } } [ [ ] ] Table 1: Table of lexical specifications for tokens Valid Tokens Programs in this language are composed of tokens displayed in table 1 : Construction of Deterministic Finite Automata (DFA) or Transition Diagrams You can construct single DFA (also called transition diagram) for recognizing all tokens in this language by combining individual DFAs for each type of token. For example you can construct transition diagram for identifiers and numbers and them merge the start states of two diagrams to create a single transition diagram that recognizes both numbers and identifiers. For identifying keywords, you can store keywords in some data structure (hashmap or string array). Whenever your program recognizes a token of identifier, you can check your map or string 3

array if this identifier matches keyword. If it matches keyword then consider it keyword wise consider it identifier. Implementation of Deterministic Finite Automata (DFA) I have written a small lexical analyser for a simple DFA shown in figure 1. You can find its source code ( Lexer.java ) on Piazza site. You can either make changes to this code or write your own lexical analyser from scratch. You can also write C++ version of this Java code if you want to program in C++. Please note that this DFA is not correct DFA for this assignment. It is just a sample DFA to give you an idea of how to convert your DFA to code. For example this DFA only allows letters in an identifier whereas in your C language you can use digits and underscore in identifier as well. whitespace 7 8 6 error letter whitespace 0 letter 1 digit = 3 5 digit 4 2 Figure 1: DFA 1 4

Submission Email a zip file containing your complete project (all source files, along with whatever files are needed to compile them; sample input and output files) to the following address: maryam.bashir@cs.uol.edu.pk The name of zip file should be roll numbers of all students in the group as follows: RollNumber1-RollNumber2 You should try to work on this assignment individually. If you think your programming is very weak then you can work in group size of maximum 2 students. 5

A Transition diagram for identifiers

A transition diagram for whitespace

A transition diagram for relational operators

A transition diagram for unsigned numbers

How to Merge Multiple Transition Diagrams Step 1: Merge start states of all transition diagrams Step2: Make a new transition on input from final state of each transition diagram to the new start state

Transition Diagram for Identifiers made of only letters letter 0 letter 1 2

Transition Diagram for integers 0 digit 3 4 digit

Transition Diagram for whitespace whitespace 7 6 whitespace 0

Transition Diagram for = operator = 0 5

Transition Diagram for error error 0 8

whitespace 7 8 6 error whitespace 0 letter digit = 3 letter 1 4 2 5 digit

whitespace 7 8 6 error whitespace 0 letter digit = 5 3 digit letter 1 4 2