Chapter 3: Lexical Analysis

Similar documents
Lexical Analysis (ASU Ch 3, Fig 3.1)

2. λ is a regular expression and denotes the set {λ} 4. If r and s are regular expressions denoting the languages R and S, respectively

Formal Languages and Compilers Lecture VI: Lexical Analysis

Chapter 3 Lexical Analysis

Compiler course. Chapter 3 Lexical Analysis

A simple syntax-directed

UNIT II LEXICAL ANALYSIS

CSCI-GA Compiler Construction Lecture 4: Lexical Analysis I. Hubertus Franke

G Compiler Construction Lecture 4: Lexical Analysis. Mohamed Zahran (aka Z)

UNIT -2 LEXICAL ANALYSIS

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

Zhizheng Zhang. Southeast University

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

1. Lexical Analysis Phase

Recognition of Tokens

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Projects for Compilers

Lexical Analysis - 1. A. Overview A.a) Role of Lexical Analyzer

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

Lexical Analysis. Lecture 3. January 10, 2018

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Buffering Techniques: Buffer Pairs and Sentinels

Introduction to Lexical Analysis

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Lexical Analysis 1 / 52

Assignment 1 (Lexical Analyzer)

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Lexical Analysis. Sukree Sinthupinyo July Chulalongkorn University

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

Lexical Analysis. Introduction

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

CS 314 Principles of Programming Languages

Compiler Construction D7011E

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Dixita Kagathara Page 1

CSc 453 Lexical Analysis (Scanning)

Assignment 1 (Lexical Analyzer)

A Pascal program. Input from the file is read to a buffer program buffer. program xyz(input, output) --- begin A := B + C * 2 end.

Lexical Analysis. Chapter 2

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

CS 403: Scanning and Parsing

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

UNIT III. The following section deals with the compilation procedure of any program.

CS 314 Principles of Programming Languages. Lecture 3

Part 5 Program Analysis Principles and Techniques

CPSC 434 Lecture 3, Page 1

Introduction to Lexical Analysis

Compiler Design. Lexical Analysis

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

CSc 453 Compilers and Systems Software

Program Analysis ( 软件源代码分析技术 ) ZHENG LI ( 李征 )

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CS308 Compiler Principles Lexical Analyzer Li Jiang


10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Lexical Analysis. Lecture 2-4

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Lisp: Lab Information. Donald F. Ross

Lexical Analysis. Finite Automata

[Lexical Analysis] Bikash Balami

We use L i to stand for LL L (i times). It is logical to define L 0 to be { }. The union of languages L and M is given by

UNIT I- LEXICAL ANALYSIS. 1.Interpreter: It is one of the translators that translate high level language to low level language.

Figure 2.1: Role of Lexical Analyzer

Problem: Read in characters and group them into tokens (words). Produce a program listing. Do it efficiently.

A Simple Syntax-Directed Translator

Dr. D.M. Akbar Hussain

The Language for Specifying Lexical Analyzer

Lexical Analyzer Scanner

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

PRACTICAL CLASS: Flex & Bison

Compiler Construction

PESIT Bangalore South Campus Hosur road, 1km before Electronic City, Bengaluru -100 Department of Computer Science and Engineering

COMPILER DESIGN LECTURE NOTES

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Group A Assignment 3(2)

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

Lexical Analyzer Scanner

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Lexical Analysis. Implementation: Finite Automata

Compiler Construction LECTURE # 3

SEM / YEAR : VI / III CS2352 PRINCIPLES OF COMPLIERS DESIGN UNIT I - LEXICAL ANALYSIS PART - A

Lexical Analysis. Lecture 3-4

Week 2: Syntax Specification, Grammars

CPS 506 Comparative Programming Languages. Syntax Specification

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Programming in C++ 4. The lexical basis of C++

Chapter 4: Syntax Analyzer

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Lexical Analysis. Finite Automata

Structure of Programming Languages Lecture 3

CSE302: Compiler Design

Features of C. Portable Procedural / Modular Structured Language Statically typed Middle level language

Academic Formalities. CS Language Translators. Compilers A Sangam. What, When and Why of Compilers

Transcription:

Chapter 3: Lexical Analysis A simple way to build a lexical analyzer is to construct a diagram that illustrates the structure of tokens of the source language, and then to hand translate the diagram into a Program for finding tokens. Pattern -directed programming is used in many area other than compiler such as : Query Languages Information Retrieval systems Chapter 3: Lexical Analysis 1 The Role of the Lexical Analyzer Its main task is to read the input characters & produce as output a sequence of tokens that the parser uses for syntax analysis. Also, it removes (skip out) white space, it keeps track of the number of new line characters to associate this number with an error message when applicable. Chapter 3: Lexical Analysis 2 1

The Role of the Lexical Analyzer Some lexical analyzer divided into a cascade of two phases: Scanning : Simple task (eliminate spaces). Lexical analysis More complex task. Chapter 3: Lexical Analysis 3 The Role of the Lexical Analyzer Why to separate the analysis phases of compiling into lexical analysis & parsing? Simplicity efficiency portability. Chapter 3: Lexical Analysis 4 2

Tokens, Patterns, Lexemes Tokens: a set of strings (id, num, opr). A Lexeme: is sequence of characters in the source program that is matched by the pattern for a token. Pattern: a rule defining a token Opr: +,*, /,-,>, <, <=, >=, <>, = ID: letter followed by letters or digits Chapter 3: Lexical Analysis 5 Lexical Errors When no match with any pattern, an error is occurred, the best strategy is to delete successive characters until we find a wellformed token. Chapter 3: Lexical Analysis 6 3

Error Recovery Deleting an extraneous character. Inserting a missing character. Replacing an incorrect character by a correct one. Transposing two adjacent characters. Chapter 3: Lexical Analysis 7 Specification Of Tokens Strings & languages: Alphabet: finite set of symbols String is finite sequence of characters Language set of strings Regular Expressions important notation for specifying patterns Chapter 3: Lexical Analysis 8 4

Regular Expressions e.g. Let = {a,b} The regular exp. a b denotes the set {a, b} (a b)(a b) denotes (aa,ab,ba,bb} or aa ab ba bb a* denotes the set of all strings of zero or more a s (a b)* denotes the set of all strings containing zero or more stances of a or b. Chapter 3: Lexical Analysis 9 Regular Expressions A regular definition of id id letter (letter digit)* letter A. Z a.. z digit 0. 9 Chapter 3: Lexical Analysis 10 5

Recognition Of tokens Identifier: Letter A Z a z Digit 0 9 Id letter (letter digit)* Chapter 3: Lexical Analysis 11 Recognition Of tokens Unsigned number in Pascal are strings such as: 5280,39.37,6.336E4,1.894E-4 digit 0 1. 9 digits digit digit* optional-fraction.digits empty optional-exponent E(+ - empty) digits empty num digits optional-fraction optional-exponent Chapter 3: Lexical Analysis 12 6

Recognition Of tokens digit 0. 9 digits digit digit* optional-fraction (.digits)? optional exponent (E(+ -)?digits)? num digits optional-fraction optionalexponent where r? is same as r empty Chapter 3: Lexical Analysis 13 Recognition Of tokens we assume that lexemes are separated by white space, consisting of non-null sequence of blanks, tabs, and new-lines. <delim> blank tab new-line ws <delim> + Chapter 3: Lexical Analysis 14 7

Transition diagram for >= Chapter 3: Lexical Analysis 15 Transition diagram for RelOp Chapter 3: Lexical Analysis 16 8

Identifiers Remember that we will treat keywords as identifiers, rather than encode the keywords into TDs. Chapter 3: Lexical Analysis 17 The return statement of the accepting state uses: gettoken looks in the symbol table. if lexeme is a keyword, then KW is returned. Otherwise, the token ID is returned. install-id: if gettoken return KW, it returns 0 if lexeme found in the symbol table, return a pointer to the existing entry. if lexeme is not found in symbol table, it is installed as a variable and a pointer to the new entry is returned. Chapter 3: Lexical Analysis 18 9

Numbers Chapter 3: Lexical Analysis 19 Chapter 3: Lexical Analysis 20 10

Chapter 3: Lexical Analysis 21 * How to handle Errors? Case 1: If unrecognized char, then non of the case options will be fired, at the end of the cases there is error message. Case 2: If there is in the middle of a token like#, then unexpected char, then ERROR message and start looking for a new token. Chapter 3: Lexical Analysis 22 11

How to handle Comments: Chapter 3: Lexical Analysis 23 Lexical_analyzer ( ) { while (!EOF(input)) { switch (state) { case 0: c= nextchar; //c is lookahead char if (c = = blank c = = tab c = = newline) else if (c = = < ) state=1; else if (c = = = ) state=5; else if (c= = > ) state=6; else state =9; Chapter 3: Lexical Analysis 24 12

case 1: c= nextchar( ); If (c = = = ) state =2 ; else if ( c= > ) state=3; else state =4; case 2: tokenval =LE; return (Relop) ; Chapter 3: Lexical Analysis 25 case 4: Case 5: Case 6: retract (1); TokenVal=LT; TokenType=Relop; tokenval=eq; TokenType=Relop; c=nextchar( ); if (c = = = ) state=7; else state =8; Chapter 3: Lexical Analysis 26 13

Case 7 : Case 8: Case 9: tokenval= GE; TokenType=Relop; retract (1); TokenVal=GT; TokenType=Relop; if (isletter( c ) ) state =10; else state=12; Chapter 3: Lexical Analysis 27 case 10: c=nextchar(); if (isletter( c )) state =10; else if (isdigit(c))state =10; else state = 11; Case 11: retract(1); TokenVal=install id(); TokenType=gettoken( ) ; Chapter 3: Lexical Analysis 28 14

case 13: case14: c=nextchar(); if (isdigit (c )) state =13; else if (c=. ) state =14; else if (c= E )state=16; else state=20; break c=nextchar( ); if (isdigit(c ) ) state =15; else ERROR; Chapter 3: Lexical Analysis 29 case 15 : c= nextchar(); if (isdigit ( c )) state=15; else if (c= E ) state=16; else state=21; case 16: c=nextcahr( ) if (c = + c= - ) state=17; else if (isdigit(c ))state=18; else ERROR; case 17 : c= nextchar(); if (isdigit(c ))state=18; else ERROR; Chapter 3: Lexical Analysis 30 15

case 18: c= nextchar if (isdigit(c )) state=18; else state=19; case 19: case 20 : case 21: retract (1); TokenVAl=Lexeme; TokenType=NUM; case 22: if (c = + ) state =23; else if(c= - ) state =24; else if (c= / )state=25; else if (c= * ) state =26; else state =27; Chapter 3: Lexical Analysis 31 case 23 : TokenVal=Add; TokenType=opr; case 24 : TokenVal=Sub; TokenType=opr; case 25: TokenVal=Mul; TokenType=opr; Chapter 3: Lexical Analysis 32 16

case 26: TokenVal=div; TokenType=opr; case 27: if ( c= ; ) state=28; else If (c=, ) state =28; case 28: TokenVal=; ; TokenType=pun; } } } Chapter 3: Lexical Analysis 33 17