The Language for Specifying Lexical Analyzer

Similar documents
Formal Languages and Compilers Lecture VI: Lexical Analysis

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Figure 2.1: Role of Lexical Analyzer

Fig.25: the Role of LEX

UNIT II LEXICAL ANALYSIS

Zhizheng Zhang. Southeast University

1. Lexical Analysis Phase

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Implementation of Lexical Analysis

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

Lexical Analysis. Implementation: Finite Automata

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

UNIT -2 LEXICAL ANALYSIS

Dr. D.M. Akbar Hussain

CSc 453 Lexical Analysis (Scanning)

Implementation of Lexical Analysis

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

Lexical Analysis. Chapter 2

ECS 120 Lesson 7 Regular Expressions, Pt. 1

Compiler course. Chapter 3 Lexical Analysis

Implementation of Lexical Analysis

Implementation of Lexical Analysis

Lexical Analysis (ASU Ch 3, Fig 3.1)

Lexical Analysis. Lecture 2-4

CSE302: Compiler Design

UNIT I- LEXICAL ANALYSIS. 1.Interpreter: It is one of the translators that translate high level language to low level language.

Lexical Analysis. Sukree Sinthupinyo July Chulalongkorn University

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Chapter 3 Lexical Analysis

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Lexical Analysis. Lecture 3-4

Lexical Analysis. Lecture 3. January 10, 2018

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

CSCI312 Principles of Programming Languages!

Regular Expressions. Regular Expressions. Regular Languages. Specifying Languages. Regular Expressions. Kleene Star Operation

CSC 467 Lecture 3: Regular Expressions

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

2. λ is a regular expression and denotes the set {λ} 4. If r and s are regular expressions denoting the languages R and S, respectively

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

3.15: Applications of Finite Automata and Regular Expressions. Representing Character Sets and Files

SEM / YEAR : VI / III CS2352 PRINCIPLES OF COMPLIERS DESIGN UNIT I - LEXICAL ANALYSIS PART - A

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

Lecture Outline. COMP-421 Compiler Design. What is Lex? Lex Specification. ! Lexical Analyzer Lex. ! Lex Examples. Presented by Dr Ioanna Dionysiou

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

Implementation of Lexical Analysis

Front End: Lexical Analysis. The Structure of a Compiler

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

Finite Automata and Scanners

Implementation of Lexical Analysis. Lecture 4

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

LANGUAGE TRANSLATORS

Introduction to Lexical Analysis

2.2 Syntax Definition

Lexical Analysis. Finite Automata

Programming Languages 2nd edition Tucker and Noonan"

CS 403: Scanning and Parsing

We use L i to stand for LL L (i times). It is logical to define L 0 to be { }. The union of languages L and M is given by

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Lexical Analysis/Scanning

CS 310: State Transition Diagrams

FSASIM: A Simulator for Finite-State Automata

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Finite State Machines: Motivating Examples

Lexical Analysis. Introduction

A Simple Syntax-Directed Translator

UNIT III & IV. Bottom up parsing

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

CS308 Compiler Principles Lexical Analyzer Li Jiang

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

G Compiler Construction Lecture 4: Lexical Analysis. Mohamed Zahran (aka Z)

Compiler construction 2002 week 2

Recognition of Tokens

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

Buffering Techniques: Buffer Pairs and Sentinels

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 5

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Implementation of Lexical Analysis

Alternation. Kleene Closure. Definition of Regular Expressions

CT32 COMPUTER NETWORKS DEC 2015

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994

Assignment 1 (Lexical Analyzer)

Chapter 3: Lexical Analysis

VIVA QUESTIONS WITH ANSWERS

Lexical and Syntax Analysis

CSE 3302 Programming Languages Lecture 2: Syntax

Cunning Plan. Informal Sketch of Lexical Analysis. Issues in Lexical Analysis. Specifying Lexers

Bottom-Up Parsing. Lecture 11-12

Assignment 1 (Lexical Analyzer)

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Compiler Construction D7011E

Bottom-Up Parsing. Lecture 11-12

Monday, August 26, 13. Scanners

Transcription:

The Language for Specifying Lexical Analyzer We shall now study how to build a lexical analyzer from a specification of tokens in the form of a list of regular expressions The discussion centers around the design of an existing tool called LEX, for automatically generating lexical analyzer program A LEX source program is a specification of a lexical analyzer, consisting of a set of regular expressions together with an action for each regular expression The action is a piece of code which is to be executed whenever a token specified by the corresponding regular expression is recognized The output of LEX is a lexical analyzer program constructed from the LEX source specification Unlike most programming languages, a source program for LEX does not supply all the details of the intended computation Rather, LEX itself supplies with its output a program that simulates a finite automaton This program takes a transition table as data The transition table is that portion of LEX's output that stems directly from LEX's inputthe situation is shown in fig (23),where the lexical analyzer L is the transition table plus the program to simulate an arbitrary finite automaton expressed as a transition table Only L is to be included in the compiler being built LEX Source LEX Compiler Lexical Analyzer L Input Text Lexical Analyzer L Sequence of tokens Fig (23) the Role of LEX

A LEX source program consists of two parts, a sequence of auxiliary definition followed by a sequence of translation rules Auxiliary Definitions The auxiliary definitions are statements of the form: D 1 =R 1 D 2 =R 2 D n =R n Where each D i is a distinct name, and each R i is a regular expression whose symbol are chosen from Σ {D 1, D 2, D i-1 }, ie, characters or previously defined names The D i 's are shorthand names for regular expressions Σ Is our input symbol alphabet Example: We can define the class of identifiers for a typical programming language with the sequence of auxiliary definitions Letter = A B Z Digit = 0 1 9 Identifier = Letter (Letter Digit)*

Translation Rules The translation rules of a LEX program are statements of the form:- P 1 {A 1 } P 2 {A 2 } P m {A m } Where each P i is a regular expression called a pattern, over the alphabet consisting of Σ and the auxiliary definition names The patterns describe the form of the tokens Each A i is a program fragment describing what action the lexical analyzer should take when token P i is found The A i 's are written in a conventional programming language, rather than any particular language, we use pseudo language To create the lexical analyzer L, each of the A i 's must be compiled into machine code The lexical analyzer L created by LEX behaves in the following manner: L read its input, one characters at a time, until it has found the longest prefix of the input which matches one of the regular expressions, P i Once L has found that prefix, L removes it from the input and places it in a buffer called TOKEN (Actually, TOKEN may be a pair of pointers to the beginning and end of the matched string in the input buffer itself) L then executes the action A i It is possible, that non of the regular expressions denoting the tokens matches any prefix of the input In that case, an error has occurred, and L transfers control to some error handling routine It is also possible that two or more patterns match the same longest prefix of the remaining input If that is

the case, L will break the tie in favor of that token which came first in the list of translation rules Example: Let us consider the collection of tokens defined in fig (5), LEX program is shown in fig (24) AUXILIARY DEFINITION Letter= A B Z Digit= 0 1 9 TRANSLATION RULES BEGIN {return 1} END {return 2} IF {return 3} THEN {return 4} ELSE {return 5} letter(letter digit)* {LEX VAL:= INSTALL( ); return 6} digit + {LEX VAL:= INSTALL( ); return 7} < {LEX VAL := 1; return 8} <= {LEX VAL := 2; return 8} = {LEX VAL := 3; return 8} < > {LEX VAL := 4; return 8} > {LEX VAL := 5; return 8} >= {LEX VAL := 6; return 8} Fig (24) LEX Program

Suppose the lexical analyzer resulting from the above rules is given input BEGIN followed by blank Both the first and sixth pattern matches BEGIN, and no pattern matches a longer string Since the pattern for keyword BEGIN precedes the pattern for identifier in the above list, the conflict is resolved in favor of the keyword For another example, suppose <= are the first two characters read While pattern < matches the first character, it is not the longest pattern matching a prefix of the input Thus LEX's strategy that the longest prefix matching a pattern is selected makes it easy to resolve the conflict between < and <=, by choosing <= as the next tokens