Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Similar documents
David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

CSc 453 Lexical Analysis (Scanning)

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

Lexical Analysis. Chapter 2

Figure 2.1: Role of Lexical Analyzer

UNIT -2 LEXICAL ANALYSIS

Implementation of Lexical Analysis

Lexical Analysis. Implementation: Finite Automata

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Formal Languages and Compilers Lecture VI: Lexical Analysis

Lexical Analysis 1 / 52

Lexical Analysis. Lecture 2-4

Introduction to Lexical Analysis

CSE302: Compiler Design

Implementation of Lexical Analysis

UNIT II LEXICAL ANALYSIS

Lexical Analysis. Lecture 3-4

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

Lexical Analysis. Introduction

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Zhizheng Zhang. Southeast University

Implementation of Lexical Analysis

Front End: Lexical Analysis. The Structure of a Compiler

Lexical Analysis. Lecture 3. January 10, 2018

Lexical Analysis. Prof. James L. Frankel Harvard University

Compiler Construction LECTURE # 3

CS415 Compilers. Lexical Analysis

Compiler course. Chapter 3 Lexical Analysis

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Lexical Analysis (ASU Ch 3, Fig 3.1)

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

Chapter 3 Lexical Analysis

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

Compiler Construction

CS308 Compiler Principles Lexical Analyzer Li Jiang

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

Dixita Kagathara Page 1

Lexical Analysis. Finite Automata

[Lexical Analysis] Bikash Balami

2. Lexical Analysis! Prof. O. Nierstrasz!

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

CSE302: Compiler Design

Introduction to Lexical Analysis

CS 314 Principles of Programming Languages

Implementation of Lexical Analysis. Lecture 4

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Lexical Error Recovery

Compiler Construction

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Cunning Plan. Informal Sketch of Lexical Analysis. Issues in Lexical Analysis. Specifying Lexers

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

Lexical Analysis. Finite Automata

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

Chapter 3: Lexical Analysis

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Structure of Programming Languages Lecture 3

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08


CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

Part 5 Program Analysis Principles and Techniques

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

Dr. D.M. Akbar Hussain

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

Implementation of Lexical Analysis

Lexical Analyzer Scanner

G52LAC Languages and Computation Lecture 6

Compiler phases. Non-tokens

The Language for Specifying Lexical Analyzer

Lexical Analyzer Scanner

Week 2: Syntax Specification, Grammars

2. λ is a regular expression and denotes the set {λ} 4. If r and s are regular expressions denoting the languages R and S, respectively

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

Lecture 3: Lexical Analysis

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

CSE 105 THEORY OF COMPUTATION

Writing a Lexical Analyzer in Haskell (part II)

COMPILER DESIGN LECTURE NOTES

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Group A Assignment 3(2)

CT32 COMPUTER NETWORKS DEC 2015

Lexical Analysis - 2

Monday, August 26, 13. Scanners

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Outline CS4120/4121. Compilation in a Nutshell 1. Administration. Introduction to Compilers Andrew Myers. HW1 out later today due next Monday.

Transcription:

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres dgriol@inf.uc3m.es

Introduction: Definitions Lexical analysis or scanning: To read from left-to-right a source program and divide it into a set of tokens (first phase of a compiler). TOKEN: Sequence of characters with a collective syntactic meaning Objectives: To simplify the syntax analyzer. To facilitate the portability of the compiler. 2

Introduction: Definitions Objectives: It may identify errors in the source program. It may strip out from the source program comments and white space characters (tab, newline, space). It may also associate a line number from the source program with a given error message. 3

The role of the Lexical Analyzer Error messages Source Program Lexical Analyzer Token stream Syntax Analyzer Control (Next) Symbol Table 4

Introduction: Definitions Tokens: reserved words (if, while), identifiers (a23, var53d), special symbols (+, *, >=) Lexemes: Particular instances of tokens. Patterns: Rules that describe the lexemes of a token. Tokens: subject, verb, predicate Lexemes: verb (go, be, belong, arrive ) Pattern: go be belong arrive 5

The role of the Lexical Analyzer Errors than can be detected: The scanner has no information about context It can detect: illegal characters, unterminated comments Can eliminate comments, white spaces, etc. Correlates error messages from the compiler with the source program. 6

The role of the Lexical Analyzer It does not look: garbled sequences, tokens out of place, undeclared identifiers, misspelled keywords, mismatched types. 7

Scanner Implementation There are basically two methods for implementing a scanner: 1. A program that is hard-coded to perform the scanner analysis (Loop and Switch). 2. Using methods to define and recognize patterns in sequences of characters: regular expressions. finite automata theory. 8

Scanner Implementation There are basically two methods for implementing a scanner: 1. A program that is hard-coded to perform the scanner analysis (Loop and Switch): Write the lexical analyzer in a conventional programming/scripting language, using the I/O facilities of that language to read the input. A good candidate is PERL with the rich pattern matching capabilities it offers. Write the lexical analyzer in assembly language and explicitly manage the reading of input. 9

Scanner Implementation Loop and Switch Main Loop: Reads characters one by one from the input file. Uses a switch statement to process the character(s) just read. Output: A list of tokens and lexemes from the source program. Ad hoc scanners (specific problems). Gcc: C lexer is over 2,500 lines of code; 10

Scanner Implementation There are basically two methods for implementing a scanner: 1. Using methods to define and recognize patterns in sequences of characters: regular expressions. finite automata theory. 11

Regular expression review Given an alphabet, the rules that define regular expressions of are: a is a regular expression. is a regular expression. If r and s are regular expressions, then (r) rs r s r* are regular expressions. Nothing else is a regular expression. 12

Regular expression review Axioms: r s = s r r (s t) = (r s) t (rs)t = r(st) r(s t)=rs rt r = r r = r r * = (r ) * r ** = r * 13

Regular expression review Notation: One or more: + R* = r * Cero or one:? Cero or more: * Any character:. Any other character: ~ Classes: a b c z = [a-z] 14

Regular expressions for tokens Numbers: nat = [0 1 2 3 4 5 6 7 8 9]+ natwithsign = (+ -)? nat number = natwithsign (. nat)? (E natwithsign)? 15

Regular expressions for tokens Identifiers and reserved words: reserved = if while do letter = [a-za-z] digit= [0-9] identifier = letter(letter digit)* 16

Regular expressions for tokens Comments: {this is a comment in Pascal} comment= {(~})*} 17

Finite automata review Once all the tokens are defined using regular expressions, a finite automaton can be created for recognizing them. A finite automata consists of: A finite set of states, including a start state and some final states. An alphabet of possible input symbols. A finite set of transitions. 18

Finite automata review (II) 19

Finite automata review (IV) digit (0 1 2 3 4 5 6 7 8 9)+ digit digit f not digit not digit 20

Finite automata review (VI) Deterministic finite automata (DFA): AFD=(Σ, Q, f, q0, F) Σ is the alphabet of possible input symbols. Q is the set of states q0 Q is the start state F Q is the set of final states f is the transition fuction f : Q Σ Q 21

Finite automata review (VII) Nondeterministic finite automata: NFA=(Σ, Q, f, q0, F) Σ is the alphabet of possible input symbols. Q is the set of states q0 Q is the start state F Q is the set of final states f is the transition fuction f : Q (Σ {λ}) P(Q) 22

Finite automata review (VIII) Deterministic finite automata (DFA): 1. There are not moves on input. 2. For each state s and input symbol a, there is exactly one edge out of s labeled as a. Nondeterministic finite automata (NFA): 1. More than one edge with the same label from any state is allowed. 2. Some states for which certain input symbols have no edge are allowed. 3. -NFA: transitions allowed. 23

Implementing the scanner Regular expression NFA DFA Program From regular expressions to NFA: Thompson s construction From NFA to DFA: Subsets construction From DFA to program: Specific purpose programs Transition tables 24

Regular expression NFA DFA Program Thompson s construction Input. A regular expression r over an alphabet T. Output. An NDFA N accepting the language L(r ). Method. First we parse r and fragment it into sub-expressions. Then we create NDFAs for the basic symbols appearing in the regular expression. Finally, we integrate the basic fragments into an NDFA that represents the entire expression. 25

Regular expression NFA DFA Program Thompson s construction Basic Regular expressions (, a): i f i a f 26

Regular expression NFA DFA Program Thompson s construction Concatenation rs: r s 27

Regular expression NFA DFA Program Thompson s construction Selection r s: r s 28

Regular expression NFA DFA Program Thompson s construction Repetition r * : r 29

Regular expression NFA DFA Program Example 1: ab a a b ab a b ab a a b a 30

Regular expression NFA DFA Program Conversion of and -NFA into a DFA Subset construction 31

Regular expression NFA DFA Program Conversion of and -NFA into a DFA Subset construction For s N, closure(s) ={t N, there are a - -transitions from s to t} For T in N, closure(t)=u si T closure(s i ) For T in N, move(t,a)=u si T {states in N to which there is an a-transition from s i in T} Algorithm: construction of states D E and the D T table 1. Initially, D E contains the closure(s 0 ) 32 2. while there is an unmarked state T in D E 1. Mark T 2. for each input symbol a : 1. U=closure(move(T,a)) 2. if U is not in D E then 1. add U to D E 2. D T (T,a)=U 3. End 3. End

Regular expression NFA DFA Program Minimizing the number of states of a DFA Construction of a DFA M accepting the same language as M and having as few states as possible 1. Construct an initial partition P with two groups : F (acp), S (no) 2. Construct P n : 1. For each group G of P, partition G into subgroups for until any pair of states s and t in the same subgroup there is a transtition on an input a to states in the same group P. 3. If P n =P, go to the next step. Otherwise repeat previous step with P P n 4. The groups in P are the states of M 1. Construct transition table 2. Eliminate unreachable states 33

Regular expression NFA DFA Program Specific purpose programs (I) {start: state 1} if nextchar is a letter then read newchar; {now in state 2} while nextchar is a letter or a digit do read newchar; {stay at state2} end while; {goto to state 3 without reading newchar} accept; else {error or other cases} end if; digit, letter 1 letter [other] 2 3 Only for a small number of states. Each DFA has its specific implementation. 34

Regular expression NFA DFA Program Specific purpose programs (II) state:=1 {initial state} while state = 1 or 2 do case state of: 1: case inputchar of letter: read newchar; state:=2; else state:= {error or another}; end case; 2: case inputchar of letter, digit: read newchar; state:=2; else state:=3; end case: end case; end while; if state := 3 then accept else error; digit, letter 1 letter [other] 2 3 Introduces a variable that denotes the state. Case selections to represent the transitions. 35

Regular expression NFA DFA Program Transition tables digit, letter 1 letter [other] 2 3 state Input character Letter digit another Accept? 1 2 no 2 2 2 [3] no 3 yes 36

Regular expression NFA DFA Program Transition tables state := 1 ch := next input character; while not Accept[state] and not error(state) do newstate := T[state,ch]; if Advance[state,ch] then ch := next input char; state := newstate; end while; if Accept[state] then accept; digit, letter 1 letter [other] 2 3 The code is reduced. It can be used for many different problems. It is easy to modify. 37