DFA: Automata where the next state is uniquely given by the current state and the current input character.

Similar documents
We use L i to stand for LL L (i times). It is logical to define L 0 to be { }. The union of languages L and M is given by

1.0 Languages, Expressions, Automata

Lexical Analysis. Implementation: Finite Automata

Implementation of Lexical Analysis

Finite automata. III. Finite automata: language recognizers. Nondeterministic Finite Automata. Nondeterministic Finite Automata with λ-moves

Lexical Analysis. Chapter 2

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

Lexical Analysis. Lecture 2-4

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Alternation. Kleene Closure. Definition of Regular Expressions

Introduction to Lexical Analysis

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

Front End: Lexical Analysis. The Structure of a Compiler

Formal Languages and Compilers Lecture VI: Lexical Analysis

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

Lexical Analysis - 2

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Lexical Analysis. Lecture 3-4

Implementation of Lexical Analysis

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Implementation of Lexical Analysis

CSc 453 Lexical Analysis (Scanning)

Implementation of Lexical Analysis. Lecture 4

CSE302: Compiler Design

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

UNIT -2 LEXICAL ANALYSIS

Writing a Lexical Analyzer in Haskell (part II)

Lexical Analysis. Lecture 3. January 10, 2018

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Finite Automata and Scanners

Figure 2.1: Role of Lexical Analyzer

CMSC 350: COMPILER DESIGN

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

Compiler phases. Non-tokens

Wednesday, September 3, 14. Scanners

CS415 Compilers. Lexical Analysis

CSE450. Translation of Programming Languages. Automata, Simple Language Design Principles

Implementation of Lexical Analysis

Chapter 3 Lexical Analysis

Implementation of Lexical Analysis

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Compiler course. Chapter 3 Lexical Analysis

T Parallel and Distributed Systems (4 ECTS)

CSE302: Compiler Design

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

Monday, August 26, 13. Scanners

T.E. (Computer Engineering) (Semester I) Examination, 2013 THEORY OF COMPUTATION (2008 Course)

Lexical Analysis. Introduction

1. Lexical Analysis Phase

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

CSE 105 THEORY OF COMPUTATION

Compiler Construction

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Lecture 9 CIS 341: COMPILERS

TENTAMEN / EXAM. General instructions

Lexical Error Recovery

Lecture 2 Finite Automata

Lexical Error Recovery

Chapter 3: Lexing and Parsing

CS308 Compiler Principles Lexical Analyzer Li Jiang

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

Compiler Construction

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 3

lec3:nondeterministic finite state automata

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing

Lexical Analysis. Finite Automata. (Part 2 of 2)

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

Lexical Analyzer Scanner

G52LAC Languages and Computation Lecture 6

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

WARNING for Autumn 2004:

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Zhizheng Zhang. Southeast University

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

Lexical Analysis. Finite Automata

Recognition of Tokens

CS 314 Principles of Programming Languages. Lecture 3

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

CSc 453 Compilers and Systems Software

CS2 Language Processing note 3

A Scanner should create a token stream from the source code. Here are 4 ways to do this:

Lexical and Syntax Analysis

UNIT II LEXICAL ANALYSIS

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

CS 403: Scanning and Parsing

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Lexical Analyzer Scanner

Compiler Construction

CS 310: State Transition Diagrams

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Compiler Construction

Transcription:

Chapter : SCANNING (Lexical Analysis).3 Finite Automata Introduction to Finite Automata Finite automata (finite-state machines) are a mathematical way of descriing particular kinds of algorithms. A strong relationship etween finite automata and regular expression Identifier letter (letter digit)* Transition: Record a change from one state to another upon a match of the character or characters y which they are laeled. Start state: The recognition process egin Drawing an unlaeled arrowed line to it coming from nowhere Accepting states: Represent the end of the recognition process. Drawing a doule-line order around the state in the diagram.3.1 Definition of Deterministic Finite Automata 1. The Concept of DFA DFA: Automata where the next state is uniquely given y the current state and the current input character. Definition of a DFA: A DFA (Deterministic Finite Automation) M consist of (1) an alphaet, () A set of states S, (3) a transition function T : S S, (4) a start state s0 S, (5)And a set of accepting states A S The language accepted y a DFA M, written L(M), is defined to e the set of strings of characters c1cc3.cn with each ci such that there exist states s1 t(s0,c1),s t(s1,c), sn T(sn-1,cn) with sn an element of A (i.e. an accepting state). Accepting state sn means the same thing as the diagram: c1 c cn s0 s1 s sn-1 sn. Some differences etween definition of DFA and the diagram: 1) The definition does not restrict the set of states to numers ) We have not laeled the transitions with characters ut with names representing a set of characters 1

3) definitions T: S S, T(s, c) must have a value for every s and c. In the diagram, T (start, c) defined only if c is a letter, T(in_id, c) is defined only if c is a letter or a digit. Error transitions are not drawn in the diagram ut are simply assumed to always exist. 3. Examples of DFA Example.6: exactly accept one Not Not Example.7: at most one not not Example.8:digit [0-9] nat digit + signednat (+ -)? nat Numer singednat(. nat)?(e signednat)? A DFA of Numer:.3. Lookahead, Backtracking, and Nondeterministic Automata 1. A Typical Action of DFA Algorithm Making a transition: move the character from the input string to a string that accumulates the characters elonging to a single token (the token string value or lexeme of the token) Reaching an accepting state: return the token just recognized, along with any associated attriutes. Reaching an error state: either ack up in the input (acktracking) or to generate an error token.

. Finite automaton for an identifier with delimiter and return value The error state represents the fact that either an identifier is not to e recognized (if came from the start state) or a delimiter has een seen and we should now accept and generate an identifier-token. [other]: indicate that the delimiting character should e considered look-ahead, it should e returned to the input string and not consumed. This diagram also expresses the principle of longest su-string : the DFA continues to match letters and digits (in state in_id) until a delimiter is found. By contrast the old diagram allowed the DFA to accept at any point while reading an identifier string. 3. How to arrive at the start state in the first place (comine all the tokens into one DFA) a) Each of these tokens egins with a different character Consider the tokens given y the strings :,, and Each of these is a fixed string, and DFAs for them can e written as right : return ASSIGN return LE return EQ Uniting all of their start states into a single start state to get the DFA return ASSIGN : return EQ return LE ) Several tokens eginning with the same character They cannot e simply written as the right diagram, since it is not a DFA return LE > 3 return NE return LT

The diagram can e rearranged into a DFA return LE > return NE [other] return LT 4. Expand the Definition of a Finite Automaton One solution for the prolem is to expand the definition of a finite automaton More than one transition from a state may exist for a particular character (NFA: non-deterministic finite automaton,) Developing an algorithm for systematically turning these NFA into DFAs 5. ε-transition A transition that may occur without consulting the input string (and without consuming any characters) It may e viewed as a "match" of the empty string. ( This should not e confused with a match of the characterεin the input) 6. ε-transitions Used in Two Ways. First: to express a choice of alternatives in a way without comining states Advantage: keeping the original automata intact and only adding a new start state to connect them : Second: to explicitly descrie a match of the empty string. 4

7. Definition of NFA An NFA (non-deterministic finite automaton) M consists of an alphaet, a set of states S, a transition function T: S x ( U{ε}) (S), a start state s0 from S, and a set of accepting states A from S The language accepted y M, written L(M), is defined to e the set of strings of characters c1c. cn with each ci from U{ε}such that there exist states s1 in T(s0,c1), s in (s1, c),..., sn in T(sn-1, cn) with sn an element of A. 8. Some Notes An NFA does not represent an algorithm. However, it can e simulated y an algorithm that acktracks through every non-deterministic choice. 9. Examples of NFAs Example.10 The string a can e accepted y either of the following sequences of transitions: a ε 1 4 4 a ε ε ε 1 3 4 4 4 This NFA accepts the languages as follows: regular expression: (a ε)* a+ a* * a a 1 3 4 A DFA accepts the same language. a 5

.3.3 Implementation of Finite Automata in Code 1. Ways to Translate a DFA or NFA into Code a) The code for the DFA accepting identifiers: letter { starting in state 1 } if the next character is a letter then advance the input; { now in state } while the next character is a letter or a digit do advance the input; { stay in state } end while; { go to state 3 without advancing the input} accept; else { error or other cases } end if; 1 letter [other] digit 3 Two drawacks: It is ad hoc that is, each DFA has to e treated slightly differently, and it is difficult to state an algorithm that will translate every DFA to code in this way. The complexity of the code increases dramatically as the numer of states rises or, more specifically, as the numer of different states along aritrary paths rises. ) A etter method: Using a variale to maintain the current state and writing the transitions as a douly nested case statement inside a loop, where the first case statement tests the current state and the nested second level tests the input character. The code of the DFA for identifier: state : 1; { start } while state 1 or do case state of 1: case input character of letter: advance the input : state : ; else state :.{ error or other }; end case; : case input character of letter, digit: advance the input; state : ; { actually unnecessary } else state:3; end case; end case; end while; 6

if state3 then accept else error; c) Generic code: Express the DFA as a data structure and then write "generic" code; A transition tale, or two-dimensional array, indexed y state and input character that expresses the values of the transition function T Ways to Translate a Characters DFA in the or alphaet NFA c into Code States s States representing transitions T (s, c) The transition tale of the DFA for identifier: The transition tale of the DFA for identifier: state Input char letter digit other Accepting 1 No [3] no 3 yes Brackets indicate noninputconsuming transitions This column indicates accepting states Assume :the first state listed is the start state Features of Tale-Driven Method Tale driven: use tales to direct the progress of the algorithm. The advantage: The size of the code is reduced, the same code will work for many different prolems, and the code is easier to change (maintain). The disadvantage: The tales can ecome very large, causing a significant increase in the space used y the program. Indeed, much of the space in the arrays we have just descried is wasted. Tale-driven methods often rely on tale-compression methods such as sparse-array representations, although there is usually a time penalty to e paid for such compression, since tale lookup ecomes slower. Since scanners must e efficient, these methods are rarely used for them. NFAs can e implemented in similar ways to DFAs, except NFAs are nondeterministic, there are potentially many different sequences of transitions that must e tried. A program that simulates an NFA must store up transitions that have not yet een tried and acktrack to them on failure. 7

.4 From Regular Expression To DFAs Main Purpose Translating a regular expression into a DFA via NFA. Regular Expression NFA DFA Program 8