G52LAC Languages and Computation Lecture 6

Similar documents
CSE 105 THEORY OF COMPUTATION

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

CS415 Compilers. Lexical Analysis

Implementation of Lexical Analysis

CSE 105 THEORY OF COMPUTATION

Lexical Analysis. Lecture 2-4

Implementation of Lexical Analysis

Lexical Analysis. Introduction

Lexical Analysis. Chapter 2

Implementation of Lexical Analysis

Introduction to Lexical Analysis

Lexical Analysis. Lecture 3-4

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

Converting a DFA to a Regular Expression JP

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

UNIT -2 LEXICAL ANALYSIS

Lexical Analysis. Lecture 3. January 10, 2018

Figure 2.1: Role of Lexical Analyzer

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

Compiler course. Chapter 3 Lexical Analysis

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

Compiler phases. Non-tokens

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Chapter 3 Lexical Analysis

CMSC 350: COMPILER DESIGN

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Formal Languages and Compilers Lecture VI: Lexical Analysis

Compiler Construction

Implementation of Lexical Analysis. Lecture 4

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

CSE302: Compiler Design

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Lexical Analysis. Implementation: Finite Automata

Front End: Lexical Analysis. The Structure of a Compiler

Implementation of Lexical Analysis

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CS308 Compiler Principles Lexical Analyzer Li Jiang

Structure of Programming Languages Lecture 3

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Implementation of Lexical Analysis

Decidable Problems. We examine the problems for which there is an algorithm.

Dr. D.M. Akbar Hussain

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Zhizheng Zhang. Southeast University

Wednesday, September 3, 14. Scanners

CS 301. Lecture 05 Applications of Regular Languages. Stephen Checkoway. January 31, 2018

Lecture 5: Regular Expression and Finite Automata

Finite Automata and Scanners

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Lexical Analysis 1 / 52

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

Compiler Construction

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Monday, August 26, 13. Scanners

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

Implementation of Lexical Analysis

Alternation. Kleene Closure. Definition of Regular Expressions

Lecture 2 Finite Automata

Decision, Computation and Language

Lecture 3: Lexical Analysis

CSE302: Compiler Design

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

CSE450. Translation of Programming Languages. Automata, Simple Language Design Principles

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

CS2 Language Processing note 3

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

(Refer Slide Time: 0:19)

Lexical Analysis. Finite Automata. (Part 2 of 2)

CSc 453 Lexical Analysis (Scanning)

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

Definition: A context-free grammar (CFG) is a 4- tuple. variables = nonterminals, terminals, rules = productions,,

Languages and Compilers

2068 (I) Attempt all questions.

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 3


Chapter 2 :: Programming Language Syntax

Lexical Analysis - 2

Part 5 Program Analysis Principles and Techniques

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

Compiler Construction LECTURE # 3

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Transcription:

G52LAC Languages and Computation Lecture 6 Equivalence of Regular Expression and Finite Automata Henrik Nilsson University of Nottingham G52LACLanguages and ComputationLecture 6 p.1/28

This Lecture (1) G52LACLanguages and ComputationLecture 6 p.2/28

This Lecture (1) We have now seen three ways of formally describing potentially infinite languages: - Deterministic Finite Automata (DFA) - Nondeterministic Finite Automata (NFA) - Regular Expressions (RE) G52LACLanguages and ComputationLecture 6 p.2/28

This Lecture (1) We have now seen three ways of formally describing potentially infinite languages: - Deterministic Finite Automata (DFA) - Nondeterministic Finite Automata (NFA) - Regular Expressions (RE) Because - a DFA is a special case of an NFA - any NFA can be converted into an equivalent DFA DFAs and NFAs describe the same class of languages: the Regular languages. G52LACLanguages and ComputationLecture 6 p.2/28

This Lecture (2) So, what class of languages do the REs describe? Smaller? Larger? Completely different? G52LACLanguages and ComputationLecture 6 p.3/28

This Lecture (2) So, what class of languages do the REs describe? Smaller? Larger? Completely different? In fact: Regular Expressions describe the Regular Languages G52LACLanguages and ComputationLecture 6 p.3/28

This Lecture (2) So, what class of languages do the REs describe? Smaller? Larger? Completely different? In fact: Regular Expressions describe the Regular Languages Proof: translation between RE and FA G52LACLanguages and ComputationLecture 6 p.3/28

This Lecture (2) So, what class of languages do the REs describe? Smaller? Larger? Completely different? In fact: Regular Expressions describe the Regular Languages Proof: translation between RE and FA This lecture: translation of RE into NFA G52LACLanguages and ComputationLecture 6 p.3/28

This Lecture (2) So, what class of languages do the REs describe? Smaller? Larger? Completely different? In fact: Regular Expressions describe the Regular Languages Proof: translation between RE and FA This lecture: translation of RE into NFA Will start by a motivating example. G52LACLanguages and ComputationLecture 6 p.3/28

This Lecture (2) So, what class of languages do the REs describe? Smaller? Larger? Completely different? In fact: Regular Expressions describe the Regular Languages Proof: translation between RE and FA This lecture: translation of RE into NFA Will start by a motivating example. Time permitting, brief look at another application: scanners. Study details in your own time if of interest. G52LACLanguages and ComputationLecture 6 p.3/28

Applications (1) RE to NFA conversion has important practical applications. The following is a very nice, practically oriented article you should be able to fully appreciate based on what you have learned in G52MAL thus far: Russ Cox. Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby,...), January 2007. http://swtch.com/~rsc/regexp/regexp1.html G52LACLanguages and ComputationLecture 6 p.4/28

Applications (2) Underlying message: if you re ignorant about CS theory, your code can perform really poorly. Example from the paper: Time to match (a+ǫ) n a n against a n G52LACLanguages and ComputationLecture 6 p.5/28

Applications (2) Underlying message: if you re ignorant about CS theory, your code can perform really poorly. Example from the paper: Time to match (a+ǫ) n a n against a n Note difference of time scale: 60 s vs. 60µs! http://en.wikipedia.org/wiki/thompson s_construction G52LACLanguages and ComputationLecture 6 p.5/28

Applications (3) To quantify: G52LACLanguages and ComputationLecture 6 p.6/28

Applications (3) To quantify: Thompson NFA implementation a million times faster than Perl (5.8.7) when running on a 29-character string. G52LACLanguages and ComputationLecture 6 p.6/28

Applications (3) To quantify: Thompson NFA implementation a million times faster than Perl (5.8.7) when running on a 29-character string. Thompson NFA handles a 100-character string in under 200 microseconds; Perl would require over 10 15 years. G52LACLanguages and ComputationLecture 6 p.6/28

Applications (3) To quantify: Thompson NFA implementation a million times faster than Perl (5.8.7) when running on a 29-character string. Thompson NFA handles a 100-character string in under 200 microseconds; Perl would require over 10 15 years. How old is the universe? G52LACLanguages and ComputationLecture 6 p.6/28

Applications (3) To quantify: Thompson NFA implementation a million times faster than Perl (5.8.7) when running on a 29-character string. Thompson NFA handles a 100-character string in under 200 microseconds; Perl would require over 10 15 years. How old is the universe? Current best estimate: 13.8 billion years... G52LACLanguages and ComputationLecture 6 p.6/28

Applications (3) To quantify: Thompson NFA implementation a million times faster than Perl (5.8.7) when running on a 29-character string. Thompson NFA handles a 100-character string in under 200 microseconds; Perl would require over 10 15 years. How old is the universe? Current best estimate: 13.8 billion years... or about 10 10 years. 10 15 years is a looong time... G52LACLanguages and ComputationLecture 6 p.6/28

Recap: Syntax of Regular Expressions 1. is an RE 2. ǫ is an RE 3. For all x Σ, x is an RE (Handwriting convention: x is an RE) 4. If E and F are REs, so is E +F 5. If E and F are REs, so is EF 6. If E is an REs, so is E 7. If E is an REs, so is (E) These are all regular expressions. G52LACLanguages and ComputationLecture 6 p.7/28

Recap: Semantics of Regular Expr. 1. L( ) = 2. L(ǫ) = {ǫ} 3. For all x Σ, L(x) = {x} 4. L(E +F) = L(E) L(F) 5. L(EF) = L(E)L(F) 6. L(E ) = L(E) 7. L( (E) ) = L(E) G52LACLanguages and ComputationLecture 6 p.8/28

Translating RE to NFA (1) We are going to detail a Graphical Construction for converting an RE to an NFA that is suitable for carrying out by hand. It can be further refined into a fully formal algorithm: see the lecture notes for details. G52LACLanguages and ComputationLecture 6 p.9/28

Translating RE to NFA (1) We are going to detail a Graphical Construction for converting an RE to an NFA that is suitable for carrying out by hand. It can be further refined into a fully formal algorithm: see the lecture notes for details. (Our Graphical Construction is a variation of Thompson s Construction. The latter translates into NFA ǫ : a variation of NFA with a special ǫ-move that does not consume any input. We don t cover NFA ǫ in this module.) G52LACLanguages and ComputationLecture 6 p.9/28

Translating RE to NFA (2) Specification: Let N(E) denote the NFA that results by applying the graphical construction to an RE E. Then the following equation must hold: L(E) = L(N(E)) (Note that L is overloaded: the language of an RE to the left, the language of an NFA to the right.) We proceed case by case according to the structure of the syntax of REs. G52LACLanguages and ComputationLecture 6 p.10/28

RE to NFA, Case Recall: L( ) = N( ): Note: L(N( )) = = L( ); specification satisfied in this case. Note: States are given without names for simplicity. Suffice as construction is graphical; states to be named at the end. G52LACLanguages and ComputationLecture 6 p.11/28

RE to NFA, Caseǫ Recall: L(ǫ) = {ǫ} N(ǫ): Note: L(N(ǫ)) = {ǫ} = L(ǫ); specification satisfied in this case. G52LACLanguages and ComputationLecture 6 p.12/28

RE to NFA, Casexforx Σ Recall: For each x Σ,L(x) = {x} N(x): x Note: L(N(x)) = {x} = L(x); specification satisfied in this case. G52LACLanguages and ComputationLecture 6 p.13/28

RE to NFA, CaseE +F (1) Recall: L(E +F) = L(E) L(F) N(E +F): N ( E ) N ( F ) The NFAs N(E) and N(F) in parallel. The initial states of N(E +F) are the union of the initial states of N(E) and N(F). G52LACLanguages and ComputationLecture 6 p.14/28

RE to NFA, CaseE +F (2) Note: Assuming specification holds for E and F, L(N(E +F)) = L(N(E)) L(N(F)) = L(E) L(F) = L(E +F) Thus, specification holds in this case. (This is an inductive case.) G52LACLanguages and ComputationLecture 6 p.15/28

RE to NFA, CaseEF (1) Sub-case 1: No initial state of N(E) is accepting; i.e. ǫ / L(N(E)) (Recall: L(EF) = L(E)L(F)) N ( E ) a N ( F ) b c G52LACLanguages and ComputationLecture 6 p.16/28

RE to NFA, CaseEF (2) N ( E ) a N ( E F ) a a N ( F ) b b c b c c G52LACLanguages and ComputationLecture 6 p.17/28

RE to NFA, CaseEF (3) Sub-case 2: Some initial states of N(E) are accepting; i.e. ǫ L(N(E)) N ( E ) a N ( F ) b c G52LACLanguages and ComputationLecture 6 p.18/28

RE to NFA, CaseEF (4) N ( E ) a N ( E F ) a a N ( F ) b b c b c c G52LACLanguages and ComputationLecture 6 p.19/28

RE to NFA, CaseEF (5) Note: Assuming specification holds for E and F, L(N(EF)) = L(N(E))L(N(F)) = L(E)L(F) = L(EF) Thus, specification holds in this case. (This is an inductive case.) G52LACLanguages and ComputationLecture 6 p.20/28

RE to NFA, CaseE (1) (Recall: L(E ) = L(E) ) N ( E ) a * b c G52LACLanguages and ComputationLecture 6 p.21/28

RE to NFA, CaseE (2) N ( E *) a b N ( E ) a b a c b c c Note the additional initial and accepting state that ensures the empty word is accepted. G52LACLanguages and ComputationLecture 6 p.22/28

RE to NFA, CaseE (3) Note: Assuming specification holds for E, L(N(E )) = L(N(E)) = L(E) = L(E ) Thus, specification holds in this case. (This is an inductive case.) G52LACLanguages and ComputationLecture 6 p.23/28

RE to NFA, Case(E) (Recall: L( (E) ) = L(E)) N( (E) ) = N(E) Note: Assuming specification holds for E, L(N( (E) )) = L(N(E)) = L(E) = L( (E) ) Thus, specification holds in this case. (This is an inductive case.) G52LACLanguages and ComputationLecture 6 p.24/28

Example Systematically construct an NFA for the regular expression: (a+b) c ( zero or more as or bs, followed by a single c ) Use the graphical construction. On the white board. G52LACLanguages and ComputationLecture 6 p.25/28

Scanning (1) The first stage of many real-world language processing tasks, such as a compiler, is to group individual characters into languagespecific symbols called Lexemes or Tokens: - Keywords (likeif,then,while) - Literals (like42,3.14, A,"abc") - Special symbols and separators (like:=, (,;) -... G52LACLanguages and ComputationLecture 6 p.26/28

Scanning (1) The first stage of many real-world language processing tasks, such as a compiler, is to group individual characters into languagespecific symbols called Lexemes or Tokens: - Keywords (likeif,then,while) - Literals (like42,3.14, A,"abc") - Special symbols and separators (like:=, (,;) -... This process is called Lexical Analysis or Scanning, and is performed by a Scanner. G52LACLanguages and ComputationLecture 6 p.26/28

Scanning (2) Commonly, white space and comments are understood as token separators. G52LACLanguages and ComputationLecture 6 p.27/28

Scanning (2) Commonly, white space and comments are understood as token separators. An additional task of the scanner is often to discard white space and comments as they usually serve no purpose after the scanning. G52LACLanguages and ComputationLecture 6 p.27/28

Scanning (2) Commonly, white space and comments are understood as token separators. An additional task of the scanner is often to discard white space and comments as they usually serve no purpose after the scanning. Regular expressions is the most commonly used formalism for describing the Lexical Syntax of a language; i.e. the syntax of the tokes, white space, and comments. G52LACLanguages and ComputationLecture 6 p.27/28

Scanning (2) Commonly, white space and comments are understood as token separators. An additional task of the scanner is often to discard white space and comments as they usually serve no purpose after the scanning. Regular expressions is the most commonly used formalism for describing the Lexical Syntax of a language; i.e. the syntax of the tokes, white space, and comments. In essence, a scanner is thus a finite automaton. G52LACLanguages and ComputationLecture 6 p.27/28

Scanning (3) There are many famous so called scanner generators; e.g. Lex, Flex: given regular expressions describing the lexical syntax, they produce a scanner for the language. G52LACLanguages and ComputationLecture 6 p.28/28

Scanning (3) There are many famous so called scanner generators; e.g. Lex, Flex: given regular expressions describing the lexical syntax, they produce a scanner for the language. Internally, they use Thompson s construction (or similar). G52LACLanguages and ComputationLecture 6 p.28/28