CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

Similar documents
CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

Implementation of Lexical Analysis

Implementation of Lexical Analysis

Converting a DFA to a Regular Expression JP

Lexical Analysis. Implementation: Finite Automata

NFAs and Myhill-Nerode. CS154 Chris Pollett Feb. 22, 2006.

CSE450. Translation of Programming Languages. Automata, Simple Language Design Principles

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

Implementation of Lexical Analysis

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages

Lexical Analysis. Lecture 3-4

6 NFA and Regular Expressions

Figure 2.1: Role of Lexical Analyzer

Lexical Analysis. Chapter 2

Non-deterministic Finite Automata (NFA)

Introduction to Lexical Analysis

Lexical Analysis. Lecture 2-4

Implementation of Lexical Analysis. Lecture 4

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Ambiguous Grammars and Compactification

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

Chapter Seven: Regular Expressions

(Refer Slide Time: 0:19)

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

Lexical Analysis - 2

Implementation of Lexical Analysis

Writing a Lexical Analyzer in Haskell (part II)

Lexical Analysis. Prof. James L. Frankel Harvard University

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

HKN CS 374 Midterm 1 Review. Tim Klem Noah Mathes Mahir Morshed

Lexical Analysis. Finite Automata. (Part 2 of 2)

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Lexical Error Recovery

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Neha 1, Abhishek Sharma 2 1 M.Tech, 2 Assistant Professor. Department of Cse, Shri Balwant College of Engineering &Technology, Dcrust University

CSE 105 THEORY OF COMPUTATION

Kinder, Gentler Nation

CSE302: Compiler Design

CS415 Compilers. Lexical Analysis

CS308 Compiler Principles Lexical Analyzer Li Jiang

Module 6 Lexical Phase - RE to DFA

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Automating Construction of Lexers

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 5

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CSE 431S Scanning. Washington University Spring 2013

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Front End: Lexical Analysis. The Structure of a Compiler

Structure of Programming Languages Lecture 3

Regular Languages. Regular Language. Regular Expression. Finite State Machine. Accepts

Dr. D.M. Akbar Hussain

CMSC 350: COMPILER DESIGN

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

Lecture 3: Lexical Analysis

Zhizheng Zhang. Southeast University

Chapter Seven: Regular Expressions. Formal Language, chapter 7, slide 1

CS2 Language Processing note 3

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

G52LAC Languages and Computation Lecture 6

FAdo: Interactive Tools for Learning Formal Computational Models

lec3:nondeterministic finite state automata

Midterm I review. Reading: Chapters 1-4

Lexical Analysis 1 / 52

Regular Expression Derivatives in Python

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

CS402 Theory of Automata Solved Subjective From Midterm Papers. MIDTERM SPRING 2012 CS402 Theory of Automata

Formal Languages and Compilers Lecture VI: Lexical Analysis

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

[Lexical Analysis] Bikash Balami

Lexical Analysis. Introduction

Languages and Compilers

UNIT -2 LEXICAL ANALYSIS

Implementation of Lexical Analysis

Implementation of Lexical Analysis

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

CSc 453 Lexical Analysis (Scanning)

Monday, August 26, 13. Scanners

Scanning. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

I have read and understand all of the instructions below, and I will obey the Academic Honor Code.

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

Wednesday, September 3, 14. Scanners

Dixita Kagathara Page 1

UNIT II LEXICAL ANALYSIS

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Compiler Construction

Regular Languages and Regular Expressions

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

Lexical Analysis/Scanning

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Theory of Computations Spring 2016 Practice Final Exam Solutions

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 3

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

ECS 120 Lesson 7 Regular Expressions, Pt. 1

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Transcription:

CS 432 Fall 2017 Mike Lam, Professor Finite Automata Conversions and Lexing

Finite Automata Key result: all of the following have the same expressive power (i.e., they all describe regular languages): Regular expressions (REs) Non-deterministic finite automata (NFAs) Deterministic finite automata (DFAs) Proof by construction An algorithm exists to convert any RE to an NFA An algorithm exists to convert any NFA to a DFA An algorithm exists to convert any DFA to an RE For every regular language, there exists a minimal DFA Has the fewest number of states of all DFAs equivalent to RE

Finite Automata Finite automata transitions: Kleene's construction Hopcroft's algorithm (minimize) Regex NFA DFA Lexer Thompson's construction Subset construction Lexer generators Brzozowski's algorithm (direct to minimal DFA) (dashed lines indicate transitions to a minimized DFA)

Finite Automata Conversions RE to NFA: Thompson's construction Core insight: inductively build up NFA using templates Core concept: use null transitions to build NFA quickly NFA to DFA: Subset construction Core insight: DFA nodes represent subsets of NFA nodes Core concept: use null closure to calculate subsets DFA minimization: Hopcroft s algorithm Core insight: create partitions, then keep splitting DFA to RE: Kleene's construction Core insight: repeatedly eliminate states by combining regexes

Thompson's Construction Basic idea: create NFA inductively, bottom-up Base case: Start with individual alphabet symbols (see below) Inductive case: Combine by adding new states and null/epsilon transitions Templates for the three basic operations Invariant: The NFA always has exactly one start state and one accepting state a

Thompson's: Concatenation A B

Thompson's: Concatenation AB

Thompson's: Union A B

Thompson's: Union A B

Thompson's: Closure A

Thompson's: Closure A* ε

Subset construction Basic idea: create DFA incrementally Each DFA state represents a subset of NFA states Use null closure operation to collapse null/epsilon transitions Null closure: all states reachable via epsilon transitions i.e., where can we go for free? Simulates running all possible paths through the NFA Null closure of A = { A } Null closure of B = { B, D } Null closure of C = Null closure of D =

Subset construction Basic idea: create DFA incrementally Each DFA state represents a subset of NFA states Use null closure operation to collapse null/epsilon transitions Null closure: all states reachable via epsilon transitions i.e., where can we go for free? Simulates running all possible paths through the NFA Null closure of A = { A } Null closure of B = { B, D } Null closure of C = { C, D } Null closure of D = { D }

Subset construction Basic idea: create DFA incrementally Each DFA state represents a subset of NFA states Use null closure operation to collapse null/epsilon transitions Null closure: all states reachable via epsilon transitions i.e., where can we go for free? Simulates running all possible paths through the NFA Null closure of A = { A } Null closure of B = { B, D } Null closure of C = { C, D } Null closure of D = { D }

Subset Example

Subset Example

Subset Example a {B,D} {A} b {C,D}

Hopcroft s DFA Minimization Split into two partitions (final & non-final) Keep splitting a partition while there are states with differing behaviors Two states transition to differing partitions on the same symbol Or one state transitions on a symbol and another doesn t When done, each partition becomes a single state a {B,D} {A} Same behavior; collapse! b {C,D} a,b {B,C,D} {A}

Hopcroft s DFA Minimization Split into two partitions (final & non-final) Keep splitting a partition while there are states with differing behaviors Two states transition to differing partitions on the same symbol Or one state transitions on a symbol and another doesn t When done, each partition becomes a single state a {A} b a {B,D} {C,D} Differing behavior on a ; split! {A} a {B,D} a b {C,D}

Kleene's Construction Replace edge labels with REs "a" "a" and "a,b" "a b" Eliminate states by combining REs See pattern below; apply pairwise around each state to be eliminated Repeat until only one or two states remain Build final RE One state with "A" self-loop "A*" Two states: see pattern below Eliminating states: A B C Combining final two states: A B C D D AB*C D A*B(C DA*B)*

NFA/DFA complexity What are the time and space requirements to... Build an NFA? Run an NFA? Build a DFA? Run a DFA?

NFA/DFA complexity Thompson's construction At most two new states and four transitions per regex character Thus, a linear space increase with respect to the # of regex characters Constant # of operations per increase means linear time as well NFA execution Proportional to both NFA size and input string size Must track multiple simultaneous current states Subset construction Potential exponential state space explosion A n-state NFA could require up to 2 n DFA states However, this rarely happens in practice DFAs execution Proportional to input string size only (only track a single current state)

NFA/DFA complexity NFAs build quicker (linear) but run slower Better if you will only run the FA a few times Or if you need features that are difficult to implement with DFAs DFAs build slower but run faster (linear) Better if you will run the FA many times NFA DFA Build time O(m) O(2 m ) Run time O(m n) O(n) m = length of regular expression n = length of input string

Lexers Auto-generated Table-driven: generic scanner, auto-generated tables Direct-coded: hard-code transitions using jumps Common tools: lex/flex and similar Hand-coded Better I/O performance (i.e., buffering) More efficient interfacing w/ other phases

Handling Keywords Issue: keywords are valid identifiers Option 1: Embed into NFA/DFA Separate regex for keywords Easier/faster for generated scanners Option 2: Use lookup table Scan as identifier then check for a keyword Easier for hand-coded scanners (Thus, this is probably easier for P2)