Kinder, Gentler Nation

Similar documents
Lexical Analysis. Finite Automata. (Part 2 of 2)

Implementation of Lexical Analysis

Implementation of Lexical Analysis

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

Implementation of Lexical Analysis

CSE450. Translation of Programming Languages. Automata, Simple Language Design Principles

Lexical Analysis. Chapter 2

Lexical Analysis. Lecture 2-4

Introduction to Lexical Analysis

Implementation of Lexical Analysis

Implementation of Lexical Analysis

Implementation of Lexical Analysis. Lecture 4

Compiler course. Chapter 3 Lexical Analysis

Chapter 3 Lexical Analysis

Lexical Analysis. Lecture 3-4

Lexical Analysis. Implementation: Finite Automata

Implementation of Lexical Analysis

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

Administrativia. Extra credit for bugs in project assignments. Building a Scanner. CS164, Fall Recall: The Structure of a Compiler

Lexical Analysis. Lecture 3. January 10, 2018

Formal Languages and Compilers Lecture VI: Lexical Analysis

Front End: Lexical Analysis. The Structure of a Compiler

Compiler phases. Non-tokens

Writing a Lexical Analyzer in Haskell (part II)

Cunning Plan. Informal Sketch of Lexical Analysis. Issues in Lexical Analysis. Specifying Lexers

Lecture 9 CIS 341: COMPILERS

Automating Construction of Lexers

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

Monday, August 26, 13. Scanners

Wednesday, September 3, 14. Scanners

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Lexical Analysis. Finite Automata

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

CMSC 350: COMPILER DESIGN

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

6 NFA and Regular Expressions

UNIT -2 LEXICAL ANALYSIS

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

CS415 Compilers. Lexical Analysis

Lexical Analysis. Finite Automata

CSc 453 Lexical Analysis (Scanning)

CS 310: State Transition Diagrams

Structure of Programming Languages Lecture 3

Lexical Analyzer Scanner

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Lexical Analyzer Scanner

Programming Languages and Compilers (CS 421)

n (0 1)*1 n a*b(a*) n ((01) (10))* n You tell me n Regular expressions (equivalently, regular 10/20/ /20/16 4

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Chapter 3: Lexing and Parsing

In One Slide. Outline. LR Parsing. Table Construction

ECS 120 Lesson 7 Regular Expressions, Pt. 1

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

Semantic Analysis. Lecture 9. February 7, 2018

Introduction to Lexical Analysis

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) Programming Languages & Compilers. Major Phases of a Compiler

Compiler Construction

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

CSE302: Compiler Design

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

Midterm I (Solutions) CS164, Spring 2002

Figure 2.1: Role of Lexical Analyzer

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Formal Definition of Computation. Formal Definition of Computation p.1/28

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

Programming Languages and Compilers (CS 421)

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) I. Major Phases of a Compiler. Programming Languages & Compilers

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

CSE 105 THEORY OF COMPUTATION

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

Lexical Error Recovery

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

CT32 COMPUTER NETWORKS DEC 2015

Lexical Analysis - 2

CS308 Compiler Principles Lexical Analyzer Li Jiang

Zhizheng Zhang. Southeast University

Lexical Analysis. Introduction

CSE 105 THEORY OF COMPUTATION

Lex Spec Example. Int installid() {/* code to put id lexeme into string table*/}

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

Converting a DFA to a Regular Expression JP

Lecture 3: Lexical Analysis

CS 314 Principles of Programming Languages. Lecture 3

CSCI312 Principles of Programming Languages!

Decision Properties of RLs & Automaton Minimization

lec3:nondeterministic finite state automata

Torben./Egidius Mogensen. Introduction. to Compiler Design. ^ Springer

Bottom-Up Parsing. Lecture 11-12

Compiler Construction

CSE 105 THEORY OF COMPUTATION

Transcription:

Lexical Analysis Finite Automata (Part 2 of 2) # Kinder, Gentler Nation In our post drop-deadline world things get easier. While we re here: reading quiz. #2 Summary Regular expressions provide a concise notation for string patterns Use in lexical analysis requires small extensions To resolve ambiguities To handle errors Good algorithms known (next) Require only single pass over the input Few operations per character (table lookup) #3

Finite Automata Regular expressions = specification Finite automata = implementation A finite automaton consists of An input alphabet Σ A set of states S A start state n A set of accepting states F S A set of transitions state input state #4 Finite Automata Transition s a s 2 Is read In state s on input a go to state s 2 If end of input (or no transition possible) If in accepting state accept Otherwise reject #5 Finite Automata State Graphs A state The start state An accepting state A transition a #6 2

A Simple Example A finite automaton that accepts only A finite automaton accepts a string if we can follow transitions labeled with the characters in the string from the start to some accepting state #7 Another Simple Example A finite automaton accepting any number of s followed by a single Alphabet Σ= {,} Check that is accepted but is not #8 And Another Example Alphabet Σ = {,} What language does this recognize? #9 3

And Another Example Alphabet still Σ = {, } The operation of the automaton is not completely defined by the input On input the automaton could be in either state # Epsilon Moves Another kind of transition: -moves A Machine can move from state A to state B without reading input B # Deterministic and Nondeterministic Automata Deterministic Finite Automata (DFA) One transition per input per state No -moves Nondeterministic Finite Automata (NFA) Can have multiple transitions for one input in a given state Can have -moves Finite automata have finite memory Need only to encode the current state #2 4

Execution of Finite Automata A DFA can take only one path through the state graph Completely determined by input NFAs can choose Whether to make -moves Which of multiple transitions for a single input to take #3 Acceptance of NFAs An NFA can get into multiple states Input: Rule: NFA accepts if it can get in a final state #4 NFAs and DFAs recognize the same set of languages (regular languages) They have the same expressive power NFA vs. DFA () DFAs are easier to implement There are no choices to consider #5 5

NFA vs. DFA (2) For a given language the NFA can be simpler than the DFA NFA DFA DFA can be exponentially larger than NFA #6 Regular Expressions to Finite Automata High-level sketch NFA Regular expressions DFA Lexical Specification Table-driven Implementation of DFA #7 Regular Expressions to NFA () For each kind of rexp, define an NFA Notation: NFA for rexp A A For For input a a #8 6

Regular Expressions to NFA (2) For AB A B For A B B A #9 Regular Expressions to NFA (3) For A* A #2 Example of RegExp -> NFA conversion Consider the regular expression ( )* The NFA is A B C D E F G H I J #2 7

Next NFA Regular expressions DFA Lexical Specification Table-driven Implementation of DFA #22 NFA to DFA: The Trick Simulate the NFA Each state of DFA = a non-empty subset of states of the NFA Start state = the set of NFA states reachable through -moves from NFA start state Add a transition S a S to DFA iff S is the set of NFA states reachable from the states in S after seeing the input a considering -moves as well #23 NFA DFA Example A B ABCDHI C D E G H F I J FGABCDHI EJGABCDHI #24 8

NFA DFA: Remark An NFA may be in many states at any time How many different states? If there are N states, the NFA must be in some subset of those N states How many non-empty subsets are there? 2 N - = finitely many #25 Implementation A DFA can be implemented by a 2D table T One dimension is states Other dimension is input symbols For every transition S i a S k define T[i,a] = k DFA execution If in state S i and input a, read T[i,a] = k and skip to state S k Very efficient #26 Table Implementation of a DFA S T U S T U T T T U U U #27 9

Implementation (Cont.) NFA DFA conversion is at the heart of tools such as flex or ocamllex But, DFAs can be huge In practice, flex-like tools trade off speed for space in the choice of NFA and DFA representations #28 PA: Lexical Analysis Correctness is job #. And job #2 and #3! Tips on building large systems: Keep it simple Design systems that can be tested Don t optimize prematurely It is easier to modify a working system than to get a system working #29 Lexical Analyzer Generator Tools like lex and flex and ocamllex will build lexers for you! You will use this for PA List of Regexps with code snippets Lexical Analyzer Generator Lexer Source Code I ll explain ocamllex; others are similar See PA documentation #3

Ocamllex lexer.mll file { (* raw preamble code type declarations, utility functions, etc. *) } let re_name i = re i rule normal_tokens = parse re { token } re 2 { token 2 } and special t okens = parse re n { token n } #3 Example lexer.mll { type token = Tok_Integer of int (* 23 *) Tok_Divide (* / *) } let digit = [ 9 ] rule initial = parse / { Tok_Divide } digit digit* { let token_string = Lexing.lexeme lexbuf in let token_val = int_of_string token_string in Tok_Integer(token_val) } _ { Printf.printf Error!\n ; exit } #32 Adding Winged Comments { type token = Tok_Integer of int (* 23 *) Tok_Divide (* / *) } let digit = [ 9 ] rule initial = parse // { eol_comment } / { Tok_Divide } digit digit* { let token_string = Lexing.lexeme lexbuf in let token_val = int_of_string token_string in Tok_Integer(token_val) } _ { Printf.printf Error!\n ; exit } and eol_comment = parse \n { initial lexbuf } _ { eol_comment lexbuf } #33

Using Lexical Analyzer Generators $ ocamllex lexer.mll 45 states, 83 transitions, table size 462 bytes (* your main.ml file *) let file_input = open_in file.cl in let lexbuf = Lexing.from_channel file_input in let token = Lexer.initial lexbuf in match token with Tok_Divide -> printf Divide Token!\n Tok_Integer(x) -> printf Integer Token = %d\n x #34 How Big Is PA? The reference lexer.mll file is 88 lines Perhaps another 2 lines to keep track of input line numbers Perhaps another 2 lines to open the file and get a list of tokens Then 65 lines to serialize the output I m sure it s possible to be smaller! Conclusion: This isn t a code slog, it s about careful forethought and precision. #35 Homework Friday: PA due Next Tuesday: Chapters 2.3 2.3.2 Optional Wikipedia article #36 2