Lexical Analysis. Finite Automata. (Part 2 of 2)

Similar documents
Kinder, Gentler Nation

Implementation of Lexical Analysis

Implementation of Lexical Analysis

Implementation of Lexical Analysis

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

Lexical Analysis. Chapter 2

Lexical Analysis. Lecture 2-4

Introduction to Lexical Analysis

CSE450. Translation of Programming Languages. Automata, Simple Language Design Principles

Compiler course. Chapter 3 Lexical Analysis

Implementation of Lexical Analysis

Chapter 3 Lexical Analysis

Implementation of Lexical Analysis. Lecture 4

Lexical Analysis. Lecture 3-4

Implementation of Lexical Analysis

Implementation of Lexical Analysis

Lexical Analysis. Implementation: Finite Automata

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

Administrativia. Extra credit for bugs in project assignments. Building a Scanner. CS164, Fall Recall: The Structure of a Compiler

Compiler phases. Non-tokens

Lexical Analysis. Lecture 3. January 10, 2018

Writing a Lexical Analyzer in Haskell (part II)

Cunning Plan. Informal Sketch of Lexical Analysis. Issues in Lexical Analysis. Specifying Lexers

Formal Languages and Compilers Lecture VI: Lexical Analysis

Lecture 9 CIS 341: COMPILERS

Intro To Parsing. Step By Step

Lexical Analysis. Finite Automata

Front End: Lexical Analysis. The Structure of a Compiler

CMSC 350: COMPILER DESIGN

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

UNIT -2 LEXICAL ANALYSIS

Monday, August 26, 13. Scanners

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

Wednesday, September 3, 14. Scanners

Lexical Analysis. Finite Automata

Introduction to Lexical Analysis

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

CSc 453 Lexical Analysis (Scanning)

CS415 Compilers. Lexical Analysis

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Chapter 3: Lexing and Parsing

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Programming Languages and Compilers (CS 421)

n (0 1)*1 n a*b(a*) n ((01) (10))* n You tell me n Regular expressions (equivalently, regular 10/20/ /20/16 4

Lexical Analyzer Scanner

Lexical Analyzer Scanner

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Compiler Construction

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

6 NFA and Regular Expressions

Automating Construction of Lexers

CS 310: State Transition Diagrams

Structure of Programming Languages Lecture 3

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Lecture 3: Lexical Analysis

CSE302: Compiler Design

CS 314 Principles of Programming Languages. Lecture 3

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

Semantic Analysis. Lecture 9. February 7, 2018

Lexical Analysis. Introduction

Lexical Analysis - 2

Figure 2.1: Role of Lexical Analyzer

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) Programming Languages & Compilers. Major Phases of a Compiler

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages

Formal Definition of Computation. Formal Definition of Computation p.1/28

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

Lexical Error Recovery

Introduction to Parsing. Lecture 8

CS 314 Principles of Programming Languages

Lex Spec Example. Int installid() {/* code to put id lexeme into string table*/}

Compiler Construction

CS308 Compiler Principles Lexical Analyzer Li Jiang

Alternation. Kleene Closure. Definition of Regular Expressions

Programming Languages and Compilers (CS 421)

Programming Languages & Compilers. Programming Languages and Compilers (CS 421) I. Major Phases of a Compiler. Programming Languages & Compilers

ECS 120 Lesson 7 Regular Expressions, Pt. 1

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Zhizheng Zhang. Southeast University

Midterm I (Solutions) CS164, Spring 2002

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

CSE Lecture 4: Scanning and parsing 28 Jan Nate Nystrom University of Texas at Arlington

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Week 2: Syntax Specification, Grammars

CS143 Midterm Fall 2008

In One Slide. Outline. LR Parsing. Table Construction

G52LAC Languages and Computation Lecture 6

Transcription:

# Lexical Analysis Finite Automata (Part 2 of 2)

PA0, PA Although we have included the tricky file ends without a newline testcases in previous years, students made good cases against them (e.g., they test I/O and not the algorithm) so we are dropping them from PA. You can submit new rosetta.yada files for PA, so you can fix errors from PA0. #2

#3 Reading Quiz! Are practical parsers and scanners based on deterministic or non-deterministic automata? How can regular expressions be used to specify nested constructs? How is a two-dimensional transition table used in tabledriven scanning?

#4 Cunning Plan Regular expressions provide a concise notation for string patterns Use in lexical analysis requires small extensions To resolve ambiguities To handle errors Good algorithms known (next) Require only single pass over the input Few operations per character (table lookup)

#5 One-Slide Summary Finite automata are formal models of computation that can accept regular languages corresponding to regular expressions. Nondeterministic finite automata (NFA) feature epsilon transitions and multiple outgoing edges for the same input symbol. Regular expressions can be converted to NFAs. Tools will generate DFA-based lexer code for you from regular expressions.

#6 Finite Automata Regular expressions = specification Finite automata = implementation A finite automaton consists of An input alphabet Σ A set of states S A start state n A set of accepting states F µ S A set of transitions state input state

#7 Finite Automata Transition s a s 2 Is read In state s on input a go to state s 2 If end of input (or no transition possible) If in accepting state ) accept Otherwise ) reject

#8 Finite Automata State Graphs A state The start state An accepting state A transition a

A Simple Example A finite automaton that accepts only A finite automaton accepts a string if we can follow transitions labeled with the characters in the string from the start to some accepting state #9

Another Simple Example A finite automaton accepting any number of s followed by a single 0 Alphabet Σ = {0,} 0 Check that 0 is accepted but 0 is not #0

# And Another Example Alphabet Σ = {0,} What language does this recognize? 0 0 0

#2 And Another Example Alphabet still Σ = { 0, } The operation of the automaton is not completely defined by the input On input the automaton could be in either state

#3 Epsilon Moves Another kind of transition: -moves A Machine can move from state A to state B without reading input B

#4 Deterministic and Nondeterministic Automata Deterministic Finite Automata (DFA) One transition per input per state No -moves Nondeterministic Finite Automata (NFA) Can have multiple transitions for one input in a given state Can have -moves Finite automata have finite memory Need only to encode the current state

#5 Execution of Finite Automata A DFA can take only one path through the state graph Completely determined by input NFAs can choose Whether to make -moves Which of multiple transitions for a single input to take

#6 Acceptance of NFAs An NFA can get into multiple states 0 Input: 0 0 Rule: NFA accepts if it can get in a final state

NFA vs. DFA () NFAs and DFAs recognize the same set of languages (regular languages) They have the same expressive power DFAs are easier to implement There are no choices to consider #7

#8 NFA vs. DFA (2) For a given language the NFA can be simpler than the DFA NFA 0 0 0 DFA 0 0 0 DFA can be exponentially larger than NFA

#9 Regular Expressions to Finite High-level sketch Automata NFA Regular expressions DFA Lexical Specification Table-driven Implementation of DFA

#20 Regular Expressions to NFA () For each kind of rexp, define an NFA Notation: NFA for rexp A A For For input a a

#2 Regular Expressions to NFA (2) For AB A B For A B B A

#22 Regular Expressions to NFA (3) For A* A

#23 Example of RegExp -> NFA Conversion Consider the regular expression ( 0)* The NFA is A B C D 0 E F G H I J

Overarching Plan NFA Regular expressions DFA Lexical Specification Table-driven Implementation of DFA Thomas Cole Evening in Arcady (843) #24

#25 NFA to DFA: The Trick Simulate the NFA Each state of DFA = a non-empty subset of states of the NFA Start state = the set of NFA states reachable through -moves from NFA start state Add a transition S a S to DFA iff S is the set of NFA states reachable from the states in S after seeing the input a considering -moves as well

#26 NFA! DFA Example A B ABCDHI 0 C D 0 E F FGABCDHI 0 EJGABCDHI G H I J 0

#27 NFA! DFA: Remark An NFA may be in many states at any time How many different states? If there are N states, the NFA must be in some subset of those N states How many non-empty subsets are there? 2 N - = finitely many

#28 Implementation A DFA can be implemented by a 2D table T One dimension is states Other dimension is input symbols For every transition S i a S k define T[i,a] = k DFA execution If in state S i and input a, read T[i,a] = k and skip to state S k Very efficient

#29 Table Implementation of a DFA S 0 T 0 U 0 S T U 0 T T T U U U

#30 Implementation (Cont.) NFA! DFA conversion is at the heart of tools such as flex or ocamllex But, DFAs can be huge In practice, flex-like tools trade off speed for space in the choice of NFA and DFA representations

#3 PA2: Lexical Analysis Correctness is job #. And job #2 and #3! Tips on building large systems: Keep it simple Design systems that can be tested Don t optimize prematurely It is easier to modify a working system than to get a system working

Lexical Analyzer Generator Tools like lex and flex and ocamllex will build lexers for you! You will use this for PA List of Regexps with code snippets Lexical Analyzer Generator Lexer Source Code I ll explain ocamllex; others are similar See PA2 documentation #32

#33 Ocamllex lexer.mll file { (* raw preamble code type declarations, utility functions, etc. *) } let re_name i = re i rule normal_tokens = parse re { token } re 2 { token 2 } and special t okens = parse re n { token n }

#34 Example lexer.mll { type token = Tok_Integer of int (* 23 *) Tok_Divide (* / *) } let digit = [ 0 9 ] rule initial = parse / { Tok_Divide } digit digit* { let token_string = Lexing.lexeme lexbuf in let token_val = int_of_string token_string in Tok_Integer(token_val) } _ { Printf.printf Error!\n ; exit }

#35 Adding Winged Comments { type token = Tok_Integer of int (* 23 *) Tok_Divide (* / *) } let digit = [ 0 9 ] rule initial = parse // { eol_comment } / { Tok_Divide } digit digit* { let token_string = Lexing.lexeme lexbuf in let token_val = int_of_string token_string in Tok_Integer(token_val) } _ { Printf.printf Error!\n ; exit } and eol_comment = parse \n { initial lexbuf } _ { eol_comment lexbuf }

#36 Using Lexical Analyzer Generators $ ocamllex lexer.mll 45 states, 083 transitions, table size 4602 bytes (* your main.ml file *) let file_input = open_in file.cl in let lexbuf = Lexing.from_channel file_input in let token = Lexer.initial lexbuf in match token with Tok_Divide -> printf Divide Token!\n Tok_Integer(x) -> printf Integer Token = %d\n x

#37 How Big Is PA2? The reference lexer.mll file is 88 lines Perhaps another 20 lines to keep track of input line numbers Perhaps another 20 lines to open the file and get a list of tokens Then 65 lines to serialize the output I m sure it s possible to be smaller! Conclusion: This isn t a code slog, it s about careful forethought and precision.

#38 Homework Wednesday: PA due Thursday: Chapters 2.3 2.3.2 Optional Wikipedia article