Implementation of Lexical Analysis

Similar documents
Implementation of Lexical Analysis

Implementation of Lexical Analysis

Implementation of Lexical Analysis

Implementation of Lexical Analysis

Implementation of Lexical Analysis

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

Implementation of Lexical Analysis. Lecture 4

Lexical Analysis. Implementation: Finite Automata

Lexical Analysis. Lecture 2-4

Lexical Analysis. Chapter 2

Lexical Analysis. Lecture 3-4

Introduction to Lexical Analysis

Administrativia. Extra credit for bugs in project assignments. Building a Scanner. CS164, Fall Recall: The Structure of a Compiler

Kinder, Gentler Nation

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

Lexical Analysis. Finite Automata. (Part 2 of 2)

Chapter 3 Lexical Analysis

Compiler course. Chapter 3 Lexical Analysis

CSE450. Translation of Programming Languages. Automata, Simple Language Design Principles

Lexical Analysis. Finite Automata

Cunning Plan. Informal Sketch of Lexical Analysis. Issues in Lexical Analysis. Specifying Lexers

Lexical Analysis. Lecture 3. January 10, 2018

Lexical Analysis. Finite Automata

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Formal Languages and Compilers Lecture VI: Lexical Analysis

Compiler phases. Non-tokens

Finite automata. III. Finite automata: language recognizers. Nondeterministic Finite Automata. Nondeterministic Finite Automata with λ-moves

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

CS308 Compiler Principles Lexical Analyzer Li Jiang

Structure of Programming Languages Lecture 3

Introduction to Lexical Analysis

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

CSc 453 Lexical Analysis (Scanning)

Compiler Construction

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

ECS 120 Lesson 7 Regular Expressions, Pt. 1

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Writing a Lexical Analyzer in Haskell (part II)

UNIT -2 LEXICAL ANALYSIS

Lexical Analyzer Scanner

Lexical Analyzer Scanner

(Refer Slide Time: 0:19)

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Compiler Construction

Front End: Lexical Analysis. The Structure of a Compiler

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

The Language for Specifying Lexical Analyzer

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

Compiler Construction D7011E

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

CSC 467 Lecture 3: Regular Expressions

Compiler Construction LECTURE # 3

Lecture 9 CIS 341: COMPILERS

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Assignment 1 (Lexical Analyzer)

Bottom-Up Parsing. Lecture 11-12

CS 314 Principles of Programming Languages. Lecture 3

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

CMSC 350: COMPILER DESIGN

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

Finite Automata Part Three

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

CS415 Compilers. Lexical Analysis

G52LAC Languages and Computation Lecture 6

6 NFA and Regular Expressions

Lexical Analysis. Introduction

Lexical Analysis. Prof. James L. Frankel Harvard University

DFA: Automata where the next state is uniquely given by the current state and the current input character.

Assignment 1 (Lexical Analyzer)

Formal Definition of Computation. Formal Definition of Computation p.1/28

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

Tasks of the Tokenizer

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Safra's Büchi determinization algorithm

2. Lexical Analysis! Prof. O. Nierstrasz!

Introduction to Parsing. Lecture 8

Dr. D.M. Akbar Hussain

CS 314 Principles of Programming Languages

CS2 Language Processing note 3

Equivalence of NTMs and TMs

Figure 2.1: Role of Lexical Analyzer

Lex Spec Example. Int installid() {/* code to put id lexeme into string table*/}

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Week 2: Syntax Specification, Grammars

Bottom-Up Parsing. Lecture 11-12

CSE 105 THEORY OF COMPUTATION

Introduction to Lexing and Parsing

Zhizheng Zhang. Southeast University

Automating Construction of Lexers

Formal Languages. Formal Languages

Transcription:

Written ssignments W assigned today Implementation of Lexical nalysis Lecture 4 Due in one week y 5pm Turn in In class In box outside 4 Gates Electronically Prof. iken CS 43 Lecture 4 Prof. iken CS 43 Lecture 4 2 Tips on uilding Large Systems KISS (Keep It Simple, Stupid!) Don t optimize prematurely Design systems that can be tested It is easier to modify a working system than to get a system working Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite utomata (DFs) Non-deterministic Finite utomata (NFs) Implementation of regular expressions RegExp => NF => DF => Tables Prof. iken CS 43 Lecture 4 3 Prof. iken CS 43 Lecture 4 4 Notation There is variation in regular expression notation Union: + Option: +? Range: a + b + + z [a-z] Excluded range: complement of [a-z] [^a-z] Regular Expressions in Lexical Specification Last lecture: a specification for the predicate s L(R) ut a yes/no answer is not enough! Instead: partition the input into tokens We adapt regular expressions to this goal Prof. iken CS 43 Lecture 4 5 Prof. iken CS 43 Lecture 4 6

Regular Expressions => Lexical Spec. (). Write a rexp for the lexemes of each token Number = digit + Keyword = if + else + Identifier = letter (letter + digit)* OpenPar = ( Regular Expressions => Lexical Spec. (2) 2. Construct R, matching all lexemes for all tokens R = Keyword + Identifier + Number + = R + R 2 + Prof. iken CS 43 Lecture 4 7 Prof. iken CS 43 Lecture 4 8 Regular Expressions => Lexical Spec. (3) 3. Let input be x x n For i n check x x i L(R) 4. If success, then we know that x x i L(R j ) for some j 5. Remove x x i from input and go to (3) mbiguities () There are ambiguities in the algorithm How much input is used? What if x x i L(R) and also x x K L(R) Rule: Pick longest possible string in L(R) The maximal munch Prof. iken CS 43 Lecture 4 9 Prof. iken CS 43 Lecture 4 mbiguities (2) Which token is used? What if x x i L(R j ) and also x x i L(R k ) Rule: use rule listed first (j if j < k) Treats if as a keyword, not an identifier Error Handling What if No rule matches a prefix of input? Problem: Can t just get stuck Solution: Write a rule matching all bad strings Put it last (lowest priority) Prof. iken CS 43 Lecture 4 Prof. iken CS 43 Lecture 4 2 2

Summary Regular expressions provide a concise notation for string patterns Use in lexical analysis requires small extensions To resolve ambiguities To handle errors Good algorithms known Require only single pass over the input Few operations per character (table lookup) Finite utomata Regular expressions = specification Finite automata = implementation finite automaton consists of n input alphabet set of states S start state n set of accepting states F S set of transitions state input state Prof. iken CS 43 Lecture 4 3 Prof. iken CS 43 Lecture 4 4 Finite utomata Transition Finite utomata State Graphs state s a s 2 Is read In state s on input a go to state s 2 If end of input and in accepting state => accept Otherwise => reject The start state n accepting state transition a Prof. iken CS 43 Lecture 4 5 Prof. iken CS 43 Lecture 4 6 Simple Example finite automaton that accepts only nother Simple Example finite automaton accepting any number of s followed by a single lphabet: {,} Prof. iken CS 43 Lecture 4 7 Prof. iken CS 43 Lecture 4 8 3

nd nother Example lphabet {,} What language does this recognize? Epsilon Moves nother kind of transition: -moves Machine can move from state to state without reading input Prof. iken CS 43 Lecture 4 9 Prof. iken CS 43 Lecture 4 2 Deterministic and Nondeterministic utomata Deterministic Finite utomata (DF) One transition per input per state No -moves Nondeterministic Finite utomata (NF) Can have multiple transitions for one input in a given state Can have -moves Execution of Finite utomata DF can take only one path through the state graph Completely determined by input NFs can choose Whether to make -moves Which of multiple transitions for a single input to take Prof. iken CS 43 Lecture 4 2 Prof. iken CS 43 Lecture 4 22 cceptance of NFs n NF can get into multiple states NF vs. DF () NFs and DFs recognize the same set of languages (regular languages) Input: Rule: NF accepts if it can get to a final state DFs are faster to execute There are no choices to consider Prof. iken CS 43 Lecture 4 23 Prof. iken CS 43 Lecture 4 24 4

NF vs. DF (2) Regular Expressions to Finite utomata For a given language NF can be simpler than DF NF DF DF can be exponentially larger than NF High-level sketch Regular expressions Lexical Specification NF DF Table-driven Implementation of DF Prof. iken CS 43 Lecture 4 25 Prof. iken CS 43 Lecture 4 26 Regular Expressions to NF () Regular Expressions to NF (2) For each kind of rexp, define an NF Notation: NF for rexp M For For For input a M a For + Prof. iken CS 43 Lecture 4 27 Prof. iken CS 43 Lecture 4 28 Regular Expressions to NF (3) Example of RegExp -> NF conversion For * Consider the regular expression (+)* The NF is C E D F G H I J Prof. iken CS 43 Lecture 4 29 Prof. iken CS 43 Lecture 4 3 5

NF to DF: The Trick Simulate the NF Each state of DF = a non-empty subset of states of the NF Start state = the set of NF states reachable through -moves from NF start state dd a transition S a S to DF iff S is the set of NF states reachable from any state in S after seeing the input a, considering - moves as well Prof. iken CS 43 Lecture 4 3 NF to DF. Remark n NF may be in many states at any time How many different states? If there are N states, the NF must be in some subset of those N states How many subsets are there? 2 N - = finitely many Prof. iken CS 43 Lecture 4 32 NF -> DF Example Implementation CDHI C E G D H F I J FGHICD EJGHICD DF can be implemented by a 2D table T One dimension is states Other dimension is input symbol For every transition S i a S k define T[i,a] = k DF execution If in state S i and input a, read T[i,a] = k and skip to state S k Very efficient Prof. iken CS 43 Lecture 4 33 Prof. iken CS 43 Lecture 4 34 Table Implementation of a DF Implementation (Cont.) S T U NF -> DF conversion is at the heart of tools such as flex ut, DFs can be huge S T U T T U U T U In practice, flex-like tools trade off speed for space in the choice of NF and DF representations Prof. iken CS 43 Lecture 4 35 Prof. iken CS 43 Lecture 4 36 6