Implementation of Lexical Analysis

Similar documents
Implementation of Lexical Analysis

Implementation of Lexical Analysis

Implementation of Lexical Analysis

Implementation of Lexical Analysis

Implementation of Lexical Analysis

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

Implementation of Lexical Analysis. Lecture 4

Lexical Analysis. Implementation: Finite Automata

Lexical Analysis. Lecture 2-4

Lexical Analysis. Chapter 2

Lexical Analysis. Lecture 3-4

Introduction to Lexical Analysis

Administrativia. Extra credit for bugs in project assignments. Building a Scanner. CS164, Fall Recall: The Structure of a Compiler

Kinder, Gentler Nation

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

Lexical Analysis. Finite Automata. (Part 2 of 2)

Compiler course. Chapter 3 Lexical Analysis

Chapter 3 Lexical Analysis

CSE450. Translation of Programming Languages. Automata, Simple Language Design Principles

Lexical Analysis. Finite Automata

Cunning Plan. Informal Sketch of Lexical Analysis. Issues in Lexical Analysis. Specifying Lexers

Lexical Analysis. Lecture 3. January 10, 2018

Lexical Analysis. Finite Automata

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Formal Languages and Compilers Lecture VI: Lexical Analysis

Compiler phases. Non-tokens

Finite automata. III. Finite automata: language recognizers. Nondeterministic Finite Automata. Nondeterministic Finite Automata with λ-moves

CS308 Compiler Principles Lexical Analyzer Li Jiang

Introduction to Lexical Analysis

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Structure of Programming Languages Lecture 3

CSc 453 Lexical Analysis (Scanning)

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

ECS 120 Lesson 7 Regular Expressions, Pt. 1

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

The Language for Specifying Lexical Analyzer

Front End: Lexical Analysis. The Structure of a Compiler

Compiler Construction

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

Writing a Lexical Analyzer in Haskell (part II)

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

UNIT -2 LEXICAL ANALYSIS

Lexical Analyzer Scanner

Lexical Analyzer Scanner

(Refer Slide Time: 0:19)

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Compiler Construction

CS 314 Principles of Programming Languages. Lecture 3

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

Compiler Construction D7011E

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

CSC 467 Lecture 3: Regular Expressions

G52LAC Languages and Computation Lecture 6

Compiler Construction LECTURE # 3

Lexical Analysis. Prof. James L. Frankel Harvard University

Lecture 9 CIS 341: COMPILERS

Lexical Analysis. Introduction

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

Assignment 1 (Lexical Analyzer)

Bottom-Up Parsing. Lecture 11-12

CMSC 350: COMPILER DESIGN

Safra's Büchi determinization algorithm

Finite Automata Part Three

Tasks of the Tokenizer

CS415 Compilers. Lexical Analysis

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

2. Lexical Analysis! Prof. O. Nierstrasz!

CS 314 Principles of Programming Languages

6 NFA and Regular Expressions

Figure 2.1: Role of Lexical Analyzer

Automating Construction of Lexers

T Reactive Systems: Kripke Structures and Automata

DFA: Automata where the next state is uniquely given by the current state and the current input character.

Assignment 1 (Lexical Analyzer)

CSE 105 THEORY OF COMPUTATION

Lex Spec Example. Int installid() {/* code to put id lexeme into string table*/}

Zhizheng Zhang. Southeast University

Formal Definition of Computation. Formal Definition of Computation p.1/28

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

CSE302: Compiler Design

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

2. Syntax and Type Analysis

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Introduction to Parsing. Lecture 8

Dr. D.M. Akbar Hussain

Lexical Analysis 1 / 52

CSE 401/M501 Compilers

Transcription:

Written ssignments W assigned today Implementation of Lexical nalysis Lecture 4 Due in one week :59pm Electronic hand-in Prof. iken CS 43 Lecture 4 Prof. iken CS 43 Lecture 4 2 Tips on uilding Large Systems KISS (Keep It Simple, Stupid!) Don t optimize prematurely Design systems that can be tested It is easier to modify a working system than to get a system working Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite utomata (DFs) Non-deterministic Finite utomata (NFs) Implementation of regular expressions RegExp => NF => DF => Tables Prof. iken CS 43 Lecture 4 3 Prof. iken CS 43 Lecture 4 4

Notation There is variation in regular expression notation Regular Expressions in Lexical Specification Last lecture: a specification for the predicate s L(R) Union: + Option: +? Range: a + b + + z [a-z] Excluded range: complement of [a-z] [^a-z] ut a yes/no answer is not enough! Instead: partition the input into tokens We adapt regular expressions to this goal Prof. iken CS 43 Lecture 4 5 Prof. iken CS 43 Lecture 4 6 Regular Expressions => Lexical Spec. (). Write a rexp for the lexemes of each token Number = digit + Keyword = if + else + Identifier = letter (letter + digit)* OpenPar = ( Regular Expressions => Lexical Spec. (2) 2. Construct R, matching all lexemes for all tokens R = Keyword + Identifier + Number + = R + R 2 + Prof. iken CS 43 Lecture 4 7 Prof. iken CS 43 Lecture 4 8 2

Regular Expressions => Lexical Spec. (3) 3. Let input be x x n For i n check x x i L(R) 4. If success, then we know that x x i L(R j ) for some j 5. Remove x x i from input and go to (3) mbiguities () There are ambiguities in the algorithm How much input is used? What if x x i L(R) and also x x K L(R) Rule: Pick longest possible string in L(R) The maximal munch Prof. iken CS 43 Lecture 4 9 Prof. iken CS 43 Lecture 4 mbiguities (2) Which token is used? What if x x i L(R j ) and also x x i L(R k ) Rule: use rule listed first (j if j < k) Treats if as a keyword, not an identifier Error Handling What if No rule matches a prefix of input? Problem: Can t just get stuck Solution: Write a rule matching all bad strings Put it last (lowest priority) Prof. iken CS 43 Lecture 4 Prof. iken CS 43 Lecture 4 2 3

Summary Finite utomata Regular expressions provide a concise notation for string patterns Use in lexical analysis requires small extensions To resolve ambiguities To handle errors Good algorithms known Require only single pass over the input Few operations per character (table lookup) Regular expressions = specification Finite automata = implementation finite automaton consists of n input alphabet Σ set of states S start state n set of accepting states F S set of transitions state input state Prof. iken CS 43 Lecture 4 3 Prof. iken CS 43 Lecture 4 4 Finite utomata Transition Finite utomata State Graphs state Is read s a s 2 The start state In state s on input a go to state s 2 If end of input and in accepting state => accept Otherwise => reject n accepting state transition a Prof. iken CS 43 Lecture 4 5 Prof. iken CS 43 Lecture 4 6 4

Simple Example finite automaton that accepts only nother Simple Example finite automaton accepting any number of s followed by a single lphabet: {,} Prof. iken CS 43 Lecture 4 7 Prof. iken CS 43 Lecture 4 8 nd nother Example lphabet {,} What language does this recognize? Epsilon Moves nother kind of transition: -moves Machine can move from state to state without reading input Prof. iken CS 43 Lecture 4 9 Prof. iken CS 43 Lecture 4 2 5

Deterministic and Nondeterministic utomata Deterministic Finite utomata (DF) One transition per input per state No -moves Nondeterministic Finite utomata (NF) Can have multiple transitions for one input in a given state Can have -moves Execution of Finite utomata DF can take only one path through the state graph Completely determined by input NFs can choose Whether to make -moves Which of multiple transitions for a single input to take Prof. iken CS 43 Lecture 4 2 Prof. iken CS 43 Lecture 4 22 cceptance of NFs n NF can get into multiple states NF vs. DF () NFs and DFs recognize the same set of languages (regular languages) Input: Rule: NF accepts if it can get to a final state DFs are faster to execute There are no choices to consider Prof. iken CS 43 Lecture 4 23 Prof. iken CS 43 Lecture 4 24 6

NF vs. DF (2) Regular Expressions to Finite utomata For a given language NF can be simpler than DF NF DF DF can be exponentially larger than NF High-level sketch Regular expressions Lexical Specification NF DF Table-driven Implementation of DF Prof. iken CS 43 Lecture 4 25 Prof. iken CS 43 Lecture 4 26 Regular Expressions to NF () Regular Expressions to NF (2) For each kind of rexp, define an NF Notation: NF for rexp M For For For input a M a For + Prof. iken CS 43 Lecture 4 27 Prof. iken CS 43 Lecture 4 28 7

Regular Expressions to NF (3) Example of RegExp -> NF conversion For * Consider the regular expression (+)* The NF is C E D F G H I J Prof. iken CS 43 Lecture 4 29 Prof. iken CS 43 Lecture 4 3 NF to DF: The Trick Simulate the NF Each state of DF = a non-empty subset of states of the NF Start state = the set of NF states reachable through -moves from NF start state dd a transition S a S to DF iff S is the set of NF states reachable from any state in S after seeing the input a, considering - moves as well Prof. iken CS 43 Lecture 4 3 NF to DF. Remark n NF may be in many states at any time How many different states? If there are N states, the NF must be in some subset of those N states How many subsets are there? 2 N - = finitely many Prof. iken CS 43 Lecture 4 32 8

NF -> DF Example Implementation CDHI C D E G H F I J FGHICD EJGHICD DF can be implemented by a 2D table T One dimension is states Other dimension is input symbol For every transition S i a S k define T[i,a] = k DF execution If in state S i and input a, read T[i,a] = k and skip to state S k Very efficient Prof. iken CS 43 Lecture 4 33 Prof. iken CS 43 Lecture 4 34 Table Implementation of a DF Implementation (Cont.) S T U NF -> DF conversion is at the heart of tools such as flex ut, DFs can be huge S T U T T U U T U In practice, flex-like tools trade off speed for space in the choice of NF and DF representations Prof. iken CS 43 Lecture 4 35 Prof. iken CS 43 Lecture 4 36 9