Implementation of Lexical Analysis

Similar documents
Implementation of Lexical Analysis

Implementation of Lexical Analysis

Implementation of Lexical Analysis

Implementation of Lexical Analysis

Implementation of Lexical Analysis. Lecture 4

Lexical Analysis. Implementation: Finite Automata

Implementation of Lexical Analysis

Lexical Analysis. Chapter 2

Lexical Analysis. Lecture 2-4

Introduction to Lexical Analysis

Lexical Analysis. Lecture 3-4

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

Kinder, Gentler Nation

Lexical Analysis. Finite Automata. (Part 2 of 2)

CSE450. Translation of Programming Languages. Automata, Simple Language Design Principles

Compiler course. Chapter 3 Lexical Analysis

Chapter 3 Lexical Analysis

Lexical Analysis. Finite Automata

Cunning Plan. Informal Sketch of Lexical Analysis. Issues in Lexical Analysis. Specifying Lexers

Lexical Analysis. Lecture 3. January 10, 2018

Administrativia. Extra credit for bugs in project assignments. Building a Scanner. CS164, Fall Recall: The Structure of a Compiler

Lexical Analysis. Finite Automata

Formal Languages and Compilers Lecture VI: Lexical Analysis

Compiler phases. Non-tokens

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CS308 Compiler Principles Lexical Analyzer Li Jiang

Front End: Lexical Analysis. The Structure of a Compiler

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

Compiler Construction

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Introduction to Lexical Analysis

UNIT -2 LEXICAL ANALYSIS

Lexical Analyzer Scanner

CSc 453 Lexical Analysis (Scanning)

Lexical Analyzer Scanner

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Automating Construction of Lexers

ECS 120 Lesson 7 Regular Expressions, Pt. 1

Compiler Construction

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

(Refer Slide Time: 0:19)

Structure of Programming Languages Lecture 3

6 NFA and Regular Expressions

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Writing a Lexical Analyzer in Haskell (part II)

Zhizheng Zhang. Southeast University

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

Lex Spec Example. Int installid() {/* code to put id lexeme into string table*/}

Figure 2.1: Role of Lexical Analyzer

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

G52LAC Languages and Computation Lecture 6

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

CS415 Compilers. Lexical Analysis

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

The Language for Specifying Lexical Analyzer

Lexical Analysis 1 / 52

Lexical Analysis. Prof. James L. Frankel Harvard University

Lexical Analysis. Introduction

Midterm I (Solutions) CS164, Spring 2002

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

Dr. D.M. Akbar Hussain

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Lexical Error Recovery

CS2 Language Processing note 3

DFA: Automata where the next state is uniquely given by the current state and the current input character.

Tasks of the Tokenizer

Regular Languages and Regular Expressions

CS 314 Principles of Programming Languages. Lecture 3

Lecture 3: Lexical Analysis

Formal Definition of Computation. Formal Definition of Computation p.1/28

CSE 105 THEORY OF COMPUTATION

Finite Automata Part Three

CSE302: Compiler Design

2. Lexical Analysis! Prof. O. Nierstrasz!

Compiler Construction

Lexical Analysis - 2

Lecture 9 CIS 341: COMPILERS

CS 310: State Transition Diagrams

Monday, August 26, 13. Scanners

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Wednesday, September 3, 14. Scanners

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

Transcription:

Implementation of Lexical Analysis Lecture 4 (Modified by Professor Vijay Ganesh)

Tips on Building Large Systems KISS (Keep It Simple, Stupid!) Don t optimize prematurely Design systems that can be tested It is easier to modify a working system than to get a system working

Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs) Implementation of regular expressions RegExp => NFA => DFA => Tables

Notation There is variation in regular expression notation Union: A B A + B Option: A + A? Range: a + b + + z [a-z] Excluded range: complement of [a-z] [^a-z]

Regular Expressions in Lexical Specification Last lecture: a specification for the predicate s L(R) But a yes/no answer is not enough! Instead: partition the input into tokens We adapt regular expressions to this goal

Regular Expressions => Lexical Spec. (). Write a rexp for the lexemes of each token Number = digit + Keyword = if + else + Identifier = letter (letter + digit)* OpenPar = (

Regular Expressions => Lexical Spec. (2) 2. Construct R, matching all lexemes for all tokens R = Keyword + Identifier + Number + = R + R 2 +

Regular Expressions => Lexical Spec. (3) 3. Let input be x x n For i n check x x i L(R) 4. If success, then we know that x x i L(R j ) for some j 5. Remove x x i from input and go to (3)

Ambiguities () There are ambiguities in the algorithm How much input is used? What if x x i L(R) and also x x K L(R) Rule: Pick longest possible string in L(R) The maximal munch

Ambiguities (2) Which token is used? What if x x i L(R j ) and also x x i L(R k ) Rule: use rule listed first (j if j < k) Treats if as a keyword, not an identifier

Error Handling What if No rule matches a prefix of input? Problem: Can t just get stuck Solution: Write a rule matching all bad strings Put it last (lowest priority)

Summary Regular expressions provide a concise notation for string patterns Use in lexical analysis requires small extensions To resolve ambiguities To handle errors Good algorithms known Require only single pass over the input Few operations per character (table lookup)

Finite Automata Regular expressions = specification Finite automata = implementation A finite automaton consists of An input alphabet Σ A set of states S A start state n A set of accepting states F S A set of transitions state input state

Finite Automata Transition Is read s a s 2 In state s on input a go to state s 2 If end of input and in accepting state => accept Otherwise => reject

Finite Automata State Graphs A state The start state An accepting state A transition a

A Simple Example A finite automaton that accepts only

Another Simple Example A finite automaton accepting any number of s followed by a single 0 Alphabet: {0,} 0

And Another Example Alphabet {0,} What language does this recognize? 0 0 0

Epsilon Moves Another kind of transition: -moves A Machine can move from state A to state B without reading input B

Deterministic and Nondeterministic Automata Deterministic Finite Automata (DFA) One transition per input per state No -moves Nondeterministic Finite Automata (NFA) Can have multiple transitions for one input in a given state Can have -moves

Execution of Finite Automata A DFA can take only one path through the state graph Completely determined by input NFAs can choose Whether to make -moves Which of multiple transitions for a single input to take

Acceptance of NFAs An NFA can get into multiple states 0 0 Input: 0 0 0 Rule: NFA accepts if it can get to a final state

NFA vs. DFA () NFAs and DFAs recognize the same set of languages (regular languages) DFAs are faster to execute There are no choices to consider

NFA vs. DFA (2) For a given language NFA can be simpler than DFA NFA 0 0 0 DFA 0 DFA can be exponentially larger than NFA 0 0

Regular Expressions to Finite Automata High-level sketch NFA Regular expressions DFA Lexical Specification Table-driven Implementation of DFA

Regular Expressions to NFA () For each kind of rexp, define an NFA Notation: NFA for rexp M M For For input a a

Regular Expressions to NFA (2) For AB A B For A + B B A

Regular Expressions to NFA (3) For A* A

Example of RegExp -> NFA conversion Consider the regular expression (+0)* The NFA is A B C E 0 D F G H I J

NFA to DFA: The Trick Simulate the NFA Each state of DFA = a non-empty subset of states of the NFA Start state = the set of NFA states reachable through -moves from NFA start state Add a transition S a S to DFA iff S is the set of NFA states reachable from any state in S after seeing the input a, considering - moves as well

NFA to DFA. Remark An NFA may be in many states at any time How many different states? If there are N states, the NFA must be in some subset of those N states How many subsets are there? 2 N - = finitely many

NFA -> DFA Example A B ABCDHI 0 C D 0 E F FGHIABCD 0 EJGHIABCD G H I J 0

Implementation A DFA can be implemented by a 2D table T One dimension is states Other dimension is input symbol For every transition S i a S k define T[i,a] = k DFA execution If in state S i and input a, read T[i,a] = k and skip to state S k Very efficient

Table Implementation of a DFA S 0 T 0 U 0 0 S T U T T U U T U

Implementation (Cont.) NFA -> DFA conversion is at the heart of tools such as flex But, DFAs can be huge In practice, flex-like tools trade off speed for space in the choice of NFA and DFA representations