Lexical Analysis. Prof. James L. Frankel Harvard University

Similar documents
COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CSE302: Compiler Design

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

2. Lexical Analysis! Prof. O. Nierstrasz!

CS308 Compiler Principles Lexical Analyzer Li Jiang

[Lexical Analysis] Bikash Balami

Lexical Analysis - 2

Front End: Lexical Analysis. The Structure of a Compiler

Zhizheng Zhang. Southeast University

UNIT II LEXICAL ANALYSIS

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

Lexical Analyzer Scanner

We use L i to stand for LL L (i times). It is logical to define L 0 to be { }. The union of languages L and M is given by

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Lexical Analyzer Scanner

Lexical Analysis. Chapter 2

Lexical Analysis. Lecture 3-4

Lexical Analysis. Implementation: Finite Automata

Lexical Analysis. Lecture 2-4

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

LANGUAGE TRANSLATORS

Chapter Seven: Regular Expressions

Decision, Computation and Language

Chapter Seven: Regular Expressions. Formal Language, chapter 7, slide 1

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

Lexical Analysis (ASU Ch 3, Fig 3.1)

Lecture 4: Syntax Specification

Languages and Compilers

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

6 NFA and Regular Expressions

Module 6 Lexical Phase - RE to DFA

Lexical Analysis/Scanning

Lexical Analysis. Sukree Sinthupinyo July Chulalongkorn University

ECS 120 Lesson 7 Regular Expressions, Pt. 1

Figure 2.1: Role of Lexical Analyzer

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Lexical Analysis. Lecture 3. January 10, 2018

Converting a DFA to a Regular Expression JP

Implementation of Lexical Analysis

Regular Languages and Regular Expressions

Lexical Analysis 1 / 52

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

CSE302: Compiler Design

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Lexical Analysis. Introduction

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

UNIT I- LEXICAL ANALYSIS. 1.Interpreter: It is one of the translators that translate high level language to low level language.

Introduction to Lexical Analysis

Formal Languages and Automata

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications


Theory of Computations Spring 2016 Practice Final Exam Solutions

CSE450. Translation of Programming Languages. Automata, Simple Language Design Principles

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

Formal Languages and Compilers Lecture VI: Lexical Analysis

J. Xue. Tutorials. Tutorials to start in week 3 (i.e., next week) Tutorial questions are already available on-line

Dr. D.M. Akbar Hussain

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Midterm I review. Reading: Chapters 1-4

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

ON THE MECHANIZATION OF KLEENE ALGEBRA IN FORMAL LANGUAGE ANALYZER. Dmitry Cheremisinov

CS402 - Theory of Automata Glossary By

CSE Lecture 4: Scanning and parsing 28 Jan Nate Nystrom University of Texas at Arlington

SEM / YEAR : VI / III CS2352 PRINCIPLES OF COMPLIERS DESIGN UNIT I - LEXICAL ANALYSIS PART - A

CS415 Compilers. Lexical Analysis

CSE 431S Scanning. Washington University Spring 2013

CSc 453 Lexical Analysis (Scanning)

Midterm Exam II CIS 341: Foundations of Computer Science II Spring 2006, day section Prof. Marvin K. Nakayama

Regular Languages. Regular Language. Regular Expression. Finite State Machine. Accepts

Implementation of Lexical Analysis. Lecture 4

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

Conversion of Fuzzy Regular Expressions to Fuzzy Automata using the Follow Automata

Compiler Design. 2. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 21, 2010

Compiler Construction

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

1. Lexical Analysis Phase

Non-deterministic Finite Automata (NFA)

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

CSE 105 THEORY OF COMPUTATION

CSE 105 THEORY OF COMPUTATION

1 Finite Representations of Languages

Formal Languages. Formal Languages

Lexical Analysis. Finite Automata

UNIT -2 LEXICAL ANALYSIS

Torben./Egidius Mogensen. Introduction. to Compiler Design. ^ Springer

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Formal languages and computation models

Compiler course. Chapter 3 Lexical Analysis

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

Implementation of Lexical Analysis

Implementation of Lexical Analysis

Transcription:

Lexical Analysis Prof. James L. Frankel Harvard University Version of 5:37 PM 30-Jan-2018 Copyright 2018, 2016, 2015 James L. Frankel. All rights reserved.

Regular Expression Notation We will develop a notation called regular expressions to describe languages that can be built from an alphabet (letters, numbers, and other symbols) with certain operations applied potentially recursively applied Operators are Union designated by infix (vertical bar) Concatenation designated by sequential subexpressions Closure designated by a postfix superscript of either * or +

Alphabet & Strings An alphabet is any finite set of symbols A string over an alphabet is a finite sequence of symbols drawn from that alphabet s is the length of string s, i.e., the number of symbols in string s ε is the string of length zero, i.e., the empty string

Language, Union, and Concatenation Assume L is a language L is any countable set of strings over some fixed alphabet Includes the empty set, syntactically well-formed C programs, all grammatically correct English sentences This definition of language does not ascribe any meaning to strings in the language Union of languages Identical to the set operation of union A language formed from the union of two languages is composed of all strings in either the first language or the second language Concatenation of languages A language formed from the concatenation of two languages is composed of all strings formed by taking a string from the first language and a string from the second language, in all possible ways, and concatenating them

Closure Kleene Closure or, simply, Closure The set of strings that can be created by concatenating L zero or more times Denoted by L * L 0 (i.e., concatenation of L zero times) is defined to be { ε } L i, for i > 0, is defined to be L i-1 L Positive Closure Same as the Kleene Closure, but without L 0 The set of strings that can be created by concatenating L one or more times Denoted by L + ε is not in L + unless ε is in L itself

Regular Expression Σ is the alphabet L(r), where r is a regular expression, is the language denoted by r ε is a regular expression L(ε) is {ε}, that is, the language whose sole member is the empty string If a is a symbol in Σ, then a is a regular expression and L(a) = {a} For regular expressions r and s, (r) is a regular expression denoting L(r) (r) (s) is a regular expression denoting L(r) U L(s) (r)(s) is a regular expression denoting L(r)L(s) (r)* is a regular expression denoting (L(r))*

Precedence of Regular Expression Operators Highest precedence: Closure (or unary *) Next highest precedence: Concatenation Lowest precedence: Union or Alternation (or ) All operators are left associative

Algebraic Laws for Regular Expressions r s = s r is commutative r (s t) = (r s) t is associative r(st) = (rs)t concatenation is associative r(s t) = rs rt concatenation distributes over (s t)r = sr tr concatenation distributes over εr = rε = r ε is the identity for concatenation r* = (r ε)* ε is guaranteed in a closure r** = r* * is idempotent

Transition Diagrams Letter or digit * start letter other return( ) start Designated start state Accepting (final) state Edges labeled with symbol or set of symbols * Retract one position (symbol)

Nondeterministic Finite Automata (NFA) A finite set of states S. A set of input symbols Σ, the input alphabet. We assume that ε, which stands for the empty string, is never a member of Σ. A transition function that gives, for each state, and for each symbol in Σ U {ε} a set of next states. A state s 0 from S that is distinguished as the start state (or initial state). A set of states F, a subset of S, that is distinguished as the accepting states (or final states).

Nondeterministic Finite Automata (NFA) compared to Deterministic Finite Automata (DFA) NFA No restrictions on the labels on edges The same symbol can label several edges out of the same state ε, the empty string, is also a possible label DFA For each state, and for each symbol of its input alphabet, there can be exactly one edge with that symbol leaving that state No edge can be labeled with ε, the empty string Either an NFA or a DFA can be represented by a transition graph

Construction of an NFA from a Regular Expression Apply the McNaughton-Yamada-Thompson algorithm Apply the algorithm on constituent subexpressions

Construction of an NFA: expression ε For regular expression r = ε start i ε f

Construction of an NFA: expression a in Σ For regular expression r = a (in Σ) start i a f

Construction of an NFA: expression with union (that is, ) For regular expression r = s t N(s) and N(t) are NFAs for regular expressions s and t, respectively N(s) ε ε start i f ε ε N(t)

Construction of an NFA: expression with concatenation For regular expression r = st N(s) and N(t) are NFAs for regular expressions s and t, respectively start i N(s) N(t) f

Construction of an NFA: expression with closure (that is, *) For regular expression r = s* N(s) is an NFA for regular expression s ε start i ε N(s) ε f ε

Subset Construction of a DFA from an NFA We refer to the NFA as N and to the DFA as D D s states will be called Dstates D s transitions will be encoded in a transition table Dtran Each state of D is a set of NFA states

Subset Construction of a DFA from an NFA s refers to a single state of N T refers to a set of states of N ε-closure(s) is the set of NFA states reachable from NFA state s on ε- transitions alone ε-closure(t) is the set of NFA states reachable from some NFA state s in set T on ε-transitions alone move(t, a) is the set of NFA states to which there is a transition on input symbol a from some state in T

Computing ε-closure(t) push all states of T onto stack; initialize ε-closure(t) to T; while ( stack is not empty ) { pop t, the top element, off stack; for ( each state u with an edge from t to u labeled ε ) if ( u is not in ε-closure(t) ) { add u to ε-closure(t); push u onto stack; } }

Subset construction initially, ε-closure(s 0 ) is the only state in Dstates, and it is unmarked; while ( there is an unmarked state T in Dstates ) { mark T; for ( each input symbol a ) { U = ε-closure(move(t, a)); if ( U is not in Dstates) add U as an unmarked state in Dstates; Dtran[T, a] = U; } }

Example Construct an NFA from a regular expression Construct a DFA from an NFA