Definition of Regular Expression

Similar documents
Finite Automata. Lecture 4 Sections Robb T. Koether. Hampden-Sydney College. Wed, Jan 21, 2015

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Fig.25: the Role of LEX

Lexical Analysis: Constructing a Scanner from Regular Expressions

Dr. D.M. Akbar Hussain

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

Topic 2: Lexing and Flexing

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 4: Lexical Analyzers 28 Jan 08

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών. Lecture 3b Lexical Analysis Elias Athanasopoulos

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

Deterministic. Finite Automata. And Regular Languages. Fall 2018 Costas Busch - RPI 1

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata

Lexical analysis, scanners. Construction of a scanner

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis

Compilers Spring 2013 PRACTICE Midterm Exam

CS 430 Spring Mike Lam, Professor. Parsing

CSCE 531, Spring 2017, Midterm Exam Answer Key

Languages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) *

CS 340, Fall 2016 Sep 29th Exam 1 Note: in all questions, the special symbol ɛ (epsilon) is used to indicate the empty string.

Reducing a DFA to a Minimal DFA

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries

CS 340, Fall 2014 Dec 11 th /13 th Final Exam Note: in all questions, the special symbol ɛ (epsilon) is used to indicate the empty string.

Principles of Programming Languages

ASTs, Regex, Parsing, and Pretty Printing

Example: Source Code. Lexical Analysis. The Lexical Structure. Tokens. What do we really care here? A Sample Toy Program:

Theory of Computation CSE 105

Assignment 4. Due 09/18/17

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

Midterm I Solutions CS164, Spring 2006

We use L i to stand for LL L (i times). It is logical to define L 0 to be { }. The union of languages L and M is given by

CMPSC 470: Compiler Construction

TO REGULAR EXPRESSIONS

CSE 401 Midterm Exam 11/5/10 Sample Solution

Lexical Analysis and Lexical Analyzer Generators

Should be done. Do Soon. Structure of a Typical Compiler. Plan for Today. Lab hours and Office hours. Quiz 1 is due tonight, was posted Tuesday night

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

ECE 468/573 Midterm 1 September 28, 2012

COMP 423 lecture 11 Jan. 28, 2008

Quiz2 45mins. Personal Number: Problem 1. (20pts) Here is an Table of Perl Regular Ex

2014 Haskell January Test Regular Expressions and Finite Automata

Lexical Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

CS 241 Week 4 Tutorial Solutions

CS 241. Fall 2017 Midterm Review Solutions. October 24, Bits and Bytes 1. 3 MIPS Assembler 6. 4 Regular Languages 7.

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.

Fall Compiler Principles Lecture 1: Lexical Analysis. Roman Manevich Ben-Gurion University of the Negev

Compiler Construction D7011E

Homework. Context Free Languages III. Languages. Plan for today. Context Free Languages. CFLs and Regular Languages. Homework #5 (due 10/22)

Scanner Termination. Multi Character Lookahead

LR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table

Recognition of Tokens

10.5 Graphing Quadratic Functions

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFORMATICS 1 COMPUTATION & LOGIC INSTRUCTIONS TO CANDIDATES

Compilation

Fall Compiler Principles Lecture 1: Lexical Analysis. Roman Manevich Ben-Gurion University

Sample Midterm Solutions COMS W4115 Programming Languages and Translators Monday, October 12, 2009

Regular Expression Matching with Multi-Strings and Intervals. Philip Bille Mikkel Thorup

Scanner Termination. Multi Character Lookahead. to its physical end. Most parsers require an end of file token. Lex and Jlex automatically create an

CS 321 Programming Languages and Compilers. Bottom Up Parsing

CMSC 331 First Midterm Exam

Context-Free Grammars

LEX5: Regexps to NFA. Lexical Analysis. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class

CMPT 379 Compilers. Lexical Analysis

acronyms possibly used in this test: CFG :acontext free grammar CFSM :acharacteristic finite state machine DFA :adeterministic finite automata

Information Retrieval and Organisation

Regular Expressions and Automata using Miranda

Lecture T1: Pattern Matching

What are suffix trees?

Typing with Weird Keyboards Notes

MTH 146 Conics Supplement

CS201 Discussion 10 DRAWTREE + TRIES

Applied Databases. Sebastian Maneth. Lecture 13 Online Pattern Matching on Strings. University of Edinburgh - February 29th, 2016

Suffix trees, suffix arrays, BWT

CS481: Bioinformatics Algorithms

COS 333: Advanced Programming Techniques

CS311H: Discrete Mathematics. Graph Theory IV. A Non-planar Graph. Regions of a Planar Graph. Euler s Formula. Instructor: Işıl Dillig

1.1. Interval Notation and Set Notation Essential Question When is it convenient to use set-builder notation to represent a set of numbers?

Context-Free Grammars

Matrices and Systems of Equations

Lecture T4: Pattern Matching

2 Computing all Intersections of a Set of Segments Line Segment Intersection

a(e, x) = x. Diagrammatically, this is encoded as the following commutative diagrams / X

COS 333: Advanced Programming Techniques

CIS 1068 Program Design and Abstraction Spring2015 Midterm Exam 1. Name SOLUTION

Operator Precedence. Java CUP. E E + T T T * P P P id id id. Does a+b*c mean (a+b)*c or

12 <= rm <digit> 2 <= rm <no> 2 <= rm <no> <digit> <= rm <no> <= rm <number>

Section 10.4 Hyperbolas

A dual of the rectangle-segmentation problem for binary matrices

COMBINATORIAL PATTERN MATCHING

The Fundamental Theorem of Calculus

LING/C SC/PSYC 438/538. Lecture 21 Sandiway Fong

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016

Suffix Tries. Slides adapted from the course by Ben Langmead

12-B FRACTIONS AND DECIMALS

Section 3.1: Sequences and Series

COMPUTER SCIENCE 123. Foundations of Computer Science. 6. Tuples

Mid-term exam. Scores. Fall term 2012 KAIST EE209 Programming Structures for EE. Thursday Oct 25, Student's name: Student ID:

SOME EXAMPLES OF SUBDIVISION OF SMALL CATEGORIES

Lecture 7: Integration Techniques

Transcription:

Definition of Regulr Expression After the definition of the string nd lnguges, we re redy to descrie regulr expressions, the nottion we shll use to define the clss of lnguges known s regulr sets. Recll tht token is either single string (such s punctution symol) or one of collection of string of certin type (such s n identifier). If we view the set of strings in ech token clss s lnguge, we cn use the regulr-expression nottion to descrie tokens. s:- In regulr expression nottion we could write the definition for identifier Identifier= letter (letter digit) * The verticl r mens "or" tht is union, the prentheses re used to group su expressions, nd the str is the closure opertor mening "zero or more instnces". Wht we cll the regulr expression over lphet Σ re exctly those expressions tht cn e constructed from the following rules. Ech regulr expression denotes lnguge nd we gives the rules for construction of the denoted lnguges long with the regulr-expression construction rules. 1- ε Is regulr expression denoting {ε}, tht is, the lnguge consisting only the empty string. 2- For ech in Σ, is regulr expression denoting {}, the lnguge with only one string, tht string consisting of the single symol. - If R nd S re regulr expression denoting lnguge L R nd L S, respectively, then:-

i) (R) (S) is regulr expression denoting L R U L S ii) (R). (S) is regulr expression denoting L R. L S iii) (R) * is regulr expression denoting L * R We hve shown regulr expression formed with prentheses whenever possile. In fct, we eliminte them when we cn, using the precedence rules tht * hs highest precedence, then comes., nd hs lowest precedence. Let us ssume tht our lphet Σ is {, }. The regulr expression denotes {}, which is different from just the string. 1- The regulr expression * denotes the closure of the lnguge {}, tht is * =U{ i } The set of ll strings of zero or more 's. The regulr expression *, which y our precedence rules is prsed ()*, denote the strings of one or more 's. We my use + for * i=0 2- Wht does the regulr expression ( )* denote? We see tht denotes {, }, the lnguge with two string nd. Thus ( )* denote U{, } i i=0 Which is just the set of ll string of 's nd 's including the empty string. The regulr expression (**)* denote the sme set. - The expression * is grouped ( ()*), nd denotes the set of strings consisting of either single "" or "" followed y zero or more 's.

4- The expression denotes ll strings of length two, so ( )* denotes ll strings of even length. Note tht ε is string of length zero. 5- ε denotes strings of length zero or one. Exmple: The token discussed in fig (5), cn e descried y regulr expression s follows: Keyword=BEGIN END IF THEN ELSE Identifier=letter (letter digit)* Constnt=digit* Relops= < <= = < > > >= Where letter stnds for A B Z, nd digit stnds for 0 1 9. If two regulr expression R nd S denote the sme lnguge, we write R=S, nd sy tht R nd S re equivlent. For exmple, we previously oserved tht ( )*= (**)*. For ny regulr expression R, S nd T, the following xioms hold:- 1- R S= S R ( is commuttive) 2- R (S T)=(R S) T ( is ssocitive) - R (ST) = (RS) T (. is ssocitive) 4- R(S T) = RS RT nd (S T) R= SR TR (. distriutes over 1) 5- εr=rε=r (ε is the identity for conctention) Finite Automt

A recognizer for lnguge L is progrm tkes s input string x nd nswer "yes" if x is sentence of L on "no" otherwise. Clerly, the prt of lexicl nlyzer tht identifies the presence of token on the input is recognizer for the lnguge defining tht token. Suppose we hve specific lnguge y regulr expression R, nd we re given some string x. We wnt to know whether x is in the lnguge L denoted y R. One wy to ttempt this test is to check tht x cn e decomposed into sequence of sustrings denoted y the primitive su expressions in R. Suppose R is ( )*, the set of ll strings ending in nd x is the string. We see tht R=R 1 R 2, where R 1 = ( )* nd R 2 =. We cn verify tht is n element of the lnguge denoted y R 1 nd tht similrly mtch R 2. In this wy, we show tht is in the lnguge denoted y R. Non Deterministic Automt A etter wy to convert regulr expression to recognizer is to construct generlized trnsition digrm from the expression. This digrm is clled nondeterministic finite utomt. A nondeterministic finite utomt recognizing the lnguge ( )* is shown in fig (7). Strt 0 1 2 The NFA is leled directed grph. The nodes re clled sttes. nd the leled edges re clled Fig (7) A nondeterministic Finite Automt trnsitions. The NFA looks lmost like trnsition

digrm, ut edges cn e leled y ε s well s chrcters, nd the some chrcter cl lel two or more trnsitions out of one stte. One stte (0 in fig (7)) is distinguished s the strt stte, nd one or more sttes my e distinguished s ccepting sttes (or finl sttes). Stte in fig (7) is the finl stte. The trnsitions of n NFA cn e conveniently represented in tulr form y mens of trnsition tle. The trnsition tle for the NFA of fig (7) is shown in fig (8). In the trnsition tle there is row for ech stte nd column for ech input symol. The entry for row 1 nd symol is the set of possile next stte for stte 1 on input Stte Input symol Fig (8) Trnsition Tle The NFA ccepts n input string x if nd only if there is pth from the strt stte to some ccepting stte, such tht lels long tht pth spell out x. If the input string is, then we cn show this sequence of moves:- 0 {0,1} {0} 1 ---- {2} 2 ---- {} Stte Remining input 0 0 1 2 Fig (9) ε

In Fig (9) elow we cn see n NFA to recognize * *. String is ccepted y going through sttes 0, 1, 2, 2, nd 2. The lels of these edges re ε,, nd, whose conctention is. ε 1 2 Strt 0 ε 4 Fig (9) NFA ccepting * *. Deterministic Automt The NFA shown in fig (8) hs more thn one trnsition from stte 0 on input, tht is, it my go to stte 0 or 1. Similrly, the NFA of fig (9) hs two trnsitions on ε from stte 0. These situtions re the reson why it is hrd to simulte n NFA with computer progrm. The deterministic finite utomt hs t most one pth from the strt stte leled y ny string. The finite utomton is deterministic if 1- It hs no trnsitions on input ε 2- For ech stte s nd input symol, there is t most one edge leled leving s.

Exmple: in fig (10) elow we see deterministic finite utomt (DFA) ccepting the lnguge ( )*, which is the sme lnguge s tht ccepted y the NFA of fig (7) Strt 0 1 2 Fig (10) DFA ccepting ( )* Since there is t most one trnsition out of ny stte on ny symol, DFA is esier to simulte y progrm thn n NFA. How to Build Lexicl Anlyzer Step1 Convert the Grmmr into Trnsition Digrm. Step2 Convert the Regulr Expression into Nondeterministic Finite Stte Automt. Step Convert the NFA into DFA. Step4 Minimize Finite Stte Automt. Step5 Write n efficient progrm for the minimized finite stte utomt, clled (minimized finite stte utomt recognizer).