Regular Expressions. Regular Expressions. Regular Languages. Specifying Languages. Regular Expressions. Kleene Star Operation

Similar documents
ECS 120 Lesson 7 Regular Expressions, Pt. 1

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Regular Languages. Regular Language. Regular Expression. Finite State Machine. Accepts

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

DVA337 HT17 - LECTURE 4. Languages and regular expressions

Dr. D.M. Akbar Hussain

Decision, Computation and Language

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

COP 3402 Systems Software Syntax Analysis (Parser)

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

SEM / YEAR : VI / III CS2352 PRINCIPLES OF COMPLIERS DESIGN UNIT I - LEXICAL ANALYSIS PART - A

We use L i to stand for LL L (i times). It is logical to define L 0 to be { }. The union of languages L and M is given by

CSE 105 THEORY OF COMPUTATION

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Chapter Seven: Regular Expressions. Formal Language, chapter 7, slide 1

Regular Expressions. Lecture 10 Sections Robb T. Koether. Hampden-Sydney College. Wed, Sep 14, 2016

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

The Language for Specifying Lexical Analyzer

Compilers CS S-01 Compiler Basics & Lexical Analysis

CSE 105 THEORY OF COMPUTATION

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

CSCE 314 Programming Languages

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Lexical Analysis (ASU Ch 3, Fig 3.1)

Structure of Programming Languages Lecture 3

Lexical Analysis - 1. A. Overview A.a) Role of Lexical Analyzer

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Lexical Analysis. Introduction

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

Compilers CS S-01 Compiler Basics & Lexical Analysis

Introduction to Lexical Analysis

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

CS308 Compiler Principles Lexical Analyzer Li Jiang

CS S-01 Compiler Basics & Lexical Analysis 1

1. Lexical Analysis Phase

8 ε. Figure 1: An NFA-ǫ

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

COL728 Minor1 Exam Compiler Design Sem II, Answer all 5 questions Max. Marks: 20

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

Formal Languages and Compilers Lecture VI: Lexical Analysis

UVa ID: NAME (print): CS 4501 LDI Midterm 1

Chapter Seven: Regular Expressions

UNIT II LEXICAL ANALYSIS

Lexical Analysis. Lecture 3. January 10, 2018

Introduction to Lexical Analysis

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2004 Columbia University Department of Computer Science

Part 5 Program Analysis Principles and Techniques

Lexical Analysis. Prof. James L. Frankel Harvard University

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np

Introduction to Lexing and Parsing

Today. Assignments. Lecture Notes CPSC 326 (Spring 2019) Quiz 2. Lexer design. Syntax Analysis: Context-Free Grammars. HW2 (out, due Tues)

CS402 Theory of Automata Solved Subjective From Midterm Papers. MIDTERM SPRING 2012 CS402 Theory of Automata

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2003 Columbia University Department of Computer Science

Building Compilers with Phoenix

Programming Languages and Compilers (CS 421)

Group A Assignment 3(2)

n (0 1)*1 n a*b(a*) n ((01) (10))* n You tell me n Regular expressions (equivalently, regular 10/20/ /20/16 4

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Languages and Strings. Chapter 2

2068 (I) Attempt all questions.

Regular Expressions. Chapter 6

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Implementation of Lexical Analysis

Compiler Construction LECTURE # 3

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

B The SLLGEN Parsing System

Compiler phases. Non-tokens

CS164: Midterm I. Fall 2003

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

CMPSCI 250: Introduction to Computation. Lecture #28: Regular Expressions and Languages David Mix Barrington 2 April 2014

Monday, August 26, 13. Scanners

Wednesday, September 3, 14. Scanners

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Chapter 2 - Programming Language Syntax. September 20, 2017

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

Languages and Compilers

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

Introduction to Parsing. Lecture 5

Torben./Egidius Mogensen. Introduction. to Compiler Design. ^ Springer

2. Lexical Analysis! Prof. O. Nierstrasz!

Lexical Analysis. Chapter 2

Outline CS4120/4121. Compilation in a Nutshell 1. Administration. Introduction to Compilers Andrew Myers. HW1 out later today due next Monday.

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

Lexical Analysis. Sukree Sinthupinyo July Chulalongkorn University

Regular Expression Module-2

CS 403: Scanning and Parsing

CS402 - Theory of Automata FAQs By

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

CPSC 434 Lecture 3, Page 1

Multiple Choice Questions

Transcription:

Another means to describe languages accepted by Finite Automata. In some books, regular languages, by definition, are described using regular. Specifying Languages Recall: how do we specify languages? If language is finite, you can list all of its strings. L = {a, aa, aba, aca} Descriptive: L = {x n a (x) = n b (x)} Using basic Language operations L= {aa, ab} * {b}{bb} * Regular languages are described using this last method Regular Languages A regular expression describes a language using only the set operations of: Union Concatenation Kleene Star Kleene Star Operation The set of strings that can be obtained by concatenating any number of elements of a language L is called the Kleene Star, L * * L = U i= 0 i 0 1 2 3 4 L = L L L L L... Note that since, L * contains L 0, λ is an element of L * Regular are the mechanism by which regular languages are described: Take the set operation definition of the language and: Replace with + Replace {} with () And you have a regular expression 1

Regular {λ} λ {011} 011 {0,1} 0 + 1 {0, 01} 0 + 01 {110} * {0,1} (110) * (0+1) {10, 11, 01} * (10 + 11 + 01) * {0, 11} * ({11} * {101, λ}) (0 + 11) * ((11) * + 101 + λ) Regular Expression Recursive definition of regular languages / expression over Σ : 1. is a regular language and its regular expression is 2. {λ} is a regular language and λ is its regular expression 3. For each a Σ, {a} is a regular language and its regular expression is a Regular Expression 4. If L 1 and L 2 are regular languages with regular r 1 and r 2 then -- L 1 L 2 is a regular language with regular expression (r 1 + r 2 ) -- L 1 L 2 is a regular language with regular expression (r 1 r 2 ) -- book uses (r 1 r 2 ) -- L 1 * is a regular language with regular expression (r 1* ) -- Regular can be parenthesized to indicate operator precidence I.e. (r 1 ) Some shorthand If we apply precedents to the operators, we can relax the full parenthesized definition: Kleene star has highest precedent Concatenation had mid precedent + has lowest precedent Thus a + b * c is the same as (a + ((b * )c)) (a + b) * is not the same as a + b * Only languages obtainable by using rules 1-4 are regular languages. More shorthand Equating regular. Two regular are considered equal if they describe the same language 1 * 1 * = 1 * (a + b) * a + b * Even more shorthand Sometimes you might see in the book: r n where n indicates the number of concatenations of r (e.g. r 6 ) r + to indicate one or more concatenations of r. Note that this is only shorthand! r 6 and r + are not regular. 2

Important thing to remember A regular expression is not a language A regular expression is used to describe a language. Questions? It is incorrect to say that for a language L, L = (a + b + c) * But it s okay to say that L is described by (a + b + c) * All finite languages can be described by regular Can anyone tell me why? All finite languages can be described using regular A finite language L can be expressed as the union of languages each with one string corresponding to a string in L Example: L = {a, aa, aba, aca} L = {a} {aa} {aba} {aca} Regular expression: (a + aa + aba + aca) L = {x {0,1} * x is even} Any string of even length can be obtained by concatenating strings length 2. Any concatenation of strings of length 2 will be even L = {00, 01, 10, 11} * Regular describing L: (00 + 01 + 10 + 11) * ((0 + 1)(0 + 1)) * L = {x {0,1} * x does not end in 01 } If x does not end in 01, then either x < 2 or x ends in 00, 10, or 11 A regular expression that describes L is: λ + 0 + 1 + (0 + 1) * (00 + 10 + 11) 3

L = {x {0,1} * x contains an odd number of 0s } Express x = yz y is a string of the form y=1 i 01 j In z, there must be an even number of additional 0s or z = (01 k 01 m ) * x can be described by (1 * 01 * )(01 * 01 * ) * Questions? Useful properties of regular Commutative L + M = M + L Associative (L + M) + N = L + (M + N) (LM)N = L(MN) Identities + L = L + = L λl = L λ = L L = L = Useful properties of regular Distributed L (M + N) = LM + LN (M + N)L = ML + NL Idempotent L + L = L Useful properties of regular Closures (L * ) * = L * * = λ λ * = λ L + = LL * L * = L + + λ Questions? Practical uses for regular grep Global (search for) and Print Finds patterns of characters in a text file. grep man foo.txt grep [ab]*c[de]? foo.txt Source file Practical uses for regular How a compiler works lexer Stream of parser Parse tokens codegen Tree Object code 4

Practical uses for regular How a compiler works The Lexical Analyzer (lexer) reads source code and generates a stream of tokens What is a token? Identifier Keyword Number Operator Punctuation Practical uses for regular How a compiler works Tokens can be described using regular! L = set of valid C keywords This is a finite set L can be described by if + then + else + while + do + goto + break + switch + L = set of valid C identifiers A valid C identifier begins with a letter or _ A valid C identifier contains letters, numbers, and _ If we let: l = {a, b,, z, A, B,, Z} d = {1, 2,, 9, 0} Then a regular expression for L: (l + _)(l + d + _) * Practical uses for regular lex Program that will create a lexical analyzer. Input: set of valid tokens Tokens are given by regular. Questions? Summary Regular languages can be expressed using only the set operations of union, concatenation, Kleene Star. Regular languages Means of describing: Regular Expression Machine for accepting: Finite Automata Practical uses Text search (grep) Compilers / Lexical Analysis (lex) Questions? 5

For next time Chicken or the egg? Which came first, the regular expression or the finite automata? McCulloch/Pitts -- used finite automata to model neural networks (1943) Kleene (mid 1950s) -- Applied to regular sets Ken Thompson/ Bell Labs folk (1970s) -- QED / ed / grep / lex / awk / Recall: Princeton dudes (1937) The bottom line Regular and finite automata are equivalent in their ability to describe languages. Every regular expression has a FA that accepts the language it describes The language accepted by an FA can be described by some regular expression. The Kleene Theorem! (1956) But that s next time. One last note: Apps Using regular 6