DVA337 HT17 - LECTURE 4. Languages and regular expressions

Similar documents
COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Languages and Strings. Chapter 2

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

Proof Techniques Alphabets, Strings, and Languages. Foundations of Computer Science Theory

Formal Languages and Automata

1.3 Functions and Equivalence Relations 1.4 Languages

Regular Expressions. Lecture 10 Sections Robb T. Koether. Hampden-Sydney College. Wed, Sep 14, 2016

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

Dr. D.M. Akbar Hussain

Chapter Seven: Regular Expressions

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

CMPSCI 250: Introduction to Computation. Lecture #28: Regular Expressions and Languages David Mix Barrington 2 April 2014

Formal Languages and Compilers Lecture VI: Lexical Analysis

CSE 105 THEORY OF COMPUTATION

Compiler Construction

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Finite Automata Part Three

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

CS402 Theory of Automata Solved Subjective From Midterm Papers. MIDTERM SPRING 2012 CS402 Theory of Automata

ECS 120 Lesson 7 Regular Expressions, Pt. 1

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

Lexical Analysis. Sukree Sinthupinyo July Chulalongkorn University

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Regular Expressions. Regular Expressions. Regular Languages. Specifying Languages. Regular Expressions. Kleene Star Operation

Lexical Analysis 1 / 52

Lexical Analysis (ASU Ch 3, Fig 3.1)

Glynda, the good witch of the North

Lecture 4: Syntax Specification

Multiple Choice Questions

Lexical Analysis - 1. A. Overview A.a) Role of Lexical Analyzer

Lexical Analysis. Lecture 3. January 10, 2018

Implementation of Lexical Analysis

CSE 105 THEORY OF COMPUTATION

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Lexical Analyzer Scanner

HKN CS 374 Midterm 1 Review. Tim Klem Noah Mathes Mahir Morshed

Lexical Analysis. Chapter 2

Automata Theory TEST 1 Answers Max points: 156 Grade basis: 150 Median grade: 81%

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016

CHAPTER TWO LANGUAGES. Dr Zalmiyah Zakaria

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Ambiguous Grammars and Compactification

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Lexical Analysis. Introduction

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Lexical Analyzer Scanner

Zhizheng Zhang. Southeast University

2. λ is a regular expression and denotes the set {λ} 4. If r and s are regular expressions denoting the languages R and S, respectively

Lexical Analysis. Lecture 2-4

Notes for Comp 454 Week 2

Implementation of Lexical Analysis

Languages and Finite Automata

Lexical Analysis. Lecture 3-4

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages

JNTUWORLD. Code No: R

CPS 506 Comparative Programming Languages. Syntax Specification

Finite Automata Part Three

Compiler Construction

Regular Languages and Regular Expressions

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Introduction to Lexical Analysis

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Non-deterministic Finite Automata (NFA)

MA/CSSE 474. Today's Agenda

Compilers CS S-01 Compiler Basics & Lexical Analysis

Skyup's Media. PART-B 2) Construct a Mealy machine which is equivalent to the Moore machine given in table.

Recursively Defined Functions

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Finite Automata Part Three

8 ε. Figure 1: An NFA-ǫ

Homework. Context Free Languages. Before We Start. Announcements. Plan for today. Languages. Any questions? Recall. 1st half. 2nd half.

Languages and Compilers

Compilers CS S-01 Compiler Basics & Lexical Analysis

UNIT I PART A PART B

Chapter Seven: Regular Expressions. Formal Language, chapter 7, slide 1

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Structure of Programming Languages Lecture 3

AUBER (Models of Computation, Languages and Automata) EXERCISES

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Lexical Analysis. Finite Automata

CS402 - Theory of Automata FAQs By

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

Compiler Construction

2. Lexical Analysis! Prof. O. Nierstrasz!

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Announcements. CS243: Discrete Structures. Strong Induction and Recursively Defined Structures. Review. Example (review) Example (review), cont.

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

University of Nevada, Las Vegas Computer Science 456/656 Fall 2016

Learn Smart and Grow with world

Lexical Analysis. Finite Automata

Caveat lector: This is the first edition of this lecture note. Please send bug reports and suggestions to

Today. Assignments. Lecture Notes CPSC 326 (Spring 2019) Quiz 2. Lexer design. Syntax Analysis: Context-Free Grammars. HW2 (out, due Tues)

Slides for Faculty Oxford University Press All rights reserved.

Name: Finite Automata

TOPIC PAGE NO. UNIT-I FINITE AUTOMATA

Transcription:

DVA337 HT17 - LECTURE 4 Languages and regular expressions 1

SO FAR 2

TODAY Formal definition of languages in terms of strings Operations on strings and languages Definition of regular expressions Meaning of regular expressions in terms of languages Outlook: practical use of regular expressions 3

LANGUAGES alphabets, strings, and languages 4

LANGUAGE How can we define what a (formal) language is? 5

LANGUAGE We define a language to be a set of strings over an alphabet Σ An alphabet is a set of symbols, e.g., { a, b, c,..., z } A string over an alphabet Σ is a sequence of symbols from the alphabet What is the alphabet for the language L = { apple, pear, 1911 } L = { x : x is a binary string } 6

EXERCISE, LANGUAGE How can we define the alphabet and language for 1) the programming language C? 2) written English? 7

STRINGS 8

STRINGS For a 1, a 2,..., a n Σ the sequence a 1 a 2... a n is a string over Σ The empty string is written λ What are the strings over Σ = { a, b }? Let u, v, w denote strings 9

CONCATENATION For u = a 1 a 2... a n and v = b 1 b 2... b n what is the concatenation of u and v, written uv? 10

PREFIX AND SUFFIX For a string w = u v u is a prefix v is a suffix All prefixes and suffixes for abbab? 11

SUBSTRING For a string w = u 1 v u 2 v is a substring prefix and suffix special cases of substring Substrings of abbab? 12

LENGTH For string w = a 1 a 2... a n, the length is n, w = n λ = 0 abbab = 5 How can we define length recursively? 13

PROOF BY INDUCTION Induction over natural numbers To show that a property P holds for all natural numbers, n N. P(n), show A base case, e.g., P(0) An inductive step, n N. P(n) P(n+1) Why can we conclude n N. P(n) from this? 14

EXERCISE, LENGTH OF CONCATENATION What is u v? Can we prove it? 15

LENGTH OF CONCATENATION Theorem: u v = u + v Proof: By induction on the length of v. 16

REVERSE For a string w = a 1 a 2... a n what is the reverse w R of w? What is a palindrome? 17

REPETITION Let w n be w repeated n times, w w... w Can you write a recursive definition of w n? 18

Σ N, STRINGS OF LENGTH N Let Σ n be the set of strings of length n over Σ For Σ = {a, b} Σ 0 = { λ } Σ 1 = { a, b } Σ 2 = { aa, ab, ba, bb } How can we define Σ n? 19

Σ*, KLEENE CLOSURE Σ* is the set of all strings over Σ {a,b}* = { λ, a, b, aa, bb, ab, ba, aaa, bbb,... } How can we define Σ*? 20

Σ*, KLEENE CLOSURE We have that Σ* = Σ 0 Σ 1..., where Σ 0 = { λ } Σ n+1 = { x y : x Σ, y Σ n } Can we use this to define Σ*? as a fixpoint to F(S) S for some F? 21

Σ +, POSITIVE CLOSURE Let Σ + = Σ 1 Σ 2... How can we define the positive closure? 22

EXERCISE For Σ = {a, b} what is the cardinality of Σ 3? In general, what is the cardinality of Σ n? For Σ as below, give Σ* and Σ + Σ = { 0, 1 } Σ = { a } Σ = { } 23

EXERCISE Prove that Σ n = Σ n 24

LANGUAGES 25

LANGUAGE A language L is a set of strings over an alphabet Σ A language L is a subset of Σ* For Σ = { a, b } Σ* = { λ, a, b, aa, ab, ba, bb, aaa, aab,... } Examples of languages over Σ? 26

EXERCISE What is P(Σ*)? 27

SET OPERATIONS ON LANGUAGES Since language are sets, the standard set operations apply. For L 1 = {a, b, aaa} and L 2 = {bb, ab}, what is L 1 L 2 L 1 L 2 L 1 L 2 What is the complement of a language, L 28

REVERSAL AND CONCATENATION Reversal and concatenation carry over from strings in the natural way Reversal, L R = { w R : w L } { ab, aab, baba } R {a n b n : n 0 } Concatenation, L 1 L 2 = { u v : u L 1, v L 2 } { ab, aab, baba }{b,aa} 29

REPETITION With concatenation of languages defined, we can define repetition L 0 = { λ } L n+1 = { u v : u L, v L n } For L = { a n b n : n 0} what is L 2? what is L 0? 30

CLOSURES With repetition we can define Kleene closure and positive closure for languages L* = L 0 L 1... L + = L 1 L 2... What is L* in words? If L* = L we say that L is Kleene closed Is C Kleene closed? 31

SUMMARY An alphabet, Σ, is a set of symbols A string is a sequence of symbols concatenation, reverse, length, substring, prefix, suffix, repetition Kleene closure Σ*, and positive closure Σ + A language over Σ is a set of strings; a subset of Σ * union, intersection, difference, complement reverse, concatenation, repetition Kleene closure L *, and positive closure L + (c.f., Σ * and Σ + ) 32

WHY IS THIS USEFUL? Broad definition: any set of strings on an alphabet is a language Methods of defining language grammars Methods of deciding membership in languages How to answer the questions if a given string is in a given language Can membership always be decided? 33

REGULAR EXPRESSIONS 34

REGULAR EXPRESSIONS, λ, and any α Σ are primitive regular expressions If r 1 and r 2 are regular expressions, then so are r 1 + r 2, r 1 r 2, r*, and (r) 35

EXERCISE Is (a + bc)*(c+λ) a regular expression? Is (a + b +) a regular expression? 36

INTUITIVE MEANING Each regular expression over Σ defines a language over Σ think in terms of matching, λ, and any α Σ are primitive regular expressions If r 1 and r 2 are regular expressions, then so are r 1 + r 2, r 1 r 2, r*, and (r) 37

EXAMPLE What is the language defined by a + b? What is the language defined by (ab)*? Exercise, what is the language defined by (a + bc)*(c+λ)? 38

LANGUAGE DEFINED BY REGULAR EXPRESSIONS How can we define the language of a regular expression more formally? Can we build a recursive function, L(r) that defines the language of a regular expression r? Remember a language is a set of strings we have defined operations on languages: union, concatenation, Kleene star 39

EXAMPLE What is L((a + b)a*)? 40

EXERCISE What is the language defined by (a+b)*(a+bb) 41

ON PRECEDENCE What is the language defined by (a + b)a What is the language defined by a + (ba) Which one is a + ba? 42

EXERCISE What is the language defined by (aa)*(bb)*b? 43

EXAMPLE Create a regular expression over Σ = { 0, 1 } that defines the language where all strings have at least two consecutive 0s 001 L 010 L 44

EXERCISE Construct the regular expression over { 0, 1 } where no string has two consecutive 0s. 010 L 001 L 45

EQUIVALENCE OF REGULAR EXPRESSIONS Two regular expressions are equivalent if they define the same language L = { all strings over {0, 1} without consecutive 0 } r 1 = (1+01)*(0+λ) r 2 = (1+011*)*(0+λ)+1*(0+λ) Since L = L(r 1 ) = L(r 2 ) we have that r 1 and r 2 are equivalent. Can we prove that L(r 1 ) = L(r 2 ) in some way? 46

REGULAR EXPRESSIONS IN REALITY Slightly richer alphabet and language than what we saw here, e.g., quantifiers: *, +,?, {m}, {m,}, {m,n}, atoms: char, [chars],., ^, $, \char Example uses Lexical analysis - tokenization preceding parsing Text search grep/egrep (unix) Search for gr(a e)y ^[-+]?[0-9]*\.?[0-9]+$ 47

REGULAR EXPRESSIONS IN COMPILERS The programmer creates a program The lexer splits the program text into a stream of tokens and removes white space Literals: 1, 1.32, Hello World! Keywords: if, while, Variables: c, y, counter, The token stream is passed to the parser that creates a parse tree, which is used by the next step of the compiler this simplifies the parse as it can work on tokens rather than on characters. Text Tokens Binary Lexer Parser 48

PARTS OF EXAMPLE PASCAL LEXER white_space [ \t]* digit [0-9] alpha alpha_num hex_digit identifier unsigned_integer hex_integer exponent i real string [A-Za-z_] ({alpha} {digit}) [0-9A-F] {alpha}{alpha_num}* {digit}+ ${hex_digit}{hex_digit}* e[+-]?{digit}+ {unsigned_integer} ({i}\.{i}? {i}?\.{i}){exponent}? \'([^'\n] \'\')+\ and array begin return(and); return(array); return(_begin); 49

EXAMPLE TOKENIZATION Consider the following PASCAL program Program Lesson1_Program1; Begin Write('Hello World'); Readln; End. Which would produce the following token stream PROGRAM IDENTIFIER BEGIN IDENTIFIER ( STRING ) ; IDENTIFIER ; END. Note that the tokens are represented by integers and tokens like IDENTIFIER and STRING carry the actual string representing the token. 50

REGULAR LANGUAGES Topic for the next few lectures Ways of defining regular language Regular Expressions (RE) Regular grammars Ways of deciding membership in regular languages DFA and NFA Equivalence of the approaches DFA NFA RE 51

REGULAR LANGUAGES Regular Expression DFA Regular Language NFA Regular Grammar 52

DO THE EXERCISES! Exercise material on the homepage exercises similar to what will be on exam If you get stuck ask a friend ask me If several of you have issues with one we ll add it to a lecture. 53