Regular Expressions. A Language for Specifying Languages. Thursday, September 20, 2007 Reading: Stoughton 3.1

Similar documents
Equivalence and Simplification of Regular Expressions

1 Finite Representations of Languages

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Regular Expressions. Chapter 6

Dr. D.M. Akbar Hussain

Slides for Faculty Oxford University Press All rights reserved.

Chapter Seven: Regular Expressions

Regular Languages and Regular Expressions

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

Formal Languages and Automata Theory, SS Project (due Week 14)

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np

Languages and Compilers

Chapter Seven: Regular Expressions. Formal Language, chapter 7, slide 1

CSE 105 THEORY OF COMPUTATION

Bottom-Up Parsing. Lecture 11-12

Lecture 6,

1. Draw the state graphs for the finite automata which accept sets of strings composed of zeros and ones which:

Lexical Analysis. Prof. James L. Frankel Harvard University

2.8. Connectedness A topological space X is said to be disconnected if X is the disjoint union of two non-empty open subsets. The space X is said to

6 NFA and Regular Expressions

Zhizheng Zhang. Southeast University

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

CMSC 132: Object-Oriented Programming II

ECS 120 Lesson 7 Regular Expressions, Pt. 1

3.15: Applications of Finite Automata and Regular Expressions. Representing Character Sets and Files

Decision Properties for Context-free Languages

Principles of Programming Languages COMP251: Syntax and Grammars

Problem Set 6 (Optional) Due: 5pm on Thursday, December 20

Regular Expression Module-2

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

SC/MATH Boolean Formulae. Ref: G. Tourlakis, Mathematical Logic, John Wiley & Sons, York University

Abstract Syntax Trees L3 24

Regular Expressions & Automata

Lecture 1. Types, Expressions, & Variables

This book is licensed under a Creative Commons Attribution 3.0 License

Compilers CS S-01 Compiler Basics & Lexical Analysis

SEM / YEAR : VI / III CS2352 PRINCIPLES OF COMPLIERS DESIGN UNIT I - LEXICAL ANALYSIS PART - A

Lexical Analysis. Lecture 3-4

Languages and Strings. Chapter 2

Caveat lector: This is the first edition of this lecture note. Please send bug reports and suggestions to

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

THREE LECTURES ON BASIC TOPOLOGY. 1. Basic notions.

COMPILER DESIGN UNIT I LEXICAL ANALYSIS. Translator: It is a program that translates one language to another Language.

MA513: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 18 Date: September 12, 2011

Compiler phases. Non-tokens

2 Regular Languages. 2.1 Languages

Problem Set 2 Solutions

Chapter 3: Propositional Languages

Decision, Computation and Language

Introduction to Lexical Analysis

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Compilers CS S-01 Compiler Basics & Lexical Analysis

Proof Techniques Alphabets, Strings, and Languages. Foundations of Computer Science Theory

Introduction to Automata Theory. BİL405 - Automata Theory and Formal Languages 1

DVA337 HT17 - LECTURE 4. Languages and regular expressions

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

CS3102 Theory of Computation Problem Set 2, Spring 2011 Department of Computer Science, University of Virginia

Introduction to Lexical Analysis

CS S-01 Compiler Basics & Lexical Analysis 1

if s has property P and c char, then c s has property P.

Compiler Construction

CS 6110 S14 Lecture 1 Introduction 24 January 2014

8 ε. Figure 1: An NFA-ǫ

Lecture 7: Deterministic Bottom-Up Parsing

CS4215 Programming Language Implementation. Martin Henz

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Lexical Analysis. Lecture 2-4

(Pre-)Algebras for Linguistics

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

University of Nevada, Las Vegas Computer Science 456/656 Fall 2016

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

SLR parsers. LR(0) items

Lexical Analysis. Chapter 2

Ambiguous Grammars and Compactification

CSCE 314 Programming Languages

Lecture 5: Formation Tree and Parsing Algorithm

Automata Theory CS S-FR Final Review

CHAPTER TWO LANGUAGES. Dr Zalmiyah Zakaria

Structure of Programming Languages Lecture 3

CSE 20 DISCRETE MATH. Winter

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

Context-free grammars

2. Functions, sets, countability and uncountability. Let A, B be sets (often, in this module, subsets of R).

Let A(x) be x is an element of A, and B(x) be x is an element of B.

A B. bijection. injection. Section 2.4: Countability. a b c d e g

CS314: FORMAL LANGUAGES AND AUTOMATA THEORY L. NADA ALZABEN. Lecture 1: Introduction

1. [5 points each] True or False. If the question is currently open, write O or Open.

Announcements. Written Assignment 1 out, due Friday, July 6th at 5PM.

CSE 105 THEORY OF COMPUTATION

Non-deterministic Finite Automata (NFA)

Compiler Construction: Parsing

CS Programming Languages Fall Homework #1

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.

Formal Languages and Automata

York University CSE 2001 Unit 4.0 Context Free Grammars and Parsers and Context Sensitive Grammars Instructor: Jeff Edmonds

Reflection in the Chomsky Hierarchy

Review: Shift-Reduce Parsing. Bottom-up parsing uses two actions: Bottom-Up Parsing II. Shift ABC xyz ABCx yz. Lecture 8. Reduce Cbxy ijk CbA ijk

Compiler Construction

HW 1 CMSC 452. Morally DUE Feb 7 NOTE- THIS HW IS THREE PAGES LONG!!! SOLUTIONS THROUGOUT THIS HW YOU CAN ASSUME:

Transcription:

Regular Expressions A Language for Specifying Languages Thursday, September 20, 2007 Reading: Stoughton 3.1 CS235 Languages and Automata Department of Computer Science Wellesley College Motivation for Regular Expressions There are many languages simple to describe in English that we would like to specify via a concrete formal notation. E.g., consider the following {a,b,c}-languages: All strings containing only a s All strings containing exactly one b All strings containing at least one b All strings containing the substring cab All strings that have even length. All strings with length 3. All strings that start and end with the same symbol All strings containing b and c in which the first c appears after the first b All strings in which every pair of consecutive b s delimits a substring containing exactly one c Regular expressions are a notation for specifying languages like these. Regular Expressions 8-2 1

Regular Expression Example A regular expression is a tree with binary @ and + nodes, unary * nodes, and leaves that are either %, $, or symbols from Sym. Example: @(*(+(a,b)),+(%,c)) @ * + + % c a b Regular Expressions 8-3 Formal Syntax of Regular Expressions Let RegLab = Sym {%, $, *, @, +} The set Reg of regular expressions is the least subset of Tree RegLab inductively defined by the following rules: (empty string) % Reg (empty set) $ Reg (symbol) for all a in Sym, a Reg (closure) for all α Reg, *(α) Reg (concatenation) for all α, β Reg, @(α,β) Reg (union) for all α, β Reg, +(α,β) Reg Regular Expressions 8-4 2

Formal Semantics of Regular Expressions The meaning of a regular expression is defined by the following function L : Reg Lan: L(%) = {%} L($) = L(a) = {a} L(*(α)) = (L(α))* L(@(α,β)) = L(α) @ L(β) L(+(α,β)) = L(α) L(β) A language is a regular language if it is the meaning of some regular expression. RegLan is the set of all regular languages. Regular Expressions 8-5 Semantics Example L( @(*(+(a,b)), +(%,c)) ) = L( *(+(a,b)) ) @ L( +(%,c) ) = (L( +(a,b) )) * @ (L( % ) L( c )) = (L( a ) L( b )) * @ ({%} {c}) = ({a} {b}) * @ {%, c} = {a, b} * @ {%, c} Regular Expressions 8-6 3

Abbreviations for Regular Expressions Abbreviate *(α) as α * Abbreviate @(α,β) as α@β or αβ right associative: a@β@γ means @(a,@(β,γ)) Abbreviate +(α,β) as α+β right associative: α+β+γ means +(α,+(β,γ)) Precedence order: * is highest, @ is middle, + is lowest override precedence with explicit parentheses. Examples: a+b*c+d means +(a,+(@(*(b),c),d)) a+b*(c+d) means +(a, @(*(b), +(c, d))) (a+b)*(c+d) means @(*(+(a,b)),+(c,d)) (a+b)*c+d means +(@(*(+(a,b)),c),d) Regular Expressions 8-7 Returning to Our Initial {a,b,c} Examples English Specification 1. All strings containing only a s Regular Expression 2. All strings containing exactly one b 3. All strings containing at least one b 4. All strings containing the substring cab 5. All strings that have even length 6. All strings with length 3 7. All strings that start and end with the same symbol 8. All strings containing b and c in which the first c appears after the first b 9. All strings in which every consecutive pair of b s delimits a substring containing exactly one c Regular Expressions 8-8 4

Languages and Countability Recall that Sym is a fixed universe of symbols and Str is the set of all strings over Sym. Str is countably infinite. Why? Suppose L is a Sym-language (i.e., a subset of Str). L must be countable. Why? Let Lan be the set of all Sym-languages = P(Str). Lan is uncountable. Why? The set Reg of regular expressions is countably infinite. Why? The set RegLan of regular languages is countably infinite. Why? Regular Expressions 8-9 5