Equivalence and Simplification of Regular Expressions

Similar documents
Regular Expressions. A Language for Specifying Languages. Thursday, September 20, 2007 Reading: Stoughton 3.1

3.15: Applications of Finite Automata and Regular Expressions. Representing Character Sets and Files

SLR parsers. LR(0) items

Ambiguous Grammars and Compactification

General Factorial Models

General Factorial Models

Bottom-Up Parsing II. Lecture 8

Specialized Grammars. Overview. Linear Grammars. Linear Grammars and Normal Forms. Yet Another Way to Specify Regular Languages

Regular Expressions. Chapter 6

Problem Set 3 Due: 11:59 pm on Thursday, September 25

Problem Set 2 Solutions

Bottom-Up Parsing. Lecture 11-12

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

CMPSCI 250: Introduction to Computation. Lecture #36: State Elimination David Mix Barrington 23 April 2014

Propositional Calculus: Boolean Algebra and Simplification. CS 270: Mathematical Foundations of Computer Science Jeremy Johnson

CMPSCI 250: Introduction to Computation. Lecture #22: Graphs, Paths, and Trees David Mix Barrington 12 March 2014

Problem Set 6 (Optional) Due: 5pm on Thursday, December 20

Functional dependency theory

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np

02157 Functional Programming. Michael R. Ha. Disjoint Unions and Higher-order list functions. Michael R. Hansen

Introduction to Syntax Analysis

Inductive Data Types

Review: Shift-Reduce Parsing. Bottom-up parsing uses two actions: Bottom-Up Parsing II. Shift ABC xyz ABCx yz. Lecture 8. Reduce Cbxy ijk CbA ijk

LSN 4 Boolean Algebra & Logic Simplification. ECT 224 Digital Computer Fundamentals. Department of Engineering Technology

On Learning Discontinuous Dependencies from Positive Data

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

Regular Languages and Regular Expressions

Informationslogistik Unit 5: Data Integrity & Functional Dependency

Power Set of a set and Relations

Structural polymorphism in Generic Haskell

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.

Decision Properties for Context-free Languages

Lecture Note: Types. 1 Introduction 2. 2 Simple Types 3. 3 Type Soundness 6. 4 Recursive Types Subtyping 17

STABILITY AND PARADOX IN ALGORITHMIC LOGIC

Glynda, the good witch of the North

CHAPTER 8. Copyright Cengage Learning. All rights reserved.

Computer Science Technical Report

Note that alternative datatypes are possible. Other choices for not null attributes may be acceptable.

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

Taibah University College of Computer Science & Engineering Course Title: Discrete Mathematics Code: CS 103. Chapter 2. Sets

Introduction to Syntax Analysis. The Second Phase of Front-End

Instructor: Padraic Bartlett. Lecture 2: Schreier Diagrams

6.852 Lecture 17. Atomic objects Reading: Chapter 13 Next lecture: Atomic snapshot, read/write register

CSE 105 THEORY OF COMPUTATION

Logic and Discrete Mathematics. Section 2.5 Equivalence relations and partitions

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

CS352 Lecture - Conceptual Relational Database Design

58093 String Processing Algorithms. Lectures, Autumn 2013, period II

CS2 Algorithms and Data Structures Note 10. Depth-First Search and Topological Sorting

Types, Expressions, and States

CIS 500: Software Foundations

Announcements. Written Assignment 1 out, due Friday, July 6th at 5PM.

UNIT -III. Two Marks. The main goal of normalization is to reduce redundant data. Normalization is based on functional dependencies.

Solving Linear Recurrence Relations (8.2)

User-defined Functions. Conditional Expressions in Scheme

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Schemes and the IP-graph

Propositional Calculus. CS 270: Mathematical Foundations of Computer Science Jeremy Johnson

Module 8: Local and functional abstraction

Lexical Analysis - 1. A. Overview A.a) Role of Lexical Analyzer

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Lecture 3: Recursion; Structural Induction

Fall Lecture 3 September 4. Stephen Brookes

Compilation Lecture 3: Syntax Analysis: Top-Down parsing. Noam Rinetzky

Midterm Exam. CSCI 3136: Principles of Programming Languages. February 20, Group 2

9/19/12. Why Study Discrete Math? What is discrete? Sets (Rosen, Chapter 2) can be described by discrete math TOPICS

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

DKT 122/3 DIGITAL SYSTEM 1

CSE341 Section 3. Standard-Library Docs, First-Class Functions, & More

Rice s Theorem and Enumeration

CS352 Lecture - Conceptual Relational Database Design

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Foundations of Computer Science Spring Mathematical Preliminaries

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

02157 Functional Programming Tagged values and Higher-order list functions

List Processing in SML

Course Project 2 Regular Expressions

Compilation 2012 Static Type Checking

Implementation of Lexical Analysis. Lecture 4

IT 201 Digital System Design Module II Notes

Integers and Mathematical Induction

List Processing in SML

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Notions of Computability

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

Behavioural Equivalences and Abstraction Techniques. Natalia Sidorova

Real Analysis, 2nd Edition, G.B.Folland

02157 Functional Programming Lecture 2: Functions, Basic Types and Tuples

Lecture 7: Deterministic Bottom-Up Parsing

Lecture #13: Type Inference and Unification. Typing In the Language ML. Type Inference. Doing Type Inference

1. Draw the state graphs for the finite automata which accept sets of strings composed of zeros and ones which:

Applying Classical Concepts to Parsing Expression Grammar

Models of Computation II: Grammars and Pushdown Automata

1 Finite Representations of Languages

Static Type Checking. Static Type Checking. The Type Checker. Type Annotations. Types Describe Possible Values

Lexical Analysis (ASU Ch 3, Fig 3.1)

Recursively Enumerable Languages, Turing Machines, and Decidability

VIVA QUESTIONS WITH ANSWERS

Designing Computer Systems Boolean Algebra

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

Transcription:

Equivalence and Simplification of Regular Expressions Wednesday, September 26, 2007 Reading: Stoughton 3.2 CS235 Languages and Automata Department of Computer Science Wellesley College Goals for Today Equivalence of regular expressions Simplifying regular expressions Forlan tools for symbols, strings, and regular expressions Equivalence and Simplification of Regular Expressions 10-2 1

Equivalence of Regular Expressions English: Regular expressions α and β are equivalent iff they denote the same language. Symbols: α β iff L(α) = L(β) Example: Show the following for any string x: % + x(% + x)* x* Approach 1: Reason by definitions of languages. L 1 = L(% + x(% + x)*) = {%} U {x}@{%,x} * L 2 = L(x*) = {x} * Show L 1 = L 2. Approach 2: Develop algebraic laws and use these for reasoning Equivalence and Simplification of Regular Expressions 10-3 Some Equivalence Laws for Regular Expressions Stoughton s Laws (Section 3.2) 1. (α + β) + γ α + (β + γ) 2. α + β β + α 3. $ + α α 4. α + α α 5. (αβ)γ α(βγ) 6. %α α α% 7. $α $ α$ 8. α(β + γ) αβ + αγ 9. (α + β)γ αγ + βγ 10. $* % 11. %* % 12. α*α αα* 13. α*α* α* 14. (α*)* α* 15. L(α) L(β) α + β β Some Additional Laws (Kozen, Chapter 9): 16. % + αα* α* 17. % + α*α α* 18. (% + α)* α* 19. (αβ)*α α(βα)* 20. (α*β)*α* (α + β)* 21. α*(βα*)* (α + β)* 22. β + αγ + γ γ α*β + γ γ 23. β + γα + γ γ βα* + γ γ Kozen says that laws 1-9 + 15, 16, 17, 22, and 23 can generate all true equations between regular expressions. Equivalence and Simplification of Regular Expressions 10-4 2

Let s Get Some Practice! Show % + x(% + x)* x* Show 0* + 0*1(% + 0*01)*0*00 % + (0+10)*0 (adapted from Kozen, Ch. 9) Equivalence and Simplification of Regular Expressions 10-5 Simplification Algebra is tricky: how do we know which laws to apply when (and in which direction)? What we want: A simplification algorithm for applying rules in a particular direction. Each rule should make the expression simpler in some way, so that the algorithm terminates. The order in which the rules are applied shouldn t affect the final result (confluence). The simplification process should find the simplest form according to all possible laws. The simplification process should be efficient. Sadly, our hopes are crushed by Stones Theorem (Jagger): You can t always get what you want. Equivalence and Simplification of Regular Expressions 10-6 3

Stoughton s Simplification Algorithms Stoughton presents two simplification algorithms (3.2): weaksimplify: uses only a small set of rules, so isn t very powerful is easy to understand is efficient resulting expressions have nice properties simplify: uses a much larger set of rules, so is more powerful rules are ad hoc heuristics that aren t confluent, so results are hard to predict is inefficient is abstracted over a predicate conservatively approximating the subset relation on regular expressions (weaksub by default) Equivalence and Simplification of Regular Expressions 10-7 Weakly Simplified Regular Expressions Definition: A regular expression is weakly simplified if it does not contain any subexpressions of the following form: %* $* (α*)* %α or α% $α or α$ (αβ)γ α*α or α*(αβ) $ + α or α + $ (α + β) + γ ( ) α + α, α + (α + β) ( ) β + α or β + (α + γ), where α < β via Reg ordering ( ) and ( ) say sums are sorted and w/o duplicates. We ll see that subexpressions of the above form are ones that are easy to remove in a step-by-step simplification process. Equivalence and Simplification of Regular Expressions 10-8 4

What is the Reg Ordering? Recall RegLab = {%, $, *, @, +} U Sym RegLab is ordered as follows: % < $ < symbols in order < * < @ < + Reg is the set of regular expressions. Elements of Reg are ordered by viewing them as trees and then ordering them first by their root labels and then recursively by their children (from left to right). E.g.: % < *(%) < *(@($,*($))) < *(@(a,%)) < @(a,b) < +(%,$) Equivalence and Simplification of Regular Expressions 10-9 Weak Simplification Rules %* % $* % (α*)* α* %α α α% α $α $ α$ $ (αβ)γ α(βγ) Each rule preserves equivalence α*α αα*, α*(αβ) α(α*β) $ + α α α + $ α α + α α, α + (α + β) α + β β + α α + β, β + (α + γ) α + (β + γ) (if α < β via Reg ordering) (α + β) + γ α + (β + γ) Each rule makes the expression closer to weakly simplified form Applying the rules in any order will eventually terminate with the same weakly simplified expression Stoughton describes a weaksimplify algorithm that efficiently applies rules in a particular order. Equivalence and Simplification of Regular Expressions 10-10 5

Weak Simplification Examples Perform weak simplification on the following expressions: b*(b + b$) + ($ + $*) + (a + a)*a % + x(% + x)* 0* + 0*1(% + 0*01)*0*00 Equivalence and Simplification of Regular Expressions 10-11 Weak Simplification in Forlan In Forlan, we can define a testweaksimplify function that performs weaksimplify algorithm on regular expressions represented as strings. - testweaksimplify; (* We ll see how to define this later *) val testweaksimplify = fn : string -> string - testweaksimplify "b*(b + b$) + ($ + $*) + (a + a)*a"; val it = "% + aa* + bb*" : string - testweaksimplify "% + x(% + x)*"; val it = "% + x(% + x)*" : string - testweaksimplify "0* + 0*1(% + 0*01)0*00"; val it = "0* + 0*1(% + 00*1)000*" : string - testweaksimplify "(1+%)(2+$)(3+%*)(4+$*)"; val it = "(% + 1)2(% + 3)(% + 4)" : string Equivalence and Simplification of Regular Expressions 10-12 6

Nice Properties of Weakly Simplified Form Suppose α is weakly simplified. Then: L(α) = iff α = $ L(α) = {%} iff α = % L(α) = {a} iff α = a L(α) is infinite iff α contains * The last condition gives rise to an easy algorithm to determine whether a regular expression denotes an infinite language. Equivalence and Simplification of Regular Expressions 10-13 Stronger Simplification Stoughton presents simplify, a more powerful simplifier. It is parameterized over a predicate sub(α,β) for conservatively approximating if α β: if sub(α,β) returns true, then α β, but if sub(α,β) returns false, it is unknown whether α β The trivial function trivialsubset(α,β) = false is an option, but not very useful. Stoughton defines a weaksubset function that is a simple nontrivial implementation of sub, but there are more precise ones. Stoughton s simplify definition also uses a predicate hasemp(α) that determines if % L(α). Equivalence and Simplification of Regular Expressions 10-14 7

Some Stronger Simplification Rules (1) (α*β)*α* (α + β)* (2) α*(βα*)* (α + β)* (3) If hasemp(α) and sub(α,β*) then αβ* β* (4) If hasemp(β) and sub(β,α*) then α*β α* (5) If sub(α,β*) then (α + β)* β* (6) (α + β*)* (α + β)* (7) If hasemp(α) and hasemp(β), then (αβ)* (α + β)* (9) If hasemp(α) and sub(α,β*) then (αβ)* β* (10) If hasemp(β) and sub(β,α*) then (αβ)* α* (13) If sub(α,β) then α + β β (14) αβ + αγ α(β + γ) (15) αγ + βγ (α + β)γ (20) If hasemp(β) then α*α + β α* + β For the complete set of rules, see Stoughton 3.2. These rules are just heuristics; they are not confluent and there is no canonical form that they yield. Equivalence and Simplification of Regular Expressions 10-15 Stronger Simplification in Forlan In Forlan, we can define a testsimplify function that performs simplify algorithm on regular expressions represented as strings using weaksubset as sub. - testsimplify; (* We ll see how to define this later *) val testsimplify = fn : string -> string - testsimplify "b*(b + b$) + ($ + $*) + (a + a)*a"; val it = "a* + b*" : string - testsimplify "% + x(% + x)*"; val it = "x*" : string - testsimplify "0* + 0*1(% + 0*01)0*00"; val it = "0* + 0*1(% + 00*1)000*" : string Equivalence and Simplification of Regular Expressions 10-16 8

Tracing Simplification in Forlan In Forlan, we can define a testtracesimplify function that shows how simplify arrives at its result. - testtracesimplify "% + x(% + x)*"; % + x(% + x)* weakly simplifies to % + x(% + x)* is transformed by closure rules to % + x(% + x)* is transformed by simplification rule 5 to % + xx* weakly simplifies to % + xx* is transformed by closure rules to xx* + % (* continued in next column *) is transformed by simplification rule 20 to x* + % weakly simplifies to % + x* is transformed by closure rules to % + x* is transformed by simplification rule 13 to x* weakly simplifies to x* is simplified val it = "x*" : string Equivalence and Simplification of Regular Expressions 10-17 The Forlan Sym Module - open Sym; opening Sym (* I m leaving out a bunch of details here *) type sym = sym (* sym is an abstract type whose representation is hidden *) val fromstring : string -> sym val tostring : sym -> string val compare : sym * sym -> order val equal : sym * sym -> bool val size : sym -> int - val syms = map Sym.fromString ["a", "b", "<foo>", "<a<b>>"]; val syms = [-,-,-,-] : sym list - map Sym.toString syms; val it = ["a","b","<foo>","<a<b>>"] : string list - Sym.compare(Sym.fromString "c", Sym.fromString "<ab>"); val it = LESS : order - Sym.compare(Sym.fromString "<c>", Sym.fromString "<ab>"); val it = LESS : order - Sym.compare(Sym.fromString "<cd>", Sym.fromString "<ab>"); val it = GREATER : order Equivalence and Simplification of Regular Expressions 10-18 9

The Forlan Str Module - open Str; opening Str type str = sym list val fromstring : string -> str val tostring : str -> string val isempty : str -> bool val isnonempty : str -> bool val compare : str * str -> order val equal : str * str -> bool val alphabet : str -> sym set val prefix : str * str -> bool val suffix : str * str -> bool val substr : str * str -> bool val power : str * int -> str (* I m leaving out some details *) Equivalence and Simplification of Regular Expressions 10-19 Str Examples - val s1 = Str.fromString "ab<c>d<efg>"; val s1 = [-,-,-,-,-] : str - length s1; val it = 5 : int - map Sym.toString s1; val it = ["a","b","<c>","d","<efg>"] : string list - val s2 = Str.fromString "ab%c(def)%g"; val s2 = [-,-,-,-,-,-,-] : str - Str.toString s2; val it = "abcdefg" : string - val s3 = Str.fromString "%"; val s3 = [] : str - val s4 = Str.fromString ""; line 1 : end-of-file unexpected uncaught exception Error - Str.toString(Str.fromString "%"); val it = "%" : string - Str.toString(s3); val it = "%" : string - val s5 = s1 @ s2; val s5 = [-,-,-,-,-,-,-,-,-,-,-,-] : sym list - Str.toString (s5); val it = "ab<c>d<efg>abcdefg" : string - Str.prefix (s1, s5); val it = true : bool - Str.prefix (s2, s5); val it = false : bool - Str.substr(Str.fromString("def"),s5); val it = true : bool - Str.substr(Str.fromString("edf"),s5); val it = false : bool - Str.toString(Str.power(s1,3)); val it = "ab<c>d<efg>ab<c>d<efg>ab<c>d<efg>" : string Equivalence and Simplification of Regular Expressions 10-20 10

The Forlan SymSet Module - open SymSet; opening SymSet val fromlist : sym list -> sym set val compare : sym set * sym set -> order val memb : sym * sym set -> bool val subset : sym set * sym set -> bool val equal : sym set * sym set -> bool val map : ('a -> sym) -> 'a set -> sym set val union : sym set * sym set -> sym set val inter : sym set * sym set -> sym set val minus : sym set * sym set -> sym set val powset : sym set -> sym set set val fromstring : string -> sym set val input : string -> sym set val tostring : sym set -> string (* I m leaving out some details *) - val alph = Str.alphabet(s5); val alph = - : sym set - SymSet.toString alph; val it = "a, b, c, d, e, f, g, <c>, <efg>" : string - SymSet.toString (SymSet.inter(alph,(SymSet.fromString "b,<efg>,e"))); val it = "b, e, <efg>" : string Equivalence and Simplification of Regular Expressions 10-21 The Forlan SymRel Module opening SymRel type sym_rel = (sym,sym) rel val compare : sym_rel * sym_rel -> order val memb : (sym * sym) * sym_rel -> bool val subset : sym_rel * sym_rel -> bool val equal : sym_rel * sym_rel -> bool val union : sym_rel * sym_rel -> sym_rel val inter : sym_rel * sym_rel -> sym_rel val minus : sym_rel * sym_rel -> sym_rel val powset : sym_rel -> sym_rel set val reflexive : sym_rel * sym set -> bool val symmetric : sym_rel -> bool val transitive : sym_rel -> bool val reflexiveclosure : sym_rel * sym set -> sym_rel val transitiveclosure : sym_rel -> sym_rel val symmetricclosure : sym_rel -> sym_rel val reflexivetransitiveclosure : sym_rel * sym set -> sym_rel val reflexivesymmetricclosure : sym_rel * sym set -> sym_rel val transitivesymmetricclosure : sym_rel -> sym_rel val reflexivetransitivesymmetricclosure : sym_rel * sym set -> sym_rel (* I m leaving out a bunch of details here *) Equivalence and Simplification of Regular Expressions 10-22 11

SymRel Examples - val rel = SymRel.fromString "(a,b), (b,d), (d,h)"; val rel = - : sym_rel - SymRel.symmetric rel; val it = false : bool - SymRel.transitive rel; val it = false : bool - val rel2 = SymRel.transitiveClosure rel; val rel2 = - : sym_rel - SymRel.toString rel2; val it = "(a, b), (a, d), (a, h), (b, d), (b, h), (d, h)" : string - SymRel.transitive rel2; val it = true : bool - SymRel.symmetric rel2; val it = false : bool Equivalence and Simplification of Regular Expressions 10-23 The Forlan StrSet Module - open StrSet; opening StrSet val fromlist : str list -> str set val compare : str set * str set -> order val memb : str * str set -> bool val subset : str set * str set -> bool val equal : str set * str set -> bool val map : ('a -> str) -> 'a set -> str set val union : str set * str set -> str set val inter : str set * str set -> str set val minus : str set * str set -> str set val powset : str set -> str set set val fromstring : string -> str set val tostring : str set -> string val concat : str set * str set -> str set val power : str set * int -> str set val rev : str set -> str set val alphabet : str set -> sym set (* I m leaving out a bunch of details here *) Equivalence and Simplification of Regular Expressions 10-24 12

StrSet Examples - val ss1 = StrSet.fromString "in,out"; val ss1 = - : str set - val ss2 = StrSet.fromString "doors,put,side"; val ss2 = - : str set - StrSet.toString(StrSet.concat(ss1,ss2)); val it = "input, inside, output, indoors, outside, outdoors" : string - StrSet.toString(StrSet.union(ss1,ss2)); val it = "in, out, put, side, doors" : string - StrSet.toString(StrSet.power(ss1,3)); val it = "ininin, ininout, inoutin, outinin, inoutout, outinout, outoutin, outoutout" : string Equivalence and Simplification of Regular Expressions 10-25 The Forlan Reg Module - open Reg; opening Reg type reg val fromstring : string -> reg val tostring : reg -> string val height : reg -> int val size : reg -> int val compare : reg * reg -> order val equal : reg * reg -> bool val alphabet : reg -> sym set val emptystr : reg val emptyset : reg val fromsym : sym -> reg val closure : reg -> reg val concat : reg * reg -> reg val union : reg * reg -> reg val fromstr : str -> reg val power : reg * int -> reg val weaksimplify : reg -> reg val weaksubset : reg * reg -> bool val tracesimplify : (reg * reg -> bool) -> reg -> reg val simplify : (reg * reg -> bool) -> reg -> reg (* I m leaving out a bunch of details here *) Equivalence and Simplification of Regular Expressions 10-26 13

Reg Examples - val r1 = Reg.fromString "(a+b)*(%+c)"; val r1 = - : reg - Reg.size r1; val it = 8 : int - Reg.alphabet r1; val it = - : sym set - SymSet.toString(Reg.alphabet r1); val it = "a, b, c" : string - val r1' = Reg.concat(Reg.closure(Reg.union(Reg.fromSym(Sym.fromString("a")), = Reg.fromSym(Sym.fromString("b")))), = Reg.union(Reg.emptyStr,Reg.fromSym(Sym.fromString("c")))); val r1' = - : reg - Reg.toString(r1'); val it = "(a + b)*(% + c)" : string - Reg.toString(Reg.power(r1,3)); val it = "((a + b)*(% + c))((a + b)*(% + c))(a + b)*(% + c)" : string - Reg.toString(Reg.power(r1,2)); val it = "((a + b)*(% + c))(a + b)*(% + c)" : string Equivalence and Simplification of Regular Expressions 10-27 Our Testing Functions from Earlier - fun testweaksimplify s = Reg.toString(Reg.weakSimplify(Reg.fromString s)); val testweaksimplify = fn : string -> string -fun testsimplify s = = Reg.toString(Reg.simplify Reg.weakSubset (Reg.fromString s)); val testsimplify = fn : string -> string -fun testtracesimplify s = -= Reg.toString(Reg.traceSimplify Reg.weakSubset (Reg.fromString s)); val testsimplify = fn : string -> string Equivalence and Simplification of Regular Expressions 10-28 14