Decision, Computation and Language Regular Expressions Dr. Muhammad S Khan (mskhan@liv.ac.uk) Ashton Building, Room G22 http://www.csc.liv.ac.uk/~khan/comp218
Regular expressions M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 2
Regular expressions M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 3
Union M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 4
Concatenation M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 5
Closure (also called Kleene closure ) If we take a regular expression and add the superscript *, we get a new r.e. that represents the set of all words you can make by taking any sequence of words from the original r.e. and concatenating them together. M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 6
Example ({cat})* denotes the words ε, cat, catcat, catcatcat, catcatcatcat,... Notice that using closure, you can define an infinite language! M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 7
Example M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 8
Example M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 9
Example M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 10
Examples M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 11
Common extensions to the notation M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 12
Lexical tokens M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 13
Regular Languages......are those languages that can be described using regular expressions. We will see that these are the same languages as those that can be accepted by finite automata (Kleene's theorem). In terms of description length some languages are better expressed as DFAs and some as reg exprs. r.e's usually give a easy-to-understand description, on the other hand it's obvious how to run DFAs on input strings. M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 14
A language that is easier to describe using a r.e. M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 15
Regular Expressions vs. Finite Automata Offers a declarative way to express the pattern of any string we want to accept E.g., 01*+ 10* Automata => more machine-like < input: string, output: [accept/reject] > Regular expressions => more program syntax-like Unix environments heavily use regular expressions E.g., bash shell, grep, vi & other editors, sed Perl scripting good for string processing Lexical analyzers such as Lex or Flex M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 16
Regular Expressions Syntactical expressions Regular expressions = Finite Automata (DFA, NFA, -NFA) Automata/machines Regular Languages Formal language classes M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 17
Example L = { w w is a binary string which does not contain two consecutive 0s or two consecutive 1s anywhere) E.g., w = 01010101 is in L, while w = 10010 is not in L Goal: Build a regular expression for L Four cases for w: Case A: w starts with 0 and w is even Case B: w starts with 1 and w is even Case C: w starts with 0 and w is odd Case D: w starts with 1 and w is odd Regular expression for the four cases: Case A: (01)* Case B: (10)* Case C: 0(10)* Case D: 1(01)* Since L is the union of all 4 cases: Reg Exp for L = (01)* + (10)* + 0(10)* + 1(01)* M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 18
Precedence of Operators Highest to lowest * operator (star). (concatenation) + operator Example: 01* + 1 = ( 0. ((1)*) ) + 1 M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 19
Finite Automata & Regular Expressions Proofs in the book To show that they are interchangeable, consider the following theorems: Theorem 1: For every DFA A there exists a regular expression R such that L(R)=L(A) Theorem 2: For every regular expression R there exists an -NFA E such that L(E)=L(R) Theorem 2 -NFA Reg Ex Theorem 1 NFA DFA Kleene Theorem M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 20
DFA to RE construction DFA Theorem 1 Reg Ex Informally, trace all distinct paths (traversing cycles only once) from the start state to each of the final states and enumerate all the expressions along the way Example: 1 0 0,1 q 0 0 q 1 1 q 2 (1*) 0 (0*) 1 (0 + 1)* 1* 00* 1 (0+1)* Q) What is the language? M S Khan (Univ. of Liverpool) 1*00*1(0+1)* COMP218 Decision, Computation and Language 21
RE to -NFA construction Reg Ex Theorem 2 -NFA Example: (0+1)*01(0+1)* (0+1)* 01 (0+1)* 0 0 1 0 1 1 M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 22
Algebraic Laws of Regular Expressions Commutative: E+F = F+E Associative: (E+F)+G = E+(F+G) (EF)G = E(FG) Identity: E+Φ = E E = E = E Annihilator: ΦE = EΦ = Φ M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 23
Algebraic Laws Distributive: E(F+G) = EF + EG (F+G)E = FE+GE Idempotent: E + E = E Involving Kleene closures: (E*)* = E* Φ* = * = E + =EE* M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 24
Observations, exercise M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 25
Equivalences amongst regular expressions M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 26
Equivalences amongst regular expressions M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 27
True or False? Let R and S be two regular expressions. Then: 1. ((R*)*)* = R*? 2. (R+S)* = R* + S*? 3. (RS + R)* RS = (RR*S)*? M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 28
Using regular expressions for lexical analysis Use regular expressions to specify lexical tokens A parser generator can convert r.e's to DFAs. Given a sequence of characters that may or may not be a token, it is easy to check whether it is accepted by the DFA. (Notice that it is not straightforward to check whether a string of symbols belongs to the language given by a regular expression. A compiler, given a program, will look for a prefix that corresponds to a token, when that prefix is found, add it to list of tokens, delete it from program, and repeat, on remainder of program. (A lexical error is generated if no prefix matches any token expression.) M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 29
Taking stock M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 30
Taking stock M S Khan (Univ. of Liverpool) COMP218 Decision, Computation and Language 31