CMSC 132: Object-Oriented Programming II

Similar documents
Regular Expressions & Automata

Chapter Seven: Regular Expressions

Dr. D.M. Akbar Hussain

Regular Languages and Regular Expressions

Chapter Seven: Regular Expressions. Formal Language, chapter 7, slide 1

Structure of Programming Languages Lecture 3

Languages and Strings. Chapter 2

Compilers CS S-01 Compiler Basics & Lexical Analysis

CS 314 Principles of Programming Languages

Compilers CS S-01 Compiler Basics & Lexical Analysis

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Regular Expressions. Chapter 6

Applications of. Regular Expressions BY NIKHIL KUMAR KATTE

CMPSCI 250: Introduction to Computation. Lecture #28: Regular Expressions and Languages David Mix Barrington 2 April 2014

Languages and Finite Automata

Understanding Regular Expressions, Special Characters, and Patterns

CSE 105 THEORY OF COMPUTATION

Compiler Design. 2. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 21, 2010

ECS 120 Lesson 7 Regular Expressions, Pt. 1

CS S-01 Compiler Basics & Lexical Analysis 1

Regular Expressions. Steve Renals (based on original notes by Ewan Klein) ICL 12 October Outline Overview of REs REs in Python

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Welcome! CSC445 Models Of Computation Dr. Lutz Hamel Tyler 251

Non-deterministic Finite Automata (NFA)

CS 314 Principles of Programming Languages. Lecture 3

Theory of Programming Languages COMP360

Week 2: Syntax Specification, Grammars

Regular Languages. Regular Language. Regular Expression. Finite State Machine. Accepts

Lexical Analysis. Sukree Sinthupinyo July Chulalongkorn University

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

CHAPTER TWO LANGUAGES. Dr Zalmiyah Zakaria

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Lec-5-HW-1, TM basics

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Optimizing Finite Automata

QUESTION BANK. Formal Languages and Automata Theory(10CS56)

Chapter 4: Regular Expressions

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Formal languages and computation models

Lexical Analysis (ASU Ch 3, Fig 3.1)

Ambiguous Grammars and Compactification

Closure Properties of CFLs; Introducing TMs. CS154 Chris Pollett Apr 9, 2007.

Object-Oriented Software Engineering CS288

CSE 431S Scanning. Washington University Spring 2013

R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

1. Draw the state graphs for the finite automata which accept sets of strings composed of zeros and ones which:

CS402 - Theory of Automata FAQs By

Perl Regular Expressions. Perl Patterns. Character Class Shortcuts. Examples of Perl Patterns

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Rice s Theorem and Enumeration

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np

Regular Expressions. A Language for Specifying Languages. Thursday, September 20, 2007 Reading: Stoughton 3.1

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Languages and Compilers

DVA337 HT17 - LECTURE 4. Languages and regular expressions

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Dr. Sarah Abraham University of Texas at Austin Computer Science Department. Regular Expressions. Elements of Graphics CS324e Spring 2017

Automata & languages. A primer on the Theory of Computation. The imitation game (2014) Benedict Cumberbatch Alan Turing ( ) Laurent Vanbever

CSE 105 THEORY OF COMPUTATION

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Formal Languages and Compilers Lecture VI: Lexical Analysis

Formal Languages. Formal Languages

Languages and Models of Computation

Finite Automata Part Three

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Regular Expressions. Todd Kelley CST8207 Todd Kelley 1


Regular Expressions. Lecture 10 Sections Robb T. Koether. Hampden-Sydney College. Wed, Sep 14, 2016

CMSC 330: Organization of Programming Languages

1.0 Languages, Expressions, Automata

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Automata Theory TEST 1 Answers Max points: 156 Grade basis: 150 Median grade: 81%

Regular Expressions. Regular Expressions. Regular Languages. Specifying Languages. Regular Expressions. Kleene Star Operation

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

COMP Logic for Computer Scientists. Lecture 25

Multiple Choice Questions

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

UNION-FREE DECOMPOSITION OF REGULAR LANGUAGES

CS 2112 Lab: Regular Expressions

2. λ is a regular expression and denotes the set {λ} 4. If r and s are regular expressions denoting the languages R and S, respectively

Formal Grammars and Abstract Machines. Sahar Al Seesi

CSE528 Natural Language Processing Venue:ADB-405 Topic: Regular Expressions & Automata. www. l ea rn ersd esk.weeb l y. com

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Turing Machine Languages

Formal Languages and Automata

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

Proof Techniques Alphabets, Strings, and Languages. Foundations of Computer Science Theory

MA/CSSE 474. Today's Agenda

Lexical Analysis 1 / 52

Compiler phases. Non-tokens

Objects and Types. COMS W1007 Introduction to Computer Science. Christopher Conway 29 May 2003

A Java program contains at least one class definition.

CMSC 330: Organization of Programming Languages

Notes for Comp 454 Week 2

CS308 Compiler Principles Lexical Analyzer Li Jiang

Universal Turing Machine Chomsky Hierarchy Decidability Reducibility Uncomputable Functions Rice s Theorem Decidability Continued

Transcription:

CMSC 132: Object-Oriented Programming II Regular Expressions & Automata Department of Computer Science University of Maryland, College Park 1

Regular expressions Notation Patterns Java support Automata Languages Finite State Machines Turing Machines Computability Overview 2

Regular Expression (RE) Notation for describing simple string patterns Very useful for text processing Finding / extracting pattern in text Manipulating strings Automatically generating web pages 3

Regular Expression Regular expression is composed of Symbols Operators Concatenation AB Union A B Closure A* 4

Definitions Alphabet Set of symbols S Examples {a, b}, {A, B, C}, {a-z,a-z,0-9} Strings Sequences of 0 or more symbols from alphabet Examples, a, bb, cat, caterpillar Languages Sets of strings empty string Examples, { }, { a }, { bb, cat } 5

More Formally Regular expression describes a language over an alphabet L(E) is language for regular expression E Set of strings generated from regular expression String in language if it matches pattern specified by regular expression 6

Regular Expression Construction Every symbol is a regular expression Example a REs can be constructed from other REs using Concatenation Union Closure * 7

Regular Expression Construction Concatenation A followed by B L(AB) = { ab a L(A) AND b L(B) } Example a { a } ab { ab } 8

Regular Expression Construction Union A or B L(A B) = { a a L(A) OR a L(B) } Example a b { a, b } 9

Regular Expression Construction Closure Zero or more A L(A*) = { a a = OR a L(A)L(A*) } Example a* {, a, aa, aaa, aaaa } (ab)*c { c, abc, ababc, abababc } 10

Regular Expressions in Java Java supports regular expressions In java.util.regex.* Applies to String class in Java 1.4 Introduces additional specification methods Simplifies specification Does not increase power of regular expressions Can simulate with concatenation, union, closure 11

Regular Expressions in Java Concatenation ab ab (ab)c abc Union ( bar or square brackets [ ] for chars) a b a, b [abc] a, b, c Closure (star *) (ab)* [ab]*, ab, abab, ababab, a, b, aa, ab, ba, bb 12

Regular Expressions in Java One or more (plus +) a+ One or more a s Range (dash ) [a z] Any lowercase letters [0 9] Any digit Complement (caret ^ at beginning of RE) [^a] [^a z] Any symbol except a Any symbol except lowercase letters 13

Regular Expressions in Java Precedence Higher precedence operators take effect first Precedence order Parentheses ( ) Highest Closure a* b+ Concatenation ab Union a b Range [ ] Lowest 14

Regular Expressions in Java Examples ab+ (ab)+ ab cd a(b c)d [abc]d ab, abb, abbb, abbbb ab, abab, ababab, ab, cd abd, acd ad, bd, cd When in doubt, use parentheses 15

Regular Expressions in Java Predefined character classes [.] Any character except end of line [\d] Digit: [0-9] [\D] Non-digit: [^0-9] [\s] Whitespace character: [ \t\n\x0b\f\r] [\S] Non-whitespace character: [^\s] [\w] Word character: [a-za-z_0-9] [\W] Non-word character: [^\w] 16

Regular Expressions in Java Literals using backslash \ Need two backslash Java compiler will interpret 1 st backslash for String Examples \\] ] \\.. \\\\ \ 4 backslashes interpreted as \\ by Java compiler 17

Using Regular Expressions in Java Compile pattern import java.util.regex.*; Pattern p = Pattern.compile("[a-z]+"); Create matcher for specific piece of text Matcher m = p.matcher("now is the time"); Search text boolean found = m.find(); Returns true if pattern is found anywhere in text boolean exact = m.matches() returns true if pattern matches entire text 18

Using Regular Expressions in Java If pattern is found in text m.group() string found m.start() index of the first character matched m.end() index after last character matched m.group() is same as s.substring(m.start(), m.end()) Calling m.find() again Starts search after end of current pattern match 19

Code Complete Java Example import java.util.regex.*; public class RegexTest { public static void main(string args[]) { Pattern p = Pattern.compile( [a-z]+ ); Matcher m = p.matcher( Now is the time ); while (m.find()) { System.out.print(m.group() + ); } } } Output ow is the time 20

Language Recognition Accept string if and only if in language Abstract representation of computation Performing language recognition can be Simple Strings with even number of 1 s Not Simple Hard Strings with any number of a s, followed by the same number of b s Strings representing legal Java programs Not possible for all cases Strings representing nonterminating Java programs 21

Automata Simple abstract computers Can be used to recognize languages Finite state machine States + transitions Turing machine States + transitions + tape 22

Finite State Machine States Starting Accepting Finite number allowed 0 1 1 Transitions State to state Labeled by symbol a q 1 q 2 Start State 0 Accept State L(M) = { w w ends in a 1} 23

Operations Finite State Machine Move along transitions based on symbol Accept string if ends up in accept state Reject string if ends up in non-accepting state 011 Accept 0 1 1 10 Reject q 1 q 2 0 24

Properties Finite State Machine Powerful enough to recognize regular expressions In fact, finite state machine regular expression Languages recognized by finite state machines 1-to-1 mapping Languages recognized by regular expressions 25

Turing Machine Defined by Alan Turing in 1936 Finite state machine + tape Tape Infinite storage Read / write one symbol at tape head Move tape head one space left / right 0 1 1 Tape Head q 1 q 2 0 26

Allowable actions Turing Machine Read symbol from current square Write symbol to current square Move tape head left Move tape head right Go to next state 27

Turing Machine Tape Head * 1 0 0 1 0 * Current State Current Content Value to Write Direction to Move New state to enter START * * Left MOVING MOVING 1 0 Left MOVING MOVING 0 1 Left MOVING MOVING * * No move HALT 28

Turing Machine Operations Read symbol on current square Select action based on symbol & current state Accept string if in accept state Reject string if halts in non-accepting state Reject string if computation does not terminate Halting problem It is undecidable in general whether long-running computations will eventually accept 29

Computability Computability A language is computable if it can be recognized by some algorithm with finite number of steps Church-Turing thesis Turing machine can recognize any language computable on any machine Intuition Turing machine captures essence of computing Both in a formal sense, and in an informal practical sense 30

Computability Turing Test Turing Test Test machine s capability to demonstrate intelligence Proposed by Alan Turing in 1950 Practical test for Can machines think? Test procedure 1. Judge converses via text (e.g., instant messaging) 2. Pass if can t reliably tell whether machine or human? 31

More Turing Tests On the Internet, nobody knows you re a dog. 32

Computability Reverse Turing Test Reverse Turing test Computer determines whether human or machine CAPTCHA Completely Automated Public Turing test to tell Computers and Humans Apart Challenge-response asking for word identification Useful for foiling scripts 33

More Reverse Turing Tests 34