CSE : Python Programming

Size: px
Start display at page:

Download "CSE : Python Programming"

Transcription

1 CSE : Python Programming Lecture 11: Regular expressions April 2,

2 Announcements About those meeting from last week If I said I was going to look into something or you some information, you should send me and remind me We'll have similar meetings near the end of classes Code and documentation for projects is due on the last day of classes (April 20) 2

3 Regular Expressions

4 Survey time (again) Who here knows about the following? Deterministic finite-state automata (DFAs) Non-deterministic finite-state automata (NFAs) Regular languages Recall (if you've taken CSE 260?) that the proof showing the equivalence of the above three gives an algorithm for translating a regular language into an automaton 4

5 Regular expressions (overview) A regular expression is a compact way of specifying a (potentially large) set of strings Example: Compilers and identifiers Source code: Object, LinkedList, my_int, pagelist Regular expression: [a-za-z][a-za-z0-9]* They are useful for finding a particular kind string within some larger string, but they're not great for everything 5

6 Warning: Code Mathematics Mathematics has an idea of what a regular expression is Programming languages also have ideas for this Shells have yet more ideas for this They are not exactly the same! Somewhat different notations and meanings Some languages provide features which have no correspondence to mathematics 6

7 Regular expression in Python The following characters have special meanings inside a regular expression. ^ $ * +? { } [ ] \ ( ) If you want to refer to literally refer to these characters, prefix them with a backslash For example: \. 7

8 Backslash mayhem and raw strings But, you have to give regular expressions as strings to Python, and backslashes have another meaning there For example: '\n' is a one-character string Suppose we want to match the backslash character: Regular expression we have to use: \\ As a Python string: '\\\\' Raw strings, e.g., r'foo' and r"foo", don't interpret backslash characters in any special way 8

9 !"#$%&'()""*#%'+)",-$%&. /%$0',122)%3'!"#$%&'&'1#'!"#()*%+$. (4"-1%'5*&67)#'809#*::$1%'$:'!"##$%$&'(),-*)$*).*)*. $;'#*9*)"$%&'<'1#'21#*'"$2*:. /0$1'2)",-*:'/1='/01='/001='>>>. /,021-$1'2)",-*:'/1='/0211='/ ='>>>..;'#*9*)"$%&'?'1#'21#*'"$2*:. /0.1'2)",-*:'/01='/001='>>>. /,021-.1'2)",-*:'/0211='/ ='>>>. *;'<'1#'?'"$2*:

10 !"#$#%&'$(!)#** +!"!,'#-*!./$01!#$$"%%&!,#&%"'*!$$!/$!%% + '$%()!,#&%"'*!$2!%2(/$!( + /$!*3,4)5!'$*() + '6738#)'-&(&/(#$"%"(&( + '$%()+!,#&%"'*!$2!%2!(2!$$2!$%2!$(2(%$2!%%2! %(2!($2!(%2!((2(999 + ',-)!,#&%"'*!#-5(%"#$!':%'4&!- + ',.*/)!,#&%"'*!#-5(%"#$!':%'4&!#!;3<3& + $'%(0)1!,#&%"'*(,#-5(,/$'(&"#-!$#%(0&1

11 !"#$%&'()&*)+,--./ 0 1/)23"#$%&'(4)5-)3-"')3"#$%&'()#%-)1-(&''&'() 67,#&7')78)")*#,&'( 0!"#$%&)3"#$%-*)#%-)9'.-,:&'-.)6",#)&')!#$#$' 0 (,--./)*-",$%)5&#%)1"$;#,"$;&'( 0!"#$'%(#)3"#$%-*)!#$'#<)!#$'#$'<)!#$' 0 #,/)3"#$%)#%-)6"##-,')!)#$'*(#)5&#%)*#,&'()!#$#' Step Matched Explanation 1 a The a in the RE matches. 2 abcbd The engine matches [bcd]*, going as far as it can, which is to the end of the string. 3 Failure The engine tries to match b, but the current position is at the end of the string, so it fails. 4 abcb Back up, so that [bcd]* matches one less character. 5 Failure Try b again, but the current position is at the last character, which is a d. 6 abc Back up again, so that [bcd]* is only matching bc. 6 abcb Try b again. This time but the character at the current position is b, so it succeeds.

12 !"#$%&'()$*$#+&*", #)$*$#+&*"'-.+)'"%&#.$/'0&$1.12",!"#"$"%"&"'"(")"*"+","-"."/,",.01,/"0$+#)&"".01/,"!"0$+#)&"'$13'".12/&'#)$*$#+&*,"!%"0$+#)&"'$13'"+*.12,",,"0$+#)&"",,"#"0$+#)&"'+)&'4& '56'$'/.1&'5*'"+*.12, '15+'+)&'#'.1".7&'#)$*8#/$""&"'*#999+,"$"0$+#)&"'+)&'&17'56'$'/.1&'5*'"+*.12, 0*123+%1$'75&"'15+'0$+#)'"+*.12'01213

13 Special character classes \d Matches any decimal digit; this is equivalent to the class [0-9]. \D Matches any non-digit character; this is equivalent to the class [ˆ0-9]. \s Matches any whitespace character; this is equivalent to the class [ \t\n\r\f\v]. \S Matches any non-whitespace character; this is equivalent to the class [ˆ \t\n\r\f\v]. \w Matches any alphanumeric character; this is equivalent to the class [a-za-z0-9 ]. \W Matches any non-alphanumeric character; this is equivalent to the class [ˆa-zA-Z0-9_]. 13

14 Overview of functions Method/Attribute match search split sub subn Purpose Determine if the RE matches at the beginning of the string. Scan through a string, looking for any location where this RE matches. Split the string into a list, splitting it wherever the RE matches Find all substrings where the RE matches, and replace them with a different string Does the same thing as sub(), except you can limit the number of replacements 14

15 !"#$%#&'()*+,-./"0 6 $+(,-.7*#"-5#(0*8&9)*'$*$,'2"37*%#*,*&,-./"3*%89".-!!!"#$%&'("')!!!"')*$+(,-./0+1234/5"667 8&9).%&1'2"3*4"#0'%(* '0*$,0-"#*$%#* #"1",-"3*50"!!!"%":"')*,&$%#;)./0+1234/7!!!"% <=>')*?@A=B+(()'9"&CD),("+("EF,G,HE!!!!"%*$+(,-.667!!!"%'#9("%*$+(, &9)!!!"$":"%*$+(,-."/()$%&/7!!!"%'#9("$ <=>')*?@A=I+(,-"&CD),("+("EF,JKLE!!"

16 Methods on match objects Method/Attribute group() start() end() span() Purpose Return the string matched by the RE Return the starting position of the match Return the ending position of the match Return a tuple containing the (start, end) of the match 16

17 !"#$%&'()&*+",$% -!"#$%&'&.+#+,/01+(&02&3"##+,1&/"#$%+(&"#&#%+& &62&"&(#,015 - ()"*$%&'&($"1(&#%,675%&#%+&(#,015&#6&(++&02&"18& (74(#,015&/"#$%+( +++,-*./#,-0!"#$%&1222,!)(("3)1' 45/) +++,!,6,-0()"*$%&1222,!)(("3)1' +++,-*./#,! 7*)08"#$%9:;)$#,./(#"/$),"#,<=$>?@=+ +++,!03*5A-&' 1!)(("3)1 +++,!0(-"/&' &BC,DD'

18 Matching a "word boundary" >>> p = re.compile(r \bclass\b ) >>> print p.search( no class at all ) <re.matchobject instance at 80c8f28> >>> print p.search( the declassified algorithm ) None >>> print p.search( one subclass is ) None A word is defined as a sequence of alpha-numeric characters Whitespace and punction effectively denote the beginning and end of a word 18

19 !"#$%&&'(!"#$!!"#$")*+'( %!"#$%&&'(!&'()&*#!+!,-#(!./!+,,!#)0#(&-*1#!(2+(!3+(42'# %!"#$")*+'(!&'()&*#!+*!-('&+(.&!./!3+(42'5!.06'4(#,,,-.-/-+*0123."&*'45$64(,,,-7-/-489-$+:33*+7;-88-.".*+7;-8<-&2+$74,,,-.0!"#$%&&'( "#)-3%)1G07.%#'( 000 '<;-9( '8F;-8I( '9D;-9J(

20 !"#$%&&'(!"#$!!"#$")*+'( %!"#$%&&'(!&'()&*#!+!,-#(!./!+,,!#)0#(&-*1#!(2+(!3+(42'# %!"#$")*+'(!&'()&*#!+*!-('&+(.&!./!3+(42'5!.06'4(#,,,-.-/-+*0123."&*'45$64(,,,-7-/-489-$+:33*+7;-88-.".*+7;-8<-&2+$74,,,-.0!"#$%&&'( "#)-3%)1G07.%#'( 000 '<;-9( '8F;-8I( '9D;-9J( Notice the "greedy" matching here.

21 !"#$%& +++"""""%'#<("$+D'&E%.3A"$+D'&E% ",4)":;;"./,4)/A"/:;;/3,#4""=>>"./,#4/A"/=>>/3!!!"$"*"%+4)8',9.43!!!"$+D'&E%.3 /,4)":;;/!!!"$+D'&E%.>3 /,4)":;;/!!!"$+D'&E%.F3 /,4)/!!!"$+D'&E%.G3 /:;;/!!!"$+D'&E%4.GAF3./:;;/A"/,4)/3

22 !"#$%& +++"""""%'#<("$+D'&E%.3A"$+D'&E% ",4)":;;"./,4)/A"/:;;/3,#4""=>>"./,#4/A"/=>>/3 Parentheses define groups. Group n starts at the nth open parenthesis.!!!"$"*"%+4)8',9.43!!!"$+d'&e%.3 /,4)":;;/!!!"$+D'&E%.>3 /,4)":;;/!!!"$+D'&E%.F3 /,4)/!!!"$+D'&E%.G3 /:;;/!!!"$+D'&E%4.GAF3./:;;/A"/,4)/3

23 !"#$%"&%'#"()* + #$,$##-&.%/"%)#$0-"(*%.#"()*%12%063%0B3%444!!!"#"$"%&'()*+,-&.%/ /4!!!"5"$"789,5",5"89&""89&"():%5&7!!!"#';,<=>--.54?/89&/@!!!"#'5&>%(9.54'A%):+.4 /89&""89&/ /2)"%5$/$6/$#%78 #%9$:&*%;#:<=!!!"+"$"%&'()*+,-&./.>.14(4=/4!!!"*"$"+'*>8(9./>1(=/4!!!"*'A%):+.C4 />1(=/!!!"*'A%):+.64 />1(/!!!"*'A%):+.B4 /1/!"

24 !"#$%"&%'#"()* + #$,$##-&.%/"%)#$0-"(*%.#"()*%12%063%0B3%444!!!"#"$"%&'()*+,-&.%/ /4!!!"5"$"789,5",5"89&""89&"():%5&7!!!"#';,<=>--.54?/89&/@!!!"#'5&>%(9.54'A%):+.4 /89&""89&/ These refer to the text that was matched, not the pattern. /2)"%5$/$6/$#%78 #%9$:&*%;#:<=!!!"+"$"%&'()*+,-&./.>.14(4=/4!!!"*"$"+'*>8(9./>1(=/4!!!"*'A%):+.C4 />1(=/!!!"*'A%):+.64 />1(/!!!"*'A%):+.B4 /1/!"

25 Non-capturing group (?:regex) Exactly like a normal group (regex), except that it doesn't count for purposes of counting or returning groups from matches >>> p = re.compile(r'.*[.](.*)([12])') >>> p.match('test.backup1').groups() ('backup', '1') >>> p = re.compile(r'.*[.](?:.*)([12])') >>> p.match('test.backup1').groups() ('1',) >>> p = re.compile(r'.*[.](.*)(?:[12])') >>> p.match('test.backup1').groups() ('backup',) 22

26 Named groups (?P<name>regex) Lets you refer to this group by name in addition to by number Analog of '\number' is '(?P=name)'. >>> p = re.compile(r'(?p<word>\b\w+\b)') >>> m = p.search( '(((( Lots of punctuation )))' ) >>> m.group('word') 'Lots' >>> m.group(1) 'Lots' 23

27 Other qualifiers {n,m} says to match between n and m copies {0,} is the same as * {1,} is the same as + {0,1} is the same as? Missing lower limit treated as 0 Missing upper limit treated as "infinity" Append '?' to a qualifier (*, +,?) to make it non-greedy This means: go for the shortest match, not longest 24

28 !"#$%&''()*+,-#./!'& 0 ('1-,2.*3-.45/#%*/6*%&''() 0,6'*#"#$%&''()*7,-#./!'&6*! """#$#%#&'()*+"'(,-."')/)+,"0/)+,'1)/)+,"& """#23/4)#3,5*-)6(7&'58"&9#$:5;3<=27: '()*+"'(,-."')/)+,"0/)+,'1)/)+," """#23/4)#3,5*-)6(7&'58!"&9#$:5;3<=27: '()*+" """#2#%#3,56<*2/+,7&'-#(3,>%758:"&: """#25*-)6(7?'-#(3,>%@?/4.,A5()*+@?"B-6C'1-"?:5;3<=27D: &?/4.,A5()*+?"B-6C'1-"& """#2#%#3,56<*2/+,7&'-#(3,>%758!:"&: """#25*-)6(7?'-#(3,>%@?/4.,A5()*+@?"B-6C'1-"?:5;3<=27D: &?/4.,A5()*+?&

29 Look-ahead assertions (?=regex) Looks for regex at the current spot. Does not consume characters, so the rest of the pattern starts at the same spot regex did. (?!regex) Like the above, except checks to see that regex does not match at the current spot. 26

30 Look-ahead assertions: Example >>> p = re.compile(r'.*[.](?!bat$ exe$)(.*)$') >>> p.match('sendmail.cf').groups() ('cf',) >>> p.match('sendmail.cf').group() 'sendmail.cf' >>> p.match('sendmail.cf.bak').groups() ('bak',) >>> p.match('sendmail.cf.bak').group() 'sendmail.cf.bak' >>> p.match('sendmail.exe.cf').group() 'sendmail.exe.cf' >>> print p.match('sendmail.exe') None 27

31 re.verbose pat = re.compile(r""" \s* # Skip leading whitespace (?P<header>[^:]+) # Header name \s* : # Whitespace, and a colon (?P<value>.*?) # The header's value -- *? used to # lose the following trailing whitespace \s*$ # Trailing whitespace to end-of-line """, re.verbose) Whitespace outside a character class is ignored Can embed Python-style comments Makes long expressions much more readable 28

32 Resources Used as the basis for this lecture The documentation for the re module The documentation for the shlex module Not exactly related to regular expresions Splits strings based on shell-like syntax 29

33 Two more lectures to go Networking Review of Python Something mind-breaking? 30

Regular Expressions. Steve Renals (based on original notes by Ewan Klein) ICL 12 October Outline Overview of REs REs in Python

Regular Expressions. Steve Renals (based on original notes by Ewan Klein) ICL 12 October Outline Overview of REs REs in Python Regular Expressions Steve Renals s.renals@ed.ac.uk (based on original notes by Ewan Klein) ICL 12 October 2005 Introduction Formal Background to REs Extensions of Basic REs Overview Goals: a basic idea

More information

LING115 Lecture Note Session #7: Regular Expressions

LING115 Lecture Note Session #7: Regular Expressions LING115 Lecture Note Session #7: Regular Expressions 1. Introduction We need to refer to a set of strings for various reasons: to ignore case-distinction, to refer to a set of files that share a common

More information

Regular Expressions. Regular Expression Syntax in Python. Achtung!

Regular Expressions. Regular Expression Syntax in Python. Achtung! 1 Regular Expressions Lab Objective: Cleaning and formatting data are fundamental problems in data science. Regular expressions are an important tool for working with text carefully and eciently, and are

More information

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions Last lecture CMSC330 Finite Automata Languages Sets of strings Operations on languages Regular expressions Constants Operators Precedence 1 2 Finite automata States Transitions Examples Types This lecture

More information

Regular Expression HOWTO

Regular Expression HOWTO Regular Expression HOWTO Release 2.6.4 Guido van Rossum Fred L. Drake, Jr., editor January 04, 2010 Python Software Foundation Email: docs@python.org Contents 1 Introduction ii 2 Simple Patterns ii 2.1

More information

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens Concepts Introduced in Chapter 3 Lexical Analysis Regular Expressions (REs) Nondeterministic Finite Automata (NFA) Converting an RE to an NFA Deterministic Finite Automatic (DFA) Lexical Analysis Why separate

More information

a b c d a b c d e 5 e 7

a b c d a b c d e 5 e 7 COMPSCI 230 Homework 9 Due on April 5, 2016 Work on this assignment either alone or in pairs. You may work with different partners on different assignments, but you can only have up to one partner for

More information

https://lambda.mines.edu You should have researched one of these topics on the LGA: Reference Couting Smart Pointers Valgrind Explain to your group! Regular expression languages describe a search pattern

More information

Regular Expressions Explained

Regular Expressions Explained Found at: http://publish.ez.no/article/articleprint/11/ Regular Expressions Explained Author: Jan Borsodi Publishing date: 30.10.2000 18:02 This article will give you an introduction to the world of regular

More information

Regular Expressions 1 / 12

Regular Expressions 1 / 12 Regular Expressions 1 / 12 https://xkcd.com/208/ 2 / 12 Regular Expressions In computer science, a language is a set of strings. Like any set, a language can be specified by enumeration (listing all the

More information

RegExpr:Review & Wrapup; Lecture 13b Larry Ruzzo

RegExpr:Review & Wrapup; Lecture 13b Larry Ruzzo RegExpr:Review & Wrapup; Lecture 13b Larry Ruzzo Outline More regular expressions & pattern matching: groups substitute greed RegExpr Syntax They re strings Most punctuation is special; needs to be escaped

More information

Formal Languages and Compilers Lecture VI: Lexical Analysis

Formal Languages and Compilers Lecture VI: Lexical Analysis Formal Languages and Compilers Lecture VI: Lexical Analysis Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/ artale/ Formal

More information

UNIT -2 LEXICAL ANALYSIS

UNIT -2 LEXICAL ANALYSIS OVER VIEW OF LEXICAL ANALYSIS UNIT -2 LEXICAL ANALYSIS o To identify the tokens we need some method of describing the possible tokens that can appear in the input stream. For this purpose we introduce

More information

Lecture 2 Finite Automata

Lecture 2 Finite Automata Lecture 2 Finite Automata August 31, 2007 This lecture is intended as a kind of road map to Chapter 1 of the text just the informal examples that I ll present to motivate the ideas. 1 Expressions without

More information

Lecture 18 Regular Expressions

Lecture 18 Regular Expressions Lecture 18 Regular Expressions In this lecture Background Text processing languages Pattern searches with grep Formal Languages and regular expressions Finite State Machines Regular Expression Grammer

More information

LECTURE 8. The Standard Library Part 2: re, copy, and itertools

LECTURE 8. The Standard Library Part 2: re, copy, and itertools LECTURE 8 The Standard Library Part 2: re, copy, and itertools THE STANDARD LIBRARY: RE The Python standard library contains extensive support for regular expressions. Regular expressions, often abbreviated

More information

Lexical Analysis. Lecture 3-4

Lexical Analysis. Lecture 3-4 Lexical Analysis Lecture 3-4 Notes by G. Necula, with additions by P. Hilfinger Prof. Hilfinger CS 164 Lecture 3-4 1 Administrivia I suggest you start looking at Python (see link on class home page). Please

More information

CS 301. Lecture 05 Applications of Regular Languages. Stephen Checkoway. January 31, 2018

CS 301. Lecture 05 Applications of Regular Languages. Stephen Checkoway. January 31, 2018 CS 301 Lecture 05 Applications of Regular Languages Stephen Checkoway January 31, 2018 1 / 17 Characterizing regular languages The following four statements about the language A are equivalent The language

More information

Figure 2.1: Role of Lexical Analyzer

Figure 2.1: Role of Lexical Analyzer Chapter 2 Lexical Analysis Lexical analysis or scanning is the process which reads the stream of characters making up the source program from left-to-right and groups them into tokens. The lexical analyzer

More information

Implementation of Lexical Analysis. Lecture 4

Implementation of Lexical Analysis. Lecture 4 Implementation of Lexical Analysis Lecture 4 1 Tips on Building Large Systems KISS (Keep It Simple, Stupid!) Don t optimize prematurely Design systems that can be tested It is easier to modify a working

More information

Pieter van den Hombergh. April 13, 2018

Pieter van den Hombergh. April 13, 2018 Intro ergh Fontys Hogeschool voor Techniek en Logistiek April 13, 2018 ergh/fhtenl April 13, 2018 1/11 Regex? are a very power, but also complex tool. There is the saying that: Intro If you start with

More information

Regular Expressions. Computer Science and Engineering College of Engineering The Ohio State University. Lecture 9

Regular Expressions. Computer Science and Engineering College of Engineering The Ohio State University. Lecture 9 Regular Expressions Computer Science and Engineering College of Engineering The Ohio State University Lecture 9 Language Definition: a set of strings Examples Activity: For each above, find (the cardinality

More information

Lexical Analysis. Lecture 2-4

Lexical Analysis. Lecture 2-4 Lexical Analysis Lecture 2-4 Notes by G. Necula, with additions by P. Hilfinger Prof. Hilfinger CS 164 Lecture 2 1 Administrivia Moving to 60 Evans on Wednesday HW1 available Pyth manual available on line.

More information

Structure of Programming Languages Lecture 3

Structure of Programming Languages Lecture 3 Structure of Programming Languages Lecture 3 CSCI 6636 4536 Spring 2017 CSCI 6636 4536 Lecture 3... 1/25 Spring 2017 1 / 25 Outline 1 Finite Languages Deterministic Finite State Machines Lexical Analysis

More information

LECTURE 6 Scanning Part 2

LECTURE 6 Scanning Part 2 LECTURE 6 Scanning Part 2 FROM DFA TO SCANNER In the previous lectures, we discussed how one might specify valid tokens in a language using regular expressions. We then discussed how we can create a recognizer

More information

Regexs with DFA and Parse Trees. CS230 Tutorial 11

Regexs with DFA and Parse Trees. CS230 Tutorial 11 Regexs with DFA and Parse Trees CS230 Tutorial 11 Regular Expressions (Regex) This way of representing regular languages using metacharacters. Here are some of the most important ones to know: -- OR example:

More information

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions CSE45 Translation of Programming Languages Lecture 2: Automata and Regular Expressions Finite Automata Regular Expression = Specification Finite Automata = Implementation A finite automaton consists of:

More information

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1 CSE P 501 Compilers LR Parsing Hal Perkins Spring 2018 UW CSE P 501 Spring 2018 D-1 Agenda LR Parsing Table-driven Parsers Parser States Shift-Reduce and Reduce-Reduce conflicts UW CSE P 501 Spring 2018

More information

Introduction to regular expressions

Introduction to regular expressions Introduction to regular expressions Table of Contents Introduction to regular expressions Here's how we do it Iteration 1: skill level > Wollowitz Iteration 2: skill level > Rakesh Introduction to regular

More information

Algorithmic Approaches for Biological Data, Lecture #8

Algorithmic Approaches for Biological Data, Lecture #8 Algorithmic Approaches for Biological Data, Lecture #8 Katherine St. John City University of New York American Museum of Natural History 17 February 2016 Outline More on Pattern Finding: Regular Expressions

More information

Here's an example of how the method works on the string "My text" with a start value of 3 and a length value of 2:

Here's an example of how the method works on the string My text with a start value of 3 and a length value of 2: CS 1251 Page 1 Friday Friday, October 31, 2014 10:36 AM Finding patterns in text A smaller string inside of a larger one is called a substring. You have already learned how to make substrings in the spreadsheet

More information

N-grams in Python. L445/L515 Autumn 2010

N-grams in Python. L445/L515 Autumn 2010 N-grams in Python L445/L515 Autumn 2010 Calculating n-grams We want to take a practical task, i.e., using n-grams for natural language processing, and see how we can start implementing it in Python. Some

More information

CSE 105 THEORY OF COMPUTATION

CSE 105 THEORY OF COMPUTATION CSE 105 THEORY OF COMPUTATION Spring 2017 http://cseweb.ucsd.edu/classes/sp17/cse105-ab/ Today's learning goals Sipser Ch 1.2, 1.3 Decide whether or not a string is described by a given regular expression

More information

Python I. Some material adapted from Upenn cmpe391 slides and other sources

Python I. Some material adapted from Upenn cmpe391 slides and other sources Python I Some material adapted from Upenn cmpe391 slides and other sources Overview Names & Assignment Data types Sequences types: Lists, Tuples, and Strings Mutability Understanding Reference Semantics

More information

Introduction; Parsing LL Grammars

Introduction; Parsing LL Grammars Introduction; Parsing LL Grammars CS 440: Programming Languages and Translators Due Fri Feb 2, 11:59 pm 1/29 pp.1, 2; 2/7 all updates incorporated, solved Instructions You can work together in groups of

More information

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 5

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 5 CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 5 CS 536 Spring 2015 1 Multi Character Lookahead We may allow finite automata to look beyond the next input character.

More information

Regular Expression HOWTO Release 3.6.0

Regular Expression HOWTO Release 3.6.0 Regular Expression HOWTO Release 3.6.0 Guido van Rossum and the Python development team March 05, 2017 Python Software Foundation Email: docs@python.org Contents 1 Introduction 2 2 Simple Patterns 2 2.1

More information

Programming with C++ as a Second Language

Programming with C++ as a Second Language Programming with C++ as a Second Language Week 2 Overview of C++ CSE/ICS 45C Patricia Lee, PhD Chapter 1 C++ Basics Copyright 2016 Pearson, Inc. All rights reserved. Learning Objectives Introduction to

More information

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications Agenda for Today Regular Expressions CSE 413, Autumn 2005 Programming Languages Basic concepts of formal grammars Regular expressions Lexical specification of programming languages Using finite automata

More information

(Refer Slide Time: 0:19)

(Refer Slide Time: 0:19) Theory of Computation. Professor somenath Biswas. Department of Computer Science & Engineering. Indian Institute of Technology, Kanpur. Lecture-15. Decision Problems for Regular Languages. (Refer Slide

More information

Regular Expressions. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Regular Expressions. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein Regular Expressions Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein A quick review: The super Date class class Date: def init (self, day, month): self.day = day self.month

More information

Lexical Analysis. Chapter 2

Lexical Analysis. Chapter 2 Lexical Analysis Chapter 2 1 Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexers Regular expressions Examples

More information

Lexical Analysis. Lecture 3. January 10, 2018

Lexical Analysis. Lecture 3. January 10, 2018 Lexical Analysis Lecture 3 January 10, 2018 Announcements PA1c due tonight at 11:50pm! Don t forget about PA1, the Cool implementation! Use Monday s lecture, the video guides and Cool examples if you re

More information

CS/ECE 374 Fall Homework 1. Due Tuesday, September 6, 2016 at 8pm

CS/ECE 374 Fall Homework 1. Due Tuesday, September 6, 2016 at 8pm CSECE 374 Fall 2016 Homework 1 Due Tuesday, September 6, 2016 at 8pm Starting with this homework, groups of up to three people can submit joint solutions. Each problem should be submitted by exactly one

More information

CSc 453 Compilers and Systems Software

CSc 453 Compilers and Systems Software CSc 453 Compilers and Systems Software 3 : Lexical Analysis I Christian Collberg Department of Computer Science University of Arizona collberg@gmail.com Copyright c 2009 Christian Collberg August 23, 2009

More information

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1 CSE 401 Compilers LR Parsing Hal Perkins Autumn 2011 10/10/2011 2002-11 Hal Perkins & UW CSE D-1 Agenda LR Parsing Table-driven Parsers Parser States Shift-Reduce and Reduce-Reduce conflicts 10/10/2011

More information

Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

Regular Expressions. Todd Kelley CST8207 Todd Kelley 1 Regular Expressions Todd Kelley kelleyt@algonquincollege.com CST8207 Todd Kelley 1 POSIX character classes Some Regular Expression gotchas Regular Expression Resources Assignment 3 on Regular Expressions

More information

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis CS 1622 Lecture 2 Lexical Analysis CS 1622 Lecture 2 1 Lecture 2 Review of last lecture and finish up overview The first compiler phase: lexical analysis Reading: Chapter 2 in text (by 1/18) CS 1622 Lecture

More information

CSE 105 THEORY OF COMPUTATION

CSE 105 THEORY OF COMPUTATION CSE 105 THEORY OF COMPUTATION Spring 2017 http://cseweb.ucsd.edu/classes/sp17/cse105-ab/ Today's learning goals Sipser Ch 1.2, 1.3 Design NFA recognizing a given language Convert an NFA (with or without

More information

Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore (Refer Slide Time: 00:20) Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore Lecture - 4 Lexical Analysis-Part-3 Welcome

More information

MP 3 A Lexer for MiniJava

MP 3 A Lexer for MiniJava MP 3 A Lexer for MiniJava CS 421 Spring 2012 Revision 1.0 Assigned Wednesday, February 1, 2012 Due Tuesday, February 7, at 09:30 Extension 48 hours (penalty 20% of total points possible) Total points 43

More information

Regular Expressions. Perl PCRE POSIX.NET Python Java

Regular Expressions. Perl PCRE POSIX.NET Python Java ModSecurity rules rely heavily on regular expressions to allow you to specify when a rule should or shouldn't match. This appendix teaches you the basics of regular expressions so that you can better make

More information

Lexical Analysis 1 / 52

Lexical Analysis 1 / 52 Lexical Analysis 1 / 52 Outline 1 Scanning Tokens 2 Regular Expresssions 3 Finite State Automata 4 Non-deterministic (NFA) Versus Deterministic Finite State Automata (DFA) 5 Regular Expresssions to NFA

More information

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing CS 432 Fall 2017 Mike Lam, Professor Finite Automata Conversions and Lexing Finite Automata Key result: all of the following have the same expressive power (i.e., they all describe regular languages):

More information

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012 Scanners Xiaokang Qiu Purdue University ECE 468 Adapted from Kulkarni 2012 August 24, 2016 Scanners Sometimes called lexers Recall: scanners break input stream up into a set of tokens Identifiers, reserved

More information

The Three Rules. Program. What is a Computer Program? 5/30/2018. Interpreted. Your First Program QuickStart 1. Chapter 1

The Three Rules. Program. What is a Computer Program? 5/30/2018. Interpreted. Your First Program QuickStart 1. Chapter 1 The Three Rules Chapter 1 Beginnings Rule 1: Think before you program Rule 2: A program is a human-readable essay on problem solving that also executes on a computer Rule 3: The best way to improve your

More information

Zhizheng Zhang. Southeast University

Zhizheng Zhang. Southeast University Zhizheng Zhang Southeast University 2016/10/5 Lexical Analysis 1 1. The Role of Lexical Analyzer 2016/10/5 Lexical Analysis 2 2016/10/5 Lexical Analysis 3 Example. position = initial + rate * 60 2016/10/5

More information

Lexical Error Recovery

Lexical Error Recovery Lexical Error Recovery A character sequence that can t be scanned into any valid token is a lexical error. Lexical errors are uncommon, but they still must be handled by a scanner. We won t stop compilation

More information

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! [ALSU03] Chapter 3 - Lexical Analysis Sections 3.1-3.4, 3.6-3.7! Reading for next time [ALSU03] Chapter 3 Copyright (c) 2010 Ioanna

More information

8 Matroid Intersection

8 Matroid Intersection 8 Matroid Intersection 8.1 Definition and examples 8.2 Matroid Intersection Algorithm 8.1 Definitions Given two matroids M 1 = (X, I 1 ) and M 2 = (X, I 2 ) on the same set X, their intersection is M 1

More information

Alternation. Kleene Closure. Definition of Regular Expressions

Alternation. Kleene Closure. Definition of Regular Expressions Alternation Small finite sets are conveniently represented by listing their elements. Parentheses delimit expressions, and, the alternation operator, separates alternatives. For example, D, the set of

More information

Implementation of Lexical Analysis

Implementation of Lexical Analysis Implementation of Lexical Analysis Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs) Implementation

More information

Regular Languages and Regular Expressions

Regular Languages and Regular Expressions Regular Languages and Regular Expressions According to our definition, a language is regular if there exists a finite state automaton that accepts it. Therefore every regular language can be described

More information

Lexical Error Recovery

Lexical Error Recovery Lexical Error Recovery A character sequence that can t be scanned into any valid token is a lexical error. Lexical errors are uncommon, but they still must be handled by a scanner. We won t stop compilation

More information

=~ determines to which variable the regex is applied. In its absence, $_ is used.

=~ determines to which variable the regex is applied. In its absence, $_ is used. NAME DESCRIPTION OPERATORS perlreref - Perl Regular Expressions Reference This is a quick reference to Perl's regular expressions. For full information see perlre and perlop, as well as the SEE ALSO section

More information

Regular Expressions!!

Regular Expressions!! Regular Expressions!! In your mat219_class project 1. Copy code from D2L to download regex-prac9ce.r, and run in the Console. 2. Open a blank R script and name it regex-notes. library(tidyverse) regular

More information

Chapter 2, Part I Introduction to C Programming

Chapter 2, Part I Introduction to C Programming Chapter 2, Part I Introduction to C Programming C How to Program, 8/e, GE 2016 Pearson Education, Ltd. All rights reserved. 1 2016 Pearson Education, Ltd. All rights reserved. 2 2016 Pearson Education,

More information

Regular expressions. LING78100: Methods in Computational Linguistics I

Regular expressions. LING78100: Methods in Computational Linguistics I Regular expressions LING78100: Methods in Computational Linguistics I String methods Python strings have methods that allow us to determine whether a string: Contains another string; e.g., assert "and"

More information

CS 1110, LAB 2: ASSIGNMENTS AND STRINGS

CS 1110, LAB 2: ASSIGNMENTS AND STRINGS CS 1110, LAB 2: ASSIGNMENTS AND STRINGS http://www.cs.cornell.edu/courses/cs1110/2014fa/labs/lab02.pdf First Name: Last Name: NetID: The purpose of this lab is to get you comfortable with using assignment

More information

Lexical Analysis. Implementation: Finite Automata

Lexical Analysis. Implementation: Finite Automata Lexical Analysis Implementation: Finite Automata Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs)

More information

Effective Programming Practices for Economists. 17. Regular Expressions

Effective Programming Practices for Economists. 17. Regular Expressions Effective Programming Practices for Economists 17. Regular Expressions Hans-Martin von Gaudecker Department of Economics, Universität Bonn Motivation Replace all occurences of my name in the project template

More information

Administrivia. CMSC 216 Introduction to Computer Systems Lecture 24 Data Representation and Libraries. Representing characters DATA REPRESENTATION

Administrivia. CMSC 216 Introduction to Computer Systems Lecture 24 Data Representation and Libraries. Representing characters DATA REPRESENTATION Administrivia CMSC 216 Introduction to Computer Systems Lecture 24 Data Representation and Libraries Jan Plane & Alan Sussman {jplane, als}@cs.umd.edu Project 6 due next Friday, 12/10 public tests posted

More information

CMSC 350: COMPILER DESIGN

CMSC 350: COMPILER DESIGN Lecture 11 CMSC 350: COMPILER DESIGN see HW3 LLVMLITE SPECIFICATION Eisenberg CMSC 350: Compilers 2 Discussion: Defining a Language Premise: programming languages are purely formal objects We (as language

More information

CSC 467 Lecture 3: Regular Expressions

CSC 467 Lecture 3: Regular Expressions CSC 467 Lecture 3: Regular Expressions Recall How we build a lexer by hand o Use fgetc/mmap to read input o Use a big switch to match patterns Homework exercise static TokenKind identifier( TokenKind token

More information

lec3:nondeterministic finite state automata

lec3:nondeterministic finite state automata lec3:nondeterministic finite state automata 1 1.introduction Nondeterminism is a useful concept that has great impact on the theory of computation. When the machine is in a given state and reads the next

More information

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars CMSC 330: Organization of Programming Languages Context Free Grammars Where We Are Programming languages Ruby OCaml Implementing programming languages Scanner Uses regular expressions Finite automata Parser

More information

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata Outline 1 2 Regular Expresssions Lexical Analysis 3 Finite State Automata 4 Non-deterministic (NFA) Versus Deterministic Finite State Automata (DFA) 5 Regular Expresssions to NFA 6 NFA to DFA 7 8 JavaCC:

More information

More Scripting and Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

More Scripting and Regular Expressions. Todd Kelley CST8207 Todd Kelley 1 More Scripting and Regular Expressions Todd Kelley kelleyt@algonquincollege.com CST8207 Todd Kelley 1 Regular Expression Summary Regular Expression Examples Shell Scripting 2 Do not confuse filename globbing

More information

Languages and Compilers

Languages and Compilers Principles of Software Engineering and Operational Systems Languages and Compilers SDAGE: Level I 2012-13 3. Formal Languages, Grammars and Automata Dr Valery Adzhiev vadzhiev@bournemouth.ac.uk Office:

More information

Theory of Computation Dr. Weiss Extra Practice Exam Solutions

Theory of Computation Dr. Weiss Extra Practice Exam Solutions Name: of 7 Theory of Computation Dr. Weiss Extra Practice Exam Solutions Directions: Answer the questions as well as you can. Partial credit will be given, so show your work where appropriate. Try to be

More information

Parsing CSCI-400. Principles of Programming Languages.

Parsing CSCI-400. Principles of Programming Languages. Parsing Principles of Programming Languages https://lambda.mines.edu Activity & Overview Review the learning group activity with your group. Compare your solutions to the practice problems. Did anyone

More information

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata Lexical Analysis Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata Phase Ordering of Front-Ends Lexical analysis (lexer) Break input string

More information

Compiler phases. Non-tokens

Compiler phases. Non-tokens Compiler phases Compiler Construction Scanning Lexical Analysis source code scanner tokens regular expressions lexical analysis Lennart Andersson parser context free grammar Revision 2011 01 21 parse tree

More information

CS2 Practical 2 CS2Ah

CS2 Practical 2 CS2Ah CS2 Practical 2 Finite automata This practical is based on material in the language processing thread. The practical is made up of two parts. Part A consists of four paper and pencil exercises, designed

More information

CSE 413 Final Exam. June 7, 2011

CSE 413 Final Exam. June 7, 2011 CSE 413 Final Exam June 7, 2011 Name The exam is closed book, except that you may have a single page of hand-written notes for reference plus the page of notes you had for the midterm (although you are

More information

Monday, August 26, 13. Scanners

Monday, August 26, 13. Scanners Scanners Scanners Sometimes called lexers Recall: scanners break input stream up into a set of tokens Identifiers, reserved words, literals, etc. What do we need to know? How do we define tokens? How can

More information

Week - 04 Lecture - 01 Merge Sort. (Refer Slide Time: 00:02)

Week - 04 Lecture - 01 Merge Sort. (Refer Slide Time: 00:02) Programming, Data Structures and Algorithms in Python Prof. Madhavan Mukund Department of Computer Science and Engineering Indian Institute of Technology, Madras Week - 04 Lecture - 01 Merge Sort (Refer

More information

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan Compilers Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan Lexical Analyzer (Scanner) 1. Uses Regular Expressions to define tokens 2. Uses Finite Automata to recognize tokens

More information

Wednesday, September 3, 14. Scanners

Wednesday, September 3, 14. Scanners Scanners Scanners Sometimes called lexers Recall: scanners break input stream up into a set of tokens Identifiers, reserved words, literals, etc. What do we need to know? How do we define tokens? How can

More information

Optimizing Finite Automata

Optimizing Finite Automata Optimizing Finite Automata We can improve the DFA created by MakeDeterministic. Sometimes a DFA will have more states than necessary. For every DFA there is a unique smallest equivalent DFA (fewest states

More information

Behaviour Diagrams UML

Behaviour Diagrams UML Behaviour Diagrams UML Behaviour Diagrams Structure Diagrams are used to describe the static composition of components (i.e., constraints on what intstances may exist at run-time). Interaction Diagrams

More information

Outline CS4120/4121. Compilation in a Nutshell 1. Administration. Introduction to Compilers Andrew Myers. HW1 out later today due next Monday.

Outline CS4120/4121. Compilation in a Nutshell 1. Administration. Introduction to Compilers Andrew Myers. HW1 out later today due next Monday. CS4120/4121 Introduction to Compilers Andrew Myers Lecture 2: Lexical Analysis 31 August 2009 Outline Administration Compilation in a nutshell (or two) What is lexical analysis? Writing a lexer Specifying

More information

Lecture 2. Regular Expression Parsing Awk

Lecture 2. Regular Expression Parsing Awk Lecture 2 Regular Expression Parsing Awk Shell Quoting Shell Globing: file* and file? ls file\* (the backslash key escapes wildcards) Shell Special Characters ~ Home directory ` backtick (command substitution)

More information

CIS192 Python Programming

CIS192 Python Programming CIS192 Python Programming Regular Expressions and maybe OS Robert Rand University of Pennsylvania October 1, 2015 Robert Rand (University of Pennsylvania) CIS 192 October 1, 2015 1 / 16 Outline 1 Regular

More information

Automating Construction of Lexers

Automating Construction of Lexers Automating Construction of Lexers Regular Expression to Programs Not all regular expressions are simple. How can we write a lexer for (a*b aaa)? Tokenizing aaaab Vs aaaaaa Regular Expression Finite state

More information

A language is a subset of the set of all strings over some alphabet. string: a sequence of symbols alphabet: a set of symbols

A language is a subset of the set of all strings over some alphabet. string: a sequence of symbols alphabet: a set of symbols The current topic:! Introduction! Object-oriented programming: Python! Functional programming: Scheme! Python GUI programming (Tkinter)! Types and values! Logic programming: Prolog! Introduction! Rules,

More information

Regexp. Lecture 26: Regular Expressions

Regexp. Lecture 26: Regular Expressions Regexp Lecture 26: Regular Expressions Regular expressions are a small programming language over strings Regex or regexp are not unique to Python They let us to succinctly and compactly represent classes

More information

Regular Expressions. Michael Wrzaczek Dept of Biosciences, Plant Biology Viikki Plant Science Centre (ViPS) University of Helsinki, Finland

Regular Expressions. Michael Wrzaczek Dept of Biosciences, Plant Biology Viikki Plant Science Centre (ViPS) University of Helsinki, Finland Regular Expressions Michael Wrzaczek Dept of Biosciences, Plant Biology Viikki Plant Science Centre (ViPS) University of Helsinki, Finland November 11 th, 2015 Regular expressions provide a flexible way

More information

Implementation of Lexical Analysis

Implementation of Lexical Analysis Implementation of Lexical Analysis Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs) Implementation

More information

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG) CS5371 Theory of Computation Lecture 8: Automata Theory VI (PDA, PDA = CFG) Objectives Introduce Pushdown Automaton (PDA) Show that PDA = CFG In terms of descriptive power Pushdown Automaton (PDA) Roughly

More information