CSE : Python Programming
|
|
- Bennett Ray
- 5 years ago
- Views:
Transcription
1 CSE : Python Programming Lecture 11: Regular expressions April 2,
2 Announcements About those meeting from last week If I said I was going to look into something or you some information, you should send me and remind me We'll have similar meetings near the end of classes Code and documentation for projects is due on the last day of classes (April 20) 2
3 Regular Expressions
4 Survey time (again) Who here knows about the following? Deterministic finite-state automata (DFAs) Non-deterministic finite-state automata (NFAs) Regular languages Recall (if you've taken CSE 260?) that the proof showing the equivalence of the above three gives an algorithm for translating a regular language into an automaton 4
5 Regular expressions (overview) A regular expression is a compact way of specifying a (potentially large) set of strings Example: Compilers and identifiers Source code: Object, LinkedList, my_int, pagelist Regular expression: [a-za-z][a-za-z0-9]* They are useful for finding a particular kind string within some larger string, but they're not great for everything 5
6 Warning: Code Mathematics Mathematics has an idea of what a regular expression is Programming languages also have ideas for this Shells have yet more ideas for this They are not exactly the same! Somewhat different notations and meanings Some languages provide features which have no correspondence to mathematics 6
7 Regular expression in Python The following characters have special meanings inside a regular expression. ^ $ * +? { } [ ] \ ( ) If you want to refer to literally refer to these characters, prefix them with a backslash For example: \. 7
8 Backslash mayhem and raw strings But, you have to give regular expressions as strings to Python, and backslashes have another meaning there For example: '\n' is a one-character string Suppose we want to match the backslash character: Regular expression we have to use: \\ As a Python string: '\\\\' Raw strings, e.g., r'foo' and r"foo", don't interpret backslash characters in any special way 8
9 !"#$%&'()""*#%'+)",-$%&. /%$0',122)%3'!"#$%&'&'1#'!"#()*%+$. (4"-1%'5*&67)#'809#*::$1%'$:'!"##$%$&'(),-*)$*).*)*. $;'#*9*)"$%&'<'1#'21#*'"$2*:. /0$1'2)",-*:'/1='/01='/001='>>>. /,021-$1'2)",-*:'/1='/0211='/ ='>>>..;'#*9*)"$%&'?'1#'21#*'"$2*:. /0.1'2)",-*:'/01='/001='>>>. /,021-.1'2)",-*:'/0211='/ ='>>>. *;'<'1#'?'"$2*:
10 !"#$#%&'$(!)#** +!"!,'#-*!./$01!#$$"%%&!,#&%"'*!$$!/$!%% + '$%()!,#&%"'*!$2!%2(/$!( + /$!*3,4)5!'$*() + '6738#)'-&(&/(#$"%"(&( + '$%()+!,#&%"'*!$2!%2!(2!$$2!$%2!$(2(%$2!%%2! %(2!($2!(%2!((2(999 + ',-)!,#&%"'*!#-5(%"#$!':%'4&!- + ',.*/)!,#&%"'*!#-5(%"#$!':%'4&!#!;3<3& + $'%(0)1!,#&%"'*(,#-5(,/$'(&"#-!$#%(0&1
11 !"#$%&'()&*)+,--./ 0 1/)23"#$%&'(4)5-)3-"')3"#$%&'()#%-)1-(&''&'() 67,#&7')78)")*#,&'( 0!"#$%&)3"#$%-*)#%-)9'.-,:&'-.)6",#)&')!#$#$' 0 (,--./)*-",$%)5&#%)1"$;#,"$;&'( 0!"#$'%(#)3"#$%-*)!#$'#<)!#$'#$'<)!#$' 0 #,/)3"#$%)#%-)6"##-,')!)#$'*(#)5&#%)*#,&'()!#$#' Step Matched Explanation 1 a The a in the RE matches. 2 abcbd The engine matches [bcd]*, going as far as it can, which is to the end of the string. 3 Failure The engine tries to match b, but the current position is at the end of the string, so it fails. 4 abcb Back up, so that [bcd]* matches one less character. 5 Failure Try b again, but the current position is at the last character, which is a d. 6 abc Back up again, so that [bcd]* is only matching bc. 6 abcb Try b again. This time but the character at the current position is b, so it succeeds.
12 !"#$%&'()$*$#+&*", #)$*$#+&*"'-.+)'"%&#.$/'0&$1.12",!"#"$"%"&"'"(")"*"+","-"."/,",.01,/"0$+#)&"".01/,"!"0$+#)&"'$13'".12/&'#)$*$#+&*,"!%"0$+#)&"'$13'"+*.12,",,"0$+#)&"",,"#"0$+#)&"'+)&'4& '56'$'/.1&'5*'"+*.12, '15+'+)&'#'.1".7&'#)$*8#/$""&"'*#999+,"$"0$+#)&"'+)&'&17'56'$'/.1&'5*'"+*.12, 0*123+%1$'75&"'15+'0$+#)'"+*.12'01213
13 Special character classes \d Matches any decimal digit; this is equivalent to the class [0-9]. \D Matches any non-digit character; this is equivalent to the class [ˆ0-9]. \s Matches any whitespace character; this is equivalent to the class [ \t\n\r\f\v]. \S Matches any non-whitespace character; this is equivalent to the class [ˆ \t\n\r\f\v]. \w Matches any alphanumeric character; this is equivalent to the class [a-za-z0-9 ]. \W Matches any non-alphanumeric character; this is equivalent to the class [ˆa-zA-Z0-9_]. 13
14 Overview of functions Method/Attribute match search split sub subn Purpose Determine if the RE matches at the beginning of the string. Scan through a string, looking for any location where this RE matches. Split the string into a list, splitting it wherever the RE matches Find all substrings where the RE matches, and replace them with a different string Does the same thing as sub(), except you can limit the number of replacements 14
15 !"#$%#&'()*+,-./"0 6 $+(,-.7*#"-5#(0*8&9)*'$*$,'2"37*%#*,*&,-./"3*%89".-!!!"#$%&'("')!!!"')*$+(,-./0+1234/5"667 8&9).%&1'2"3*4"#0'%(* '0*$,0-"#*$%#* #"1",-"3*50"!!!"%":"')*,&$%#;)./0+1234/7!!!"% <=>')*?@A=B+(()'9"&CD),("+("EF,G,HE!!!!"%*$+(,-.667!!!"%'#9("%*$+(, &9)!!!"$":"%*$+(,-."/()$%&/7!!!"%'#9("$ <=>')*?@A=I+(,-"&CD),("+("EF,JKLE!!"
16 Methods on match objects Method/Attribute group() start() end() span() Purpose Return the string matched by the RE Return the starting position of the match Return the ending position of the match Return a tuple containing the (start, end) of the match 16
17 !"#$%&'()&*+",$% -!"#$%&'&.+#+,/01+(&02&3"##+,1&/"#$%+(&"#&#%+& &62&"&(#,015 - ()"*$%&'&($"1(&#%,675%&#%+&(#,015&(++&02&"18& (74(#,015&/"#$%+( +++,-*./#,-0!"#$%&1222,!)(("3)1' 45/) +++,!,6,-0()"*$%&1222,!)(("3)1' +++,-*./#,! 7*)08"#$%9:;)$#,./(#"/$),"#,<=$>?@=+ +++,!03*5A-&' 1!)(("3)1 +++,!0(-"/&' &BC,DD'
18 Matching a "word boundary" >>> p = re.compile(r \bclass\b ) >>> print p.search( no class at all ) <re.matchobject instance at 80c8f28> >>> print p.search( the declassified algorithm ) None >>> print p.search( one subclass is ) None A word is defined as a sequence of alpha-numeric characters Whitespace and punction effectively denote the beginning and end of a word 18
19 !"#$%&&'(!"#$!!"#$")*+'( %!"#$%&&'(!&'()&*#!+!,-#(!./!+,,!#)0#(&-*1#!(2+(!3+(42'# %!"#$")*+'(!&'()&*#!+*!-('&+(.&!./!3+(42'5!.06'4(#,,,-.-/-+*0123."&*'45$64(,,,-7-/-489-$+:33*+7;-88-.".*+7;-8<-&2+$74,,,-.0!"#$%&&'( "#)-3%)1G07.%#'( 000 '<;-9( '8F;-8I( '9D;-9J(
20 !"#$%&&'(!"#$!!"#$")*+'( %!"#$%&&'(!&'()&*#!+!,-#(!./!+,,!#)0#(&-*1#!(2+(!3+(42'# %!"#$")*+'(!&'()&*#!+*!-('&+(.&!./!3+(42'5!.06'4(#,,,-.-/-+*0123."&*'45$64(,,,-7-/-489-$+:33*+7;-88-.".*+7;-8<-&2+$74,,,-.0!"#$%&&'( "#)-3%)1G07.%#'( 000 '<;-9( '8F;-8I( '9D;-9J( Notice the "greedy" matching here.
21 !"#$%& +++"""""%'#<("$+D'&E%.3A"$+D'&E% ",4)":;;"./,4)/A"/:;;/3,#4""=>>"./,#4/A"/=>>/3!!!"$"*"%+4)8',9.43!!!"$+D'&E%.3 /,4)":;;/!!!"$+D'&E%.>3 /,4)":;;/!!!"$+D'&E%.F3 /,4)/!!!"$+D'&E%.G3 /:;;/!!!"$+D'&E%4.GAF3./:;;/A"/,4)/3
22 !"#$%& +++"""""%'#<("$+D'&E%.3A"$+D'&E% ",4)":;;"./,4)/A"/:;;/3,#4""=>>"./,#4/A"/=>>/3 Parentheses define groups. Group n starts at the nth open parenthesis.!!!"$"*"%+4)8',9.43!!!"$+d'&e%.3 /,4)":;;/!!!"$+D'&E%.>3 /,4)":;;/!!!"$+D'&E%.F3 /,4)/!!!"$+D'&E%.G3 /:;;/!!!"$+D'&E%4.GAF3./:;;/A"/,4)/3
23 !"#$%"&%'#"()* + #$,$##-&.%/"%)#$0-"(*%.#"()*%12%063%0B3%444!!!"#"$"%&'()*+,-&.%/ /4!!!"5"$"789,5",5"89&""89&"():%5&7!!!"#';,<=>--.54?/89&/@!!!"#'5&>%(9.54'A%):+.4 /89&""89&/ /2)"%5$/$6/$#%78 #%9$:&*%;#:<=!!!"+"$"%&'()*+,-&./.>.14(4=/4!!!"*"$"+'*>8(9./>1(=/4!!!"*'A%):+.C4 />1(=/!!!"*'A%):+.64 />1(/!!!"*'A%):+.B4 /1/!"
24 !"#$%"&%'#"()* + #$,$##-&.%/"%)#$0-"(*%.#"()*%12%063%0B3%444!!!"#"$"%&'()*+,-&.%/ /4!!!"5"$"789,5",5"89&""89&"():%5&7!!!"#';,<=>--.54?/89&/@!!!"#'5&>%(9.54'A%):+.4 /89&""89&/ These refer to the text that was matched, not the pattern. /2)"%5$/$6/$#%78 #%9$:&*%;#:<=!!!"+"$"%&'()*+,-&./.>.14(4=/4!!!"*"$"+'*>8(9./>1(=/4!!!"*'A%):+.C4 />1(=/!!!"*'A%):+.64 />1(/!!!"*'A%):+.B4 /1/!"
25 Non-capturing group (?:regex) Exactly like a normal group (regex), except that it doesn't count for purposes of counting or returning groups from matches >>> p = re.compile(r'.*[.](.*)([12])') >>> p.match('test.backup1').groups() ('backup', '1') >>> p = re.compile(r'.*[.](?:.*)([12])') >>> p.match('test.backup1').groups() ('1',) >>> p = re.compile(r'.*[.](.*)(?:[12])') >>> p.match('test.backup1').groups() ('backup',) 22
26 Named groups (?P<name>regex) Lets you refer to this group by name in addition to by number Analog of '\number' is '(?P=name)'. >>> p = re.compile(r'(?p<word>\b\w+\b)') >>> m = p.search( '(((( Lots of punctuation )))' ) >>> m.group('word') 'Lots' >>> m.group(1) 'Lots' 23
27 Other qualifiers {n,m} says to match between n and m copies {0,} is the same as * {1,} is the same as + {0,1} is the same as? Missing lower limit treated as 0 Missing upper limit treated as "infinity" Append '?' to a qualifier (*, +,?) to make it non-greedy This means: go for the shortest match, not longest 24
28 !"#$%&''()*+,-#./!'& 0 ('1-,2.*3-.45/#%*/6*%&''() 0,6'*#"#$%&''()*7,-#./!'&6*! """#$#%#&'()*+"'(,-."')/)+,"0/)+,'1)/)+,"& """#23/4)#3,5*-)6(7&'58"&9#$:5;3<=27: '()*+"'(,-."')/)+,"0/)+,'1)/)+," """#23/4)#3,5*-)6(7&'58!"&9#$:5;3<=27: '()*+" """#2#%#3,56<*2/+,7&'-#(3,>%758:"&: """#25*-)6(7?'-#(3,>%@?/4.,A5()*+@?"B-6C'1-"?:5;3<=27D: &?/4.,A5()*+?"B-6C'1-"& """#2#%#3,56<*2/+,7&'-#(3,>%758!:"&: """#25*-)6(7?'-#(3,>%@?/4.,A5()*+@?"B-6C'1-"?:5;3<=27D: &?/4.,A5()*+?&
29 Look-ahead assertions (?=regex) Looks for regex at the current spot. Does not consume characters, so the rest of the pattern starts at the same spot regex did. (?!regex) Like the above, except checks to see that regex does not match at the current spot. 26
30 Look-ahead assertions: Example >>> p = re.compile(r'.*[.](?!bat$ exe$)(.*)$') >>> p.match('sendmail.cf').groups() ('cf',) >>> p.match('sendmail.cf').group() 'sendmail.cf' >>> p.match('sendmail.cf.bak').groups() ('bak',) >>> p.match('sendmail.cf.bak').group() 'sendmail.cf.bak' >>> p.match('sendmail.exe.cf').group() 'sendmail.exe.cf' >>> print p.match('sendmail.exe') None 27
31 re.verbose pat = re.compile(r""" \s* # Skip leading whitespace (?P<header>[^:]+) # Header name \s* : # Whitespace, and a colon (?P<value>.*?) # The header's value -- *? used to # lose the following trailing whitespace \s*$ # Trailing whitespace to end-of-line """, re.verbose) Whitespace outside a character class is ignored Can embed Python-style comments Makes long expressions much more readable 28
32 Resources Used as the basis for this lecture The documentation for the re module The documentation for the shlex module Not exactly related to regular expresions Splits strings based on shell-like syntax 29
33 Two more lectures to go Networking Review of Python Something mind-breaking? 30
Regular Expressions. Steve Renals (based on original notes by Ewan Klein) ICL 12 October Outline Overview of REs REs in Python
Regular Expressions Steve Renals s.renals@ed.ac.uk (based on original notes by Ewan Klein) ICL 12 October 2005 Introduction Formal Background to REs Extensions of Basic REs Overview Goals: a basic idea
More informationLING115 Lecture Note Session #7: Regular Expressions
LING115 Lecture Note Session #7: Regular Expressions 1. Introduction We need to refer to a set of strings for various reasons: to ignore case-distinction, to refer to a set of files that share a common
More informationRegular Expressions. Regular Expression Syntax in Python. Achtung!
1 Regular Expressions Lab Objective: Cleaning and formatting data are fundamental problems in data science. Regular expressions are an important tool for working with text carefully and eciently, and are
More informationLast lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions
Last lecture CMSC330 Finite Automata Languages Sets of strings Operations on languages Regular expressions Constants Operators Precedence 1 2 Finite automata States Transitions Examples Types This lecture
More informationRegular Expression HOWTO
Regular Expression HOWTO Release 2.6.4 Guido van Rossum Fred L. Drake, Jr., editor January 04, 2010 Python Software Foundation Email: docs@python.org Contents 1 Introduction ii 2 Simple Patterns ii 2.1
More informationConcepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens
Concepts Introduced in Chapter 3 Lexical Analysis Regular Expressions (REs) Nondeterministic Finite Automata (NFA) Converting an RE to an NFA Deterministic Finite Automatic (DFA) Lexical Analysis Why separate
More informationa b c d a b c d e 5 e 7
COMPSCI 230 Homework 9 Due on April 5, 2016 Work on this assignment either alone or in pairs. You may work with different partners on different assignments, but you can only have up to one partner for
More informationhttps://lambda.mines.edu You should have researched one of these topics on the LGA: Reference Couting Smart Pointers Valgrind Explain to your group! Regular expression languages describe a search pattern
More informationRegular Expressions Explained
Found at: http://publish.ez.no/article/articleprint/11/ Regular Expressions Explained Author: Jan Borsodi Publishing date: 30.10.2000 18:02 This article will give you an introduction to the world of regular
More informationRegular Expressions 1 / 12
Regular Expressions 1 / 12 https://xkcd.com/208/ 2 / 12 Regular Expressions In computer science, a language is a set of strings. Like any set, a language can be specified by enumeration (listing all the
More informationRegExpr:Review & Wrapup; Lecture 13b Larry Ruzzo
RegExpr:Review & Wrapup; Lecture 13b Larry Ruzzo Outline More regular expressions & pattern matching: groups substitute greed RegExpr Syntax They re strings Most punctuation is special; needs to be escaped
More informationFormal Languages and Compilers Lecture VI: Lexical Analysis
Formal Languages and Compilers Lecture VI: Lexical Analysis Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/ artale/ Formal
More informationUNIT -2 LEXICAL ANALYSIS
OVER VIEW OF LEXICAL ANALYSIS UNIT -2 LEXICAL ANALYSIS o To identify the tokens we need some method of describing the possible tokens that can appear in the input stream. For this purpose we introduce
More informationLecture 2 Finite Automata
Lecture 2 Finite Automata August 31, 2007 This lecture is intended as a kind of road map to Chapter 1 of the text just the informal examples that I ll present to motivate the ideas. 1 Expressions without
More informationLecture 18 Regular Expressions
Lecture 18 Regular Expressions In this lecture Background Text processing languages Pattern searches with grep Formal Languages and regular expressions Finite State Machines Regular Expression Grammer
More informationLECTURE 8. The Standard Library Part 2: re, copy, and itertools
LECTURE 8 The Standard Library Part 2: re, copy, and itertools THE STANDARD LIBRARY: RE The Python standard library contains extensive support for regular expressions. Regular expressions, often abbreviated
More informationLexical Analysis. Lecture 3-4
Lexical Analysis Lecture 3-4 Notes by G. Necula, with additions by P. Hilfinger Prof. Hilfinger CS 164 Lecture 3-4 1 Administrivia I suggest you start looking at Python (see link on class home page). Please
More informationCS 301. Lecture 05 Applications of Regular Languages. Stephen Checkoway. January 31, 2018
CS 301 Lecture 05 Applications of Regular Languages Stephen Checkoway January 31, 2018 1 / 17 Characterizing regular languages The following four statements about the language A are equivalent The language
More informationFigure 2.1: Role of Lexical Analyzer
Chapter 2 Lexical Analysis Lexical analysis or scanning is the process which reads the stream of characters making up the source program from left-to-right and groups them into tokens. The lexical analyzer
More informationImplementation of Lexical Analysis. Lecture 4
Implementation of Lexical Analysis Lecture 4 1 Tips on Building Large Systems KISS (Keep It Simple, Stupid!) Don t optimize prematurely Design systems that can be tested It is easier to modify a working
More informationPieter van den Hombergh. April 13, 2018
Intro ergh Fontys Hogeschool voor Techniek en Logistiek April 13, 2018 ergh/fhtenl April 13, 2018 1/11 Regex? are a very power, but also complex tool. There is the saying that: Intro If you start with
More informationRegular Expressions. Computer Science and Engineering College of Engineering The Ohio State University. Lecture 9
Regular Expressions Computer Science and Engineering College of Engineering The Ohio State University Lecture 9 Language Definition: a set of strings Examples Activity: For each above, find (the cardinality
More informationLexical Analysis. Lecture 2-4
Lexical Analysis Lecture 2-4 Notes by G. Necula, with additions by P. Hilfinger Prof. Hilfinger CS 164 Lecture 2 1 Administrivia Moving to 60 Evans on Wednesday HW1 available Pyth manual available on line.
More informationStructure of Programming Languages Lecture 3
Structure of Programming Languages Lecture 3 CSCI 6636 4536 Spring 2017 CSCI 6636 4536 Lecture 3... 1/25 Spring 2017 1 / 25 Outline 1 Finite Languages Deterministic Finite State Machines Lexical Analysis
More informationLECTURE 6 Scanning Part 2
LECTURE 6 Scanning Part 2 FROM DFA TO SCANNER In the previous lectures, we discussed how one might specify valid tokens in a language using regular expressions. We then discussed how we can create a recognizer
More informationRegexs with DFA and Parse Trees. CS230 Tutorial 11
Regexs with DFA and Parse Trees CS230 Tutorial 11 Regular Expressions (Regex) This way of representing regular languages using metacharacters. Here are some of the most important ones to know: -- OR example:
More informationCSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions
CSE45 Translation of Programming Languages Lecture 2: Automata and Regular Expressions Finite Automata Regular Expression = Specification Finite Automata = Implementation A finite automaton consists of:
More informationCSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1
CSE P 501 Compilers LR Parsing Hal Perkins Spring 2018 UW CSE P 501 Spring 2018 D-1 Agenda LR Parsing Table-driven Parsers Parser States Shift-Reduce and Reduce-Reduce conflicts UW CSE P 501 Spring 2018
More informationIntroduction to regular expressions
Introduction to regular expressions Table of Contents Introduction to regular expressions Here's how we do it Iteration 1: skill level > Wollowitz Iteration 2: skill level > Rakesh Introduction to regular
More informationAlgorithmic Approaches for Biological Data, Lecture #8
Algorithmic Approaches for Biological Data, Lecture #8 Katherine St. John City University of New York American Museum of Natural History 17 February 2016 Outline More on Pattern Finding: Regular Expressions
More informationHere's an example of how the method works on the string "My text" with a start value of 3 and a length value of 2:
CS 1251 Page 1 Friday Friday, October 31, 2014 10:36 AM Finding patterns in text A smaller string inside of a larger one is called a substring. You have already learned how to make substrings in the spreadsheet
More informationN-grams in Python. L445/L515 Autumn 2010
N-grams in Python L445/L515 Autumn 2010 Calculating n-grams We want to take a practical task, i.e., using n-grams for natural language processing, and see how we can start implementing it in Python. Some
More informationCSE 105 THEORY OF COMPUTATION
CSE 105 THEORY OF COMPUTATION Spring 2017 http://cseweb.ucsd.edu/classes/sp17/cse105-ab/ Today's learning goals Sipser Ch 1.2, 1.3 Decide whether or not a string is described by a given regular expression
More informationPython I. Some material adapted from Upenn cmpe391 slides and other sources
Python I Some material adapted from Upenn cmpe391 slides and other sources Overview Names & Assignment Data types Sequences types: Lists, Tuples, and Strings Mutability Understanding Reference Semantics
More informationIntroduction; Parsing LL Grammars
Introduction; Parsing LL Grammars CS 440: Programming Languages and Translators Due Fri Feb 2, 11:59 pm 1/29 pp.1, 2; 2/7 all updates incorporated, solved Instructions You can work together in groups of
More informationCS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 5
CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 5 CS 536 Spring 2015 1 Multi Character Lookahead We may allow finite automata to look beyond the next input character.
More informationRegular Expression HOWTO Release 3.6.0
Regular Expression HOWTO Release 3.6.0 Guido van Rossum and the Python development team March 05, 2017 Python Software Foundation Email: docs@python.org Contents 1 Introduction 2 2 Simple Patterns 2 2.1
More informationProgramming with C++ as a Second Language
Programming with C++ as a Second Language Week 2 Overview of C++ CSE/ICS 45C Patricia Lee, PhD Chapter 1 C++ Basics Copyright 2016 Pearson, Inc. All rights reserved. Learning Objectives Introduction to
More informationRegular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications
Agenda for Today Regular Expressions CSE 413, Autumn 2005 Programming Languages Basic concepts of formal grammars Regular expressions Lexical specification of programming languages Using finite automata
More information(Refer Slide Time: 0:19)
Theory of Computation. Professor somenath Biswas. Department of Computer Science & Engineering. Indian Institute of Technology, Kanpur. Lecture-15. Decision Problems for Regular Languages. (Refer Slide
More informationRegular Expressions. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein
Regular Expressions Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein A quick review: The super Date class class Date: def init (self, day, month): self.day = day self.month
More informationLexical Analysis. Chapter 2
Lexical Analysis Chapter 2 1 Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexers Regular expressions Examples
More informationLexical Analysis. Lecture 3. January 10, 2018
Lexical Analysis Lecture 3 January 10, 2018 Announcements PA1c due tonight at 11:50pm! Don t forget about PA1, the Cool implementation! Use Monday s lecture, the video guides and Cool examples if you re
More informationCS/ECE 374 Fall Homework 1. Due Tuesday, September 6, 2016 at 8pm
CSECE 374 Fall 2016 Homework 1 Due Tuesday, September 6, 2016 at 8pm Starting with this homework, groups of up to three people can submit joint solutions. Each problem should be submitted by exactly one
More informationCSc 453 Compilers and Systems Software
CSc 453 Compilers and Systems Software 3 : Lexical Analysis I Christian Collberg Department of Computer Science University of Arizona collberg@gmail.com Copyright c 2009 Christian Collberg August 23, 2009
More informationCSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1
CSE 401 Compilers LR Parsing Hal Perkins Autumn 2011 10/10/2011 2002-11 Hal Perkins & UW CSE D-1 Agenda LR Parsing Table-driven Parsers Parser States Shift-Reduce and Reduce-Reduce conflicts 10/10/2011
More informationRegular Expressions. Todd Kelley CST8207 Todd Kelley 1
Regular Expressions Todd Kelley kelleyt@algonquincollege.com CST8207 Todd Kelley 1 POSIX character classes Some Regular Expression gotchas Regular Expression Resources Assignment 3 on Regular Expressions
More informationCS Lecture 2. The Front End. Lecture 2 Lexical Analysis
CS 1622 Lecture 2 Lexical Analysis CS 1622 Lecture 2 1 Lecture 2 Review of last lecture and finish up overview The first compiler phase: lexical analysis Reading: Chapter 2 in text (by 1/18) CS 1622 Lecture
More informationCSE 105 THEORY OF COMPUTATION
CSE 105 THEORY OF COMPUTATION Spring 2017 http://cseweb.ucsd.edu/classes/sp17/cse105-ab/ Today's learning goals Sipser Ch 1.2, 1.3 Design NFA recognizing a given language Convert an NFA (with or without
More informationPrinciples of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore
(Refer Slide Time: 00:20) Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore Lecture - 4 Lexical Analysis-Part-3 Welcome
More informationMP 3 A Lexer for MiniJava
MP 3 A Lexer for MiniJava CS 421 Spring 2012 Revision 1.0 Assigned Wednesday, February 1, 2012 Due Tuesday, February 7, at 09:30 Extension 48 hours (penalty 20% of total points possible) Total points 43
More informationRegular Expressions. Perl PCRE POSIX.NET Python Java
ModSecurity rules rely heavily on regular expressions to allow you to specify when a rule should or shouldn't match. This appendix teaches you the basics of regular expressions so that you can better make
More informationLexical Analysis 1 / 52
Lexical Analysis 1 / 52 Outline 1 Scanning Tokens 2 Regular Expresssions 3 Finite State Automata 4 Non-deterministic (NFA) Versus Deterministic Finite State Automata (DFA) 5 Regular Expresssions to NFA
More informationCS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing
CS 432 Fall 2017 Mike Lam, Professor Finite Automata Conversions and Lexing Finite Automata Key result: all of the following have the same expressive power (i.e., they all describe regular languages):
More informationScanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012
Scanners Xiaokang Qiu Purdue University ECE 468 Adapted from Kulkarni 2012 August 24, 2016 Scanners Sometimes called lexers Recall: scanners break input stream up into a set of tokens Identifiers, reserved
More informationThe Three Rules. Program. What is a Computer Program? 5/30/2018. Interpreted. Your First Program QuickStart 1. Chapter 1
The Three Rules Chapter 1 Beginnings Rule 1: Think before you program Rule 2: A program is a human-readable essay on problem solving that also executes on a computer Rule 3: The best way to improve your
More informationZhizheng Zhang. Southeast University
Zhizheng Zhang Southeast University 2016/10/5 Lexical Analysis 1 1. The Role of Lexical Analyzer 2016/10/5 Lexical Analysis 2 2016/10/5 Lexical Analysis 3 Example. position = initial + rate * 60 2016/10/5
More informationLexical Error Recovery
Lexical Error Recovery A character sequence that can t be scanned into any valid token is a lexical error. Lexical errors are uncommon, but they still must be handled by a scanner. We won t stop compilation
More informationCOMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! [ALSU03] Chapter 3 - Lexical Analysis Sections 3.1-3.4, 3.6-3.7! Reading for next time [ALSU03] Chapter 3 Copyright (c) 2010 Ioanna
More information8 Matroid Intersection
8 Matroid Intersection 8.1 Definition and examples 8.2 Matroid Intersection Algorithm 8.1 Definitions Given two matroids M 1 = (X, I 1 ) and M 2 = (X, I 2 ) on the same set X, their intersection is M 1
More informationAlternation. Kleene Closure. Definition of Regular Expressions
Alternation Small finite sets are conveniently represented by listing their elements. Parentheses delimit expressions, and, the alternation operator, separates alternatives. For example, D, the set of
More informationImplementation of Lexical Analysis
Implementation of Lexical Analysis Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs) Implementation
More informationRegular Languages and Regular Expressions
Regular Languages and Regular Expressions According to our definition, a language is regular if there exists a finite state automaton that accepts it. Therefore every regular language can be described
More informationLexical Error Recovery
Lexical Error Recovery A character sequence that can t be scanned into any valid token is a lexical error. Lexical errors are uncommon, but they still must be handled by a scanner. We won t stop compilation
More information=~ determines to which variable the regex is applied. In its absence, $_ is used.
NAME DESCRIPTION OPERATORS perlreref - Perl Regular Expressions Reference This is a quick reference to Perl's regular expressions. For full information see perlre and perlop, as well as the SEE ALSO section
More informationRegular Expressions!!
Regular Expressions!! In your mat219_class project 1. Copy code from D2L to download regex-prac9ce.r, and run in the Console. 2. Open a blank R script and name it regex-notes. library(tidyverse) regular
More informationChapter 2, Part I Introduction to C Programming
Chapter 2, Part I Introduction to C Programming C How to Program, 8/e, GE 2016 Pearson Education, Ltd. All rights reserved. 1 2016 Pearson Education, Ltd. All rights reserved. 2 2016 Pearson Education,
More informationRegular expressions. LING78100: Methods in Computational Linguistics I
Regular expressions LING78100: Methods in Computational Linguistics I String methods Python strings have methods that allow us to determine whether a string: Contains another string; e.g., assert "and"
More informationCS 1110, LAB 2: ASSIGNMENTS AND STRINGS
CS 1110, LAB 2: ASSIGNMENTS AND STRINGS http://www.cs.cornell.edu/courses/cs1110/2014fa/labs/lab02.pdf First Name: Last Name: NetID: The purpose of this lab is to get you comfortable with using assignment
More informationLexical Analysis. Implementation: Finite Automata
Lexical Analysis Implementation: Finite Automata Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs)
More informationEffective Programming Practices for Economists. 17. Regular Expressions
Effective Programming Practices for Economists 17. Regular Expressions Hans-Martin von Gaudecker Department of Economics, Universität Bonn Motivation Replace all occurences of my name in the project template
More informationAdministrivia. CMSC 216 Introduction to Computer Systems Lecture 24 Data Representation and Libraries. Representing characters DATA REPRESENTATION
Administrivia CMSC 216 Introduction to Computer Systems Lecture 24 Data Representation and Libraries Jan Plane & Alan Sussman {jplane, als}@cs.umd.edu Project 6 due next Friday, 12/10 public tests posted
More informationCMSC 350: COMPILER DESIGN
Lecture 11 CMSC 350: COMPILER DESIGN see HW3 LLVMLITE SPECIFICATION Eisenberg CMSC 350: Compilers 2 Discussion: Defining a Language Premise: programming languages are purely formal objects We (as language
More informationCSC 467 Lecture 3: Regular Expressions
CSC 467 Lecture 3: Regular Expressions Recall How we build a lexer by hand o Use fgetc/mmap to read input o Use a big switch to match patterns Homework exercise static TokenKind identifier( TokenKind token
More informationlec3:nondeterministic finite state automata
lec3:nondeterministic finite state automata 1 1.introduction Nondeterminism is a useful concept that has great impact on the theory of computation. When the machine is in a given state and reads the next
More informationWhere We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars
CMSC 330: Organization of Programming Languages Context Free Grammars Where We Are Programming languages Ruby OCaml Implementing programming languages Scanner Uses regular expressions Finite automata Parser
More informationOutline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata
Outline 1 2 Regular Expresssions Lexical Analysis 3 Finite State Automata 4 Non-deterministic (NFA) Versus Deterministic Finite State Automata (DFA) 5 Regular Expresssions to NFA 6 NFA to DFA 7 8 JavaCC:
More informationMore Scripting and Regular Expressions. Todd Kelley CST8207 Todd Kelley 1
More Scripting and Regular Expressions Todd Kelley kelleyt@algonquincollege.com CST8207 Todd Kelley 1 Regular Expression Summary Regular Expression Examples Shell Scripting 2 Do not confuse filename globbing
More informationLanguages and Compilers
Principles of Software Engineering and Operational Systems Languages and Compilers SDAGE: Level I 2012-13 3. Formal Languages, Grammars and Automata Dr Valery Adzhiev vadzhiev@bournemouth.ac.uk Office:
More informationTheory of Computation Dr. Weiss Extra Practice Exam Solutions
Name: of 7 Theory of Computation Dr. Weiss Extra Practice Exam Solutions Directions: Answer the questions as well as you can. Partial credit will be given, so show your work where appropriate. Try to be
More informationParsing CSCI-400. Principles of Programming Languages.
Parsing Principles of Programming Languages https://lambda.mines.edu Activity & Overview Review the learning group activity with your group. Compare your solutions to the practice problems. Did anyone
More informationLexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata
Lexical Analysis Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata Phase Ordering of Front-Ends Lexical analysis (lexer) Break input string
More informationCompiler phases. Non-tokens
Compiler phases Compiler Construction Scanning Lexical Analysis source code scanner tokens regular expressions lexical analysis Lennart Andersson parser context free grammar Revision 2011 01 21 parse tree
More informationCS2 Practical 2 CS2Ah
CS2 Practical 2 Finite automata This practical is based on material in the language processing thread. The practical is made up of two parts. Part A consists of four paper and pencil exercises, designed
More informationCSE 413 Final Exam. June 7, 2011
CSE 413 Final Exam June 7, 2011 Name The exam is closed book, except that you may have a single page of hand-written notes for reference plus the page of notes you had for the midterm (although you are
More informationMonday, August 26, 13. Scanners
Scanners Scanners Sometimes called lexers Recall: scanners break input stream up into a set of tokens Identifiers, reserved words, literals, etc. What do we need to know? How do we define tokens? How can
More informationWeek - 04 Lecture - 01 Merge Sort. (Refer Slide Time: 00:02)
Programming, Data Structures and Algorithms in Python Prof. Madhavan Mukund Department of Computer Science and Engineering Indian Institute of Technology, Madras Week - 04 Lecture - 01 Merge Sort (Refer
More informationProf. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan
Compilers Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan Lexical Analyzer (Scanner) 1. Uses Regular Expressions to define tokens 2. Uses Finite Automata to recognize tokens
More informationWednesday, September 3, 14. Scanners
Scanners Scanners Sometimes called lexers Recall: scanners break input stream up into a set of tokens Identifiers, reserved words, literals, etc. What do we need to know? How do we define tokens? How can
More informationOptimizing Finite Automata
Optimizing Finite Automata We can improve the DFA created by MakeDeterministic. Sometimes a DFA will have more states than necessary. For every DFA there is a unique smallest equivalent DFA (fewest states
More informationBehaviour Diagrams UML
Behaviour Diagrams UML Behaviour Diagrams Structure Diagrams are used to describe the static composition of components (i.e., constraints on what intstances may exist at run-time). Interaction Diagrams
More informationOutline CS4120/4121. Compilation in a Nutshell 1. Administration. Introduction to Compilers Andrew Myers. HW1 out later today due next Monday.
CS4120/4121 Introduction to Compilers Andrew Myers Lecture 2: Lexical Analysis 31 August 2009 Outline Administration Compilation in a nutshell (or two) What is lexical analysis? Writing a lexer Specifying
More informationLecture 2. Regular Expression Parsing Awk
Lecture 2 Regular Expression Parsing Awk Shell Quoting Shell Globing: file* and file? ls file\* (the backslash key escapes wildcards) Shell Special Characters ~ Home directory ` backtick (command substitution)
More informationCIS192 Python Programming
CIS192 Python Programming Regular Expressions and maybe OS Robert Rand University of Pennsylvania October 1, 2015 Robert Rand (University of Pennsylvania) CIS 192 October 1, 2015 1 / 16 Outline 1 Regular
More informationAutomating Construction of Lexers
Automating Construction of Lexers Regular Expression to Programs Not all regular expressions are simple. How can we write a lexer for (a*b aaa)? Tokenizing aaaab Vs aaaaaa Regular Expression Finite state
More informationA language is a subset of the set of all strings over some alphabet. string: a sequence of symbols alphabet: a set of symbols
The current topic:! Introduction! Object-oriented programming: Python! Functional programming: Scheme! Python GUI programming (Tkinter)! Types and values! Logic programming: Prolog! Introduction! Rules,
More informationRegexp. Lecture 26: Regular Expressions
Regexp Lecture 26: Regular Expressions Regular expressions are a small programming language over strings Regex or regexp are not unique to Python They let us to succinctly and compactly represent classes
More informationRegular Expressions. Michael Wrzaczek Dept of Biosciences, Plant Biology Viikki Plant Science Centre (ViPS) University of Helsinki, Finland
Regular Expressions Michael Wrzaczek Dept of Biosciences, Plant Biology Viikki Plant Science Centre (ViPS) University of Helsinki, Finland November 11 th, 2015 Regular expressions provide a flexible way
More informationImplementation of Lexical Analysis
Implementation of Lexical Analysis Outline Specifying lexical structure using regular expressions Finite automata Deterministic Finite Automata (DFAs) Non-deterministic Finite Automata (NFAs) Implementation
More informationCS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)
CS5371 Theory of Computation Lecture 8: Automata Theory VI (PDA, PDA = CFG) Objectives Introduce Pushdown Automaton (PDA) Show that PDA = CFG In terms of descriptive power Pushdown Automaton (PDA) Roughly
More information