An Introduction to Regular Expressions in Python

Size: px
Start display at page:

Download "An Introduction to Regular Expressions in Python"

Transcription

1 An Introduction to Regular Expressions in Python Fabienne Braune 1 1 LMU Munich May 29, 2017 Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

2 Outline 1 Introductory Exercise 2 Regular Expressions Basics 3 Advanced Operations on Regex and Match Objects 4 Exercise Session Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

3 Introductory Exercise (10 min.) Consider the following variants of the German verb sagen : sagen, sagt, sagte, gesagt, zugesagt Write a function matchsag() that takes a string as argument and returns: true if the string contains one of the variants false if the string does not contain a variant Er sagt mir immer guten Tag true Das hat er nie gesagt true Er hat wieder was interessantes berichtet false Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

4 Introductory Exercise How many lines of code does your program have? Did you use any special module to implement your code? Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

5 Regular Expressions Basics Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

6 Regular Expressions Regular Expressions (REGEX) describe patterns of text \d stands for digit regex for \d\d\d\d-\d\d\d -\d\d\d-\d\d\d \w stands for word regex for toy238 \w\w\w\w\w\w \w\w\w\d\d\d toy\d\d\d Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

7 Creating Regex Objects Regex functions are in re module import re at beggining of script re class implements regex manipulation Creating regex with re use re object to call functions that manipulate regex re.compile() creates a pattern to match argument is string value describing regex returns pattern object corresponding to regex myphone=re.compile(r \d\d\d\d-\d\d\d-\d\d\d-\d\d\d ) r raw string no need to escape backslashes Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

8 Searching for Regex Objects Searching for regex with re myregex.search() searches myregex in given string argument is string containing regex myphone=re.compile(r \d\d\d\d-\d\d\d-\d\d\d-\d\d\d ) returns Match Object or None match if pattern is matched in string match.group() returns matched text mymatch=myphone.search( Meine Telefonnummer lautet ) None=myphone.search( Meine Telefonnummer lautet 0162-WWSR-hh-2 )) Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

9 Recap: Search for Phone Numbers in Strings 1 import re 2 myphone=re.compile(r \d\d\d\d-\d\d\d-\d\d\d-\d\d\d ) 3 match=myphone.search( Meine Telefonnummer lautet )) 4 print(match.group()) Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

10 Interactive Session Let s search for birth dates in strings! Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

11 Grouping Using groups subpatterns of regex can be printed Create groups by inserting parentheses inside regex myphone=re.compile(r (\d\d\d\d)-(\d\d\d)-(\d\d\d)-\d\d\d ) match=myphone.search( Meine Telefonnummer lautet )) print(match.group(1)) 0162 print(match.group(2)) 787 print(match.group(3)) 334 print(match.group(0)) Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

12 Matching Multiple Groups Sometimes we want to match multiple groups Match multiple groups with mysagen=re.compile(r sag\w\w \w\wsag\w ) mysagen.search( Er sagt ja, Sie sagen nein ) sagen mysagen.search( Sie haben nein gesagt ) gesagt Match object contains first occurrence that matches regex mysagen.search( Sie sagen nein, er hat ja gesagt ) sagen findall() returns list with all regex Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

13 Matching Multiple Groups Convenient to match groups with same prefix Use and parentheses mysagen=re.compile(r sag(te en t ten) ) mysagen.search( Er sagt ja, Sie sagen nein ) sagt mysagen.search( Sie sagten nein ) sagten mysagen.search( Sie haben nein gesagt ) sagt mysagen.findall( Er sagte ja, sie sagt nein und die anderen sagten vielleicht ) sagte, sagt, sagten Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

14 Optional Matching Convenient to match groups optionally? matches 0 or 1 occurrences * matches 0 to n occurrences + matches 1 to n occurrences mysagen=re.compile(r (ge)?sag(te en t ten) ) mysagen.search( Er sagt ja, Sie sagen nein ) sagt mysagen.search( Sie haben nein gesagt ) gesagt mysagen=re.compile(r (ge)*sag(te en t ten) ) mysagen.search( Sie haben nein gegegegesagt ) gegegegesagt Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

15 Optional Matching Also works with single characters? matches 0 or 1 occurrences * matches 0 to n occurrences + matches 1 to n occurrences mysagen=re.compile(r sag(te? en ten) ) mysagen.search( Er sagt ja, Sie sagen nein ) sagt mysagen.search( Er sagte ja, Sie sagten nein ) sagte mysagen=re.compile(r sag(te* en ten) ) mysagen.search( Er sagteeeeee nein ) sagteeeeee Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

16 Matching Repetitions Useful to define number of matches {1,4} matches 1 to 4 repetitions mysagen=re.compile(r (ge){1,4}sag(te en t ten) ) mysagen.search( Sie haben nein gesagt ) gesagt mysagen.search( Sie haben nein gegegegesagt ) gegegegesagt mysagen.search( Sie haben nein gegegegegegegesagt ) no match Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

17 Introductory Exercise (5 min.) Consider the following inflexions of the German verb sagen: sagen, sagt, sagte, gesagt, zugesagt, gesagten, zugesagten, abgesagten Using the regex module write a function matchsag() that takes a string as argument and returns: true if the string contains one of the variants false if the string does not contain a variant Er sagt mir immer guten Tag true Das hat er nie gesagt true Er hat wieder was interessantes berichtet false Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

18 Interactive Session (10 min.) Let s try out your solutions! Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

19 Greedy and Non-Greedy Matching Python regex greedy by default mysagen=re.compile(r (ge){1,4}sag(te en t ten) ) mysagen.search( Sie haben nein gesagt ) gesagt mysagen.search( Sie haben nein gegegegesagt ) gegegegesagt could also match gesagt but takes longest Enable non-greedy mode with? mysagen=re.compile(r (ge){1,4}?sag(te en t ten) ) mysagen.search( Sie haben nein gesagt ) gesagt mysagen.search( Sie haben nein gegegegesagt ) gesagt could also match gegegegesagt but takes shortest Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

20 Matching Beginning of String (only) Use match() instead of search() mysagen=re.compile(r (ge){1,4}sag(te en t ten) ) mysagen.match( Sie haben nein gesagt ) no match mysagen.match( gesagt hat er nichts ) gesagt mysagen.search( gegegegesagten ) gegegegesagten Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

21 Returning all Matched Strings Match object returned by search() contains first matched string What if all matched strings should be returned? Use findall() mysagen=re.compile(r (ge){1,4}sag(te en t ten) ) mysagen.search( gesagt oder nicht gegesagt ) gesagt mysagen.findall( gesagt oder nicht gegesagt ) gesagt,gegesagt when multiple groups in regex, returns a list of tuples Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

22 Some Character Classes \d digit from 0 to 9 \D not digit from 0 to 9 \w word: letter, digit or underscore \W not word: letter, digit or underscore \s space: space, tab or newline \S not space: space, tab or newline mystery=re.compile(r \d+\s\w+ ) What does mystery match? Give examples. Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

23 Creating Character Classes The pre-defined character classes may be too broad Define custom character classes with [ ] [aeiouaeiou] any vowel [a-za-z0-9] all lowercased letters [0-5]? Negative classes are defined with ˆ What does [ˆaeiouAEIOU] match? What does [ˆ0-5] match? Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

24 More Special Symbols ˆ : match beginning of string beg=re.compile(r ˆsag(te en t ten)* ) beg.search( sag hallo ) sag $ : match end of string beg=re.compile(r $sag(te en t ten)* ) beg.search( hallo sagen ) sagen. : match everything beg=re.compile(r $sag(.)* ) beg.search( hallo sagen ) sagen beg.search( hallo sagen sagen sagen ) sagen sagen sagen (greedy) Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

25 Regex symbols recap (10 min.) What do the following symbols do?? * + {n} {n,} {,m} {n,m} *? +? {n,m}? ˆ $. [abc] [ˆ abc] Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

26 Advanced Operations on Regex and Match Objects Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

27 String Modifications Useful to modify string containing regex. beg=re.compile(r ˆsag(te en t ten)* ) beg.search( sag hallo ) sag beg.sub( schrei, sag hallo ) schrei hallo re.sub(r ˆsag(te en t ten)*, schrei, sag hallo ) schrei hallo re.escape(p) : escape pattern re.purge() : empty regex cache Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

28 Advanced Grouping Print list containing all groups with groups() Returns list containing all matched groups Non-capturing groups Match a group but don t save its content use?: at beginning of group mymatch = re.match( (?:Asimov)+, Isaac Asimov ) mymatch.groups() is empty Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

29 Advanced Grouping Named groups Match a group and save its name use?p<abc> at beginning of group mymatch = re.match( (?P<abc>abc)+, abc ) mymatch.group( abc ) returns abc Positive and negative Lookahead use?= at beginning of group mymatch = re.match( Isaac (?=Asimov)+, Isaac Asimov ) mymatch.groups() is Isaac only matches Isaac Asimov Negative lookahead with?! at beginning of group Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

30 Advanced Grouping Print dictionary containing all groups with their names use groupdict() mymatch = re.match( (?P<lemma>sag)(?P<suff>(te ten)), sagten ) mymatch.groupdict() returns lemma: sag, suffix: ten Return span of matched group use span(group) mymatch = re.match( (?P<lemma>sag)(?P<suff>(te ten)), sagten ) mymatch.span(1) returns 0,2 mymatch.start(1) returns 0 mymatch.end(1) returns 2 Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

31 Compilation Flags re.compile(regex,flags=0) Possibility to pass compilation flags Flag Short Description re.ignorecase re.i Case insensitive matching re.multiline re.m ˆ $ also match at beginning of newline re.dotall re.s. matches any symbol (also newline) re.verbose re.x allows to insert comments and newlines re.unicode re.u char sequences dependent on unicode Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

32 Exercise Session Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

33 Digit Manipulation Create a function that: 1 Recognizes a date in YYYY-MM-DD format 2 Modifies YYYY-MM-DD into DD MM YYYY Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

34 Digit Manipulation: Solution import re def rewritedate(s): res=re.search(r (\d{4})-(\d{2})-(\d{2}) ) print res.group(3) + + res.group(2) + + res.group(1) Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

35 Digit and Word Manipulation Given this list: l = [ Neu, Allison, Burns, C. Montgomery, Putz, Lionel, Simpson, Homer Jay ] Write a function that transforms the list into: Allison Neu C. Montgomery Burns Lionel Putz Homer Jay Simpson Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

36 Digit and Word Manipulation: Solution import re def rewritelist(l): for i in l: res = re.search(r ([0-9-]*)\s*([A-Za-z]+),)\s+(.*), i) print res.group(3) + + res.group(2) + + res.group(1) Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

37 Word Manipulation Given a list of words write regexes to: 1 Find all words that include four consecutive vowels 2 Find every word with 5 repeat letters 3 Find all words beginning with sag 4 Find all words containing the segment sag 5 Remove repeat letters Make a function that creates an acronym from a phrase. Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

38 Interactive Session Let s try out your solutions! Fabienne Braune (CIS) An Introduction to Regular Expressions in Python May 29,

Algorithmic Approaches for Biological Data, Lecture #8

Algorithmic Approaches for Biological Data, Lecture #8 Algorithmic Approaches for Biological Data, Lecture #8 Katherine St. John City University of New York American Museum of Natural History 17 February 2016 Outline More on Pattern Finding: Regular Expressions

More information

Regular Expressions in programming. CSE 307 Principles of Programming Languages Stony Brook University

Regular Expressions in programming. CSE 307 Principles of Programming Languages Stony Brook University Regular Expressions in programming CSE 307 Principles of Programming Languages Stony Brook University http://www.cs.stonybrook.edu/~cse307 1 What are Regular Expressions? Formal language representing a

More information

Regular Expressions. Regular Expression Syntax in Python. Achtung!

Regular Expressions. Regular Expression Syntax in Python. Achtung! 1 Regular Expressions Lab Objective: Cleaning and formatting data are fundamental problems in data science. Regular expressions are an important tool for working with text carefully and eciently, and are

More information

Regular Expressions. Pattern and Match objects. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Regular Expressions. Pattern and Match objects. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein Regular Expressions Pattern and Match objects Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein A quick review Strings: abc vs. abc vs. abc vs. r abc String manipulation

More information

RegExpr:Review & Wrapup; Lecture 13b Larry Ruzzo

RegExpr:Review & Wrapup; Lecture 13b Larry Ruzzo RegExpr:Review & Wrapup; Lecture 13b Larry Ruzzo Outline More regular expressions & pattern matching: groups substitute greed RegExpr Syntax They re strings Most punctuation is special; needs to be escaped

More information

Regular Expressions. Pattern and Match objects. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Regular Expressions. Pattern and Match objects. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein Regular Expressions Pattern and Match objects Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein A quick review Strings: abc vs. abc vs. abc vs. r abc String manipulation

More information

LING115 Lecture Note Session #7: Regular Expressions

LING115 Lecture Note Session #7: Regular Expressions LING115 Lecture Note Session #7: Regular Expressions 1. Introduction We need to refer to a set of strings for various reasons: to ignore case-distinction, to refer to a set of files that share a common

More information

Computer Systems and Architecture

Computer Systems and Architecture Computer Systems and Architecture Regular Expressions Bart Meyers University of Antwerp August 29, 2012 Outline What? Tools Anchors, character sets and modifiers Advanced Regular expressions Exercises

More information

Lab 20: Regular Expressions in Python. Ling 1330/2330: Computational Linguistics Na-Rae Han

Lab 20: Regular Expressions in Python. Ling 1330/2330: Computational Linguistics Na-Rae Han Lab 20: Regular Expressions in Python Ling 1330/2330: Computational Linguistics Na-Rae Han Exercise 10: regexing Jobs [x X] [xx] (x X) Within [... ], all characters are already considered forming a set,

More information

Computer Systems and Architecture

Computer Systems and Architecture Computer Systems and Architecture Stephen Pauwels Regular Expressions Academic Year 2018-2019 Outline What is a Regular Expression? Tools Anchors, Character sets and Modifiers Advanced Regular Expressions

More information

Regular Expressions. Upsorn Praphamontripong. CS 1111 Introduction to Programming Spring [Ref: https://docs.python.org/3/library/re.

Regular Expressions. Upsorn Praphamontripong. CS 1111 Introduction to Programming Spring [Ref: https://docs.python.org/3/library/re. Regular Expressions Upsorn Praphamontripong CS 1111 Introduction to Programming Spring 2018 [Ref: https://docs.python.org/3/library/re.html] Overview: Regular Expressions What are regular expressions?

More information

CSE : Python Programming

CSE : Python Programming CSE 399-004: Python Programming Lecture 11: Regular expressions April 2, 2007 http://www.seas.upenn.edu/~cse39904/ Announcements About those meeting from last week If I said I was going to look into something

More information

https://lambda.mines.edu You should have researched one of these topics on the LGA: Reference Couting Smart Pointers Valgrind Explain to your group! Regular expression languages describe a search pattern

More information

"Hello" " This " + "is String " + "concatenation"

Hello  This  + is String  + concatenation Strings About Strings Strings are objects, but there is a special syntax for writing String literals: "Hello" Strings, unlike most other objects, have a defined operation (as opposed to a method): " This

More information

Regular Expressions. Regular expressions are a powerful search-and-replace technique that is widely used in other environments (such as Unix and Perl)

Regular Expressions. Regular expressions are a powerful search-and-replace technique that is widely used in other environments (such as Unix and Perl) Regular Expressions Regular expressions are a powerful search-and-replace technique that is widely used in other environments (such as Unix and Perl) JavaScript started supporting regular expressions in

More information

Python allows variables to hold string values, just like any other type (Boolean, int, float). So, the following assignment statements are valid:

Python allows variables to hold string values, just like any other type (Boolean, int, float). So, the following assignment statements are valid: 1 STRINGS Objectives: How text data is internally represented as a string Accessing individual characters by a positive or negative index String slices Operations on strings: concatenation, comparison,

More information

Strings are actually 'objects' Strings

Strings are actually 'objects' Strings Strings are actually 'objects' Strings What is an object?! An object is a concept that we can encapsulate data along with the functions that might need to access or manipulate that data. What is an object?!

More information

Lecture 11: Regular Expressions. LING 1330/2330: Introduction to Computational Linguistics Na-Rae Han

Lecture 11: Regular Expressions. LING 1330/2330: Introduction to Computational Linguistics Na-Rae Han Lecture 11: Regular Expressions LING 1330/2330: Introduction to Computational Linguistics Na-Rae Han Outline Language and Computers, Ch.4 Searching 4.4 Searching semi-structured data with regular expressions

More information

STATS Data analysis using Python. Lecture 0: Introduction and Administrivia

STATS Data analysis using Python. Lecture 0: Introduction and Administrivia STATS 700-002 Data analysis using Python Lecture 0: Introduction and Administrivia Data science has completely changed our world Course goals Survey popular tools in academia/industry for data analysis

More information

CSC401 Tutorial 3 - Regular Expressions

CSC401 Tutorial 3 - Regular Expressions CSC401 Tutorial 3 - Regular Expressions Zhewei Sun This is a quick introduction to regular expressions to get you up to speed on Assignment 1. For preprocessing and feature extraction, you should try to

More information

Essentials for Scientific Computing: Stream editing with sed and awk

Essentials for Scientific Computing: Stream editing with sed and awk Essentials for Scientific Computing: Stream editing with sed and awk Ershaad Ahamed TUE-CMS, JNCASR May 2012 1 Stream Editing sed and awk are stream processing commands. What this means is that they are

More information

ML 4 A Lexer for OCaml s Type System

ML 4 A Lexer for OCaml s Type System ML 4 A Lexer for OCaml s Type System CS 421 Fall 2017 Revision 1.0 Assigned October 26, 2017 Due November 2, 2017 Extension November 4, 2017 1 Change Log 1.0 Initial Release. 2 Overview To complete this

More information

DATA STRUCTURE AND ALGORITHM USING PYTHON

DATA STRUCTURE AND ALGORITHM USING PYTHON DATA STRUCTURE AND ALGORITHM USING PYTHON Sorting, Searching Algorithm and Regular Expression Peter Lo Sorting Algorithms Put Elements of List in Certain Order 2 Bubble Sort The bubble sort makes multiple

More information

LECTURE 8. The Standard Library Part 2: re, copy, and itertools

LECTURE 8. The Standard Library Part 2: re, copy, and itertools LECTURE 8 The Standard Library Part 2: re, copy, and itertools THE STANDARD LIBRARY: RE The Python standard library contains extensive support for regular expressions. Regular expressions, often abbreviated

More information

Pieter van den Hombergh. April 13, 2018

Pieter van den Hombergh. April 13, 2018 Intro ergh Fontys Hogeschool voor Techniek en Logistiek April 13, 2018 ergh/fhtenl April 13, 2018 1/11 Regex? are a very power, but also complex tool. There is the saying that: Intro If you start with

More information

Introduction to: Computers & Programming: Strings and Other Sequences

Introduction to: Computers & Programming: Strings and Other Sequences Introduction to: Computers & Programming: Strings and Other Sequences in Python Part I Adam Meyers New York University Outline What is a Data Structure? What is a Sequence? Sequences in Python All About

More information

CS 2112 Lab: Regular Expressions

CS 2112 Lab: Regular Expressions October 10, 2012 Regex Overview Regular Expressions, also known as regex or regexps are a common scheme for pattern matching regex supports matching individual characters as well as categories and ranges

More information

Fundamentals of Programming Session 4

Fundamentals of Programming Session 4 Fundamentals of Programming Session 4 Instructor: Reza Entezari-Maleki Email: entezari@ce.sharif.edu 1 Fall 2011 These slides are created using Deitel s slides, ( 1992-2010 by Pearson Education, Inc).

More information

Regular expressions. LING78100: Methods in Computational Linguistics I

Regular expressions. LING78100: Methods in Computational Linguistics I Regular expressions LING78100: Methods in Computational Linguistics I String methods Python strings have methods that allow us to determine whether a string: Contains another string; e.g., assert "and"

More information

Understanding Regular Expressions, Special Characters, and Patterns

Understanding Regular Expressions, Special Characters, and Patterns APPENDIXA Understanding Regular Expressions, Special Characters, and Patterns This appendix describes the regular expressions, special or wildcard characters, and patterns that can be used with filters

More information

More Details about Regular Expressions

More Details about Regular Expressions More Details about Regular Expressions Basic Regular Expression Notation Summary of basic notations to match single characters and sequences of characters 1. /[abc]/ = /a b c/ Character class; disjunction

More information

15-388/688 - Practical Data Science: Data collection and scraping. J. Zico Kolter Carnegie Mellon University Spring 2017

15-388/688 - Practical Data Science: Data collection and scraping. J. Zico Kolter Carnegie Mellon University Spring 2017 15-388/688 - Practical Data Science: Data collection and scraping J. Zico Kolter Carnegie Mellon University Spring 2017 1 Outline The data collection process Common data formats and handling Regular expressions

More information

Regex, Sed, Awk. Arindam Fadikar. December 12, 2017

Regex, Sed, Awk. Arindam Fadikar. December 12, 2017 Regex, Sed, Awk Arindam Fadikar December 12, 2017 Why Regex Lots of text data. twitter data (social network data) government records web scrapping many more... Regex Regular Expressions or regex or regexp

More information

Appendix. As a quick reference, here you will find all the metacharacters and their descriptions. Table A-1. Characters

Appendix. As a quick reference, here you will find all the metacharacters and their descriptions. Table A-1. Characters Appendix As a quick reference, here you will find all the metacharacters and their descriptions. Table A-1. Characters. Any character [] One out of an inventory of characters [ˆ] One not in the inventory

More information

File I/O and Regular Expressions. Sandy Brownlee

File I/O and Regular Expressions. Sandy Brownlee File I/O and Regular Expressions Sandy Brownlee sbr@cs.stir.ac.uk Outline Basic reading / writing of text files in Python Use a library for more complex formats! E.g. openpyxl, python-docx, pypdf2 Regex

More information

Regular Expressions!!

Regular Expressions!! Regular Expressions!! In your mat219_class project 1. Copy code from D2L to download regex-prac9ce.r, and run in the Console. 2. Open a blank R script and name it regex-notes. library(tidyverse) regular

More information

CSE 154 LECTURE 11: REGULAR EXPRESSIONS

CSE 154 LECTURE 11: REGULAR EXPRESSIONS CSE 154 LECTURE 11: REGULAR EXPRESSIONS What is form validation? validation: ensuring that form's values are correct some types of validation: preventing blank values (email address) ensuring the type

More information

CIS192 Python Programming. Robert Rand. August 27, 2015

CIS192 Python Programming. Robert Rand. August 27, 2015 CIS192 Python Programming Introduction Robert Rand University of Pennsylvania August 27, 2015 Robert Rand (University of Pennsylvania) CIS 192 August 27, 2015 1 / 30 Outline 1 Logistics Grading Office

More information

LING/C SC/PSYC 438/538. Lecture 10 Sandiway Fong

LING/C SC/PSYC 438/538. Lecture 10 Sandiway Fong LING/C SC/PSYC 438/538 Lecture 10 Sandiway Fong Administrivia Homework 4 Perl regex Python re import re slightly complicated string handling: use raw https://docs.python.or g/3/library/re.html Regular

More information

Regular Expression HOWTO

Regular Expression HOWTO Regular Expression HOWTO Release 2.6.4 Guido van Rossum Fred L. Drake, Jr., editor January 04, 2010 Python Software Foundation Email: docs@python.org Contents 1 Introduction ii 2 Simple Patterns ii 2.1

More information

CSE 303 Lecture 7. Regular expressions, egrep, and sed. read Linux Pocket Guide pp , 73-74, 81

CSE 303 Lecture 7. Regular expressions, egrep, and sed. read Linux Pocket Guide pp , 73-74, 81 CSE 303 Lecture 7 Regular expressions, egrep, and sed read Linux Pocket Guide pp. 66-67, 73-74, 81 slides created by Marty Stepp http://www.cs.washington.edu/303/ 1 discuss reading #2 Lecture summary regular

More information

Here's an example of how the method works on the string "My text" with a start value of 3 and a length value of 2:

Here's an example of how the method works on the string My text with a start value of 3 and a length value of 2: CS 1251 Page 1 Friday Friday, October 31, 2014 10:36 AM Finding patterns in text A smaller string inside of a larger one is called a substring. You have already learned how to make substrings in the spreadsheet

More information

Finishing Regular Expressions & XML / Web Scraping

Finishing Regular Expressions & XML / Web Scraping Finishing Regular Expressions & XML / Web Scraping April 7 th 2016 CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 1 Today Iterators Do ACT 3-2 Finish Regular Expressions XML Parsing

More information

N-grams in Python. L445/L515 Autumn 2010

N-grams in Python. L445/L515 Autumn 2010 N-grams in Python L445/L515 Autumn 2010 Calculating n-grams We want to take a practical task, i.e., using n-grams for natural language processing, and see how we can start implementing it in Python. Some

More information

GIS 4653/5653: Spatial Programming and GIS. More Python: Statements, Types, Functions, Modules, Classes

GIS 4653/5653: Spatial Programming and GIS. More Python: Statements, Types, Functions, Modules, Classes GIS 4653/5653: Spatial Programming and GIS More Python: Statements, Types, Functions, Modules, Classes Statement Syntax The if-elif-else statement Indentation and and colons are important Parentheses and

More information

CSE 390a Lecture 7. Regular expressions, egrep, and sed

CSE 390a Lecture 7. Regular expressions, egrep, and sed CSE 390a Lecture 7 Regular expressions, egrep, and sed slides created by Marty Stepp, modified by Jessica Miller and Ruth Anderson http://www.cs.washington.edu/390a/ 1 2 Lecture summary regular expression

More information

Basic Python Revision Notes With help from Nitish Mittal

Basic Python Revision Notes With help from Nitish Mittal Basic Python Revision Notes With help from Nitish Mittal HELP from Documentation dir(module) help() Important Characters and Sets of Characters tab \t new line \n backslash \\ string " " or ' ' docstring

More information

COMP519 Web Programming Autumn A Brief Intro to Python

COMP519 Web Programming Autumn A Brief Intro to Python COMP519 Web Programming Autumn 2015 A Brief Intro to Python Python Python was created in the late 1980s and its implementation was started in December 1989 by Guido van Rossum at CWI in the Netherlands.

More information

successes without magic London,

successes without magic London, (\d)(?:\u0020 \u0209 \u202f \u200a){0,1}((m mm cm km V mv µv l ml C Nm A ma bar s kv Hz khz M Hz t kg g mg W kw MW Ah mah N kn obr min µm µs Pa MPa kpa hpa mbar µf db)\b) ^\t*'.+?' => ' (\d+)(,)(\d+)k

More information

CS 230 Programming Languages

CS 230 Programming Languages CS 230 Programming Languages 09 / 20 / 2013 Instructor: Michael Eckmann Today s Topics Questions/comments? Continue Regular expressions Matching string basics =~ (matches) m/ / (this is the format of match

More information

Regular Expressions. Steve Renals (based on original notes by Ewan Klein) ICL 12 October Outline Overview of REs REs in Python

Regular Expressions. Steve Renals (based on original notes by Ewan Klein) ICL 12 October Outline Overview of REs REs in Python Regular Expressions Steve Renals s.renals@ed.ac.uk (based on original notes by Ewan Klein) ICL 12 October 2005 Introduction Formal Background to REs Extensions of Basic REs Overview Goals: a basic idea

More information

Structure of Programming Languages Lecture 3

Structure of Programming Languages Lecture 3 Structure of Programming Languages Lecture 3 CSCI 6636 4536 Spring 2017 CSCI 6636 4536 Lecture 3... 1/25 Spring 2017 1 / 25 Outline 1 Finite Languages Deterministic Finite State Machines Lexical Analysis

More information

C How to Program, 6/e by Pearson Education, Inc. All Rights Reserved.

C How to Program, 6/e by Pearson Education, Inc. All Rights Reserved. C How to Program, 6/e 1992-2010 by Pearson Education, Inc. An important part of the solution to any problem is the presentation of the results. In this chapter, we discuss in depth the formatting features

More information

\n is used in a string to indicate the newline character. An expression produces data. The simplest expression

\n is used in a string to indicate the newline character. An expression produces data. The simplest expression Chapter 1 Summary Comments are indicated by a hash sign # (also known as the pound or number sign). Text to the right of the hash sign is ignored. (But, hash loses its special meaning if it is part of

More information

LECTURE 7. The Standard Library

LECTURE 7. The Standard Library LECTURE 7 The Standard Library THE STANDARD LIBRARY Python has a fantastically large standard library. Some modules are more useful than others (e.g. sys and strings). Some modules are relatively obscure.

More information

A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994

A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994 A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994 Andrew W. Appel 1 James S. Mattson David R. Tarditi 2 1 Department of Computer Science, Princeton University 2 School of Computer

More information

Regular Expressions 1 / 12

Regular Expressions 1 / 12 Regular Expressions 1 / 12 https://xkcd.com/208/ 2 / 12 Regular Expressions In computer science, a language is a set of strings. Like any set, a language can be specified by enumeration (listing all the

More information

=~ determines to which variable the regex is applied. In its absence, $_ is used.

=~ determines to which variable the regex is applied. In its absence, $_ is used. NAME DESCRIPTION OPERATORS perlreref - Perl Regular Expressions Reference This is a quick reference to Perl's regular expressions. For full information see perlre and perlop, as well as the SEE ALSO section

More information

CIS192 Python Programming

CIS192 Python Programming CIS192 Python Programming Regular Expressions and maybe OS Robert Rand University of Pennsylvania October 1, 2015 Robert Rand (University of Pennsylvania) CIS 192 October 1, 2015 1 / 16 Outline 1 Regular

More information

More Regular Expressions

More Regular Expressions More Regular Expressions April 2 2015 CSCI 0931 - Intro. to Comp. for the HumaniBes and Social Sciences 1 Class Today iterators More special characters Working with match groups Let s talk about the project

More information

This page covers the very basics of understanding, creating and using regular expressions ('regexes') in Perl.

This page covers the very basics of understanding, creating and using regular expressions ('regexes') in Perl. NAME DESCRIPTION perlrequick - Perl regular expressions quick start Perl version 5.16.2 documentation - perlrequick This page covers the very basics of understanding, creating and using regular expressions

More information

Chapter 2, Part I Introduction to C Programming

Chapter 2, Part I Introduction to C Programming Chapter 2, Part I Introduction to C Programming C How to Program, 8/e, GE 2016 Pearson Education, Ltd. All rights reserved. 1 2016 Pearson Education, Ltd. All rights reserved. 2 2016 Pearson Education,

More information

COMP519 Web Programming Lecture 17: Python (Part 1) Handouts

COMP519 Web Programming Lecture 17: Python (Part 1) Handouts COMP519 Web Programming Lecture 17: Python (Part 1) Handouts Ullrich Hustadt Department of Computer Science School of Electrical Engineering, Electronics, and Computer Science University of Liverpool Contents

More information

Lexical Analysis. Lecture 3-4

Lexical Analysis. Lecture 3-4 Lexical Analysis Lecture 3-4 Notes by G. Necula, with additions by P. Hilfinger Prof. Hilfinger CS 164 Lecture 3-4 1 Administrivia I suggest you start looking at Python (see link on class home page). Please

More information

Babu Madhav Institute of Information Technology, UTU 2015

Babu Madhav Institute of Information Technology, UTU 2015 Five years Integrated M.Sc.(IT)(Semester 5) Question Bank 060010502:Programming in Python Unit-1:Introduction To Python Q-1 Answer the following Questions in short. 1. Which operator is used for slicing?

More information

Programming for Engineers Introduction to C

Programming for Engineers Introduction to C Programming for Engineers Introduction to C ICEN 200 Spring 2018 Prof. Dola Saha 1 Simple Program 2 Comments // Fig. 2.1: fig02_01.c // A first program in C begin with //, indicating that these two lines

More information

Lab 18: Regular Expressions in Python. Ling 1330/2330: Intro to Computational Linguistics Na-Rae Han

Lab 18: Regular Expressions in Python. Ling 1330/2330: Intro to Computational Linguistics Na-Rae Han Lab 18: Regular Expressions in Python Ling 1330/2330: Intro to Computational Linguistics Na-Rae Han Learning to use regex in Python Na-Rae's tutorials: http://www.pitt.edu/~naraehan/python3/re.html http://www.pitt.edu/~naraehan/python3/more_list_comp.html

More information

Variables and Values

Variables and Values Variables and Values Names Variables (which hold values) and functions (which are blocks of code) both have names Names must begin with a letter and may contain letters, digits, and underscores Names are

More information

Getting Started Values, Expressions, and Statements CS GMU

Getting Started Values, Expressions, and Statements CS GMU Getting Started Values, Expressions, and Statements CS 112 @ GMU Topics where does code go? values and expressions variables and assignment 2 where does code go? we can use the interactive Python interpreter

More information

The l3regex package: regular expressions in TEX

The l3regex package: regular expressions in TEX The l3regex package: regular expressions in TEX The L A TEX3 Project Released 2015/12/20 1 l3regex documentation The l3regex package provides regular expression testing, extraction of submatches, splitting,

More information

正则表达式 Frank from https://regex101.com/

正则表达式 Frank from https://regex101.com/ 符号 英文说明 中文说明 \n Matches a newline character 新行 \r Matches a carriage return character 回车 \t Matches a tab character Tab 键 \0 Matches a null character Matches either an a, b or c character [abc] [^abc]

More information

CSC 467 Lecture 3: Regular Expressions

CSC 467 Lecture 3: Regular Expressions CSC 467 Lecture 3: Regular Expressions Recall How we build a lexer by hand o Use fgetc/mmap to read input o Use a big switch to match patterns Homework exercise static TokenKind identifier( TokenKind token

More information

CSCI 4963/6963 Large-Scale Programming and Testing Homework 1 (document version 1.0) Regular Expressions and Pattern Matching in C

CSCI 4963/6963 Large-Scale Programming and Testing Homework 1 (document version 1.0) Regular Expressions and Pattern Matching in C CSCI 4963/6963 Large-Scale Programming and Testing Homework 1 (document version 1.0) Regular Expressions and Pattern Matching in C Overview This homework is due by 11:59:59 PM on Tuesday, September 19,

More information

CMPT 125: Lecture 3 Data and Expressions

CMPT 125: Lecture 3 Data and Expressions CMPT 125: Lecture 3 Data and Expressions Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University January 3, 2009 1 Character Strings A character string is an object in Java,

More information

CIS192: Python Programming

CIS192: Python Programming CIS192: Python Programming Introduction Harry Smith University of Pennsylvania January 18, 2017 Harry Smith (University of Pennsylvania) CIS 192 Lecture 1 January 18, 2017 1 / 34 Outline 1 Logistics Rooms

More information

Lab 1: Course Intro, Getting Started with Python IDLE. Ling 1330/2330 Computational Linguistics Na-Rae Han

Lab 1: Course Intro, Getting Started with Python IDLE. Ling 1330/2330 Computational Linguistics Na-Rae Han Lab 1: Course Intro, Getting Started with Python IDLE Ling 1330/2330 Computational Linguistics Na-Rae Han Objectives Course Introduction http://www.pitt.edu/~naraehan/ling1330/index.html Student survey

More information

This book is licensed under a Creative Commons Attribution 3.0 License

This book is licensed under a Creative Commons Attribution 3.0 License 6. Syntax Learning objectives: syntax and semantics syntax diagrams and EBNF describe context-free grammars terminal and nonterminal symbols productions definition of EBNF by itself parse tree grammars

More information

Introduction to programming using Python

Introduction to programming using Python Introduction to programming using Python Matthieu Choplin matthieu.choplin@city.ac.uk http://moodle.city.ac.uk/ Session 9 1 Objectives Quick review of what HTML is The find() string method Regular expressions

More information

LESSON 4. The DATA TYPE char

LESSON 4. The DATA TYPE char LESSON 4 This lesson introduces some of the basic ideas involved in character processing. The lesson discusses how characters are stored and manipulated by the C language, how characters can be treated

More information

Full file at C How to Program, 6/e Multiple Choice Test Bank

Full file at   C How to Program, 6/e Multiple Choice Test Bank 2.1 Introduction 2.2 A Simple Program: Printing a Line of Text 2.1 Lines beginning with let the computer know that the rest of the line is a comment. (a) /* (b) ** (c) REM (d)

More information

Expressions and Data Types CSC 121 Spring 2015 Howard Rosenthal

Expressions and Data Types CSC 121 Spring 2015 Howard Rosenthal Expressions and Data Types CSC 121 Spring 2015 Howard Rosenthal Lesson Goals Understand the basic constructs of a Java Program Understand how to use basic identifiers Understand simple Java data types

More information

Overview.

Overview. Overview day one 0. getting set up 1. text output and manipulation day two 2. reading and writing files 3. lists and loops today 4. writing functions 5. conditional statements day four day five day six

More information

Haskell Introduction Lists Other Structures Data Structures. Haskell Introduction. Mark Snyder

Haskell Introduction Lists Other Structures Data Structures. Haskell Introduction. Mark Snyder Outline 1 2 3 4 What is Haskell? Haskell is a functional programming language. Characteristics functional non-strict ( lazy ) pure (no side effects*) strongly statically typed available compiled and interpreted

More information

Introduction to: Computers & Programming: Strings and Other Sequences

Introduction to: Computers & Programming: Strings and Other Sequences Introduction to: Computers & Programming: Strings and Other Sequences in Python Part I Adam Meyers New York University Outline What is a Data Structure? What is a Sequence? Sequences in Python All About

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Variables, Data Types, Data Structures, Control Structures Janyl Jumadinova February 3, 2016 Data Type Data types are the basic unit of information storage. Instances of

More information

Chapter 1 Summary. Chapter 2 Summary. end of a string, in which case the string can span multiple lines.

Chapter 1 Summary. Chapter 2 Summary. end of a string, in which case the string can span multiple lines. Chapter 1 Summary Comments are indicated by a hash sign # (also known as the pound or number sign). Text to the right of the hash sign is ignored. (But, hash loses its special meaning if it is part of

More information

PHP and MySQL for Dynamic Web Sites. Intro Ed Crowley

PHP and MySQL for Dynamic Web Sites. Intro Ed Crowley PHP and MySQL for Dynamic Web Sites Intro Ed Crowley Class Preparation If you haven t already, download the sample scripts from: http://www.larryullman.com/books/phpand-mysql-for-dynamic-web-sitesvisual-quickpro-guide-4thedition/#downloads

More information

CSCI 2010 Principles of Computer Science. Data and Expressions 08/09/2013 CSCI

CSCI 2010 Principles of Computer Science. Data and Expressions 08/09/2013 CSCI CSCI 2010 Principles of Computer Science Data and Expressions 08/09/2013 CSCI 2010 1 Data Types, Variables and Expressions in Java We look at the primitive data types, strings and expressions that are

More information

Fundamentals of Programming. Lecture 3: Introduction to C Programming

Fundamentals of Programming. Lecture 3: Introduction to C Programming Fundamentals of Programming Lecture 3: Introduction to C Programming Instructor: Fatemeh Zamani f_zamani@ce.sharif.edu Sharif University of Technology Computer Engineering Department Outline A Simple C

More information

Data and Expressions. Outline. Data and Expressions 12/18/2010. Let's explore some other fundamental programming concepts. Chapter 2 focuses on:

Data and Expressions. Outline. Data and Expressions 12/18/2010. Let's explore some other fundamental programming concepts. Chapter 2 focuses on: Data and Expressions Data and Expressions Let's explore some other fundamental programming concepts Chapter 2 focuses on: Character Strings Primitive Data The Declaration And Use Of Variables Expressions

More information

12/22/11. Java How to Program, 9/e. Help you get started with Eclipse and NetBeans integrated development environments.

12/22/11. Java How to Program, 9/e. Help you get started with Eclipse and NetBeans integrated development environments. Java How to Program, 9/e Education, Inc. All Rights Reserved. } Java application programming } Use tools from the JDK to compile and run programs. } Videos at www.deitel.com/books/jhtp9/ Help you get started

More information

Regular Expression Reference

Regular Expression Reference APPENDIXB PCRE Regular Expression Details, page B-1 Backslash, page B-2 Circumflex and Dollar, page B-7 Full Stop (Period, Dot), page B-8 Matching a Single Byte, page B-8 Square Brackets and Character

More information

Introduction to Java. Java Programs Classes, Methods, and Statements Comments Strings Escape Sequences Identifiers Keywords

Introduction to Java. Java Programs Classes, Methods, and Statements Comments Strings Escape Sequences Identifiers Keywords Introduction to Java Java Programs Classes, Methods, and Statements Comments Strings Escape Sequences Identifiers Keywords Program Errors Syntax Runtime Logic Procedural Decomposition Methods Flow of Control

More information

EnableBasic. The Enable Basic language. Modified by Admin on Sep 13, Parent page: Scripting Languages

EnableBasic. The Enable Basic language. Modified by Admin on Sep 13, Parent page: Scripting Languages EnableBasic Old Content - visit altium.com/documentation Modified by Admin on Sep 13, 2017 Parent page: Scripting Languages This Enable Basic Reference provides an overview of the structure of scripts

More information

Introduction to Python

Introduction to Python Introduction to Python Why is Python? Object-oriented Free (open source) Portable Powerful Mixable Easy to use Easy to learn Running Python Immediate mode Script mode Integrated Development Environment

More information

Level 3 Computing Year 2 Lecturer: Phil Smith

Level 3 Computing Year 2 Lecturer: Phil Smith Level 3 Computing Year 2 Lecturer: Phil Smith We looked at: Previously Reading and writing files. BTEC Level 3 Year 2 Unit 16 Procedural programming Now Now we will look at: Appending data to existing

More information

Python: Short Overview and Recap

Python: Short Overview and Recap Python: Short Overview and Recap Benjamin Roth CIS LMU Benjamin Roth (CIS LMU) Python: Short Overview and Recap 1 / 39 Data Types Object type Example creation Numbers (int, float) 123, 3.14 Strings this

More information

Key Differences Between Python and Java

Key Differences Between Python and Java Python Python supports many (but not all) aspects of object-oriented programming; but it is possible to write a Python program without making any use of OO concepts. Python is designed to be used interpretively.

More information

Dr. Sarah Abraham University of Texas at Austin Computer Science Department. Regular Expressions. Elements of Graphics CS324e Spring 2017

Dr. Sarah Abraham University of Texas at Austin Computer Science Department. Regular Expressions. Elements of Graphics CS324e Spring 2017 Dr. Sarah Abraham University of Texas at Austin Computer Science Department Regular Expressions Elements of Graphics CS324e Spring 2017 What are Regular Expressions? Describe a set of strings based on

More information

c) Comments do not cause any machine language object code to be generated. d) Lengthy comments can cause poor execution-time performance.

c) Comments do not cause any machine language object code to be generated. d) Lengthy comments can cause poor execution-time performance. 2.1 Introduction (No questions.) 2.2 A Simple Program: Printing a Line of Text 2.1 Which of the following must every C program have? (a) main (b) #include (c) /* (d) 2.2 Every statement in C

More information