Describing Languages with Regular Expressions

Size: px
Start display at page:

Download "Describing Languages with Regular Expressions"

Transcription

1 University of Oslo : Department of Informatics Describing Languages with Regular Expressions Jonathon Read 25 September 2012 INF4820: Algorithms for AI and NLP

2 Outlook How can we write programs that handle sentences?

3 Outlook How can we write programs that handle sentences? Describing languages with regular expressions Representing and implementing regular expressions using finite state automata Estimating the probability of unobserved strings of words with language models Sequence-labelling part-of-speech using Hidden Markov models

4 Productivity of languages Even simple formal languages are infinite: x =1 + 2 x = x =

5 Productivity of languages Even simple formal languages are infinite: x =1 + 2 x = x = With natural languages there are so many more choices: The fox.

6 Productivity of languages Even simple formal languages are infinite: x =1 + 2 x = x = With natural languages there are so many more choices: The fox. The hungry fox.

7 Productivity of languages Even simple formal languages are infinite: x =1 + 2 x = x = With natural languages there are so many more choices: The fox. The hungry fox. The hungry fox ate.

8 Productivity of languages Even simple formal languages are infinite: x =1 + 2 x = x = With natural languages there are so many more choices: The fox. The hungry fox. The hungry fox ate. The hungry fox ate the chicken.

9 Productivity of languages Even simple formal languages are infinite: x =1 + 2 x = x = With natural languages there are so many more choices: The fox. The hungry fox. The hungry fox ate. The hungry fox ate the chicken. The hungry fox quickly ate the chicken.

10 Productivity of languages Even simple formal languages are infinite: x =1 + 2 x = x = With natural languages there are so many more choices: The fox. The hungry fox. The hungry fox ate. The hungry fox ate the chicken. The hungry fox quickly ate the chicken. The hungry brown fox quickly ate the delicious roast chicken and washed it down with a pint of beer.

11 Characterising language Simplifying assumption A language is a set of utterances utterances inside this set are well-formed utterances not in this set are ill-formed

12 Characterising language Simplifying assumption A language is a set of utterances utterances inside this set are well-formed utterances not in this set are ill-formed How do we represent sets of utterances, if the set is infinite?

13 Regular expressions Regular expressions (RE, RegEx, RegExp): Algebraic notation for characterising sets of strings They consist of constants and operators Example /[A-Z][a-z]* \d+[a-z]?, \d{4} [A-Z][a-z]*/ Note: an implementation is supplied in many programming languages and text editors for instance, try C-M-s in emacs, or grep on the command line.

14 Matching Sequences of character constants specify how to match strings. Further expressiveness is added by metacharacters, including: Example. any single character (except new lines) ˆ the start of a line $ the end of a line /ˆChapter.$/ { Chapter 1, Chapter 2,..., Chapter & } Note: When the literal of an operator or metacharacter i.e. one of {}[]()ˆ$. *+?\ should be matched, it must be escaped using a back slash, e.g. match a full-stop with /\./

15 Disjunction The operator expresses a logical or Example /ˆa (fox wolf)$/ { a fox, a wolf } Note: The operator has low precedence brackets ensure that it does not specify the set { a fox, wolf }

16 Character classes Character classes can also be used to specify disjunction they are expressed using square brackets, [ and ]: Examples /ˆ[Ff]ox$/ { Fox, fox } /ˆf[aio]x]$/ { fax, fix, fox } /ˆ[a-z]$/ { a, b, c,..., z } /ˆChapter [1-9]$/ { Chapter 1, Chapter 2,..., Chapter 9 }

17 Character classes Used inside a character class, ˆ negates the class: Example /[ˆA-Za-z]/ matches any non-alphabetic character /[ˆ ]/ matches anything that is not a space Many implementations provide named character classes: Examples /\d/ /[[:digit:]]/ /[0-9]/ /\w/ /[[:alnum:]]/ /[a-za-z0-9 ]/ /\D/ /[ˆ0-9]/ /[[:punct:]]/ matches punctuation characters

18 Quantification Quantification can be specified in a number of ways: Example? zero or one of the preceeding element * zero or more of the preceeding element + one or more of the preceeding element {n} exactly n of the preceeding element {n,m} from n to m of the preceeding element {n,} n or more of the preceeding element {,m} less than m of the preceeding element /ˆChapter [1-9]\d*$/ { Chapter 1, Chapter 2,..., Chapter 99999,... }

19 Lazy quantification How to match quoted items? Yes, he said, but why? Normal quantification operators are greedy they will the match the largest possible sequence in the input: /.+ / { Yes, he said, but why? } This can be overridden with?, which becomes the lazy operator when used next to a quantification operator: /.+? / { Yes, but why? }

20 Capturing groups Brackets are used to specify matching groups, which (a) enforce precedence and (b) indicate groups for later reference, using an escaped number (1-9). Example <[bi]>.+?</[bi]> { <b> </b>, <i> </i>, <b> </i>, <b> </i> } <([bi])>.+?</\1> { <b> </b>, <i> </i> }

21 Putting it all together What does this match? /[A-Z][a-z]* \d+[a-z]?, \d{4} [A-Z][a-z]*/

22 Putting it all together What does this match? /[A-Z][a-z]* \d+[a-z]?, \d{4} [A-Z][a-z]*/ /[A-Z][a-z]* / a word with an initial capital, followed by a space

23 Putting it all together What does this match? /[A-Z][a-z]* \d+[a-z]?, \d{4} [A-Z][a-z]*/ /[A-Z][a-z]* / /\d+[a-z]?/ a word with an initial capital, followed by a space one or more digits, optionally followed by a capital letter

24 Putting it all together What does this match? /[A-Z][a-z]* \d+[a-z]?, \d{4} [A-Z][a-z]*/ /[A-Z][a-z]* / a word with an initial capital, followed by a space /\d+[a-z]?/ one or more digits, optionally followed by a capital letter /, / a comma and a space

25 Putting it all together What does this match? /[A-Z][a-z]* \d+[a-z]?, \d{4} [A-Z][a-z]*/ /[A-Z][a-z]* / a word with an initial capital, followed by a space /\d+[a-z]?/ one or more digits, optionally followed by a capital letter /, / a comma and a space /\d{4} / four digits, followed by a space

26 Putting it all together What does this match? /[A-Z][a-z]* \d+[a-z]?, \d{4} [A-Z][a-z]*/ /[A-Z][a-z]* / a word with an initial capital, followed by a space /\d+[a-z]?/ one or more digits, optionally followed by a capital letter /, / a comma and a space /\d{4} / four digits, followed by a space /[A-Z][a-z]*/ a word with an initial capital

27 Putting it all together What does this match? /[A-Z][a-z]* \d+[a-z]?, \d{4} [A-Z][a-z]*/ /[A-Z][a-z]* / a word with an initial capital, followed by a space /\d+[a-z]?/ one or more digits, optionally followed by a capital letter /, / a comma and a space /\d{4} / four digits, followed by a space /[A-Z][a-z]*/ a word with an initial capital Gaustadalleén 23B, 0373 Oslo

28 Some exercises Write regular expressions for the following: 1. all alphabetic strings; 2. all lower case alphabetic strings ending in a b; 3. all strings of two repeated words; 4. all strings from the alphabet a,b such that a is immediately preceeded by and immediately followed by a b. 5. capturing the first word of an English sentence (making sure to deal with punctuation)

29 Some exercises 1. all alphabetic strings; /[a-za-z]+/

30 Some exercises 1. all alphabetic strings; /[a-za-z]+/ 2. all lower case alphabetic strings ending in a b; /[a-z]*b/

31 Some exercises 1. all alphabetic strings; /[a-za-z]+/ 2. all lower case alphabetic strings ending in a b; /[a-z]*b/ 3. all strings of two repeated words, separated by a space; /([a-za-z]+) \1/

32 Some exercises 1. all alphabetic strings; /[a-za-z]+/ 2. all lower case alphabetic strings ending in a b; /[a-z]*b/ 3. all strings of two repeated words, separated by a space; /([a-za-z]+) \1/ 4. all strings from the alphabet a,b such that a is immediately preceeded by and immediately followed by a b. /b+(ab+)+/

33 Some exercises 1. all alphabetic strings; /[a-za-z]+/ 2. all lower case alphabetic strings ending in a b; /[a-z]*b/ 3. all strings of two repeated words, separated by a space; /([a-za-z]+) \1/ 4. all strings from the alphabet a,b such that a is immediately preceeded by and immediately followed by a b. /b+(ab+)+/ 5. capturing the first word of an English sentence (making sure to deal with punctuation) /ˆ[ˆa-zA-Z]*([a-zA-Z]+)/

34 Applications in AI and NLP Weizenbaum 1966 User: Men are all alike. Eliza: In what way? User: They re always bugging us about something or other. Eliza: Can you think of a specific example? User: Well, my boyfriend made me come here. Eliza: Your boyfriend made you come here? User: He says I am depressed much of the time. Eliza: I am sorry to hear you are depressed.

35 Applications in AI and NLP Weizenbaum 1966 User: Eliza: User: Eliza: User: Eliza: User: Eliza: Men are all alike. In what way? They re always bugging us about something or other. Can you think of a specific example? Well, my boyfriend made me come here. Your boyfriend made you come here? He says I am depressed much of the time. I am sorry to hear you are depressed. Can be reproduced with a cascade of regular expression substitutions, e.g. using sed: s/.* all.*/in what way/ s/.* always.*/can you think of a specific example/ s/.* I am (depressed sad).*/i am sorry to hear you are \1/

36 Applications in AI and NLP Lexical morphology s/mouse/mice/ s/(bush fox house)/\1es/ s/(.)/\1s/

37 Applications in AI and NLP Lexical morphology s/mouse/mice/ s/(bush fox house)/\1es/ s/(.)/\1s/ Concise expressions of genetic sequences: finding codons, e.g. s/cg. AG[AG]/arginine specifying patterns e.g. /CG. AG[AG].{,100}GG./

38 Summary Regular expressions A finite way of specifying infinite sets Character constants, metacharacters and operators The fundamental operations are: Matching characters, wildcards (.) and anchors (ˆ and $) Disjunction ( and [ ]) Quantification (?, *, + and {n, m}) Precedence can be enforced with brackets (( and )) More complex operations include capturing groups Next week: Finite state automata Searching state spaces

Regular Expressions. Computer Science and Engineering College of Engineering The Ohio State University. Lecture 9

Regular Expressions. Computer Science and Engineering College of Engineering The Ohio State University. Lecture 9 Regular Expressions Computer Science and Engineering College of Engineering The Ohio State University Lecture 9 Language Definition: a set of strings Examples Activity: For each above, find (the cardinality

More information

Regular Expressions Explained

Regular Expressions Explained Found at: http://publish.ez.no/article/articleprint/11/ Regular Expressions Explained Author: Jan Borsodi Publishing date: 30.10.2000 18:02 This article will give you an introduction to the world of regular

More information

Regular Expressions. using REs to find patterns. implementing REs using finite state automata. Sunday, 4 December 11

Regular Expressions. using REs to find patterns. implementing REs using finite state automata. Sunday, 4 December 11 Regular Expressions using REs to find patterns implementing REs using finite state automata REs and FSAs Regular expressions can be viewed as a textual way of specifying the structure of finite-state automata

More information

Regex, Sed, Awk. Arindam Fadikar. December 12, 2017

Regex, Sed, Awk. Arindam Fadikar. December 12, 2017 Regex, Sed, Awk Arindam Fadikar December 12, 2017 Why Regex Lots of text data. twitter data (social network data) government records web scrapping many more... Regex Regular Expressions or regex or regexp

More information

Compiler Design. 2. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 21, 2010

Compiler Design. 2. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 21, 2010 Compiler Design. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 1, 010 Contents In these slides we will see 1.Introduction, Concepts and Notations.Regular Expressions, Regular

More information

CSCI 2132 Software Development. Lecture 7: Wildcards and Regular Expressions

CSCI 2132 Software Development. Lecture 7: Wildcards and Regular Expressions CSCI 2132 Software Development Lecture 7: Wildcards and Regular Expressions Instructor: Vlado Keselj Faculty of Computer Science Dalhousie University 20-Sep-2017 (7) CSCI 2132 1 Previous Lecture Pipes

More information

CS Unix Tools & Scripting

CS Unix Tools & Scripting Cornell University, Spring 2014 1 February 7, 2014 1 Slides evolved from previous versions by Hussam Abu-Libdeh and David Slater Regular Expression A new level of mastery over your data. Pattern matching

More information

Lecture 18 Regular Expressions

Lecture 18 Regular Expressions Lecture 18 Regular Expressions In this lecture Background Text processing languages Pattern searches with grep Formal Languages and regular expressions Finite State Machines Regular Expression Grammer

More information

Regular Expressions 1

Regular Expressions 1 Regular Expressions 1 Basic Regular Expression Examples Extended Regular Expressions Extended Regular Expression Examples 2 phone number 3 digits, dash, 4 digits [[:digit:]][[:digit:]][[:digit:]]-[[:digit:]][[:digit:]][[:digit:]][[:digit:]]

More information

Paolo Santinelli Sistemi e Reti. Regular expressions. Regular expressions aim to facilitate the solution of text manipulation problems

Paolo Santinelli Sistemi e Reti. Regular expressions. Regular expressions aim to facilitate the solution of text manipulation problems aim to facilitate the solution of text manipulation problems are symbolic notations used to identify patterns in text; are supported by many command line tools; are supported by most programming languages;

More information

Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

Regular Expressions. Todd Kelley CST8207 Todd Kelley 1 Regular Expressions Todd Kelley kelleyt@algonquincollege.com CST8207 Todd Kelley 1 POSIX character classes Some Regular Expression gotchas Regular Expression Resources Assignment 3 on Regular Expressions

More information

Regular Expressions. Regular expressions match input within a line Regular expressions are very different than shell meta-characters.

Regular Expressions. Regular expressions match input within a line Regular expressions are very different than shell meta-characters. ULI101 Week 09 Week Overview Regular expressions basics Literal matching.wildcard Delimiters Character classes * repetition symbol Grouping Anchoring Search Search and replace in vi Regular Expressions

More information

Filtering Service

Filtering Service Secure E-Mail Gateway (SEG) Service Administrative Guides Email Filtering Service Regular Expressions Overview Regular Expressions Overview AT&T Secure E-Mail Gateway customers can use Regular Expressions

More information

Regular Expressions in Practice

Regular Expressions in Practice University of Kentucky UKnowledge Library Presentations University of Kentucky Libraries 12-20-2016 Regular Expressions in Practice Kathryn Lybarger University of Kentucky, kathryn.lybarger@uky.edu Click

More information

Regular Expressions. Steve Renals (based on original notes by Ewan Klein) ICL 12 October Outline Overview of REs REs in Python

Regular Expressions. Steve Renals (based on original notes by Ewan Klein) ICL 12 October Outline Overview of REs REs in Python Regular Expressions Steve Renals s.renals@ed.ac.uk (based on original notes by Ewan Klein) ICL 12 October 2005 Introduction Formal Background to REs Extensions of Basic REs Overview Goals: a basic idea

More information

Understanding Regular Expressions, Special Characters, and Patterns

Understanding Regular Expressions, Special Characters, and Patterns APPENDIXA Understanding Regular Expressions, Special Characters, and Patterns This appendix describes the regular expressions, special or wildcard characters, and patterns that can be used with filters

More information

Here's an example of how the method works on the string "My text" with a start value of 3 and a length value of 2:

Here's an example of how the method works on the string My text with a start value of 3 and a length value of 2: CS 1251 Page 1 Friday Friday, October 31, 2014 10:36 AM Finding patterns in text A smaller string inside of a larger one is called a substring. You have already learned how to make substrings in the spreadsheet

More information

Regular Expressions. Regular Expression Syntax in Python. Achtung!

Regular Expressions. Regular Expression Syntax in Python. Achtung! 1 Regular Expressions Lab Objective: Cleaning and formatting data are fundamental problems in data science. Regular expressions are an important tool for working with text carefully and eciently, and are

More information

CS 301. Lecture 05 Applications of Regular Languages. Stephen Checkoway. January 31, 2018

CS 301. Lecture 05 Applications of Regular Languages. Stephen Checkoway. January 31, 2018 CS 301 Lecture 05 Applications of Regular Languages Stephen Checkoway January 31, 2018 1 / 17 Characterizing regular languages The following four statements about the language A are equivalent The language

More information

Lecture 2. Regular Expression Parsing Awk

Lecture 2. Regular Expression Parsing Awk Lecture 2 Regular Expression Parsing Awk Shell Quoting Shell Globing: file* and file? ls file\* (the backslash key escapes wildcards) Shell Special Characters ~ Home directory ` backtick (command substitution)

More information

Computer Systems and Architecture

Computer Systems and Architecture Computer Systems and Architecture Regular Expressions Bart Meyers University of Antwerp August 29, 2012 Outline What? Tools Anchors, character sets and modifiers Advanced Regular expressions Exercises

More information

Computing Unit 3: Data Types

Computing Unit 3: Data Types Computing Unit 3: Data Types Kurt Hornik September 26, 2018 Character vectors String constants: enclosed in "... " (double quotes), alternatively single quotes. Slide 2 Character vectors String constants:

More information

Regexp. Lecture 26: Regular Expressions

Regexp. Lecture 26: Regular Expressions Regexp Lecture 26: Regular Expressions Regular expressions are a small programming language over strings Regex or regexp are not unique to Python They let us to succinctly and compactly represent classes

More information

applied regex implementing REs using finite state automata using REs to find patterns Informatics 1 School of Informatics, University of Edinburgh 1

applied regex implementing REs using finite state automata using REs to find patterns Informatics 1 School of Informatics, University of Edinburgh 1 applied regex cl implementing REs using finite state automata using REs to find patterns Informatics 1 School of Informatics, University of Edinburgh 1 Is there a regular expression for every FSM? a 1

More information

STREAM EDITOR - REGULAR EXPRESSIONS

STREAM EDITOR - REGULAR EXPRESSIONS STREAM EDITOR - REGULAR EXPRESSIONS http://www.tutorialspoint.com/sed/sed_regular_expressions.htm Copyright tutorialspoint.com It is the regular expressions that make SED powerful and efficient. A number

More information

Perl Regular Expressions. Perl Patterns. Character Class Shortcuts. Examples of Perl Patterns

Perl Regular Expressions. Perl Patterns. Character Class Shortcuts. Examples of Perl Patterns Perl Regular Expressions Unlike most programming languages, Perl has builtin support for matching strings using regular expressions called patterns, which are similar to the regular expressions used in

More information

Regular Expressions in Perl

Regular Expressions in Perl Regular Expressions in Perl Marco Baroni Computational skills for text analysis Outline Practical advice Regular expressions Practical advice The programming/testing loop Reading output from a text file

More information

CS Unix Tools. Fall 2010 Lecture 5. Hussam Abu-Libdeh based on slides by David Slater. September 17, 2010

CS Unix Tools. Fall 2010 Lecture 5. Hussam Abu-Libdeh based on slides by David Slater. September 17, 2010 Fall 2010 Lecture 5 Hussam Abu-Libdeh based on slides by David Slater September 17, 2010 Reasons to use Unix Reason #42 to use Unix: Wizardry Mastery of Unix makes you a wizard need proof? here is the

More information

Regular Expressions.

Regular Expressions. Regular Expressions http://xkcd.com/208/ Overview Regular expressions are essentially a tiny, highly specialized programming language (embedded inside Python and other languages) Can use this little language

More information

Effective Programming Practices for Economists. 17. Regular Expressions

Effective Programming Practices for Economists. 17. Regular Expressions Effective Programming Practices for Economists 17. Regular Expressions Hans-Martin von Gaudecker Department of Economics, Universität Bonn Motivation Replace all occurences of my name in the project template

More information

Digital Humanities. Tutorial Regular Expressions. March 10, 2014

Digital Humanities. Tutorial Regular Expressions. March 10, 2014 Digital Humanities Tutorial Regular Expressions March 10, 2014 1 Introduction In this tutorial we will look at a powerful technique, called regular expressions, to search for specific patterns in corpora.

More information

Learning Ruby. Regular Expressions. Get at practice page by logging on to csilm.usu.edu and selecting. PROGRAMMING LANGUAGES Regular Expressions

Learning Ruby. Regular Expressions. Get at practice page by logging on to csilm.usu.edu and selecting. PROGRAMMING LANGUAGES Regular Expressions Learning Ruby Regular Expressions Get at practice page by logging on to csilm.usu.edu and selecting PROGRAMMING LANGUAGES Regular Expressions Regular Expressions A regular expression is a special sequence

More information

Structure of Programming Languages Lecture 3

Structure of Programming Languages Lecture 3 Structure of Programming Languages Lecture 3 CSCI 6636 4536 Spring 2017 CSCI 6636 4536 Lecture 3... 1/25 Spring 2017 1 / 25 Outline 1 Finite Languages Deterministic Finite State Machines Lexical Analysis

More information

Ling/CSE 472: Introduction to Computational Linguistics. 4/6/15: Morphology & FST 2

Ling/CSE 472: Introduction to Computational Linguistics. 4/6/15: Morphology & FST 2 Ling/CSE 472: Introduction to Computational Linguistics 4/6/15: Morphology & FST 2 Overview Review: FSAs & FSTs XFST xfst demo Examples of FSTs for spelling change rules Reading questions Review: FSAs

More information

More Details about Regular Expressions

More Details about Regular Expressions More Details about Regular Expressions Basic Regular Expression Notation Summary of basic notations to match single characters and sequences of characters 1. /[abc]/ = /a b c/ Character class; disjunction

More information

Regular Expressions. Michael Wrzaczek Dept of Biosciences, Plant Biology Viikki Plant Science Centre (ViPS) University of Helsinki, Finland

Regular Expressions. Michael Wrzaczek Dept of Biosciences, Plant Biology Viikki Plant Science Centre (ViPS) University of Helsinki, Finland Regular Expressions Michael Wrzaczek Dept of Biosciences, Plant Biology Viikki Plant Science Centre (ViPS) University of Helsinki, Finland November 11 th, 2015 Regular expressions provide a flexible way

More information

Pattern Matching. An Introduction to File Globs and Regular Expressions

Pattern Matching. An Introduction to File Globs and Regular Expressions Pattern Matching An Introduction to File Globs and Regular Expressions Copyright 2006 2009 Stewart Weiss The danger that lies ahead Much to your disadvantage, there are two different forms of patterns

More information

Pieter van den Hombergh. April 13, 2018

Pieter van den Hombergh. April 13, 2018 Intro ergh Fontys Hogeschool voor Techniek en Logistiek April 13, 2018 ergh/fhtenl April 13, 2018 1/11 Regex? are a very power, but also complex tool. There is the saying that: Intro If you start with

More information

Computer Systems and Architecture

Computer Systems and Architecture Computer Systems and Architecture Stephen Pauwels Regular Expressions Academic Year 2018-2019 Outline What is a Regular Expression? Tools Anchors, Character sets and Modifiers Advanced Regular Expressions

More information

Pattern Matching. An Introduction to File Globs and Regular Expressions. Adapted from Practical Unix and Programming Hunter College

Pattern Matching. An Introduction to File Globs and Regular Expressions. Adapted from Practical Unix and Programming Hunter College Pattern Matching An Introduction to File Globs and Regular Expressions Adapted from Practical Unix and Programming Hunter College Copyright 2006 2009 Stewart Weiss The danger that lies ahead Much to your

More information

Table ofcontents. Preface. 1: Introduction to Regular Expressions xv

Table ofcontents. Preface. 1: Introduction to Regular Expressions xv Preface... xv 1: Introduction to Regular Expressions... 1 Solving Real Problems.. 2 Regular Expressions as a Language.. 4 The Filename Analogy.. 4 The Language Analogy 5 The Regular-Expression Frame of

More information

Lesson 10: Representing, Naming, and Evaluating Functions

Lesson 10: Representing, Naming, and Evaluating Functions : Representing, Naming, and Evaluation Functions Classwork Opening Exercise Study the 4 representations of a function below. How are these representations alike? How are they different? TABLE: Input 0

More information

Bioinformatics Programming. EE, NCKU Tien-Hao Chang (Darby Chang)

Bioinformatics Programming. EE, NCKU Tien-Hao Chang (Darby Chang) Bioinformatics Programming EE, NCKU Tien-Hao Chang (Darby Chang) 1 Regular Expression 2 http://rp1.monday.vip.tw1.yahoo.net/res/gdsale/st_pic/0469/st-469571-1.jpg 3 Text patterns and matches A regular

More information

successes without magic London,

successes without magic London, (\d)(?:\u0020 \u0209 \u202f \u200a){0,1}((m mm cm km V mv µv l ml C Nm A ma bar s kv Hz khz M Hz t kg g mg W kw MW Ah mah N kn obr min µm µs Pa MPa kpa hpa mbar µf db)\b) ^\t*'.+?' => ' (\d+)(,)(\d+)k

More information

CSE528 Natural Language Processing Venue:ADB-405 Topic: Regular Expressions & Automata. www. l ea rn ersd esk.weeb l y. com

CSE528 Natural Language Processing Venue:ADB-405 Topic: Regular Expressions & Automata. www. l ea rn ersd esk.weeb l y. com CSE528 Natural Language Processing Venue:ADB-405 Topic: Regular Expressions & Automata Prof. Tulasi Prasad Sariki, SCSE, VIT Chennai Campus www. l ea rn ersd esk.weeb l y. com Contents NLP Example: Chat

More information

System & Network Engineering. Regular Expressions ESA 2008/2009. Mark v/d Zwaag, Eelco Schatborn 22 september 2008

System & Network Engineering. Regular Expressions ESA 2008/2009. Mark v/d Zwaag, Eelco Schatborn 22 september 2008 1 Regular Expressions ESA 2008/2009 Mark v/d Zwaag, Eelco Schatborn eelco@os3.nl 22 september 2008 Today: Regular1 Expressions and Grammars Formal Languages Context-free grammars; BNF, ABNF Unix Regular

More information

CSE : Python Programming

CSE : Python Programming CSE 399-004: Python Programming Lecture 11: Regular expressions April 2, 2007 http://www.seas.upenn.edu/~cse39904/ Announcements About those meeting from last week If I said I was going to look into something

More information

Advanced Handle Definition

Advanced Handle Definition Tutorial for Windows and Macintosh Advanced Handle Definition 2017 Gene Codes Corporation Gene Codes Corporation 525 Avis Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) +1.734.769.7249 (elsewhere)

More information

Regular Expressions. James Balamuta STAT UIUC. Lecture 25: Nov 9, 2018

Regular Expressions. James Balamuta STAT UIUC. Lecture 25: Nov 9, 2018 Lecture 25: Nov 9, 2018 Regular Expressions Regular Expressions Using Regex Literal Characters, Metacharacters, Character Classes, Quantifiers, Groups, Backreferences, Anchors Resources James Balamuta

More information

More Scripting and Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

More Scripting and Regular Expressions. Todd Kelley CST8207 Todd Kelley 1 More Scripting and Regular Expressions Todd Kelley kelleyt@algonquincollege.com CST8207 Todd Kelley 1 Regular Expression Summary Regular Expression Examples Shell Scripting 2 Do not confuse filename globbing

More information

Wildcards and Regular Expressions

Wildcards and Regular Expressions CSCI 2132: Software Development Wildcards and Regular Expressions Norbert Zeh Faculty of Computer Science Dalhousie University Winter 2019 Searching Problem: Find all files whose names match a certain

More information

Dr. Sarah Abraham University of Texas at Austin Computer Science Department. Regular Expressions. Elements of Graphics CS324e Spring 2017

Dr. Sarah Abraham University of Texas at Austin Computer Science Department. Regular Expressions. Elements of Graphics CS324e Spring 2017 Dr. Sarah Abraham University of Texas at Austin Computer Science Department Regular Expressions Elements of Graphics CS324e Spring 2017 What are Regular Expressions? Describe a set of strings based on

More information

Lecture 11: Regular Expressions. LING 1330/2330: Introduction to Computational Linguistics Na-Rae Han

Lecture 11: Regular Expressions. LING 1330/2330: Introduction to Computational Linguistics Na-Rae Han Lecture 11: Regular Expressions LING 1330/2330: Introduction to Computational Linguistics Na-Rae Han Outline Language and Computers, Ch.4 Searching 4.4 Searching semi-structured data with regular expressions

More information

Regexs with DFA and Parse Trees. CS230 Tutorial 11

Regexs with DFA and Parse Trees. CS230 Tutorial 11 Regexs with DFA and Parse Trees CS230 Tutorial 11 Regular Expressions (Regex) This way of representing regular languages using metacharacters. Here are some of the most important ones to know: -- OR example:

More information

Motivation (Scenarios) Topic 4: Grep, Find & Sed. Displaying File Names. grep

Motivation (Scenarios) Topic 4: Grep, Find & Sed. Displaying File Names. grep Topic 4: Grep, Find & Sed grep: a tool for searching for strings within files find: a tool for examining a directory tree sed: a tool for "batch editing" Associated topic: regular expressions Motivation

More information

Who This Book Is For What This Book Covers How This Book Is Structured What You Need to Use This Book. Source Code

Who This Book Is For What This Book Covers How This Book Is Structured What You Need to Use This Book. Source Code Contents Introduction Who This Book Is For What This Book Covers How This Book Is Structured What You Need to Use This Book Conventions Source Code Errata p2p.wrox.com xxi xxi xxii xxii xxiii xxiii xxiv

More information

Regular Expressions. Perl PCRE POSIX.NET Python Java

Regular Expressions. Perl PCRE POSIX.NET Python Java ModSecurity rules rely heavily on regular expressions to allow you to specify when a rule should or shouldn't match. This appendix teaches you the basics of regular expressions so that you can better make

More information

--- stands for the horizontal line.

--- stands for the horizontal line. Content Proofs on zoxiy Subproofs on zoxiy Constants in proofs with quantifiers Boxed constants on zoxiy Proofs on zoxiy When you start an exercise, you re already given the basic form of the proof, with

More information

IB047. Unix Text Tools. Pavel Rychlý Mar 3.

IB047. Unix Text Tools. Pavel Rychlý Mar 3. Unix Text Tools pary@fi.muni.cz 2014 Mar 3 Unix Text Tools Tradition Unix has tools for text processing from the very beginning (1970s) Small, simple tools, each tool doing only one operation Pipe (pipeline):

More information

Introduction to: Computers & Programming: Using Patterns with Strings For Search and Modification

Introduction to: Computers & Programming: Using Patterns with Strings For Search and Modification Introduction to: Computers & Programming: Using Patterns with Strings For Search and Modification Adam Meyers New York University Outline Eliza a famous AI program using patterns in strings What is a string

More information

Lecture 5, Regular Expressions September 2014

Lecture 5, Regular Expressions September 2014 Lecture 5, Regular Expressions 36-350 10 September 2014 In Our Last Thrilling Episode Characters and strings Matching strings, splitting on strings, counting strings We need a ways to compute with patterns

More information

CST Lab #5. Student Name: Student Number: Lab section:

CST Lab #5. Student Name: Student Number: Lab section: CST8177 - Lab #5 Student Name: Student Number: Lab section: Working with Regular Expressions (aka regex or RE) In-Lab Demo - List all the non-user accounts in /etc/passwd that use /sbin as their home directory.

More information

set in Options). Returns the cursor to its position prior to the Correct command.

set in Options). Returns the cursor to its position prior to the Correct command. Dragon Commands Summary Dragon Productivity Commands Relative to Dragon for Windows v14 or higher Dictation success with Dragon depends on just a few commands that provide about 95% of the functionality

More information

Expressions, Text Normalization, Edit Distance

Expressions, Text Normalization, Edit Distance Speech and Language Processing. Daniel Jurafsky & James H. Martin. Copyright c 2016. All rights reserved. Draft of August 7, 2017. CHAPTER 2 Regular Expressions, Text Normalization, Edit Distance User:

More information

More regular expressions, synchronizing data, comparing files

More regular expressions, synchronizing data, comparing files More regular expressions, synchronizing data, comparing files Laboratory of Genomics & Bioinformatics in Parasitology Department of Parasitology, ICB, USP Regular expressions POSIX regular expressions

More information

Download the examples: LabWeek5examples..py or download LabWeek5examples.txt and rename it as.py from the LabExamples folder or from blackboard.

Download the examples: LabWeek5examples..py or download LabWeek5examples.txt and rename it as.py from the LabExamples folder or from blackboard. NLP Lab Session Week 5 September 25, 2013 Regular Expressions and Tokenization So far, we have depended on the NLTK wordpunct tokenizer for our tokenization. Not only does the NLTK have other tokenizers,

More information

1 Finite Representations of Languages

1 Finite Representations of Languages 1 Finite Representations of Languages Languages may be infinite sets of strings. We need a finite notation for them. There are at least four ways to do this: 1. Language generators. The language can be

More information

Midterm I - Solution CS164, Spring 2014

Midterm I - Solution CS164, Spring 2014 164sp14 Midterm 1 - Solution Midterm I - Solution CS164, Spring 2014 March 3, 2014 Please read all instructions (including these) carefully. This is a closed-book exam. You are allowed a one-page handwritten

More information

CSE 390a Lecture 7. Regular expressions, egrep, and sed

CSE 390a Lecture 7. Regular expressions, egrep, and sed CSE 390a Lecture 7 Regular expressions, egrep, and sed slides created by Marty Stepp, modified by Jessica Miller and Ruth Anderson http://www.cs.washington.edu/390a/ 1 2 Lecture summary regular expression

More information

Compiler Construction LECTURE # 3

Compiler Construction LECTURE # 3 Compiler Construction LECTURE # 3 The Course Course Code: CS-4141 Course Title: Compiler Construction Instructor: JAWAD AHMAD Email Address: jawadahmad@uoslahore.edu.pk Web Address: http://csandituoslahore.weebly.com/cc.html

More information

Introduction to Regular Expressions Version 1.3. Tom Sgouros

Introduction to Regular Expressions Version 1.3. Tom Sgouros Introduction to Regular Expressions Version 1.3 Tom Sgouros June 29, 2001 2 Contents 1 Beginning Regular Expresions 5 1.1 The Simple Version........................ 6 1.2 Difficult Characters........................

More information

Informatics 1 - Computation & Logic: Tutorial 3

Informatics 1 - Computation & Logic: Tutorial 3 Informatics - Computation & Logic: Tutorial Counting Week 5: 6- October 7 Please attempt the entire worksheet in advance of the tutorial, and bring all work with you. Tutorials cannot function properly

More information

Searching Guide. September 16, Version 9.3

Searching Guide. September 16, Version 9.3 Searching Guide September 16, 2016 - Version 9.3 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

More information

R E G U L A R E X P R E S S I O N S

R E G U L A R E X P R E S S I O N S R E G U L A R E X P R E S S I O N S F O R D ATA C L E A N U P I N S I E R R A Lloyd Chittenden Union Catalog Coordinator Marmot Library Network WHAT ARE REGULAR EXPRESSIONS? Combine literal characters

More information

1 de 6 07/03/ :28 p.m.

1 de 6 07/03/ :28 p.m. 1 de 6 07/03/2007 11:28 p.m. Published on Perl.com http://www.perl.com/pub/a/2000/11/begperl3.html See this if you're having trouble printing code examples Beginner's Introduction to Perl - Part 3 By Doug

More information

Fundamentals: Expressions and Assignment

Fundamentals: Expressions and Assignment Fundamentals: Expressions and Assignment A typical Python program is made up of one or more statements, which are executed, or run, by a Python console (also known as a shell) for their side effects e.g,

More information

Set and Set Operations

Set and Set Operations Set and Set Operations Introduction A set is a collection of objects. The objects in a set are called elements of the set. A well defined set is a set in which we know for sure if an element belongs to

More information

CS Unix Tools & Scripting Lecture 7 Working with Stream

CS Unix Tools & Scripting Lecture 7 Working with Stream CS2043 - Unix Tools & Scripting Lecture 7 Working with Streams Spring 2015 1 February 4, 2015 1 based on slides by Hussam Abu-Libdeh, Bruno Abrahao and David Slater over the years Announcements Course

More information

Tips and Tricks for Making the Most of Create Lists

Tips and Tricks for Making the Most of Create Lists Tips and Tricks for Making the Most of Create Lists Matching and More Mike Monaco Coordinator, Cataloging Services The University of Akron mmonaco@uakron.edu OH-IUG October 12, 2018 The University of Akron

More information

Tuesday, September 30, 14.

Tuesday, September 30, 14. http://xkcd.com/208/ 1 Lecture 9 Regexes, Finite State Automata Intro to NLP, CS585, Fall 2014 http://people.cs.umass.edu/~brenocon/inlp2014/ Brendan O Connor (http://brenocon.com) 2 Exercise 5 out - due

More information

BNF, EBNF Regular Expressions. Programming Languages,

BNF, EBNF Regular Expressions. Programming Languages, BNF, EBNF Regular Expressions Programming Languages, 234319 1 Reminder - (E)BNF A notation for describing the grammar of a language The notation consists of: Terminals: the actual legal strings, written

More information

Regular Expressions Overview Suppose you needed to find a specific IPv4 address in a bunch of files? This is easy to do; you just specify the IP

Regular Expressions Overview Suppose you needed to find a specific IPv4 address in a bunch of files? This is easy to do; you just specify the IP Regular Expressions Overview Suppose you needed to find a specific IPv4 address in a bunch of files? This is easy to do; you just specify the IP address as a string and do a search. But, what if you didn

More information

Introduction to Lexing and Parsing

Introduction to Lexing and Parsing Introduction to Lexing and Parsing ECE 351: Compilers Jon Eyolfson University of Waterloo June 18, 2012 1 Riddle Me This, Riddle Me That What is a compiler? 1 Riddle Me This, Riddle Me That What is a compiler?

More information

Functional Programming in Haskell Prof. Madhavan Mukund and S. P. Suresh Chennai Mathematical Institute

Functional Programming in Haskell Prof. Madhavan Mukund and S. P. Suresh Chennai Mathematical Institute Functional Programming in Haskell Prof. Madhavan Mukund and S. P. Suresh Chennai Mathematical Institute Module # 02 Lecture - 03 Characters and Strings So, let us turn our attention to a data type we have

More information

Unix Introduction. Part 2

Unix Introduction. Part 2 Unix Introduction Part 2 More Unix Commands wc touch cp - copy review Let's copy a directory tree with recursion. Remember that in order to use the -r option when copying directory hierarchies. Make sure

More information

Introduction to Automata Theory. BİL405 - Automata Theory and Formal Languages 1

Introduction to Automata Theory. BİL405 - Automata Theory and Formal Languages 1 Introduction to Automata Theory BİL405 - Automata Theory and Formal Languages 1 Automata, Computability and Complexity Automata, Computability and Complexity are linked by the question: What are the fundamental

More information

Chapter 17. Fundamental Concepts Expressed in JavaScript

Chapter 17. Fundamental Concepts Expressed in JavaScript Chapter 17 Fundamental Concepts Expressed in JavaScript Learning Objectives Tell the difference between name, value, and variable List three basic data types and the rules for specifying them in a program

More information

Regular Expressions for Linguists: A Life Skill

Regular Expressions for Linguists: A Life Skill .. Regular Expressions for Linguists: A Life Skill Michael Yoshitaka Erlewine mitcho@mitcho.com Hackl Lab Turkshop March 2013 Regular Expressions What are regular expressions? Regular Expressions (aka

More information

Systems Programming/ C and UNIX

Systems Programming/ C and UNIX Systems Programming/ C and UNIX December 7-10, 2017 1/17 December 7-10, 2017 1 / 17 Outline 1 2 Using find 2/17 December 7-10, 2017 2 / 17 String Pattern Matching Tools Regular Expressions Simple Examples

More information

User Commands sed ( 1 )

User Commands sed ( 1 ) NAME sed stream editor SYNOPSIS /usr/bin/sed [-n] script [file...] /usr/bin/sed [-n] [-e script]... [-f script_file]... [file...] /usr/xpg4/bin/sed [-n] script [file...] /usr/xpg4/bin/sed [-n] [-e script]...

More information

Basics Wildcard and multipliers Special characters Negation Other functions Programming. Regular Expressions. Web Programming

Basics Wildcard and multipliers Special characters Negation Other functions Programming. Regular Expressions. Web Programming Regular Expressions Web Programming Uta Priss ZELL, Ostfalia University 2013 Web Programming Regular Expressions Slide 1/17 Outline Basics Wildcard and multipliers Special characters Negation Other functions

More information

Perl Regular Expressions Perl is renowned for its excellence in text processing. Regular expressions area big factor behind this fame.

Perl Regular Expressions Perl is renowned for its excellence in text processing. Regular expressions area big factor behind this fame. Perl Regular Expressions Perl Regular Expressions Perl is renowned for its excellence in text processing. Regular expressions area big factor behind this fame. UVic SEng 265 Daniel M. German Department

More information

1 CS580W-01 Quiz 1 Solution

1 CS580W-01 Quiz 1 Solution 1 CS580W-01 Quiz 1 Solution Date: Wed Sep 26 2018 Max Points: 15 Important Reminder As per the course Academic Honesty Statement, cheating of any kind will minimally result in receiving an F letter grade

More information

Certification. String Processing with Regular Expressions

Certification. String Processing with Regular Expressions Certification String Processing with Regular Expressions UNIT 4 String Processing with Regular Expressions UNIT 4: Objectives? Learn how the regular expression pattern matching system works? Explore the

More information

Behaviour Diagrams UML

Behaviour Diagrams UML Behaviour Diagrams UML Behaviour Diagrams Structure Diagrams are used to describe the static composition of components (i.e., constraints on what intstances may exist at run-time). Interaction Diagrams

More information

Part III. Shell Config. Tobias Neckel: Scripting with Bash and Python Compact Max-Planck, February 16-26,

Part III. Shell Config. Tobias Neckel: Scripting with Bash and Python Compact Max-Planck, February 16-26, Part III Shell Config Compact Course @ Max-Planck, February 16-26, 2015 33 Special Directories. current directory.. parent directory ~ own home directory ~user home directory of user ~- previous directory

More information

Project 2: Eliza Due: 7:00 PM, Nov 3, 2017

Project 2: Eliza Due: 7:00 PM, Nov 3, 2017 CS17 Integrated Introduction to Computer Science Hughes Contents Project 2: Eliza Due: 7:00 PM, Nov 3, 2017 1 Introduction 1 2 Regular Expressions (Regexp) 1 2.1 Any * Details........................................

More information

Object-Oriented Software Engineering CS288

Object-Oriented Software Engineering CS288 Object-Oriented Software Engineering CS288 1 Regular Expressions Contents Material for this lecture is based on the Java tutorial from Sun Microsystems: http://java.sun.com/docs/books/tutorial/essential/regex/index.html

More information

CS 2112 Lab: Regular Expressions

CS 2112 Lab: Regular Expressions October 10, 2012 Regex Overview Regular Expressions, also known as regex or regexps are a common scheme for pattern matching regex supports matching individual characters as well as categories and ranges

More information

A program that performs lexical analysis may be termed a lexer, tokenizer, or scanner, though scanner is also a term for the first stage of a lexer.

A program that performs lexical analysis may be termed a lexer, tokenizer, or scanner, though scanner is also a term for the first stage of a lexer. Compiler Design A compiler is computer software that transforms computer code written in one programming language (the source language) into another programming language (the target language). The name

More information