Perl Regular Expressions. Perl Patterns. Character Class Shortcuts. Examples of Perl Patterns

Similar documents
Regular Expressions. Regular expressions are a powerful search-and-replace technique that is widely used in other environments (such as Unix and Perl)

Understanding Regular Expressions, Special Characters, and Patterns

This page covers the very basics of understanding, creating and using regular expressions ('regexes') in Perl.

PESIT Bangalore South Campus

Regular Expressions Overview Suppose you needed to find a specific IPv4 address in a bunch of files? This is easy to do; you just specify the IP

Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Regular expressions and case insensitivity

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

CS 230 Programming Languages

c) Comments do not cause any machine language object code to be generated. d) Lengthy comments can cause poor execution-time performance.

Regular Expressions. Steve Renals (based on original notes by Ewan Klein) ICL 12 October Outline Overview of REs REs in Python

Lec-5-HW-1, TM basics

Regular Expressions 1

Pattern Matching. An Introduction to File Globs and Regular Expressions

Regular expressions and case insensitivity

Pattern Matching. An Introduction to File Globs and Regular Expressions. Adapted from Practical Unix and Programming Hunter College

While Statement Examples. While Statement (35.15) Until Statement (35.15) Until Statement Example

Lecture 18 Regular Expressions

IT441. Regular Expressions. Handling Text: DRAFT. Network Services Administration

Full file at C How to Program, 6/e Multiple Choice Test Bank

Regular Expressions & Automata

Regex, Sed, Awk. Arindam Fadikar. December 12, 2017

Essentials for Scientific Computing: Stream editing with sed and awk

Regular Expressions. Regular Expression Syntax in Python. Achtung!

Indian Institute of Technology Kharagpur. PERL Part III. Prof. Indranil Sen Gupta Dept. of Computer Science & Engg. I.I.T.

CMSC 330: Organization of Programming Languages. Ruby Regular Expressions

System & Network Engineering. Regular Expressions ESA 2008/2009. Mark v/d Zwaag, Eelco Schatborn 22 september 2008

Advanced Handle Definition

CMSC 132: Object-Oriented Programming II

Structure of Programming Languages Lecture 3

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

Regular Expressions. Regular expressions match input within a line Regular expressions are very different than shell meta-characters.

Bioinformatics Programming. EE, NCKU Tien-Hao Chang (Darby Chang)

PHP by Pearson Education, Inc. All Rights Reserved.

psed [-an] script [file...] psed [-an] [-e script] [-f script-file] [file...]

Haskell: Lists. CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Friday, February 24, Glenn G.

The Little Regular Expressionist

Today s Lecture. The Unix Shell. Unix Architecture (simplified) Lecture 3: Unix Shell, Pattern Matching, Regular Expressions

Lecture 05 I/O statements Printf, Scanf Simple statements, Compound statements

CS Unix Tools & Scripting

LECTURE 6 Scanning Part 2

Regular Expressions. Chapter 6

Introduction to regular expressions

CSCI 4152/6509 Natural Language Processing. Lab 3: Perl Tutorial 2

Perl. Interview Questions and Answers

Regular Expressions Explained

(Refer Slide Time: 01:12)

IT 201 Digital System Design Module II Notes

UNIX / LINUX - REGULAR EXPRESSIONS WITH SED

Introduction to Regular Expressions Version 1.3. Tom Sgouros

Regex Guide. Complete Revolution In programming For Text Detection

Regular Expressions!!

More Scripting and Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

Computer Systems and Architecture

OUTLINES. Variable names in MATLAB. Matrices, Vectors and Scalar. Entering a vector Colon operator ( : ) Mathematical operations on vectors.

"Hello" " This " + "is String " + "concatenation"

1. Introduction. 2. Scalar Data

Computer Systems and Architecture

More Examples. Lex/Flex/JLex

Learning Ruby. Regular Expressions. Get at practice page by logging on to csilm.usu.edu and selecting. PROGRAMMING LANGUAGES Regular Expressions

Language Reference Manual

Principles of Programming Languages COMP251: Syntax and Grammars

LESSON 1. A C program is constructed as a sequence of characters. Among the characters that can be used in a program are:

Server-side Web Development (I3302) Semester: 1 Academic Year: 2017/2018 Credits: 4 (50 hours) Dr Antoun Yaacoub

CS 25200: Systems Programming. Lecture 10: Shell Scripting in Bash

Perl Regular Expressions Perl is renowned for its excellence in text processing. Regular expressions area big factor behind this fame.

Fundamentals of Programming. November 19, 2017

More regular expressions, synchronizing data, comparing files

Regular Expressions for Information Processing in ABAP. Ralph Benzinger SAP AG

More Details about Regular Expressions

C How to Program, 6/e by Pearson Education, Inc. All Rights Reserved.

Regular Expressions in programming. CSE 307 Principles of Programming Languages Stony Brook University

IT 374 C# and Applications/ IT695 C# Data Structures

There are four numeric types: 1. Integers, represented as a 32 bit (or longer) quantity. Digits sequences (possibly) signed are integer literals:

DVA337 HT17 - LECTURE 4. Languages and regular expressions

Pathologically Eclectic Rubbish Lister

Regular Expressions. Perl PCRE POSIX.NET Python Java

CSE 303 Lecture 7. Regular expressions, egrep, and sed. read Linux Pocket Guide pp , 73-74, 81

Features of C. Portable Procedural / Modular Structured Language Statically typed Middle level language

UNIT - I. Introduction to C Programming. BY A. Vijay Bharath

Converting regexes to Parsing Expression Grammars

BoredGames Language Reference Manual A Language for Board Games. Brandon Kessler (bpk2107) and Kristen Wise (kew2132)

12/22/11. Java How to Program, 9/e. Help you get started with Eclipse and NetBeans integrated development environments.

Oracle 1Z0-200 Exam Questions & Answers

Shell scripting Scripting and Computer Environment - Lecture 5

正则表达式 Frank from

CMSC 330: Organization of Programming Languages. Ruby Regular Expressions

CSC105, Introduction to Computer Science I. Introduction. Perl Directions NOTE : It is also a good idea to

ECS 120 Lesson 7 Regular Expressions, Pt. 1

Ray Pereda Unicon Technical Report UTR-02. February 25, Abstract

Language Reference Manual

Regular Expressions. Michael Wrzaczek Dept of Biosciences, Plant Biology Viikki Plant Science Centre (ViPS) University of Helsinki, Finland

CSE 390a Lecture 7. Regular expressions, egrep, and sed

Contents. Jairo Pava COMS W4115 June 28, 2013 LEARN: Language Reference Manual

Unit. Programming Fundamentals. School of Science and Technology INTRODUCTION

Sensitive Data Detection

Lecture 3 Tonight we dine in shell. Hands-On Unix System Administration DeCal

Here's an example of how the method works on the string "My text" with a start value of 3 and a length value of 2:

Chapter Seven: Regular Expressions

Transcription:

Perl Regular Expressions Unlike most programming languages, Perl has builtin support for matching strings using regular expressions called patterns, which are similar to the regular expressions used in Unix utilities, like grep. Can be used in conditional expressions and will return a true value if there is a match. Forms for using regular expressions will be presented later. Example: if (/hello/) # sees if hello appears anywhere in $_ Perl Patterns A Perl pattern is a combination of: literal characters to be matched directly '.' matches any single character but a newline '*' match the preceding item zero or more times '+' match the preceding item one or more times '?' match the preceding item zero or one times '(' and ')' for grouping ' ' match item on the left or item on the right [...] match one character inside the brackets Examples of Perl Patterns Character Class Shortcuts /abc/ /a.c/ /ab?c/ /ab*c/ /ab cd/ /a(b c)d/ /a(b c)+d/ /a[bcd]e/ /a[a-za-z0-9]b/ /a[^a-za-z]b/ # abc # a, any char but newline, c # ac or abc # a, zero or more b's, c # ab or cd # abd or acd # a, one or more b's or c's, d # abe or ace or ade # a, letter or digit, b # a, any character but a # letter, b Perl provides shortcuts for commonly used character classes. digit char: \d == [0-9] word char: \w == [A-Za-z0-9] whitespace char: \s == [\f\t\n\r ] nondigit: nonword: non whitespace: \D == [^\d] \W == [^\w] \S == [^\s]

General Quantifiers Can use {min,max to represent the number of repetitions for an item in a regular expression. a{1,3 # a, aa, or aaa a{5,5 # aaaaa a{5 # aaaaa a{2, # two or more a's a{0, # a* a{1, # a+ a{0,1 # a? Anchors Perl anchors provide context in which a pattern is matched. /^a/ /a$/ /^a$/ /\ba/ /a\b/ /\ba\b/ # matches a if after beginning of line # matches a if before end of line # matches a if it is a complete line # matches a if at the start of a word # matches a if at the end of a word # matches a if a complete word Remembering Substring Matches (...) is used for not only grouping, but also for remembering substrings in a pattern match. Note there are similar features in the sed Unix utility. Can refer to these substrings. Backreferences can be used inside the pattern to refer to the memory saved earlier in the current pattern. Memory variables can be used outside of the pattern to refer to the memory saved in the last pattern. Backreferences A backreference has the form \number. It indicates the string matching the memory reference in the current pattern identified by that number. In numbering backreferences, you can just count the left parentheses. /(a b)\1/ /((a b)c)\1/ /((a b)c)\2/ # match aa or bb # match acac or bcbc # match aca or bcb /(.)\1/ # match any character but newline that # appears twice in a row /(\w+)\s+\1/ /([' ]).*\1/ # match any word that appears twice in a # row and is separated by one or more # whitespace chars # match string enclosed by '...' or #...

Memory Variables A memory variable has the form $number. It indicates the string in the last pattern matching the memory reference identified by that number. # Checks if $_ has a word and prints that word. if ( /\s+(\w+)\s+/ ) { print $1, \n ; # If $_ has a '$' followed by 1 to 3 digits and # optionally followed by groups of a comma with # 3 digits, then print the price. if ( /(\$\d{1,3(,\d{3)*)/ ) { print The price is $1.\n ; Binding Operator So far we have only seen checks for patterns in $_. We can check for patterns in arbitrary strings using the =~ and!~ match operators. General form: # check if <pattern> match for <string> <string> =~ /<pattern>/ # check if there is not a <pattern> match for <string> <string>!~ /<pattern>/ Example of Using Binding Operators # If the user did not specify to exit, # then print the line. if ($line!~ /\bexit\b/) { print $line; # If a blank line, then proceed to the # next iteration. if ($line =~ /^$/) { next; Automatic Match Variables A pattern only has to match a portion of a string to return a true value. There are some automatic match variables that do not require parentheses to be specified within the pattern. $` # contains portion of the string before the match $& $' # contains portion of the string that matched # contains portion of the string after the match

Automatic Match Variable Examples # establish relationship if ( $line =~ / is the parent of / ) { print $' is the child of $`\n ; # change the assignment operator if ( $line =~ /=/ ) { print $`:=$' ; # find the first word in the line if ( $line =~ /\b\w+\b/ ) { print $& is the first word in the line.\n ; Using Other Pattern Delimiters You can use other delimiters besides slashs for patterns, as we saw with the qw shortcut for quoted words in a list. If you do use a different delimiter, then you must precede the first delimiter with an m. The m is optionable when using slashes. Note some delimiters are paired and others are nonpaired. m/.../ m{... m[...] m(...) m!...! m,..., m^...^ m#...# You should probably use slashes unless your pattern contains slashes, as your Perl code will be easier to read. Example of Using Other Pattern Delimiters Sometimes the pattern matching can be more readable when using a pattern delimiter other than a '/' when the pattern contains a '/'. # Search for the start of a URL. if ($s =~ /http:\/\//) # Search for the start of a URL. if ($s =~ m^http://^) Option Modifiers There are a set of letters that you can place after the last delimiter in a pattern to indicate how the pattern is to be interpreted. Modifier i s g Description case-insensitive matching. now matches newlines as well find all occurrences

Case-Insensitive Matching You can make a case-insensitive pattern match by putting 'i' as an option modifier after the last delimiter. /\b[uu][nn][ii][xx]\b/ /\bunix\b/i # matches the word # regardless of case # same as above Matching Any Character The '.' character in a pattern indicates to match any character but a newline. By using the 's' option modifier, the '.' character will also match newlines. # Matching a quoted string that could contain # newlines. / (. \n)* / # A more concise pattern. /.* /s Global Pattern Matching You can use the 'g' option modifier to find each match of a pattern in a string. Perl remembers the match position where it left off the last time it matched the string and will find the next match. If the string is a variable and it is modified in any way, then the match position is reset to the beginning of the string. # print each acronym in a string on a # separate line while ($s =~ /[A-Z]{2,/g) { print $&\n ; Interpolating Patterns The regular expressions allow interpolation just as double quoted strings. Thus, patterns could be read in at run time and used to match strings. # match dynamic pattern if it occurs at the # beginning of a line if ($line =~ m/^$var/) { print $line; Note that the Perl program may fail if the regular expression comprising the pattern is invalid.