Regular Expressions 1

Similar documents
Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

More Scripting and Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

Lecture 18 Regular Expressions

Regular Expressions. Regular expressions match input within a line Regular expressions are very different than shell meta-characters.

Regular Expressions. Regular Expression Syntax in Python. Achtung!

Regex, Sed, Awk. Arindam Fadikar. December 12, 2017

Pattern Matching. An Introduction to File Globs and Regular Expressions

Pattern Matching. An Introduction to File Globs and Regular Expressions. Adapted from Practical Unix and Programming Hunter College

Regular Expressions. with a brief intro to FSM Systems Skills in C and Unix

Perl Regular Expressions. Perl Patterns. Character Class Shortcuts. Examples of Perl Patterns

Regular Expressions Explained

Formulas and Functions

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

Essentials for Scientific Computing: Stream editing with sed and awk

9.2 Linux Essentials Exam Objectives

Motivation (Scenarios) Topic 4: Grep, Find & Sed. Displaying File Names. grep

Common File System Commands

CS Unix Tools & Scripting

System & Network Engineering. Regular Expressions ESA 2008/2009. Mark v/d Zwaag, Eelco Schatborn 22 september 2008

ITST Searching, Extracting & Archiving Data

Paolo Santinelli Sistemi e Reti. Regular expressions. Regular expressions aim to facilitate the solution of text manipulation problems

While Statement Examples. While Statement (35.15) Until Statement (35.15) Until Statement Example

Understanding Regular Expressions, Special Characters, and Patterns

Today s Lecture. The Unix Shell. Unix Architecture (simplified) Lecture 3: Unix Shell, Pattern Matching, Regular Expressions

Filtering Service

Systems Programming/ C and UNIX

Lecture 3 Tonight we dine in shell. Hands-On Unix System Administration DeCal

Lec-5-HW-1, TM basics

Digital Humanities. Tutorial Regular Expressions. March 10, 2014

Topic 4: Grep, Find & Sed

Certification. String Processing with Regular Expressions

Structure of Programming Languages Lecture 3

Basic Linux (Bash) Commands

UNIX files searching, and other interrogation techniques

Computer Systems and Architecture

Haskell: Lists. CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Friday, February 24, Glenn G.

Regexs with DFA and Parse Trees. CS230 Tutorial 11

Table of contents. Our goal. Notes. Notes. Notes. Summer June 29, Our goal is to see how we can use Unix as a tool for developing programs

CMSC 132: Object-Oriented Programming II

Computer Systems and Architecture

CS 301. Lecture 05 Applications of Regular Languages. Stephen Checkoway. January 31, 2018

Dr. D.M. Akbar Hussain

Lexical Analysis. Sukree Sinthupinyo July Chulalongkorn University

Describing Languages with Regular Expressions

More Scripting and Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

CS214-AdvancedUNIX. Lecture 2 Basic commands and regular expressions. Ymir Vigfusson. CS214 p.1

Regular Expressions. Michael Wrzaczek Dept of Biosciences, Plant Biology Viikki Plant Science Centre (ViPS) University of Helsinki, Finland

Essential Linux Shell Commands

5/8/2012. Exploring Utilities Chapter 5

Configuring the RADIUS Listener LEG

What is a shell? The shell is interface for user to computer OS.

Statistics 202A - vi Tutorial

Module 8 Pipes, Redirection and REGEX

Regular Expressions & Automata

sottotitolo A.A. 2016/17 Federico Reghenzani, Alessandro Barenghi

ECS 120 Lesson 7 Regular Expressions, Pt. 1

Expect Basics Tutorial 3 ECE453 & CS447 Software Testing, Quality Assurance and Maintenance

Bash Script. CIRC Summer School 2015 Baowei Liu

CMSC 330: Organization of Programming Languages. Ruby Regular Expressions

UNIX / LINUX - REGULAR EXPRESSIONS WITH SED

CSE 303 Lecture 7. Regular expressions, egrep, and sed. read Linux Pocket Guide pp , 73-74, 81

Grep and Shell Programming

psed [-an] script [file...] psed [-an] [-e script] [-f script-file] [file...]

Unleashing the Shell Hands-On UNIX System Administration DeCal Week 6 28 February 2011

Scripting. More Shell Scripts Loops. Adapted from Practical Unix and Programming Hunter College

Getting to grips with Unix and the Linux family

Notes for Comp 454 Week 2

A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994

Answers to AWK problems. Shell-Programming. Future: Using loops to automate tasks. Download and Install: Python (Windows only.) R

CHAPTER TWO LANGUAGES. Dr Zalmiyah Zakaria

Regular Expressions Overview Suppose you needed to find a specific IPv4 address in a bunch of files? This is easy to do; you just specify the IP

Regular expressions: Text editing and Advanced manipulation. HORT Lecture 4 Instructor: Kranthi Varala

Shell Scripting. Todd Kelley CST8207 Todd Kelley 1

CSE 390a Lecture 7. Regular expressions, egrep, and sed

Welcome! CSC445 Models Of Computation Dr. Lutz Hamel Tyler 251

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Files (review) and Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

Introduction to Unix

Getting Started Values, Expressions, and Statements CS GMU

Shells & Shell Programming (Part B)

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

No variable declara<ons. sum = 0! i = 1! while i <= 10 do! sum += i*i! i = i + 1! end! print "Sum of squares is #{sum}\n"!

Server-side Web Development (I3302) Semester: 1 Academic Year: 2017/2018 Credits: 4 (50 hours) Dr Antoun Yaacoub

FILTERS USING REGULAR EXPRESSIONS grep and sed

Regular Expressions. Regular Expressions. Regular Languages. Specifying Languages. Regular Expressions. Kleene Star Operation

Languages and Compilers

Chapter Seven: Regular Expressions

Class 1 Supplement: Pattern matching, and dealing with files

BNF, EBNF Regular Expressions. Programming Languages,

Wildcards and Regular Expressions

Vi & Shell Scripting

d. 1 e. test: $a: integer expression expected

Review of Fundamentals

INTRODUCTION TO SHELL SCRIPTING ITPART 2

Introduction to Unix

Unix as a Platform Exercises + Solutions. Course Code: OS 01 UNXPLAT

Filenames, globbing, greping, and regexp

CSE 154 LECTURE 11: REGULAR EXPRESSIONS

Chapter 4: Regular Expressions

Transcription:

Regular Expressions 1

Basic Regular Expression Examples Extended Regular Expressions Extended Regular Expression Examples 2

phone number 3 digits, dash, 4 digits [[:digit:]][[:digit:]][[:digit:]]-[[:digit:]][[:digit:]][[:digit:]][[:digit:]] postal code A9A 9A9 [[:upper:]][[:digit]][[:upper:]] [[:digit:]][[:upper:]][[:digit:]] email address (simplified, lame) someone@somewhere.com domain name cannot begin with digit [[:alnum:]_-][[:alnum:]_-]*@[[:alpha:]][[:alnum:]-]*\.[[:alpha:]][[:alpha:]]* 3

any line containing only alphabetic characters (at least one), and no digits or anything else ^[[:alpha:]][[:alpha:]]*$ any line that begins with digits (at least one) In other words, lines that begin with a digit ^[[:digit:]] ^[[:digit:]].*$ would match the exact same lines in grep any line that contains at least one character of any kind. ^..*$ would match the exact same lines in grep 4

5

The generic syntax: [#1]operation[#2]target Examples: Ref: http://www.tutorialspoint.com/unix/unix-vi-editor.htm 6

To do search and replace in vi, can search for a regex, then make change, then repeat search, repeat command: in vi (and sed, awk, more, less) we delimit regular expressions with / capitalize sentences any lower case character following by a period and two spaces should be replaced by a capital search for /\. [[:lower:]]/ then type 4~ then type n. as many times as necessary n moves to the next occurrence, and. repeats the capitalization command 7

uncapitalize in middle of words any upper case character following a lower case character should be made lower case type /[[:lower:]][[:upper:]] notice the second / is optional and not present here then type l to move one to the right type ~ to change the capitalization type nl. as necessary the l is needed because vi will position the cursor on the first character of the match, which in this case is a character that doesn't change. 8

Now three kinds of matching 1. Filename globbing used on shell command line, and shell matches these patterns to filenames that exist used with the find command (quote from the shell) 2. Basic Regular Expressions, used with vi (use delimiter) more (use delimiter) sed (use delimiter) awk (use delimiter) grep (no delimiter, but we quote from the shell) 3. Extended Regular Expressions less (use delimiter) grep E (no delimiter, but quote from the shell) perl regular expressions (not in this course) 9

ls a*.txt # this is filename globbing The shell expands the glob before the ls command runs The shell matches existing filenames in current directory beginning with 'a', ending in '.txt' grep 'aa*' foo.txt # regular expression Grep matches strings in foo.txt beginning with 'a' followed by zero or more 'a's the single quotes protect the '*' from shell filename globbing Be careful with quoting: grep aa* foo.txt # no single quotes, bad idea shell will try to do filename globbing on aa*, changing it into existing filenames that begin with aa before grep runs: we don't want that. 10

All of what we've officially seen so far, except that one use of parenthesis many slides back, are the Basic features of regular expressions Now we unveil the Extended features of regular expressions In the old days, Basic Regex implementations didn't have these features Now, all the Basic Regex implementations we'll encounter have these features The difference between Basic and Extended Regular expressions is whether you use a backslash to make use of these Extended features 11

Basic Extended Repetition Meaning * * zero or more times \?? zero or one times \+ + one or more times \{n\} {n} n times, n is an integer \{n,\} {n,} n or more times, n is an integer \{n,m\} {n,m} at least n, at most m times, n and m are integers 12

can do this with Basic regex in grep with e example: grep e 'abc' e 'def' foo.txt matches lines with abc or def in foo.txt \ is an infix "or" operator a\ b means a or b but not both aa*\ bb* means one or more a's, or one or more b's for extended regex, leave out the \, as in a b 13

repetition is tightest (think exponentiation) xx* means x followed by x repeated, not xx repeated concatenation is next tightest (think multiplication) aa*\ bb* means aa* or bb* alternation is the loosest or lowest precedence (think addition) Precedence can be overridden with parenthesis to do grouping 14

\( and \) can be used to group regular expressions, and override the precedence rules For Extended Regular Expressions, leave out the \, as in ( and ) abb* means ab followed by zero or more b's a\(bb\)*c means a followed by zero or more pairs of b's followed by c abbb\ cd would mean abbb or cd a\(bbb\ c\)d would mean a, followed by bbb or c, followed by d 15

Operation Regex Algebra grouping () or \(\) parentheses brackets repetition * or? or + or {n} or {n,} or {n,m} * or \? or \+ or \{n\} or \{n,\} or \{n,m\} exponentiation concatenation ab multiplication alternation or \ addition 16

To remove the special meaning of a meta character, put a backslash in front of it \* matches a literal * \. matches a literal. \\ matches a literal \ \$ matches a literal $ \^ matches a literal ^ For the extended functionality, backslash turns it on for basic regex backslash turns it off for extended regex 17

Another extended regular expression feature When you use grouping, you can refer to the n'th group with \n \(..*\)\1 means any sequence of one or more characters twice in a row The \1 in this example means whatever the thing between the first set of \( \) matched Example (basic regex): \(aa*\)b\1 means any number of a's followed by b followed by exactly the same number of a's 18

phone number 3 digits, optional dash, 4 digits we couldn't do optional single dash in basic regex [[:digit:]]{3}-?[[:digit:]]{4} postal code A9A 9A9 Same as basic regex [[:upper:]][[:digit]][[:upper:]] [[:digit:]][[:upper:]][[:digit:]] email address (simplified, lame) someone@somewhere.com domain name cannot begin with digit or dash [[:alnum:]_-]+@([[:alpha:]][[:alnum:]-]+\.)+[[:alpha:]]+ 19

20

21