Wildcards and Regular Expressions

Similar documents
CSCI 2132 Software Development. Lecture 7: Wildcards and Regular Expressions

22-Sep CSCI 2132 Software Development Lecture 8: Shells, Processes, and Job Control. Faculty of Computer Science, Dalhousie University

CSCI 2132: Software Development. Norbert Zeh. Faculty of Computer Science Dalhousie University. Shell Scripting. Winter 2019

Table of contents. Our goal. Notes. Notes. Notes. Summer June 29, Our goal is to see how we can use Unix as a tool for developing programs

CSCI 2132 Software Development. Lecture 8: Introduction to C

CS 307: UNIX PROGRAMMING ENVIRONMENT FIND COMMAND

EECS2301. Lab 1 Winter 2016

UNIX files searching, and other interrogation techniques

Common File System Commands

Unix Shells and Other Basic Concepts

Lecture 18 Regular Expressions

ITST Searching, Extracting & Archiving Data

Essentials for Scientific Computing: Stream editing with sed and awk

Introduction to UNIX. Introduction. Processes. ps command. The File System. Directory Structure. UNIX is an operating system (OS).

Introduction to UNIX. CSE 2031 Fall November 5, 2012

5/20/2007. Touring Essential Programs

Pattern Matching. An Introduction to File Globs and Regular Expressions

Advanced Linux Commands & Shell Scripting

Systems Programming/ C and UNIX

Shell Control Structures

28-Nov CSCI 2132 Software Development Lecture 33: Shell Scripting. 26 Shell Scripting. Faculty of Computer Science, Dalhousie University

Essentials for Scientific Computing: Bash Shell Scripting Day 3

COMP 4/6262: Programming UNIX

Files and Directories

Shell Programming Overview

More Scripting and Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

Structure of Programming Languages Lecture 3

Pattern Matching. An Introduction to File Globs and Regular Expressions. Adapted from Practical Unix and Programming Hunter College

While Statement Examples. While Statement (35.15) Until Statement (35.15) Until Statement Example

Regular Expressions. with a brief intro to FSM Systems Skills in C and Unix

Introduction p. 1 Who Should Read This Book? p. 1 What You Need to Know Before Reading This Book p. 2 How This Book Is Organized p.

CST Lab #5. Student Name: Student Number: Lab section:

CSE 374 Midterm Exam Sample Solution 2/6/12

Basic Linux (Bash) Commands

CS Unix Tools & Scripting

Grep and Shell Programming

Shell Control Structures

CMPS 12A Introduction to Programming Lab Assignment 7

Module 8 Pipes, Redirection and REGEX

CSCI 2132 Software Development. Lecture 4: Files and Directories

27-Sep CSCI 2132 Software Development Lab 4: Exploring bash and C Compilation. Faculty of Computer Science, Dalhousie University

Dalhousie University CSCI 2132 Software Development Winter 2018 Lab 2, January 25

Today s Lecture. The Unix Shell. Unix Architecture (simplified) Lecture 3: Unix Shell, Pattern Matching, Regular Expressions

Command Interpreters. command-line (e.g. Unix shell) On Unix/Linux, bash has become defacto standard shell.

Shell Script Example. Here is a hello world shell script: $ ls -l -rwxr-xr-x 1 horner 48 Feb 19 11:50 hello* $ cat hello #!/bin/sh

Basics. I think that the later is better.

Scripting Languages Course 1. Diana Trandabăț

Advanced training. Linux components Command shell. LiLux a.s.b.l.

Bashed One Too Many Times. Features of the Bash Shell St. Louis Unix Users Group Jeff Muse, Jan 14, 2009

Introduction to Unix Week 3

Introduction to UNIX. Introduction EECS l UNIX is an operating system (OS). l Our goals:

Linux shell programming for Raspberry Pi Users - 2

Lecture 3 Tonight we dine in shell. Hands-On Unix System Administration DeCal

Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

CHAPTER 2 THE UNIX SHELLS

CSCI 2132 Software Development. Lecture 5: File Permissions

CSCI 2132: Software Development. Norbert Zeh. Faculty of Computer Science Dalhousie University. Introduction to C. Winter 2019

UNIX Essentials Featuring Solaris 10 Op System

find starting-directory -name filename -user username

Unleashing the Shell Hands-On UNIX System Administration DeCal Week 6 28 February 2011

CSCI 4061: Pipes and FIFOs

Awk & Regular Expressions

Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

CSC UNIX System, Spring 2015

CSCI 211 UNIX Lab. Shell Programming. Dr. Jiang Li. Jiang Li, Ph.D. Department of Computer Science

Regular Expressions. Regular expressions match input within a line Regular expressions are very different than shell meta-characters.

Week Overview. Simple filter commands: head, tail, cut, sort, tr, wc grep utility stdin, stdout, stderr Redirection and piping /dev/null file

Chapter 4. Unix Tutorial. Unix Shell

Writing Shell Scripts part 1

Shells. A shell is a command line interpreter that is the interface between the user and the OS. The shell:

5/8/2012. Exploring Utilities Chapter 5

Review of Fundamentals

There are some string operators that can be used in the test statement to perform string comparison.

: the User (owner) for this file (your cruzid, when you do it) Position: directory flag. read Group.

The Shell. EOAS Software Carpentry Workshop. September 20th, 2016

Reading and manipulating files

Unix/Linux Primer. Taras V. Pogorelov and Mike Hallock School of Chemical Sciences, University of Illinois

sottotitolo A.A. 2016/17 Federico Reghenzani, Alessandro Barenghi

Bourne Shell Reference

Answers to AWK problems. Shell-Programming. Future: Using loops to automate tasks. Download and Install: Python (Windows only.) R

Lab 4: Shell Scripting

Cisco IOS Shell. Finding Feature Information. Prerequisites for Cisco IOS.sh. Last Updated: December 14, 2012

Bash Programming. Student Workbook

Lecture 2. Regular Expression Parsing Awk

Regular Expressions. Michael Wrzaczek Dept of Biosciences, Plant Biology Viikki Plant Science Centre (ViPS) University of Helsinki, Finland

CENG 334 Computer Networks. Laboratory I Linux Tutorial

Principles of Bioinformatics. BIO540/STA569/CSI660 Fall 2010

Linux Shell Script. J. K. Mandal

University of Windsor : System Programming Winter Midterm 01-1h20mn. Instructor: Dr. A. Habed

Course Project 2 Regular Expressions

QUESTION BANK ON UNIX & SHELL PROGRAMMING-502 (CORE PAPER-2)

A Brief Introduction to Unix

More on Unix Shell SEEM

Part III. Shell Config. Tobias Neckel: Scripting with Bash and Python Compact Max-Planck, February 16-26,

CSCI 2132 Software Development. Lecture 3: Unix Shells and Other Basic Concepts

Shells and Shell Programming

Introduction: What is Unix?

Lab Working with Linux Command Line

Assignment 1. CSCI 2132: Software Development. Due Febuary 4, 2019

A shell can be used in one of two ways:

Transcription:

CSCI 2132: Software Development Wildcards and Regular Expressions Norbert Zeh Faculty of Computer Science Dalhousie University Winter 2019

Searching Problem: Find all files whose names match a certain pattern Find all files that contain a certain text pattern... Tools: Wildcards (shell) Regular expressions (grep and other tools)

Filename Substitution (Wildcards) Also known as pathname substitution or pathname expansion Used to specify patterns that match multiple pathnames Makes use of wildcards (metacharacters expanded by the shell) Some wildcards:? matches any single character * matches any string [a-z_] matches any character in the range a.. z and _ [!a-z] or [^a-z] matches any character not in the range a.. z

File Substitution Examples [0-9] [a-za-z] [unix] any digit any English letter any of the characters u, n, i, x ls ~/csci2132/lab1 /*.java List Java files in csci2132/lab1 ls *.???? List all files with 4-character extension ls lab[1 9] List all files with names lab1,..., lab9 ls [!0-9]* List all files whose names don t start with a digit cp lab1.bk /*.java lab1/ Copy Java files from the lab1.bk directory to the lab1 directory

More Examples ls ~/csci2132/lab1/h????world.java ls H* ls [!A-Z]* ls */*/*.java ls *.java */*.java echo.* (echo prints out its command line arguments, useful in scripts) cat *.txt > allfiles

Regular Expressions Patterns used to match strings Used in fast and flexible text search tools Name comes from regular sets defined by Stephen Kleene Can be matched using deterministic finite automaton (DFA) Kleene s notation implemented in QED editor to match patterns (author Ken Thompson) Thompson later added this to the UNIX editor ed Led to the tool grep (Name comes from ed command g/re/p: global search for regular expression and print matching lines.)

Reading about Regular Expressions The Unix book Chapter 3, Filtering Files (page 84) Appendix, Regular Expressions (page 665)

Two Types of Regular Expressions Basic regular expressions follow exactly the definition of regular sets by Kleene and can be matched using a DFA. Extended regular expressions add extensions that Make regular expressions more powerful Cannot be matched using a DFA but...... can still be matched efficiently.

Basic Regular Expressions Made up of characters and metacharacters: Metacharacters:. ( ) [ ] *? ^ $ \ Anything that is not a metacharacter matches itself Metacharacters:. [...] (expr) expr* expr? \char ^ $ matches any character matches a character class analogously to wildcards (metacharacters are not special; negation using only ^, not!) matches the expr (grouping) matches any sequence of strings that match expr matches 0 or 1 string that matches expr matches char even if char is a metacharacter matches the beginning of the line matches the end of the line

Examples of Basic Regular Expressions One or more spaces: * Empty line: ^$ Formatted dollar amount: \$[0-9][0-9]*\.[0-9][0-9]

Filters A filter is a program that reads text from stdin, transforms it, and outputs the result to stdout. Often used as elements of pipelines.

grep grep is a filter that reads its input line by line and prints all lines that match a given pattern Input: stdin if no files given on command line Otherwise, the listed files General use: grep [options] <pattern> [files]

grep Options None: Pattern is interpreted as a basic regular expression -E: Pattern is interpreted as an extended regular expression -F: Pattern is interpreted as a fixed string -n: Precede each output line by its line number -i: Ignore case (lowercase/uppercase) when looking for matches -v: Output the lines that do not match -w: Restrict matches to whole words

grep Example Consider the following file prices: Chocolate $1.23 each Candy $.56 each Jacket $278.00 </pre> <pre>$44.00 $44 If we enter $ grep \$[0-9][0-9]*\.[0-9][0-9] prices what is the output? Chocolate $1.23 each Jacket $278.00 </pre> <pre>$44.00

Another grep Example The file /usr/share/dict/linux.words contains a dictionary of English words. What grep command can we use to find all 5-letter words that start with a or b and end with b? $ grep ^[ab]...b$ /usr/share/dict/linux.words What grep command can we use to find all words that start with a or b and end with b? $ grep ^[ab].*b$ /usr/share/dict/linux.words What command do I add to my pipeline to count how many such words there are? $ grep ^[ab].*b$ /usr/share/dict/linux.words wc -l

Extended Regular Expressions Every basic regular expression is an extended regular expression. Additional features: More repetition specifiers: expr?: Match expression expr 0 or 1 time expr+: Match expression expr at least once expr{m}: Match expression expr exactly m times expr{m,n}: Match expression expr between m and n times expr{,n}: Match expression expr up to n times expr{m,}: Match expression expr at least m times Back references: (subexpr1)...(subexpr2)...\2\1: \1 and \2 match copies of the strings that matched the subexpressions subexpr1 and subexpr2 enclosed in parentheses

Examples of Extended Regular Expressions A string that consists of one or two digits followed by at least one letter: [0-9]?[0-9][a-z]+ [0-9]{1,2}[a-z]+ At least one occurrence of Mon, Wed or Fri: (Mon Wed Fri)+ An IP address: [0-9]{1,3}(\.[0-9]{1,3}){3} A string that ends with the same two characters it starts with, in reverse order: (.)(.).*\2\1

Similarities and Differences Between Wildcards and Regular Expressions Most of the time, wildcards are good enough for file matching: All Java files: $ ls *.java $ ls grep *.java$ Some patterns cannot easily be matched using wildcards but can be matched using regular expressions: All files whose names contain exactly one dash: $ ls grep ^[^-]*-[^-]*$