CSCI 2132 Software Development. Lecture 7: Wildcards and Regular Expressions

Similar documents
Wildcards and Regular Expressions

22-Sep CSCI 2132 Software Development Lecture 8: Shells, Processes, and Job Control. Faculty of Computer Science, Dalhousie University

CSCI 2132 Software Development. Lecture 4: Files and Directories

18-Sep CSCI 2132 Software Development Lecture 6: Links and Inodes. Faculty of Computer Science, Dalhousie University. Lecture 6 p.

CSCI 2132 Software Development. Lecture 2: Introduction to UNIX and Unix-like Operating Systems

CSCI 2132 Software Development. Lecture 3: Unix Shells and Other Basic Concepts

CSCI 2132 Software Development. Lecture 5: File Permissions

CSCI 2132 Software Development. Lecture 8: Introduction to C

Table of contents. Our goal. Notes. Notes. Notes. Summer June 29, Our goal is to see how we can use Unix as a tool for developing programs

Midterm 1 1 /8 2 /9 3 /9 4 /12 5 /10. Faculty of Computer Science. Term: Fall 2018 (Sep4-Dec4) Student ID Information. Grade Table Question Score

27-Sep CSCI 2132 Software Development Lab 4: Exploring bash and C Compilation. Faculty of Computer Science, Dalhousie University

28-Nov CSCI 2132 Software Development Lecture 33: Shell Scripting. 26 Shell Scripting. Faculty of Computer Science, Dalhousie University

Module 8 Pipes, Redirection and REGEX

ITST Searching, Extracting & Archiving Data

EECS2301. Lab 1 Winter 2016

CSCI 4152/6509 Natural Language Processing Lecture 6: Regular Expressions; Text Processing in Perl

Dalhousie University CSCI 2132 Software Development Winter 2018 Lab 2, January 25

Lecture 18 Regular Expressions

Common File System Commands

CSCI 2132: Software Development. Norbert Zeh. Faculty of Computer Science Dalhousie University. Shell Scripting. Winter 2019

Today s Lecture. The Unix Shell. Unix Architecture (simplified) Lecture 3: Unix Shell, Pattern Matching, Regular Expressions

Systems Programming/ C and UNIX

CS 307: UNIX PROGRAMMING ENVIRONMENT FIND COMMAND

5/20/2007. Touring Essential Programs

Files and Directories

Pattern Matching. An Introduction to File Globs and Regular Expressions

Course Project 2 Regular Expressions

Lecture 3 Tonight we dine in shell. Hands-On Unix System Administration DeCal

find starting-directory -name filename -user username

CSCI 4061: Pipes and FIFOs

Welcome to Linux. Lecture 1.1

CS 301. Lecture 05 Applications of Regular Languages. Stephen Checkoway. January 31, 2018

Writing Shell Scripts part 1

COMP 4/6262: Programming UNIX

Awk & Regular Expressions

Linux shell programming for Raspberry Pi Users - 2

Lexical Analysis. Prof. James L. Frankel Harvard University

CST Lab #5. Student Name: Student Number: Lab section:

Structure of Programming Languages Lecture 3

UNIX files searching, and other interrogation techniques

System Administration

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

CSE 303 Lecture 2. Introduction to bash shell. read Linux Pocket Guide pp , 58-59, 60, 65-70, 71-72, 77-80

27-Sep CSCI 2132 Software Development Lecture 10: Formatted Input and Output. Faculty of Computer Science, Dalhousie University. Lecture 10 p.

Appendix B WORKSHOP. SYS-ED/ Computer Education Techniques, Inc.

Essentials for Scientific Computing: Stream editing with sed and awk

Part III. Shell Config. Tobias Neckel: Scripting with Bash and Python Compact Max-Planck, February 16-26,

A Case Study on Grammatical-based Representation for Regular Expression Evolution

CSE 390a Lecture 2. Exploring Shell Commands, Streams, and Redirection

Advanced training. Linux components Command shell. LiLux a.s.b.l.

Essential Unix and Linux! Perl for Bioinformatics, ! F. Pineda

CHAPTER 2 THE UNIX SHELLS

QUESTION BANK ON UNIX & SHELL PROGRAMMING-502 (CORE PAPER-2)

CSC UNIX System, Spring 2015

Pattern Matching. An Introduction to File Globs and Regular Expressions. Adapted from Practical Unix and Programming Hunter College

Assignment 1. CSCI 2132: Software Development. Due Febuary 4, 2019

Essentials for Scientific Computing: Bash Shell Scripting Day 3

Unleashing the Shell Hands-On UNIX System Administration DeCal Week 6 28 February 2011

Unix Shells and Other Basic Concepts

Advanced Linux Commands & Shell Scripting

Getting to grips with Unix and the Linux family

Introduction to UNIX. Introduction. Processes. ps command. The File System. Directory Structure. UNIX is an operating system (OS).

Introduction to UNIX. CSE 2031 Fall November 5, 2012

Principles of Bioinformatics. BIO540/STA569/CSI660 Fall 2010

CSE 390a Lecture 2. Exploring Shell Commands, Streams, Redirection, and Processes

Regular Expressions. Regular expressions match input within a line Regular expressions are very different than shell meta-characters.

Lecture 4: Syntax Specification

Regular Expressions. Regular Expression Syntax in Python. Achtung!

COMP Logic for Computer Scientists. Lecture 25

CPSC 217 Midterm (Python 3 version)

Command-line interpreters

When talking about how to launch commands and other things that is to be typed into the terminal, the following syntax is used:

Non-deterministic Finite Automata (NFA)

CSCI2467: Systems Programming Concepts

Regular Expressions. with a brief intro to FSM Systems Skills in C and Unix

CS395T: Introduction to Scientific and Technical Computing

Week 5 Lesson 5 02/28/18

Compiler Construction I

CMPS 12A Introduction to Programming Lab Assignment 7

Introduction p. 1 Who Should Read This Book? p. 1 What You Need to Know Before Reading This Book p. 2 How This Book Is Organized p.

Unix as a Platform Exercises + Solutions. Course Code: OS 01 UNXPLAT

CS246 Spring14 Programming Paradigm Notes on Linux

University of Windsor : System Programming Winter Midterm 01-1h20mn. Instructor: Dr. A. Habed

Regular Expressions. Computer Science and Engineering College of Engineering The Ohio State University. Lecture 9

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Shells. A shell is a command line interpreter that is the interface between the user and the OS. The shell:

Regex, Sed, Awk. Arindam Fadikar. December 12, 2017

More on Unix Shell SEEM

sottotitolo A.A. 2016/17 Federico Reghenzani, Alessandro Barenghi

Lecture 2. Regular Expression Parsing Awk

Introduction: What is Unix?

Shell scripting and system variables. HORT Lecture 5 Instructor: Kranthi Varala

ECS 120 Lesson 7 Regular Expressions, Pt. 1

Describing Languages with Regular Expressions

Chapter 4. Unix Tutorial. Unix Shell

Lab Working with Linux Command Line

Unix/Linux: History and Philosophy

Basics. I think that the later is better.

Answers to AWK problems. Shell-Programming. Future: Using loops to automate tasks. Download and Install: Python (Windows only.) R

We first learn one useful option of gcc. Copy the following C source file to your

Transcription:

CSCI 2132 Software Development Lecture 7: Wildcards and Regular Expressions Instructor: Vlado Keselj Faculty of Computer Science Dalhousie University 20-Sep-2017 (7) CSCI 2132 1

Previous Lecture Pipes Inodes Hard links Soft links 20-Sep-2017 (7) CSCI 2132 2

Wildcards and Regular Expressions Similar patterns for string matching... but there are differences Wildcards or Filename Substitution Used in command line, understood by shell Regular Expressions Used with tools such as grep 20-Sep-2017 (7) CSCI 2132 3

Filename Substitution (Wildcards) Used to specify multiple filenames Makes use of wildcards ; i.e., metacharacters expanded by the shell Some wildcard types:?: matches any single character *: matches any string, including empty string [...]: matches any single character in the set [!...]: any character except characters from the set we can use ranges with - in brackets 20-Sep-2017 (7) CSCI 2132 4

[0-9] 20-Sep-2017 (7) CSCI 2132 5

20-Sep-2017 (7) CSCI 2132 6

[a-za-z] 20-Sep-2017 (7) CSCI 2132 7

[a-za-z]: any English alphabet character 20-Sep-2017 (7) CSCI 2132 8

[a-za-z]: any English alphabet character [unix] 20-Sep-2017 (7) CSCI 2132 9

[a-za-z]: any English alphabet character [unix]: matches either u, n, i, or x 20-Sep-2017 (7) CSCI 2132 10

[a-za-z]: any English alphabet character [unix]: matches either u, n, i, or x ls /csci2132/lab1/*.java 20-Sep-2017 (7) CSCI 2132 11

[a-za-z]: any English alphabet character [unix]: matches either u, n, i, or x ls /csci2132/lab1/*.java list Java files 20-Sep-2017 (7) CSCI 2132 12

[a-za-z]: any English alphabet character [unix]: matches either u, n, i, or x ls /csci2132/lab1/*.java list Java files ls *.???? 20-Sep-2017 (7) CSCI 2132 13

[a-za-z]: any English alphabet character [unix]: matches either u, n, i, or x ls /csci2132/lab1/*.java list Java files ls *.???? list all files with 4-character extension 20-Sep-2017 (7) CSCI 2132 14

[a-za-z]: any English alphabet character [unix]: matches either u, n, i, or x ls /csci2132/lab1/*.java list Java files ls *.???? list all files with 4-character extension ls lab[1-9] 20-Sep-2017 (7) CSCI 2132 15

[a-za-z]: any English alphabet character [unix]: matches either u, n, i, or x ls /csci2132/lab1/*.java list Java files ls *.???? list all files with 4-character extension ls lab[1-9] list all files with the name consisting of word lab and a digit from 1 to 9 20-Sep-2017 (7) CSCI 2132 16

[a-za-z]: any English alphabet character [unix]: matches either u, n, i, or x ls /csci2132/lab1/*.java list Java files ls *.???? list all files with 4-character extension ls lab[1-9] list all files with the name consisting of word lab and a digit from 1 to 9 ls [!0-9]* 20-Sep-2017 (7) CSCI 2132 17

[a-za-z]: any English alphabet character [unix]: matches either u, n, i, or x ls /csci2132/lab1/*.java list Java files ls *.???? list all files with 4-character extension ls lab[1-9] list all files with the name consisting of word lab and a digit from 1 to 9 ls [!0-9]* list all files which name does not start with a digit 20-Sep-2017 (7) CSCI 2132 18

[a-za-z]: any English alphabet character [unix]: matches either u, n, i, or x ls /csci2132/lab1/*.java list Java files ls *.???? list all files with 4-character extension ls lab[1-9] list all files with the name consisting of word lab and a digit from 1 to 9 ls [!0-9]* list all files which name does not start with a digit cp lab1.bk/*.java lab1/ 20-Sep-2017 (7) CSCI 2132 19

[a-za-z]: any English alphabet character [unix]: matches either u, n, i, or x ls /csci2132/lab1/*.java list Java files ls *.???? list all files with 4-character extension ls lab[1-9] list all files with the name consisting of word lab and a digit from 1 to 9 ls [!0-9]* list all files which name does not start with a digit cp lab1.bk/*.java lab1/ copy Java files from one directory to another 20-Sep-2017 (7) CSCI 2132 20

More Examples ls /csci2132/lab1/*.java ls /csci2132/lab1/h????world.java ls H* ls [!A-Z]* ls */*/*.java ls *.java */*.java echo.* command echo prints out command line arguments cat *.txt > allfiles 20-Sep-2017 (7) CSCI 2132 21

Regular Expressions Regular Expressions are patterns used to match strings, and thus used in fast and flexible text search The name comes from Regular Sets defined by the mathematician Stephen Kleene Implemented as DFA (Deterministic Finite Automata) or NFA (Non-deterministic Finite Automata) Kleene s notation implemented by Ken Thompson into the editor QED to match patterns Thompson later added this to the Unix editor ed Eventually led to the command grep, coming from ed command g/re/p (Global search for Regular Expression and Print matching lines) 20-Sep-2017 (7) CSCI 2132 22

Reading about Regular Expressions The Unix book: Chapter 3, Filtering Files (p.84) Appendix: Regular Expressions (p.665) Regular expressions Patterns used for searching and replacing text Used in many contexts, but we will focus on the grep command There are two kinds of regular expressions: basic regular expressions and extended regular expressions 20-Sep-2017 (7) CSCI 2132 23

Basic Regular Expressions Using metacharacters:.: Matches any single character [...]: Matches any character between brackets, - used to specify range; most other metacharacters loose their meta-meaning between brackets [ˆ...]: Matches any character except one of the characters between brackets *: 0 or more occurrences of the preceding character ˆ: Matches the beginning of a line $: Matches the end of a line \: Inhibits the meaning of any metacharacter 20-Sep-2017 (7) CSCI 2132 24

BRE Examples BRE = Basic Regular Expressions One or more spaces: spacespace* (replace space by a space character): * Empty line: ˆ$ Formatted dollar amount: \$[0-9][0-9]*\.[0-9][0-9] 20-Sep-2017 (7) CSCI 2132 25

Filters, grep command Filter is a program that is mostly used to read stdin, process data, and write to stdout Often used as elements of pipelines One such program is grep grep reads a file or stdin and outputs lines matching a regular expression grep syntax grep [options] BRE [file(s)] 20-Sep-2017 (7) CSCI 2132 26

Chocolate $1.23 each Candy $.56 each Jacket $278.00</pre> <pre>$44.00 $44 Example If we enter the following command grep \$[0-9][0-9]*\.[0-9][0-9] price The output will be the following three lines: Chocolate $1.23 each Jacket $278.00 $44.00 20-Sep-2017 (7) CSCI 2132 27

One more grep example We will use the dictionary file: /usr/share/dict/linux.words Write a grep command to find 5-letter words that start with a or b and end with b Write a grep command to find all words starting with a or b and ending with b How many are there? 20-Sep-2017 (7) CSCI 2132 28

Similarity between Wildcards and Regular Expressions We can get similar results with wildcards and regular expressions; e.g.: ls *.java ls grep \.java$ List of all files in /bin, whose names contain exactly one minus sign (-): ls /bin grep ˆ[ˆ-]*-[ˆ-]*$ 20-Sep-2017 (7) CSCI 2132 29