CS Unix Tools & Scripting

Similar documents
CS Advanced Unix Tools & Scripting

CS Unix Tools. Fall 2010 Lecture 5. Hussam Abu-Libdeh based on slides by David Slater. September 17, 2010

CS Unix Tools & Scripting Lecture 7 Working with Stream

CS Unix Tools & Scripting

Regular Expressions. Michael Wrzaczek Dept of Biosciences, Plant Biology Viikki Plant Science Centre (ViPS) University of Helsinki, Finland

Regex, Sed, Awk. Arindam Fadikar. December 12, 2017

Today s Lecture. The Unix Shell. Unix Architecture (simplified) Lecture 3: Unix Shell, Pattern Matching, Regular Expressions

Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

Regular Expressions Explained

Regular Expressions. Computer Science and Engineering College of Engineering The Ohio State University. Lecture 9

Paolo Santinelli Sistemi e Reti. Regular expressions. Regular expressions aim to facilitate the solution of text manipulation problems

CS160A EXERCISES-FILTERS2 Boyd

Regular Expressions. Regular Expression Syntax in Python. Achtung!

Pattern Matching. An Introduction to File Globs and Regular Expressions

Pattern Matching. An Introduction to File Globs and Regular Expressions. Adapted from Practical Unix and Programming Hunter College

More Scripting and Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

Wildcards and Regular Expressions

CS Advanced Unix Tools & Scripting

CS Unix Tools & Scripting

Regular Expressions 1

CS Unix Tools & Scripting

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

Lecture 18 Regular Expressions

Regular Expressions. Regular expressions match input within a line Regular expressions are very different than shell meta-characters.

Perl Regular Expressions. Perl Patterns. Character Class Shortcuts. Examples of Perl Patterns

UNIX files searching, and other interrogation techniques

Lab 4: Bash Scripting

Part III. Shell Config. Tobias Neckel: Scripting with Bash and Python Compact Max-Planck, February 16-26,

Lecture 05 I/O statements Printf, Scanf Simple statements, Compound statements

Describing Languages with Regular Expressions

CS 2112 Lab: Regular Expressions

UNIX / LINUX - REGULAR EXPRESSIONS WITH SED

Table of contents. Our goal. Notes. Notes. Notes. Summer June 29, Our goal is to see how we can use Unix as a tool for developing programs

Scripting Languages Course 1. Diana Trandabăț

Lab 4: Shell Scripting

Digital Humanities. Tutorial Regular Expressions. March 10, 2014

CSE 390a Lecture 2. Exploring Shell Commands, Streams, and Redirection

Regular Expressions Overview Suppose you needed to find a specific IPv4 address in a bunch of files? This is easy to do; you just specify the IP

Structure of Programming Languages Lecture 3

CS 307: UNIX PROGRAMMING ENVIRONMENT FIND COMMAND

Shell Programming Overview

Regular Expressions. Steve Renals (based on original notes by Ewan Klein) ICL 12 October Outline Overview of REs REs in Python

ITST Searching, Extracting & Archiving Data

Regular Expressions. Regular expressions are a powerful search-and-replace technique that is widely used in other environments (such as Unix and Perl)

STREAM EDITOR - REGULAR EXPRESSIONS

Computer Systems and Architecture

Configuring the RADIUS Listener LEG

Shell Programming (bash)

PESIT Bangalore South Campus

FILTERS USING REGULAR EXPRESSIONS grep and sed

CS Unix Tools. Lecture 3 Making Bash Work For You Fall Hussam Abu-Libdeh based on slides by David Slater. September 13, 2010

Bashed One Too Many Times. Features of the Bash Shell St. Louis Unix Users Group Jeff Muse, Jan 14, 2009

Perl and Python ESA 2007/2008. Eelco Schatborn 27 September 2007

CS 301. Lecture 05 Applications of Regular Languages. Stephen Checkoway. January 31, 2018

Computer Systems and Architecture

Supplemental Materials: Grammars, Parsing, and Expressions. Topics. Grammars 10/11/2017. CS2: Data Structures and Algorithms Colorado State University

9.2 Linux Essentials Exam Objectives

Introduction to Regular Expressions Version 1.3. Tom Sgouros

CSCI 2132 Software Development. Lecture 7: Wildcards and Regular Expressions

Computing Unit 3: Data Types

CSCI 2132 Software Development. Lecture 8: Introduction to C

MTAT Agile Software Development in the Cloud. Lecture 3: Ruby (cont.)

Basics. I think that the later is better.

Shells and Shell Programming

MTAT Agile Software Development in the Cloud Lecture 4: More on Ruby

Module 8 Pipes, Redirection and REGEX

applied regex implementing REs using finite state automata using REs to find patterns Informatics 1 School of Informatics, University of Edinburgh 1

Server-side Web Development (I3302) Semester: 1 Academic Year: 2017/2018 Credits: 4 (50 hours) Dr Antoun Yaacoub

Shells and Shell Programming

Bioinformatics Programming. EE, NCKU Tien-Hao Chang (Darby Chang)

Chapter 4. Unix Tutorial. Unix Shell

CSE 374: Programming Concepts and Tools. Eric Mullen Spring 2017 Lecture 4: More Shell Scripts

Python allows variables to hold string values, just like any other type (Boolean, int, float). So, the following assignment statements are valid:

While Statement Examples. While Statement (35.15) Until Statement (35.15) Until Statement Example

CS 3410 Intro to Unix, shell commands, etc... (slides from Hussam Abu-Libdeh and David Slater)

Understanding Regular Expressions, Special Characters, and Patterns

Functional Programming in Haskell Prof. Madhavan Mukund and S. P. Suresh Chennai Mathematical Institute

CSE 390a Lecture 7. Regular expressions, egrep, and sed

Answers to AWK problems. Shell-Programming. Future: Using loops to automate tasks. Download and Install: Python (Windows only.) R

RegExpr:Review & Wrapup; Lecture 13b Larry Ruzzo

CSE 374 Programming Concepts & Tools. Laura Campbell (thanks to Hal Perkins) Winter 2014 Lecture 6 sed, command-line tools wrapup

CSE 303 Lecture 2. Introduction to bash shell. read Linux Pocket Guide pp , 58-59, 60, 65-70, 71-72, 77-80

CSE 374 Programming Concepts & Tools. Brandon Myers Winter 2015 Lecture 4 Shell Variables, More Shell Scripts (Thanks to Hal Perkins)

CSE 303 Lecture 7. Regular expressions, egrep, and sed. read Linux Pocket Guide pp , 73-74, 81

sottotitolo A.A. 2016/17 Federico Reghenzani, Alessandro Barenghi

Midterm 1 1 /8 2 /9 3 /9 4 /12 5 /10. Faculty of Computer Science. Term: Fall 2018 (Sep4-Dec4) Student ID Information. Grade Table Question Score

Linux Text Utilities 101 for S/390 Wizards SHARE Session 9220/5522


Shells & Shell Programming (Part B)

UNIX II:grep, awk, sed. October 30, 2017

A Brief Introduction to the Linux Shell for Data Science

Lecture 3 Tonight we dine in shell. Hands-On Unix System Administration DeCal

Regular Expressions. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Windshield. Language Reference Manual. Columbia University COMS W4115 Programming Languages and Translators Spring Prof. Stephen A.

Regular Expressions. Perl PCRE POSIX.NET Python Java

Overview of the UNIX File System. Navigating and Viewing Directories

Program Development Tools. Lexical Analyzers. Lexical Analysis Terms. Attributes for Tokens

CSE 390a Lecture 2. Exploring Shell Commands, Streams, Redirection, and Processes

CS 115 Lecture 8. Selection: the if statement. Neil Moore

Goals for This Lecture:

Transcription:

Cornell University, Spring 2014 1 February 7, 2014 1 Slides evolved from previous versions by Hussam Abu-Libdeh and David Slater

Regular Expression A new level of mastery over your data. Pattern matching with regular expressions is more sophisticated and more powerful than shell expansion. A regular expression is a set of strings that match the expression.

Regular Expression Standard features in a wide range of languages (Perl, Python, Ruby, Java, VB.NET and C#, PHP, and MySQL) Many Unix tool takes in a regular expression as its input, e.g., grep

Regular Expressions are used all over the place. Where else can we use Regular Expressions? Example: less and vim When we are reading something using less/vim, if we hit / and type a regular expression, less will highlight everywhere it occurs press n to move to the next match press N to move to the previous match

Regular Expression Regular Expressions use different syntax than shell expansion We enclose them in single quotes to distinguish them from shell expansion.

Regular Expression Rules Some RegExp patterns perform the same tasks as our earlier wildcards Single Characters Wild card:? RegExp:. Matches any single character Wild card: [a-z] RegExp: [a-z] Matches one of the indicated characters Don t separate multiple characters with commas in RegExp form (e.x. [a,b,q-v] becomes [abq-v]). Example: grep t.a - prints lines with things like tea, taa, and steap

Warning! Like shell wildcards, RegExps are case-sensitive. What if you want to match any letter, regardless of case? What will [a-z] match? Character Sorting Different types of programs sort characters differently. In the C language, characters A-Z are assigned numbers from 65-90, while a-z are 97-122. Thus, the range [a-z] would equate to [65-122]. There are non-alphabet characters within [a-z]. To specify all letters safely we would use [a-za-z]. Note: not everything treats sorting like C. For example, a dictionary program might sort its characters aabbcc...

Fortunately We Can Get Around This Easily Fortunately there are shortcuts for many ranges of characters we typically run into: POSIX character classes [:alnum:] - alphanumeric characters Example: [:alpha:] [:digit:] [:punct:] [:lower:] [:upper:] [:space:] - alphabetic characters - digits ls grep [[:digit:]] - punctuation characters - lowercase letters - uppercase letters - whitespace characters list all files with numbers in the filename

Class Shorthands Support for the following shorthands is common: Class Shorthands \d - digit \D - non-digit \w - part of word character, i.e., [a-za-z0-9] \W - non-word character \s - whitespace character \S - non-whitespace character

The Not Operator We can also negate ranges of characters: Not [^abc] [^a-z] - matches any character that is not a b or c - matches any non lowercase letter

Matching Repetitions A RegExp followed by one of these repetition operators defines how many times that pattern should be matched: * - matches 0 or more occurences of the expression \? - matches 0 or 1 occurrences of the expression \+ - matches 1 or more occurrences of the expression Examples: grep t*a - matches things like aste, taste, ttaste, tttaste grep [[:alpha:]]\+a - matches the letter a only when it is preceeded by at least one letter. grep "\?Hello World"\? - matches Hellow World with or without quotes.

Beginning and End Another thing RegExp can do is match the beginning and end of a line Positional Operators ^ matches the beginning of a line $ matches the end of a line Examples: grep o$ matches lines ending with o grep ^[A-Z] matches lines beginning with a capital letter ls -l grep ^l prints all files that are links

Example Let s play with the file /var/log/system.log.

Matching A Range of Repetitions \{n\} - preceeding item is repeated exactly n times \{n,\} - preceeding item is repeated at least n times \{i,j\} matches between i and j occurrences of strings that match e. Question: How to print all social security numbers in a file (both 111-11-1111 and 111111111)?

Grouping Expressions Grouping Expressions \(expr\) : matches expr useful for grouping expressions together Examples: a\(boat\)* finds a, aboat, aboatboat, etc.

Regular Expression Rules And a few more: c1\ c2 matches the expression c1 or the expression c2. \< matches the beginning of a word \> matches the end of a word This illustrates word s boundaries. Question: What s the difference between [ab] and a\ b?

Regular Expression Rules And a few more examples: grep \(left\)\ \(right\) matches left or right. grep top\{3\} searches for toppp. grep [0-5]\{2\}\ [6-9]\{2\} searches for things like 12, 15, 68, 97, but not 19, 61.

A word about extended regular expressions With extended regular expressions you do not need to escape special characters such as?, +, () and {}. Some tools enable its use, but not all, e.g., grep -E or egrep ( limited availability). Extended regular expressions tend to be cleaner and easier to read: grep \(woo\+t\)\{2,3\} becomes egrep (woo+t){2,3}.

Why we quote regular expressions Suppose we have a directory with the following files in it: num, num2, test Now suppose we want to search the file test for the regular expression nu*. If we don t quote, gets expanded to grep nu* test grep num num2 test, which searches num2 and test for the string num.

Regular Expression Examples Let s play with file /usr/share/dict/words. Question: How would you match any word that begins with c and ends with d? Question: Fing 5 letter words beginning with c and ending with d? caged caked caned caped cared Great for crosswords!

Regular Expression Rules Single Characters RegExp:. Matches any single character How can we search for an url? e.g., www.cs.cornell.edu? Using. would result in anything like www2csicornell9edu Example: [[A-Z].] matches an upper case letter or a dot character. Example: \. is the escaped version that matches a dot character.