Regex, Sed, Awk. Arindam Fadikar. December 12, 2017

Size: px
Start display at page:

Download "Regex, Sed, Awk. Arindam Fadikar. December 12, 2017"

Transcription

1 Regex, Sed, Awk Arindam Fadikar December 12, 2017

2 Why Regex Lots of text data. twitter data (social network data) government records web scrapping many more...

3 Regex Regular Expressions or regex or regexp is a special text string, which describes a pattern representing a common structure. Many softwares can process regular expressions such as R, Python, Java, Perl, Ruby, grep to name a few. One can use regex via any of the compatible softwares to perform complex string matching/replacing through few lines of code. What is this complex pattern for? ^[a-za-z0-9_.+-]+@[a-za-z0-9-]+\.[a-za-z0-9-.]+$

4 A basic example - Revisit We have already seen in class how to search for a character or string inside vi editor, or using grep. with the file open in vi /< string > on the terminal #!/ bin / bash grep <string > <file > We can replace <string> by any valid regex to search for more complex pattern. Let s try some of that.

5 Building block The most basic regular expressions are made of single literal character. It finds the first occurrence of that character in the string. The regex a only matches the a after the J in the string J a ck is a boy.

6 Building block The most basic regular expressions are made of single literal character. It finds the first occurrence of that character in the string. The regex a only matches the a after the J in the string J a ck is a boy. Similarly, the regex class matches the first occurrence of class in ASC class in classroom 232.

7 Building block The most basic regular expressions are made of single literal character. It finds the first occurrence of that character in the string. The regex a only matches the a after the J in the string J a ck is a boy. Similarly, the regex class matches the first occurrence of class in ASC class in classroom 232. To match any special character, we need to use the escape character \. To find the price, we need to use \$[0-9]+ DUKE Basketball ticket price: $100

8 Special Characters \, $, ˆ,., *,, +,?, (, ), [, ], {, } Additional characters R that require escaping: : single quote, don t need to escape when inside a double-quote. : double quote, don t need to escape when inside a single-quote. \n : newline. \r : carriage return. \t : tab character.

9 Example \ is used as escape character, i.e., to use the special characters in a pattern as a literal. To match $100 we had to use \\$[0-9]+. strng <- c("+$10", "-$100", "A^C", "+$101", "-$101", "abbc", "abc", "ac", "ac") ## match string that contain a dollar sign grep("\\$[0-9]+", strng) ## [1] grep("\\$[0-9]+", strng, value = TRUE) ## [1] "+$10" "-$100" "+$101" ## [4] "-$101" Note the extra back slash.

10 ... $ represents end of the string. grep("0$", strng) ## [1] 1 2 grep("0$", strng, value = TRUE) ## [1] "+$10" "-$100"

11 ... ˆ has several meanings depending on its position in the regex. Negetion (not the following): does not end with 0. grep("[^0]$", strng, value = TRUE) ## [1] "A^C" "+$101" "-$101" ## [4] "abbc" "abc" "ac" ## [7] "ac" To indicate start of the string. grep("^\\+", strng, value = TRUE) ## [1] "+$10" "+$101" As a literal character grep("\\^", strng, value = TRUE) ## [1] "A^C"

12 .... matches any single character. Any match, not exciting grep(".", strng, value = TRUE) ## [1] "+$10" "-$100" "A^C" ## [4] "+$101" "-$101" "abbc" ## [7] "abc" "ac" "ac" Finding the occurrence of bb at position 2-3. grep(".bb.", ## [1] "abbc" strng, value = TRUE)

13 ... is used for alternation. Finding an uppercase C or lowercase c. grep('c C', strng, value = TRUE) ## [1] "A^C" "abbc" "abc" ## [4] "ac" "ac"

14 ... * matches at least 0 times (quantifiers). Strings that start with a + and end with 0. strng ## [1] "+$10" "-$100" "A^C" ## [4] "+$101" "-$101" "abbc" ## [7] "abc" "ac" "ac" regexpr("^\\+.*0", strng) ## [1] ## attr(,"match.length") ## [1] ## attr(,"usebytes") ## [1] TRUE

15 ... + matches at least 1 time (quantifiers). Strings that start with a, end with c and contain at least one b. strng ## [1] "+$10" "-$100" "A^C" ## [4] "+$101" "-$101" "abbc" ## [7] "abc" "ac" "ac" grep("ab+c", strng, value = TRUE) ## [1] "abbc" "abc"

16 ...? matches at most 1 time, but optional (quantifiers). Strings that start with a, end with c and may contain one b. strng ## [1] "+$10" "-$100" "A^C" ## [4] "+$101" "-$101" "abbc" ## [7] "abc" "ac" "ac" grep("ab?c", strng, value = TRUE) ## [1] "abc" "ac"

17 ... ( ) are used for grouping. strng ## [1] "+$10" "-$100" "A^C" ## [4] "+$101" "-$101" "abbc" ## [7] "abc" "ac" "ac" grep("(ab)", strng, value = TRUE) ## [1] "abbc" "abc" Backreferencing gsub("*(ab)c?", "\\1\\1", c('abc', 'ab c', 'b abb c')) ## [1] "abab" "abab c" ## [3] "b ababb c"

18 ... [ ] are also used for grouping, but more like alternation. Revisit - finding an uppercase C or lowercase c. grep('[cc]', strng, value = TRUE) ## [1] "A^C" "abbc" "abc" ## [4] "ac" "ac"

19 ... { } are used to express quantifiers. {n} matches exactly n times. {n,} matches at least n times. {,n} matches at most n times. {n,m} matches between n and m times.

20 ... grep('.*b{2}.*', strng, value = TRUE) ## [1] "abbc" grep('.*b{1,}.*', strng, value = TRUE) ## [1] "abbc" "abc" grep('.*b{,2}.*', strng, value = TRUE) ## [1] "+$10" "-$100" "A^C" ## [4] "+$101" "-$101" "abbc" ## [7] "abc" "ac" "ac" grep('.*b{1,2}.*', strng, value = TRUE) ## [1] "abbc" "abc"

21 Character Classes Character class lets us specify an entire class of characters. [:digit:] or \d : digits, , equivalent to [0-9] \D : non-digits, equivalent to [^0-9] [:lower:] : lower-case letters, equivalent to [a-z] [:upper:] : upper-case letters, equivalent to [A-Z] [:alpha:] : alphabetic characters, equivalent to [[:lower:][:upper:]] or [A-z] [:alnum:] : alphanumeric characters, equivalent to [[:alpha:][:digit:]] or [A-z0-9] \w : word characters, equivalent to [[:alnum:]] or [A-z0-9] \W : not word, equivalent to [^A-z0-9_] [:xdigit:] : hexadecimal digits (base 16), A B C D E F a b c d e f, equivalent to [0-9A-Fa-f] [:blank:] : blank characters, i.e. space and tab [:space:] : space characters: tab, newline, vertical tab form feed, carriage return, space \s : space,.

22 more character classes \S : not space [:punct:] : punctuation characters,! " # \$ \% \& ' ( ) * +, -. / : " < = [ ] ^ _ ` { } ~. [:graph:] : graphical (human readable) characters: equivalent to [[:alnum:][:punct:]] [:print:] : printable characters, equivalent to [[:alnum:][:punct:]\\s] [:cntrl:] : control characters, like \n or \r, [\x00-\x1f\x7f]. NOTES: The expressions enclosed in [: :] must be used inside a square bracket and the others must be preceeded by a backslash \.

23 Some more Anchors \b : empty string at either egde of a word \B : not the edge of a word What is this complex pattern for? ^[a-za-z0-9_.+-]+@[a-za-z0-9-]+\.[a-za-z0-9-.]+$

24 Some useful Base R function ## returns the whole string with matched pattern grep() grepl() ## returns the whole string with/without substituting a pattern sub() gsub() ## returns the starting position of the first match regexpr() gregexpr() ## extracts matches regmatches() regexec()

25 Real data example We will be working with the Baltimore Homicide data (for the homework). homicides <- readlines("../data/homicides.txt") head(homicides) ## [1] " , , iconhomicideshooting, 'p2', '<dl><dt>leon Nelson< ## [2] " , , iconhomicideshooting, 'p3', '<dl><dt>eddie Golf</ ## [3] " , , iconhomicidebluntforce, 'p4', '<dl><dt>nelsene Bu ## [4] " , , iconhomicideasphyxiation, 'p5', '<dl><dt>thomas M ## [5] " , , iconhomicidebluntforce, 'p6', '<dl><dt>edward Can ## [6] " , , iconhomicideshooting, 'p7', '<dl><dt>michael Cunn homicides[1] ## [1] " , , iconhomicideshooting, 'p2', '<dl><dt>leon Nelson<

26 ... Let s see what information is there and what can we extract. unlist(strsplit(homicides[1], split = ',')) ## [1] " " ## [2] " " ## [3] " iconhomicideshooting" ## [4] " 'p2'" ## [5] " '<dl><dt>leon Nelson</dt><dd class=\"address\">3400 Clifton Ave.<br />B ## [6] " MD 21216</dd><dd>black male" ## [7] " 17 years old</dd><dd>found on January 1" ## [8] " 2007</dd><dd>Victim died at Shock Trauma</dd><dd>Cause: shooting</dd></

27 Sed Stream editor is a powerful text editor to perform editing operations on information coming from standard input or file. Sed outputs to standard out by default, unless redirected to a file. The basic usage is sed [ option ] commands [file -to - edit ] Note: several sed commands can also be put is a file and can be executed as a script.

28 List of sed commands A good resource to find the list of sed commands: GNU Sed commands. We will focus on s (substitution) command in sed. The basic usage is: sed s/[ old pattern ]/[ new pattern ]/ file sed 's/[ old pattern ]/[ new pattern ]/ ' file This would replace the first instances of the [old pattern] in each line by [new pattern], and the output will be redirected to standard output by default. $ cat numbers. txt one two three, one two three four three two one one hundred

29 Example $ sed 's/ one / ONE /' numbers. txt ONE two three, one two three four three two ONE ONE hundred There are four parts to this substitute command: s substitute command /../../ delimiter one regular expression pattern (search pattern) ONE replacement string

30 Example $ sed 's/ one / ONE /' numbers. txt ONE two three, one two three four three two ONE ONE hundred There are four parts to this substitute command: s substitute command /../../ delimiter one regular expression pattern (search pattern) ONE replacement string Notes on delimiter: It is conventionally a slash, but it can be anything we want. So the followings are equivalent: $ sed 's/ one / ONE /' numbers. txt $ sed ' s_one_one_ ' numbers. txt $ sed 's: one : ONE :' numbers. txt $ sed 's one ONE ' numbers. txt

31 Redirecting to new file Output from sed command can be redirected to another file by >. ## does not print on terminal $ sed 's/ one / ONE /' numbers. txt > numbers_new. txt ## prints on terminal $ sed 's/ one / ONE /w numbers_new.txt ' < numbers. txt

32 Redirecting to new file Output from sed command can be redirected to another file by >. ## does not print on terminal $ sed 's/ one / ONE /' numbers. txt > numbers_new. txt ## prints on terminal $ sed 's/ one / ONE /w numbers_new.txt ' < numbers. txt Note: By default sed prints everything. Sed with -n argument suppresses the printing. When the -n option is used, the p flag will cause the modified line to be printed. We can recreate the functionality of grep using sed. ## grep using sed $ sed -n 's/ one /&/p ' numbers. txt Here & is used as the matched string.

33 Sed pattern flags Additional flags can be added afte the last delimiter. We saw the use of p for printing. The most important flag to consider is g (global). By default, s (substitution) happens only on the first occurence of the pattern in every line. Using g flag, sed command applies to all occurences of the pattern. $ sed 's/ one / ONE /g ' numbers. txt ONE two three, ONE two three four three two ONE ONE hundred

34 Specifying occurence With no flag, only the first matched pattern is taken care of, and with g flag, all matched patterns are considered. Matched occurences can be referenced by numbers /1, /2. $ sed 's/ one / ONE //2 ' numbers. txt one two three, ONE two three four three two one one hundred one ONE one one one three ten nine five six One can combine a number flag with a g flag. $ sed 's/ one / ONE //3g ' numbers. txt one two three, one two three four three two one one hundred one one ONE ONE ONE three ten nine five six

35 Multiple sed commands Two ways to combine multiple sed commands into one statement Using pipe $ sed 's/ one / ONE /' numbers. txt sed 's/ two / TWO /g ' ONE TWO three, one TWO three four three TWO ONE ONE hundred ONE one one one one three ten nine five six Using -e (- - expression) option $ sed -e 's/ one / ONE /' -e 's/ three /3/g ' numbers. txt ONE two 3, one two 3 four 3 two ONE ONE hundred ONE one one one one 3 ten nine five six

36 As shell script A large number of sed commands can be put into a file and can be executed as a script. $ cat sed_script. sh #!/ usr / bin / ## to replace words by single digit numbers s/ one /1/ g s/ two /2/ g s/ three /3/ g s/ four /4/ g... $ sed -f sed_script. sh numbers. txt 1 2 3, hundred

37 Sed as a shell script with arguments A sed script can take arguments (no different than a bash script). $ cat sed_script_arg. sh ## to replace the first argument by the second one sed -e 's/'$1 '/ ' $2 '/g ' numbers. txt $./ sed_script_arg. sh one ONE ONE two three, ONE two three four three two ONE ONE hundred ONE ONE ONE ONE ONE three ten nine five six Note: The use of.

38 AWK AWK is another very powerful unix utility for processing (structured) text files. It has string manipulation ability as well as same arithmetic operators as C. - in its simplest use awk is meant for processing column oriented data, such as tables. The basic usage is keyword { action } Awk will check the keywork, and based on that an action will take place. By default it is a null keyword.

39 Basic example $ awk '{ print $1 }' numbers. txt one four one one three six Try definining field separator $ awk -F, '{ print $1 }' numbers. txt one two three four three one hundred one three ten six

40 ... Awk reads and execute commands sequentially on each line of a file or standard input string. BEGIN and END are two special keywords that invokes corresponding awk commands to take action before reading any line from input and after reading the last line from input respectively. $ awk -F, ' BEGIN { printf " AWK basic printing \ n" } \ { print $1 } END { printf " --DONE - -\n"}' numbers. txt AWK basic printing one two three four three one hundred one three ten six --DONE --

41 As shell script AWK commands can also be put together in a file. Then the script can be executed as follows: awk [ options ] -f script. awk... $ cat script. awk ## BEGIN block BEGIN { printf " Col 1 \ t Col 2\ n" } ## Rules { print $1"\t" $2 } ## END block END { printf " --DONE - -\n" } $ awk -f script. awk numbers. txt Col 1 Col 2 one two four three, one hundred, one, one three ten, six --DONE --

42 Arithmetic operations It has a similar interpreter as C. An example to count number of lines in a file: $ cat line_count. awk BEGIN { count = 0 } { count ++ } END { printf " No of lines " count "\ n" } $ awk -f line_count. awk numbers. txt No of lines 6 Nice little alternative awk ' END { print NR}' numbers. txt

43 AWK with regex A regular expression can be used as a pattern by enclosing it in /.../ ~ means equal and!~ means not equal. $ awk '/ hundred / { print } ' numbers. txt one hundred $ awk '$2 ~ /^t.*/ { print } ' numbers. txt one two three, one two three four three, two one three ten, nine five

44 List of references All contents of this slide deck can be found online. Here is a list of some of them: Rstudio cheat sheet regex-1 regex-2 regex-3 sed-1 sed-2 sed-3 awk-1 awk-2

UNIX / LINUX - REGULAR EXPRESSIONS WITH SED

UNIX / LINUX - REGULAR EXPRESSIONS WITH SED UNIX / LINUX - REGULAR EXPRESSIONS WITH SED http://www.tutorialspoint.com/unix/unix-regular-expressions.htm Copyright tutorialspoint.com Advertisements In this chapter, we will discuss in detail about

More information

Paolo Santinelli Sistemi e Reti. Regular expressions. Regular expressions aim to facilitate the solution of text manipulation problems

Paolo Santinelli Sistemi e Reti. Regular expressions. Regular expressions aim to facilitate the solution of text manipulation problems aim to facilitate the solution of text manipulation problems are symbolic notations used to identify patterns in text; are supported by many command line tools; are supported by most programming languages;

More information

Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

Regular Expressions. Todd Kelley CST8207 Todd Kelley 1 Regular Expressions Todd Kelley kelleyt@algonquincollege.com CST8207 Todd Kelley 1 POSIX character classes Some Regular Expression gotchas Regular Expression Resources Assignment 3 on Regular Expressions

More information

Regular Expressions 1

Regular Expressions 1 Regular Expressions 1 Basic Regular Expression Examples Extended Regular Expressions Extended Regular Expression Examples 2 phone number 3 digits, dash, 4 digits [[:digit:]][[:digit:]][[:digit:]]-[[:digit:]][[:digit:]][[:digit:]][[:digit:]]

More information

More Scripting and Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

More Scripting and Regular Expressions. Todd Kelley CST8207 Todd Kelley 1 More Scripting and Regular Expressions Todd Kelley kelleyt@algonquincollege.com CST8207 Todd Kelley 1 Regular Expression Summary Regular Expression Examples Shell Scripting 2 Do not confuse filename globbing

More information

Part III. Shell Config. Tobias Neckel: Scripting with Bash and Python Compact Max-Planck, February 16-26,

Part III. Shell Config. Tobias Neckel: Scripting with Bash and Python Compact Max-Planck, February 16-26, Part III Shell Config Compact Course @ Max-Planck, February 16-26, 2015 33 Special Directories. current directory.. parent directory ~ own home directory ~user home directory of user ~- previous directory

More information

Pattern Matching. An Introduction to File Globs and Regular Expressions

Pattern Matching. An Introduction to File Globs and Regular Expressions Pattern Matching An Introduction to File Globs and Regular Expressions Copyright 2006 2009 Stewart Weiss The danger that lies ahead Much to your disadvantage, there are two different forms of patterns

More information

Pattern Matching. An Introduction to File Globs and Regular Expressions. Adapted from Practical Unix and Programming Hunter College

Pattern Matching. An Introduction to File Globs and Regular Expressions. Adapted from Practical Unix and Programming Hunter College Pattern Matching An Introduction to File Globs and Regular Expressions Adapted from Practical Unix and Programming Hunter College Copyright 2006 2009 Stewart Weiss The danger that lies ahead Much to your

More information

Regular Expressions. Michael Wrzaczek Dept of Biosciences, Plant Biology Viikki Plant Science Centre (ViPS) University of Helsinki, Finland

Regular Expressions. Michael Wrzaczek Dept of Biosciences, Plant Biology Viikki Plant Science Centre (ViPS) University of Helsinki, Finland Regular Expressions Michael Wrzaczek Dept of Biosciences, Plant Biology Viikki Plant Science Centre (ViPS) University of Helsinki, Finland November 11 th, 2015 Regular expressions provide a flexible way

More information

Computing Unit 3: Data Types

Computing Unit 3: Data Types Computing Unit 3: Data Types Kurt Hornik September 26, 2018 Character vectors String constants: enclosed in "... " (double quotes), alternatively single quotes. Slide 2 Character vectors String constants:

More information

CS Unix Tools. Fall 2010 Lecture 5. Hussam Abu-Libdeh based on slides by David Slater. September 17, 2010

CS Unix Tools. Fall 2010 Lecture 5. Hussam Abu-Libdeh based on slides by David Slater. September 17, 2010 Fall 2010 Lecture 5 Hussam Abu-Libdeh based on slides by David Slater September 17, 2010 Reasons to use Unix Reason #42 to use Unix: Wizardry Mastery of Unix makes you a wizard need proof? here is the

More information

psed [-an] script [file...] psed [-an] [-e script] [-f script-file] [file...]

psed [-an] script [file...] psed [-an] [-e script] [-f script-file] [file...] NAME SYNOPSIS DESCRIPTION OPTIONS psed - a stream editor psed [-an] script [file...] psed [-an] [-e script] [-f script-file] [file...] s2p [-an] [-e script] [-f script-file] A stream editor reads the input

More information

CS Unix Tools & Scripting

CS Unix Tools & Scripting Cornell University, Spring 2014 1 February 7, 2014 1 Slides evolved from previous versions by Hussam Abu-Libdeh and David Slater Regular Expression A new level of mastery over your data. Pattern matching

More information

UNIX II:grep, awk, sed. October 30, 2017

UNIX II:grep, awk, sed. October 30, 2017 UNIX II:grep, awk, sed October 30, 2017 File searching and manipulation In many cases, you might have a file in which you need to find specific entries (want to find each case of NaN in your datafile for

More information

Essentials for Scientific Computing: Stream editing with sed and awk

Essentials for Scientific Computing: Stream editing with sed and awk Essentials for Scientific Computing: Stream editing with sed and awk Ershaad Ahamed TUE-CMS, JNCASR May 2012 1 Stream Editing sed and awk are stream processing commands. What this means is that they are

More information

Computer Systems and Architecture

Computer Systems and Architecture Computer Systems and Architecture Regular Expressions Bart Meyers University of Antwerp August 29, 2012 Outline What? Tools Anchors, character sets and modifiers Advanced Regular expressions Exercises

More information

STREAM EDITOR - REGULAR EXPRESSIONS

STREAM EDITOR - REGULAR EXPRESSIONS STREAM EDITOR - REGULAR EXPRESSIONS http://www.tutorialspoint.com/sed/sed_regular_expressions.htm Copyright tutorialspoint.com It is the regular expressions that make SED powerful and efficient. A number

More information

Computer Systems and Architecture

Computer Systems and Architecture Computer Systems and Architecture Stephen Pauwels Regular Expressions Academic Year 2018-2019 Outline What is a Regular Expression? Tools Anchors, Character sets and Modifiers Advanced Regular Expressions

More information

CSE 374 Programming Concepts & Tools. Laura Campbell (thanks to Hal Perkins) Winter 2014 Lecture 6 sed, command-line tools wrapup

CSE 374 Programming Concepts & Tools. Laura Campbell (thanks to Hal Perkins) Winter 2014 Lecture 6 sed, command-line tools wrapup CSE 374 Programming Concepts & Tools Laura Campbell (thanks to Hal Perkins) Winter 2014 Lecture 6 sed, command-line tools wrapup Where we are Learned how to use the shell to run, combine, and write programs

More information

sed Stream Editor Checks for address match, one line at a time, and performs instruction if address matched

sed Stream Editor Checks for address match, one line at a time, and performs instruction if address matched Week11 sed & awk sed Stream Editor Checks for address match, one line at a time, and performs instruction if address matched Prints all lines to standard output by default (suppressed by -n option) Syntax:

More information

User Commands sed ( 1 )

User Commands sed ( 1 ) NAME sed stream editor SYNOPSIS /usr/bin/sed [-n] script [file...] /usr/bin/sed [-n] [-e script]... [-f script_file]... [file...] /usr/xpg4/bin/sed [-n] script [file...] /usr/xpg4/bin/sed [-n] [-e script]...

More information

Regular Expressions. Regular expressions match input within a line Regular expressions are very different than shell meta-characters.

Regular Expressions. Regular expressions match input within a line Regular expressions are very different than shell meta-characters. ULI101 Week 09 Week Overview Regular expressions basics Literal matching.wildcard Delimiters Character classes * repetition symbol Grouping Anchoring Search Search and replace in vi Regular Expressions

More information

Regular Expressions. Regular expressions are a powerful search-and-replace technique that is widely used in other environments (such as Unix and Perl)

Regular Expressions. Regular expressions are a powerful search-and-replace technique that is widely used in other environments (such as Unix and Perl) Regular Expressions Regular expressions are a powerful search-and-replace technique that is widely used in other environments (such as Unix and Perl) JavaScript started supporting regular expressions in

More information

Bashed One Too Many Times. Features of the Bash Shell St. Louis Unix Users Group Jeff Muse, Jan 14, 2009

Bashed One Too Many Times. Features of the Bash Shell St. Louis Unix Users Group Jeff Muse, Jan 14, 2009 Bashed One Too Many Times Features of the Bash Shell St. Louis Unix Users Group Jeff Muse, Jan 14, 2009 What is a Shell? The shell interprets commands and executes them It provides you with an environment

More information

Perl Regular Expressions. Perl Patterns. Character Class Shortcuts. Examples of Perl Patterns

Perl Regular Expressions. Perl Patterns. Character Class Shortcuts. Examples of Perl Patterns Perl Regular Expressions Unlike most programming languages, Perl has builtin support for matching strings using regular expressions called patterns, which are similar to the regular expressions used in

More information

Regular Expressions Explained

Regular Expressions Explained Found at: http://publish.ez.no/article/articleprint/11/ Regular Expressions Explained Author: Jan Borsodi Publishing date: 30.10.2000 18:02 This article will give you an introduction to the world of regular

More information

Getting to grips with Unix and the Linux family

Getting to grips with Unix and the Linux family Getting to grips with Unix and the Linux family David Chiappini, Giulio Pasqualetti, Tommaso Redaelli Torino, International Conference of Physics Students August 10, 2017 According to the booklet At this

More information

Regular Expressions. Regular Expression Syntax in Python. Achtung!

Regular Expressions. Regular Expression Syntax in Python. Achtung! 1 Regular Expressions Lab Objective: Cleaning and formatting data are fundamental problems in data science. Regular expressions are an important tool for working with text carefully and eciently, and are

More information

Grep and Shell Programming

Grep and Shell Programming Grep and Shell Programming Comp-206 : Introduction to Software Systems Lecture 7 Alexandre Denault Computer Science McGill University Fall 2006 Teacher's Assistants Michael Hawker Monday, 14h30 to 16h30

More information

CSE 303 Lecture 7. Regular expressions, egrep, and sed. read Linux Pocket Guide pp , 73-74, 81

CSE 303 Lecture 7. Regular expressions, egrep, and sed. read Linux Pocket Guide pp , 73-74, 81 CSE 303 Lecture 7 Regular expressions, egrep, and sed read Linux Pocket Guide pp. 66-67, 73-74, 81 slides created by Marty Stepp http://www.cs.washington.edu/303/ 1 discuss reading #2 Lecture summary regular

More information

CS 301. Lecture 05 Applications of Regular Languages. Stephen Checkoway. January 31, 2018

CS 301. Lecture 05 Applications of Regular Languages. Stephen Checkoway. January 31, 2018 CS 301 Lecture 05 Applications of Regular Languages Stephen Checkoway January 31, 2018 1 / 17 Characterizing regular languages The following four statements about the language A are equivalent The language

More information

Lecture 5. Essential skills for bioinformatics: Unix/Linux

Lecture 5. Essential skills for bioinformatics: Unix/Linux Lecture 5 Essential skills for bioinformatics: Unix/Linux UNIX DATA TOOLS Text processing with awk We have illustrated two ways awk can come in handy: Filtering data using rules that can combine regular

More information

Text & Patterns. stat 579 Heike Hofmann

Text & Patterns. stat 579 Heike Hofmann Text & Patterns stat 579 Heike Hofmann Outline Character Variables Control Codes Patterns & Matching Baby Names Data The social security agency keeps track of all baby names used in applications for social

More information

Lecture 18 Regular Expressions

Lecture 18 Regular Expressions Lecture 18 Regular Expressions In this lecture Background Text processing languages Pattern searches with grep Formal Languages and regular expressions Finite State Machines Regular Expression Grammer

More information

Structure of Programming Languages Lecture 3

Structure of Programming Languages Lecture 3 Structure of Programming Languages Lecture 3 CSCI 6636 4536 Spring 2017 CSCI 6636 4536 Lecture 3... 1/25 Spring 2017 1 / 25 Outline 1 Finite Languages Deterministic Finite State Machines Lexical Analysis

More information

CS Advanced Unix Tools & Scripting

CS Advanced Unix Tools & Scripting & Scripting Spring 2011 Hussam Abu-Libdeh Today s slides are from David Slater February 25, 2011 Hussam Abu-Libdeh Today s slides are from David Slater & Scripting Random Bash Tip of the Day The more you

More information

Table of contents. Our goal. Notes. Notes. Notes. Summer June 29, Our goal is to see how we can use Unix as a tool for developing programs

Table of contents. Our goal. Notes. Notes. Notes. Summer June 29, Our goal is to see how we can use Unix as a tool for developing programs Summer 2010 Department of Computer Science and Engineering York University Toronto June 29, 2010 1 / 36 Table of contents 1 2 3 4 2 / 36 Our goal Our goal is to see how we can use Unix as a tool for developing

More information

CST Lab #5. Student Name: Student Number: Lab section:

CST Lab #5. Student Name: Student Number: Lab section: CST8177 - Lab #5 Student Name: Student Number: Lab section: Working with Regular Expressions (aka regex or RE) In-Lab Demo - List all the non-user accounts in /etc/passwd that use /sbin as their home directory.

More information

CSE 390a Lecture 7. Regular expressions, egrep, and sed

CSE 390a Lecture 7. Regular expressions, egrep, and sed CSE 390a Lecture 7 Regular expressions, egrep, and sed slides created by Marty Stepp, modified by Jessica Miller and Ruth Anderson http://www.cs.washington.edu/390a/ 1 2 Lecture summary regular expression

More information

Lecture 3 Tonight we dine in shell. Hands-On Unix System Administration DeCal

Lecture 3 Tonight we dine in shell. Hands-On Unix System Administration DeCal Lecture 3 Tonight we dine in shell Hands-On Unix System Administration DeCal 2012-09-17 Review $1, $2,...; $@, $*, $#, $0, $? environment variables env, export $HOME, $PATH $PS1=n\[\e[0;31m\]\u\[\e[m\]@\[\e[1;34m\]\w

More information

- c list The list specifies character positions.

- c list The list specifies character positions. CUT(1) BSD General Commands Manual CUT(1)... 1 PASTE(1) BSD General Commands Manual PASTE(1)... 3 UNIQ(1) BSD General Commands Manual UNIQ(1)... 5 HEAD(1) BSD General Commands Manual HEAD(1)... 7 TAIL(1)

More information

5/8/2012. Exploring Utilities Chapter 5

5/8/2012. Exploring Utilities Chapter 5 Exploring Utilities Chapter 5 Examining the contents of files. Working with the cut and paste feature. Formatting output with the column utility. Searching for lines containing a target string with grep.

More information

This page covers the very basics of understanding, creating and using regular expressions ('regexes') in Perl.

This page covers the very basics of understanding, creating and using regular expressions ('regexes') in Perl. NAME DESCRIPTION perlrequick - Perl regular expressions quick start Perl version 5.16.2 documentation - perlrequick This page covers the very basics of understanding, creating and using regular expressions

More information

Unix as a Platform Exercises + Solutions. Course Code: OS 01 UNXPLAT

Unix as a Platform Exercises + Solutions. Course Code: OS 01 UNXPLAT Unix as a Platform Exercises + Solutions Course Code: OS 01 UNXPLAT Working with Unix Most if not all of these will require some investigation in the man pages. That's the idea, to get them used to looking

More information

Configuring the RADIUS Listener LEG

Configuring the RADIUS Listener LEG CHAPTER 16 Revised: July 28, 2009, Introduction This module describes the configuration procedure for the RADIUS Listener LEG. The RADIUS Listener LEG is configured using the SM configuration file p3sm.cfg,

More information

c) Comments do not cause any machine language object code to be generated. d) Lengthy comments can cause poor execution-time performance.

c) Comments do not cause any machine language object code to be generated. d) Lengthy comments can cause poor execution-time performance. 2.1 Introduction (No questions.) 2.2 A Simple Program: Printing a Line of Text 2.1 Which of the following must every C program have? (a) main (b) #include (c) /* (d) 2.2 Every statement in C

More information

Bioinformatics Programming. EE, NCKU Tien-Hao Chang (Darby Chang)

Bioinformatics Programming. EE, NCKU Tien-Hao Chang (Darby Chang) Bioinformatics Programming EE, NCKU Tien-Hao Chang (Darby Chang) 1 Regular Expression 2 http://rp1.monday.vip.tw1.yahoo.net/res/gdsale/st_pic/0469/st-469571-1.jpg 3 Text patterns and matches A regular

More information

Server-side Web Development (I3302) Semester: 1 Academic Year: 2017/2018 Credits: 4 (50 hours) Dr Antoun Yaacoub

Server-side Web Development (I3302) Semester: 1 Academic Year: 2017/2018 Credits: 4 (50 hours) Dr Antoun Yaacoub Lebanese University Faculty of Science Computer Science BS Degree Server-side Web Development (I3302) Semester: 1 Academic Year: 2017/2018 Credits: 4 (50 hours) Dr Antoun Yaacoub 2 Regular expressions

More information

STATS Data Analysis using Python. Lecture 15: Advanced Command Line

STATS Data Analysis using Python. Lecture 15: Advanced Command Line STATS 700-002 Data Analysis using Python Lecture 15: Advanced Command Line Why UNIX/Linux? As a data scientist, you will spend most of your time dealing with data Data sets never arrive ready to analyze

More information

CS 2112 Lab: Regular Expressions

CS 2112 Lab: Regular Expressions October 10, 2012 Regex Overview Regular Expressions, also known as regex or regexps are a common scheme for pattern matching regex supports matching individual characters as well as categories and ranges

More information

Module 8 Pipes, Redirection and REGEX

Module 8 Pipes, Redirection and REGEX Module 8 Pipes, Redirection and REGEX Exam Objective 3.2 Searching and Extracting Data from Files Objective Summary Piping and redirection Partial POSIX Command Line and Redirection Command Line Pipes

More information

CSE II-Sem)

CSE II-Sem) 1 2 a) Login to the system b) Use the appropriate command to determine your login shell c) Use the /etc/passwd file to verify the result of step b. d) Use the who command and redirect the result to a file

More information

Awk & Regular Expressions

Awk & Regular Expressions Awk & Regular Expressions CSCI-620 Dr. Bill Mihajlovic awk Text Editor awk, named after its developers Aho, Weinberger, and Kernighan. awk is UNIX utility. The awk command uses awk program to scan text

More information

Describing Languages with Regular Expressions

Describing Languages with Regular Expressions University of Oslo : Department of Informatics Describing Languages with Regular Expressions Jonathon Read 25 September 2012 INF4820: Algorithms for AI and NLP Outlook How can we write programs that handle

More information

CSCI 2132 Software Development. Lecture 7: Wildcards and Regular Expressions

CSCI 2132 Software Development. Lecture 7: Wildcards and Regular Expressions CSCI 2132 Software Development Lecture 7: Wildcards and Regular Expressions Instructor: Vlado Keselj Faculty of Computer Science Dalhousie University 20-Sep-2017 (7) CSCI 2132 1 Previous Lecture Pipes

More information

Regular Expressions Overview Suppose you needed to find a specific IPv4 address in a bunch of files? This is easy to do; you just specify the IP

Regular Expressions Overview Suppose you needed to find a specific IPv4 address in a bunch of files? This is easy to do; you just specify the IP Regular Expressions Overview Suppose you needed to find a specific IPv4 address in a bunch of files? This is easy to do; you just specify the IP address as a string and do a search. But, what if you didn

More information

Regular Expressions Primer

Regular Expressions Primer Regular Expressions Primer Jeremy Stephens Computer Systems Analyst Department of Biostatistics December 18, 2015 What are they? Regular expressions are a way to describe patterns in text. Why use them?

More information

C How to Program, 6/e by Pearson Education, Inc. All Rights Reserved.

C How to Program, 6/e by Pearson Education, Inc. All Rights Reserved. C How to Program, 6/e 1992-2010 by Pearson Education, Inc. An important part of the solution to any problem is the presentation of the results. In this chapter, we discuss in depth the formatting features

More information

ITST Searching, Extracting & Archiving Data

ITST Searching, Extracting & Archiving Data ITST 1136 - Searching, Extracting & Archiving Data Name: Step 1 Sign into a Pi UN = pi PW = raspberry Step 2 - Grep - One of the most useful and versatile commands in a Linux terminal environment is the

More information

Understanding Regular Expressions, Special Characters, and Patterns

Understanding Regular Expressions, Special Characters, and Patterns APPENDIXA Understanding Regular Expressions, Special Characters, and Patterns This appendix describes the regular expressions, special or wildcard characters, and patterns that can be used with filters

More information

More Examples. Lex/Flex/JLex

More Examples. Lex/Flex/JLex More Examples A FORTRAN-like real literal (which requires digits on either or both sides of a decimal point, or just a string of digits) can be defined as RealLit = (D + (λ. )) (D *. D + ) This corresponds

More information

Vi & Shell Scripting

Vi & Shell Scripting Vi & Shell Scripting Comp-206 : Introduction to Week 3 Joseph Vybihal Computer Science McGill University Announcements Sina Meraji's office hours Trottier 3rd floor open area Tuesday 1:30 2:30 PM Thursday

More information

Certification. String Processing with Regular Expressions

Certification. String Processing with Regular Expressions Certification String Processing with Regular Expressions UNIT 4 String Processing with Regular Expressions UNIT 4: Objectives? Learn how the regular expression pattern matching system works? Explore the

More information

COMP 4/6262: Programming UNIX

COMP 4/6262: Programming UNIX COMP 4/6262: Programming UNIX Lecture 12 shells, shell programming: passing arguments, if, debug March 13, 2006 Outline shells shell programming passing arguments (KW Ch.7) exit status if (KW Ch.8) test

More information

Basic Shell Scripting Practice. HPC User Services LSU HPC & LON March 2018

Basic Shell Scripting Practice. HPC User Services LSU HPC & LON March 2018 Basic Shell Scripting Practice HPC User Services LSU HPC & LON sys-help@loni.org March 2018 Quotation Exercise 1. Print out your $LOGNAME 2. Print date 3. Print `who am i` 4. Print your current directory

More information

http://xkcd.com/208/ 1. Review of pipes 2. Regular expressions 3. sed 4. awk 5. Editing Files 6. Shell loops 7. Shell scripts cat seqs.fa >0! TGCAGGTATATCTATTAGCAGGTTTAATTTTGCCTGCACTTGGTTGGGTACATTATTTTAAGTGTATTTGACAAG!

More information

System & Network Engineering. Regular Expressions ESA 2008/2009. Mark v/d Zwaag, Eelco Schatborn 22 september 2008

System & Network Engineering. Regular Expressions ESA 2008/2009. Mark v/d Zwaag, Eelco Schatborn 22 september 2008 1 Regular Expressions ESA 2008/2009 Mark v/d Zwaag, Eelco Schatborn eelco@os3.nl 22 september 2008 Today: Regular1 Expressions and Grammars Formal Languages Context-free grammars; BNF, ABNF Unix Regular

More information

Unleashing the Shell Hands-On UNIX System Administration DeCal Week 6 28 February 2011

Unleashing the Shell Hands-On UNIX System Administration DeCal Week 6 28 February 2011 Unleashing the Shell Hands-On UNIX System Administration DeCal Week 6 28 February 2011 Last time Compiling software and the three-step procedure (./configure && make && make install). Dependency hell and

More information

Shell Programming Overview

Shell Programming Overview Overview Shell programming is a way of taking several command line instructions that you would use in a Unix command prompt and incorporating them into one program. There are many versions of Unix. Some

More information

CS160A EXERCISES-FILTERS2 Boyd

CS160A EXERCISES-FILTERS2 Boyd Exercises-Filters2 In this exercise we will practice with the Unix filters cut, and tr. We will also practice using paste, even though, strictly speaking, it is not a filter. In addition, we will expand

More information

9.2 Linux Essentials Exam Objectives

9.2 Linux Essentials Exam Objectives 9.2 Linux Essentials Exam Objectives This chapter will cover the topics for the following Linux Essentials exam objectives: Topic 3: The Power of the Command Line (weight: 10) 3.3: Turning Commands into

More information

Scripting Languages Course 1. Diana Trandabăț

Scripting Languages Course 1. Diana Trandabăț Scripting Languages Course 1 Diana Trandabăț Master in Computational Linguistics - 1 st year 2017-2018 Today s lecture Introduction to scripting languages What is a script? What is a scripting language

More information

CMSC 330: Organization of Programming Languages. Ruby Regular Expressions

CMSC 330: Organization of Programming Languages. Ruby Regular Expressions CMSC 330: Organization of Programming Languages Ruby Regular Expressions 1 String Processing in Ruby Earlier, we motivated scripting languages using a popular application of them: string processing The

More information

Unix for Developers grep, sed, awk

Unix for Developers grep, sed, awk Unix for Developers grep, sed, Benedict Reuschling November 30, 2017 1 / 56 Overview In this part of the lecture we will look at grep, sed, and as tools for processing and analyzing of data. 2 / 56 grep

More information

5/20/2007. Touring Essential Programs

5/20/2007. Touring Essential Programs Touring Essential Programs Employing fundamental utilities. Managing input and output. Using special characters in the command-line. Managing user environment. Surveying elements of a functioning system.

More information

Regex Guide. Complete Revolution In programming For Text Detection

Regex Guide. Complete Revolution In programming For Text Detection Regex Guide Complete Revolution In programming For Text Detection What is Regular Expression In computing, a regular expressionis a specific pattern that provides concise and flexible means to "match"

More information

Bash Script. CIRC Summer School 2015 Baowei Liu

Bash Script. CIRC Summer School 2015 Baowei Liu Bash Script CIRC Summer School 2015 Baowei Liu Filename Expansion / Globbing Expanding filenames containing special characters Wild cards *?, not include... Square brackets [set]: - Special characters:!

More information

Introduction to Regular Expressions Version 1.3. Tom Sgouros

Introduction to Regular Expressions Version 1.3. Tom Sgouros Introduction to Regular Expressions Version 1.3 Tom Sgouros June 29, 2001 2 Contents 1 Beginning Regular Expresions 5 1.1 The Simple Version........................ 6 1.2 Difficult Characters........................

More information

Introduction to Perl. c Sanjiv K. Bhatia. Department of Mathematics & Computer Science University of Missouri St. Louis St.

Introduction to Perl. c Sanjiv K. Bhatia. Department of Mathematics & Computer Science University of Missouri St. Louis St. Introduction to Perl c Sanjiv K. Bhatia Department of Mathematics & Computer Science University of Missouri St. Louis St. Louis, MO 63121 Contents 1 Introduction 1 2 Getting started 1 3 Writing Perl scripts

More information

Digital UNIX. Programming Support Tools. Digital Equipment Corporation Maynard, Massachusetts

Digital UNIX. Programming Support Tools. Digital Equipment Corporation Maynard, Massachusetts Digital UNIX Programming Support Tools Order Number: AA-PS32D-TE March 1996 Product Version: Digital UNIX Version 4.0 or higher This manual describes commands and utilities for assisting in program development.

More information

Review of Fundamentals

Review of Fundamentals Review of Fundamentals 1 The shell vi General shell review 2 http://teaching.idallen.com/cst8207/14f/notes/120_shell_basics.html The shell is a program that is executed for us automatically when we log

More information

GNU Bash. an introduction to advanced usage. James Pannacciulli Systems Engineer.

GNU Bash. an introduction to advanced usage. James Pannacciulli Systems Engineer. Concise! GNU Bash http://talk.jpnc.info/bash_lfnw_2017.pdf an introduction to advanced usage James Pannacciulli Systems Engineer Notes about the presentation: This talk assumes you are familiar with basic

More information

Full file at C How to Program, 6/e Multiple Choice Test Bank

Full file at   C How to Program, 6/e Multiple Choice Test Bank 2.1 Introduction 2.2 A Simple Program: Printing a Line of Text 2.1 Lines beginning with let the computer know that the rest of the line is a comment. (a) /* (b) ** (c) REM (d)

More information

Wildcards and Regular Expressions

Wildcards and Regular Expressions CSCI 2132: Software Development Wildcards and Regular Expressions Norbert Zeh Faculty of Computer Science Dalhousie University Winter 2019 Searching Problem: Find all files whose names match a certain

More information

FILTERS USING REGULAR EXPRESSIONS grep and sed

FILTERS USING REGULAR EXPRESSIONS grep and sed FILTERS USING REGULAR EXPRESSIONS grep and sed We often need to search a file for a pattern, either to see the lines containing (or not containing) it or to have it replaced with something else. This chapter

More information

Welcome to the Bash Workshop!

Welcome to the Bash Workshop! Welcome to the Bash Workshop! If you prefer to work on your own, already know programming or are confident in your abilities, please sit in the back. If you prefer guided exercises, are completely new

More information

BASH SHELL SCRIPT 1- Introduction to Shell

BASH SHELL SCRIPT 1- Introduction to Shell BASH SHELL SCRIPT 1- Introduction to Shell What is shell Installation of shell Shell features Bash Keywords Built-in Commands Linux Commands Specialized Navigation and History Commands Shell Aliases Bash

More information

Control Flow Statements. Execute all the statements grouped in the brackets. Execute statement with variable set to each subscript in array in turn

Control Flow Statements. Execute all the statements grouped in the brackets. Execute statement with variable set to each subscript in array in turn Command Short Description awk cmds file(s) Invokes the awk commands (cmds) on the file or files (file(s)) $1 $2 $3... Denotes the first, second, third, and so on fields respectively in a file $0 Denotes

More information

Practical 02. Bash & shell scripting

Practical 02. Bash & shell scripting Practical 02 Bash & shell scripting 1 imac lab login: maclab password: 10khem 1.use the Finder to visually browse the file system (single click opens) 2.find the /Applications folder 3.open the Utilities

More information

CSCI 2132: Software Development

CSCI 2132: Software Development CSCI 2132: Software Development Lab 4/5: Shell Scripting Synopsis In this lab, you will: Learn to work with command-line arguments in shell scripts Learn to capture command output in variables Learn to

More information

CS214-AdvancedUNIX. Lecture 2 Basic commands and regular expressions. Ymir Vigfusson. CS214 p.1

CS214-AdvancedUNIX. Lecture 2 Basic commands and regular expressions. Ymir Vigfusson. CS214 p.1 CS214-AdvancedUNIX Lecture 2 Basic commands and regular expressions Ymir Vigfusson CS214 p.1 Shellexpansions Let us first consider regular expressions that arise when using the shell (shell expansions).

More information

COMS 6100 Class Notes 3

COMS 6100 Class Notes 3 COMS 6100 Class Notes 3 Daniel Solus September 1, 2016 1 General Remarks The class was split into two main sections. We finished our introduction to Linux commands by reviewing Linux commands I and II

More information

Lecture 5, Regular Expressions September 2014

Lecture 5, Regular Expressions September 2014 Lecture 5, Regular Expressions 36-350 10 September 2014 In Our Last Thrilling Episode Characters and strings Matching strings, splitting on strings, counting strings We need a ways to compute with patterns

More information

sottotitolo A.A. 2016/17 Federico Reghenzani, Alessandro Barenghi

sottotitolo A.A. 2016/17 Federico Reghenzani, Alessandro Barenghi Titolo presentazione Piattaforme Software per la Rete sottotitolo BASH Scripting Milano, XX mese 20XX A.A. 2016/17, Alessandro Barenghi Outline 1) Introduction to BASH 2) Helper commands 3) Control Flow

More information

Language Reference Manual

Language Reference Manual TAPE: A File Handling Language Language Reference Manual Tianhua Fang (tf2377) Alexander Sato (as4628) Priscilla Wang (pyw2102) Edwin Chan (cc3919) Programming Languages and Translators COMSW 4115 Fall

More information

PESIT Bangalore South Campus

PESIT Bangalore South Campus INTERNAL ASSESSMENT TEST - III Date : 09-11-2015 Marks : 0 Subject & Code : USP & 15CS36 Class : III ISE A & B Name of faculty : Prof. Ajoy Kumar Note: Solutions to ALL Questions Questions 1 a. Explain

More information

While Statement Examples. While Statement (35.15) Until Statement (35.15) Until Statement Example

While Statement Examples. While Statement (35.15) Until Statement (35.15) Until Statement Example While Statement (35.15) General form. The commands in the loop are performed while the condition is true. while condition one-or-more-commands While Statement Examples # process commands until a stop is

More information

Title:[ Variables Comparison Operators If Else Statements ]

Title:[ Variables Comparison Operators If Else Statements ] [Color Codes] Environmental Variables: PATH What is path? PATH=$PATH:/MyFolder/YourStuff?Scripts ENV HOME PWD SHELL PS1 EDITOR Showing default text editor #!/bin/bash a=375 hello=$a #No space permitted

More information

Regular Expressions. Computer Science and Engineering College of Engineering The Ohio State University. Lecture 9

Regular Expressions. Computer Science and Engineering College of Engineering The Ohio State University. Lecture 9 Regular Expressions Computer Science and Engineering College of Engineering The Ohio State University Lecture 9 Language Definition: a set of strings Examples Activity: For each above, find (the cardinality

More information

CS Unix Tools & Scripting Lecture 7 Working with Stream

CS Unix Tools & Scripting Lecture 7 Working with Stream CS2043 - Unix Tools & Scripting Lecture 7 Working with Streams Spring 2015 1 February 4, 2015 1 based on slides by Hussam Abu-Libdeh, Bruno Abrahao and David Slater over the years Announcements Course

More information