More text file manipulation: sorting, cutting, pasting, joining, subsetting,

Size: px
Start display at page:

Download "More text file manipulation: sorting, cutting, pasting, joining, subsetting,"

Transcription

1 More text file manipulation: sorting, cutting, pasting, joining, subsetting, Laboratory of Genomics & Bioinformatics in Parasitology Department of Parasitology, ICB, USP

2 Inverse cat Last week we learned about viewing text with the cat (for concatenate) command cat sends the contents of one or more text fles (or STDIN) to STDOUT But what if you want to get the contents from each fle starting from the end instead of the beginning? Then, you invert cat and use the tac command cat file Program version: Date: Thu May 4 Seed:./18s Seed type: DNA Database: db/dma97 Database type: fastq Duplicate headers name: yes Expansion direction: both Number of threads: 4 Assembler used: abyss-pe tac file Assembler used: abyss-pe Number of threads: 4 Expansion direction: both Duplicate headers name: yes Database type: fastq Database: db/dma97 Seed type: DNA Seed:./18s Date: Thu May 4 Program version:

3 Getting columns Tabular fles frequently contain much more data than we are interested in It is thus sometimes useful to be able to get at just one or a few columns of the data The cut command allows us to do exactly that cut uses the TAB character as the default column delimiter Of course, there is an option to change that Another option tells the program which column(s) to retrieve

4 Getting columns Let s try! In the remote server ( ), run: cut -f 1 /data/column_example Notice that only the frst column (tab-delimited!) of the fle was sent to STDOUT Now try: cut -f 1 /data/column_example2 cut -f 1 -d, /data/column_example2 In the frst instance, nothing got cut, since there was no line containing a tab character (felds are separated by commas, in that fle) In the second instance, we change the feld delimiter to comma (,)

5 Quiz time! Go to the Moodle site and choose Quiz 24 (beware time limits!)

6 Sorting data One of the most basic, and useful, things a computer is used for is sorting data many complex data analysis algorithms depend on sorted data The sort command is the command-line tool that we can use for that sort sorts the contents of STDIN or one or more fles and sends the resulting data to STDOUT Notice that it said one or more fles! We can merge the contents of diferent fles while sorting By default, sorting is performed on the content of the whole line But there are many options that allow us to change sorting behavior Besides sorting, sort can also be use to just check whether some data is sorted

7 Sorting data There are diferent kinds of sorting available, e.g., numerical, alphabetical By default, the sort command does case sensitive lexical sorting Search felds are separated by white-space (spaces or TAB), but that can be changed using an option Let s try! In the remote server, as usual First, let s look at the contents of fle /data/column_example as they are Then, sort the fle s contents: sort /data/column_example That is the simplest sort command: uses the whole line, default delimiter, default search type

8 Sorting data Before avocado 12 lime 4 apple 5 banana 3 orange 6 date 20 After apple 5 avocado 12 banana 3 date 20 lime 4 orange 6 But what if we want the results to be in descending order? Or ordered using diferent felds (columns), instead of the whole line? Diferent feld delimiter than white-space? Or using numerical comparison?

9 There are diferent kinds of sorting available, e.g., numerical, alphabetical Let s try sorting numerically now (using the second column): sort -k 2 -n /data/column_example Option -k determines the column(s) to use in sorting As used above, we are telling the program to use column 2 and whatever comes afterwards; but there are many other ways: -k 2,2 : use just column 2 Sorting data -k 3,7 : use columns 3, 4, 5, 6, and 7 -k 4n,5d : sort by columns 4 (numerically) and 5 (dictionary order) -k 1,1n -k 2,2M : sort frst by column 1 (numerically), then break any ties by sorting column 2 (by month name, like JAN, FEB etc.)

10 Quiz time! Go to the Moodle site and choose Quiz 25 (beware time limits!)

11 Now you do it! Go to the Moodle site, Practical Exercise 23 Follow the instructions to answer the questions in the exercise (and beware any time limits!) Remember: in the PE, you should do things in practice before answering the question!

12 Unsorting data We have learned how to sort data to get it in order After that is done, there is no way to undo it, of course; the original order is lost (unless you kept the original fle, obviously) But we can do something else to get data out of sorted order: shufe That is done by the shuf command: shuf /data/shuf_example shuf works generally as a randomness generator For example: shuf -i 0-9 : print digits 0 to 9, one per line, randomly shuf -i 0-9 -r -n 50 : print 50 digits (0 to 9) randomly shuf -e heads tails -r -n 50 : simulate 50 coin tosses

13 Deleting parts of the data The Unix program colrm allows us to delete columns from the data (from STDIN) Here, a column is defned as a single character For example, the line: acbce example column 1 column 2 column 19 column 3 column 6

14 Deleting parts of the data The exception here is tab: each one counts for eight columns colrm gets two numbers to specify the columns The frst number is the frst column to remove; the second, the last column to remove If only one number is given, remove everything from there to the end of the line For example: colrm 8 <<< "acbce example" Will result in: acbce 1

15 Deleting parts of the data Another example: colrm 3 10 <<< "acbce example" Will result in: Finally: ac5 example colrm 3 30 <<< "acbce example" Will result in: ac That is: if the line is shorter than the second number, just delete to the end

16 Putting things together We have learned how get columns out of a fle with cut and colrm Some other commands, such as paste, join diferent fles into one Typically, each one of the input fles to paste will be present as a column in the new fle, creating a table For example: paste /data/p1 /data/p2 /data/p3 If you look at the original contents of those two fles, you will see that everything that was in the frst fle was put in column 1 of the output, while everything that was in the second one was put in column 2 One interesting option to paste is -s (for serial), which basically transposes the data. Try it!

17 Putting things together Another command to join fles is join join works as a very primitive database function It joins lines from two fles based on one common column containing an identifer string The fles must be sorted based on that common column! For example: join -j 1 /data/j4 /data/j5 If you look at the original contents of those two fles, you will see that column 1 was the join feld : whenever that was the same in both fles, the lines got joined in the output When there is no match, by default the line is not included in the output

18 Quiz time! Go to the Moodle site and choose Quiz 26 (beware time limits!)

19 Now you do it! Go to the Moodle site, Practical Exercise 24 Follow the instructions to answer the questions in the exercise (and beware any time limits!) Remember: in the PE, you should do things in practice before answering the question!

20 Translating characters Another way to modify a text stream is by modifying some characters from a list The tr (translate) program translates (i.e., substitutes), squeezes (i.e., eliminates repetitions), and deletes characters The default for tr is to translate: characters from set 1 will be replaced with characters from set 2 whenever found For example: tr a-z A-Z < /data/file_y If you look at the original contents of that fle, you will see that all lower-case letters have been changed to upper-case

21 Translating characters That could be written as: tr '[:lower:]' '[:upper:]' The [: :] part is what is called a character class There are several such classes recognized by the shell, and they are shortcut ways of referring to a set of characters by a name The character classes listed in the man and info pages for tr are (check it out for explanations on each): alnum blank digit lower punct upper alpha cntrl graph print space xdigit

22 Translating characters Some of the character classes, such as cntrl can be very useful to remove certain non-printing (invisible) characters from data, which often lead to errors in data processing and are hard to manually spot and edit out A useful option is -c (or -C or --complement), which will get the complement of the characters listed, i.e., whatever is not in the list For example: tr -d -c '[:alnum:]' < /data/p3 That command will delete everything in fle /data/p3 that is not a letter or number

23 Splitting data Sometimes a fle is too large, and you want to split it into smaller fles Maybe you want to send diferent parts to diferent people, or you only need a sample etc. The split command allows us to break fles into smaller subsets By default, it separates one thousand lines at a time into a new fle, then starts another fle with up to one thousand lines and so on But it is possible to ask for other criteria, such as number of total fnal fles you want or maximum amount of bytes per fle, for example For example: split /data/file_x This is the simplest case, and will split the fle into smaller fles, each containing 1000 lines from the original, until the end. Try it!

24 Splitting data By default, split creates new fles called xaa, xab, xac etc. Those names are made up of two parts: the prefx (x) and a sufx (aa, ab, ac etc.) Both the prefx and the length of the sufx (in number of characters) can be changed For example: split -n 10 file_1 P Now, the prefx for the fles created will be P instead of x Options control the length of the sufx and what (if any) additional sufx you want to use

25 Now you do it! Go to the Moodle site, Practical Exercise 25 Follow the instructions to answer the questions in the exercise (and beware any time limits!) Remember: in the PE, you should do things in practice before answering the question!

26 Recap The number of Unix text tools keeps on growing tac is an easy way to invert the lines of a fle One of the most basic computer tasks, being at the basis of many advanced data analysis algorithms, is sorting; sort is a very fexible program to do that shuf provides a randomness generator, which can be very useful Large data fles are often tabular in nature, and Unix has a few programs designed to deal with that kind of fle cut allows us to retrieve one or more columns from a fle

27 Recap paste and join, on the other hand, enable the building of larger, multi-column fles out of smaller fles containing the individual columns tr (translate) is a quick way of replacing or deleting certain characters colrm provides a way to delete certain columns (for example, from characters 5 to 9) from each line of a fle split lets us break a large fle into smaller ones, based on the number of lines or bytes that we would like to have in each resulting fle

More regular expressions, synchronizing data, comparing files

More regular expressions, synchronizing data, comparing files More regular expressions, synchronizing data, comparing files Laboratory of Genomics & Bioinformatics in Parasitology Department of Parasitology, ICB, USP Regular expressions POSIX regular expressions

More information

Links, basic file manipulation, environmental variables, executing programs out of $PATH

Links, basic file manipulation, environmental variables, executing programs out of $PATH Links, basic file manipulation, environmental variables, executing programs out of $PATH Laboratory of Genomics & Bioinformatics in Parasitology Department of Parasitology, ICB, USP The $PATH PATH (which

More information

Finding files and directories (advanced), standard streams, piping

Finding files and directories (advanced), standard streams, piping Finding files and directories (advanced), standard streams, piping Laboratory of Genomics & Bioinformatics in Parasitology Department of Parasitology, ICB, USP Finding files or directories When you have

More information

CS160A EXERCISES-FILTERS2 Boyd

CS160A EXERCISES-FILTERS2 Boyd Exercises-Filters2 In this exercise we will practice with the Unix filters cut, and tr. We will also practice using paste, even though, strictly speaking, it is not a filter. In addition, we will expand

More information

Linux Text Utilities 101 for S/390 Wizards SHARE Session 9220/5522

Linux Text Utilities 101 for S/390 Wizards SHARE Session 9220/5522 Linux Text Utilities 101 for S/390 Wizards SHARE Session 9220/5522 Scott D. Courtney Senior Engineer, Sine Nomine Associates March 7, 2002 http://www.sinenomine.net/ Table of Contents Concepts of the Linux

More information

Removing files and directories, finding files and directories, controlling programs

Removing files and directories, finding files and directories, controlling programs Removing files and directories, finding files and directories, controlling programs Laboratory of Genomics & Bioinformatics in Parasitology Department of Parasitology, ICB, USP Removing files Files can

More information

Part III. Shell Config. Tobias Neckel: Scripting with Bash and Python Compact Max-Planck, February 16-26,

Part III. Shell Config. Tobias Neckel: Scripting with Bash and Python Compact Max-Planck, February 16-26, Part III Shell Config Compact Course @ Max-Planck, February 16-26, 2015 33 Special Directories. current directory.. parent directory ~ own home directory ~user home directory of user ~- previous directory

More information

CS 124/LINGUIST 180 From Languages to Information

CS 124/LINGUIST 180 From Languages to Information CS 124/LINGUIST 180 From Languages to Information Unix for Poets Dan Jurafsky (original by Ken Church, modifications by Chris Manning) Stanford University Unix for Poets (based on Ken Church s presentation)

More information

- c list The list specifies character positions.

- c list The list specifies character positions. CUT(1) BSD General Commands Manual CUT(1)... 1 PASTE(1) BSD General Commands Manual PASTE(1)... 3 UNIQ(1) BSD General Commands Manual UNIQ(1)... 5 HEAD(1) BSD General Commands Manual HEAD(1)... 7 TAIL(1)

More information

Mineração de Dados Aplicada

Mineração de Dados Aplicada Simple but Powerful Text-Processing Commands August, 29 th 2018 DCC ICEx UFMG Unix philosophy Unix philosophy Doug McIlroy (inventor of Unix pipes). In A Quarter-Century of Unix (1994): Write programs

More information

CS 124/LINGUIST 180 From Languages to Information. Unix for Poets Dan Jurafsky

CS 124/LINGUIST 180 From Languages to Information. Unix for Poets Dan Jurafsky CS 124/LINGUIST 180 From Languages to Information Unix for Poets Dan Jurafsky (original by Ken Church, modifications by me and Chris Manning) Stanford University Unix for Poets Text is everywhere The Web

More information

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Week 02 Module 06 Lecture - 14 Merge Sort: Analysis

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Week 02 Module 06 Lecture - 14 Merge Sort: Analysis Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute Week 02 Module 06 Lecture - 14 Merge Sort: Analysis So, we have seen how to use a divide and conquer strategy, we

More information

Week Overview. Simple filter commands: head, tail, cut, sort, tr, wc grep utility stdin, stdout, stderr Redirection and piping /dev/null file

Week Overview. Simple filter commands: head, tail, cut, sort, tr, wc grep utility stdin, stdout, stderr Redirection and piping /dev/null file ULI101 Week 05 Week Overview Simple filter commands: head, tail, cut, sort, tr, wc grep utility stdin, stdout, stderr Redirection and piping /dev/null file head and tail commands These commands display

More information

More Scripting and Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

More Scripting and Regular Expressions. Todd Kelley CST8207 Todd Kelley 1 More Scripting and Regular Expressions Todd Kelley kelleyt@algonquincollege.com CST8207 Todd Kelley 1 Regular Expression Summary Regular Expression Examples Shell Scripting 2 Do not confuse filename globbing

More information

Lecture 3 Tonight we dine in shell. Hands-On Unix System Administration DeCal

Lecture 3 Tonight we dine in shell. Hands-On Unix System Administration DeCal Lecture 3 Tonight we dine in shell Hands-On Unix System Administration DeCal 2012-09-17 Review $1, $2,...; $@, $*, $#, $0, $? environment variables env, export $HOME, $PATH $PS1=n\[\e[0;31m\]\u\[\e[m\]@\[\e[1;34m\]\w

More information

(Refer Slide Time: 01:12)

(Refer Slide Time: 01:12) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #22 PERL Part II We continue with our discussion on the Perl

More information

Advanced training. Linux components Command shell. LiLux a.s.b.l.

Advanced training. Linux components Command shell. LiLux a.s.b.l. Advanced training Linux components Command shell LiLux a.s.b.l. alexw@linux.lu Kernel Interface between devices and hardware Monolithic kernel Micro kernel Supports dynamics loading of modules Support

More information

Using Microsoft Excel

Using Microsoft Excel Using Microsoft Excel Table of Contents The Excel Window... 2 The Formula Bar... 3 Workbook View Buttons... 3 Moving in a Spreadsheet... 3 Entering Data... 3 Creating and Renaming Worksheets... 4 Opening

More information

538 Text processing basics

538 Text processing basics 538 Text processing basics Jianguo Lu, University of Windsor September 12, 2018 Lu September 12, 2018 1 / 26 Table of contents 1 Unix commands grep command join command Lu September 12, 2018 2 / 26 View

More information

Table of contents. Our goal. Notes. Notes. Notes. Summer June 29, Our goal is to see how we can use Unix as a tool for developing programs

Table of contents. Our goal. Notes. Notes. Notes. Summer June 29, Our goal is to see how we can use Unix as a tool for developing programs Summer 2010 Department of Computer Science and Engineering York University Toronto June 29, 2010 1 / 36 Table of contents 1 2 3 4 2 / 36 Our goal Our goal is to see how we can use Unix as a tool for developing

More information

A Brief Introduction to the Linux Shell for Data Science

A Brief Introduction to the Linux Shell for Data Science A Brief Introduction to the Linux Shell for Data Science Aris Anagnostopoulos 1 Introduction Here we will see a brief introduction of the Linux command line or shell as it is called. Linux is a Unix-like

More information

CS Unix Tools. Fall 2010 Lecture 5. Hussam Abu-Libdeh based on slides by David Slater. September 17, 2010

CS Unix Tools. Fall 2010 Lecture 5. Hussam Abu-Libdeh based on slides by David Slater. September 17, 2010 Fall 2010 Lecture 5 Hussam Abu-Libdeh based on slides by David Slater September 17, 2010 Reasons to use Unix Reason #42 to use Unix: Wizardry Mastery of Unix makes you a wizard need proof? here is the

More information

UNIX / LINUX - REGULAR EXPRESSIONS WITH SED

UNIX / LINUX - REGULAR EXPRESSIONS WITH SED UNIX / LINUX - REGULAR EXPRESSIONS WITH SED http://www.tutorialspoint.com/unix/unix-regular-expressions.htm Copyright tutorialspoint.com Advertisements In this chapter, we will discuss in detail about

More information

day one day four today day five day three Python for Biologists

day one day four today day five  day three Python for Biologists Overview day one today 0. introduction 1. text output and manipulation 2. reading and writing files 3. lists and loops 4. writing functions day three 5. conditional statements 6. dictionaries day four

More information

Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny.

Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny. Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny stefano.gaiarsa@unimi.it Linux and the command line PART 1 Survival kit for the bash environment Purpose of the

More information

Lecture- 5. Introduction to Microsoft Excel

Lecture- 5. Introduction to Microsoft Excel Lecture- 5 Introduction to Microsoft Excel The Microsoft Excel Window Microsoft Excel is an electronic spreadsheet. You can use it to organize your data into rows and columns. You can also use it to perform

More information

Excel Tools for Internal Auditing

Excel Tools for Internal Auditing Excel Tools for Internal Auditing BONNIE MAXFIELD SMITH COUNTY INTERNAL AUDITOR Data Process Obtain Data Data Import Format Text to Columns Concatenate Macros Compare /Analyze IF Function Subtotal Random

More information

12. Pointers Address-of operator (&)

12. Pointers Address-of operator (&) 12. Pointers In earlier chapters, variables have been explained as locations in the computer's memory which can be accessed by their identifer (their name). This way, the program does not need to care

More information

The input can also be taken from a file and similarly the output can be redirected to another file.

The input can also be taken from a file and similarly the output can be redirected to another file. Filter A filter is defined as a special program, which takes input from standard input device and sends output to standard output device. The input can also be taken from a file and similarly the output

More information

CS 25200: Systems Programming. Lecture 11: *nix Commands and Shell Internals

CS 25200: Systems Programming. Lecture 11: *nix Commands and Shell Internals CS 25200: Systems Programming Lecture 11: *nix Commands and Shell Internals Dr. Jef Turkstra 2018 Dr. Jeffrey A. Turkstra 1 Lecture 11 Shell commands Basic shell internals 2018 Dr. Jeffrey A. Turkstra

More information

N I X S U P P L E M E N T

N I X S U P P L E M E N T L A B A B 3 : W E B & U N I X N I X S U P P L E M E N T Section 1 1. Make a new webpage for your Computer_class items, linked to your main webpage. To do this, make a new index.html file in your Web/Computer_class

More information

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview Chapter 888 Introduction This procedure generates D-optimal designs for multi-factor experiments with both quantitative and qualitative factors. The factors can have a mixed number of levels. For example,

More information

Introduction to Scripting using bash

Introduction to Scripting using bash Introduction to Scripting using bash Scripting versus Programming (from COMP10120) You may be wondering what the difference is between a script and a program, or between the idea of scripting languages

More information

Exploring the system, investigating hardware & system resources

Exploring the system, investigating hardware & system resources Exploring the system, investigating hardware & system resources Laboratory of Genomics & Bioinformatics in Parasitology Department of Parasitology, ICB, USP Ctrl+c/Ctrl+v in the shell? Paste, in Gnome

More information

5/8/2012. Exploring Utilities Chapter 5

5/8/2012. Exploring Utilities Chapter 5 Exploring Utilities Chapter 5 Examining the contents of files. Working with the cut and paste feature. Formatting output with the column utility. Searching for lines containing a target string with grep.

More information

Create your first workbook

Create your first workbook Create your first workbook You've been asked to enter data in Excel, but you've never worked with Excel. Where do you begin? Or perhaps you have worked in Excel a time or two, but you still wonder how

More information

Excel Tips for Compensation Practitioners Month 1

Excel Tips for Compensation Practitioners Month 1 Excel Tips for Compensation Practitioners Month 1 Introduction This is the first of what will be a weekly column with Excel tips for Compensation Practitioners. These tips will cover functions in Excel

More information

Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 2 Working with data in Excel and exporting to JMP Introduction

Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 2 Working with data in Excel and exporting to JMP Introduction Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 2 Working with data in Excel and exporting to JMP Introduction In this exercise, we will learn how to reorganize and reformat a data

More information

1. Lexical Analysis Phase

1. Lexical Analysis Phase 1. Lexical Analysis Phase The purpose of the lexical analyzer is to read the source program, one character at time, and to translate it into a sequence of primitive units called tokens. Keywords, identifiers,

More information

Functional Programming in Haskell Prof. Madhavan Mukund and S. P. Suresh Chennai Mathematical Institute

Functional Programming in Haskell Prof. Madhavan Mukund and S. P. Suresh Chennai Mathematical Institute Functional Programming in Haskell Prof. Madhavan Mukund and S. P. Suresh Chennai Mathematical Institute Module # 02 Lecture - 03 Characters and Strings So, let us turn our attention to a data type we have

More information

MAKING TABLES WITH WORD BASIC INSTRUCTIONS. Setting the Page Orientation. Inserting the Basic Table. Daily Schedule

MAKING TABLES WITH WORD BASIC INSTRUCTIONS. Setting the Page Orientation. Inserting the Basic Table. Daily Schedule MAKING TABLES WITH WORD BASIC INSTRUCTIONS Setting the Page Orientation Once in word, decide if you want your paper to print vertically (the normal way, called portrait) or horizontally (called landscape)

More information

Introduction to UNIX command-line II

Introduction to UNIX command-line II Introduction to UNIX command-line II Boyce Thompson Institute 2017 Prashant Hosmani Class Content Terminal file system navigation Wildcards, shortcuts and special characters File permissions Compression

More information

CSE 390a Lecture 2. Exploring Shell Commands, Streams, and Redirection

CSE 390a Lecture 2. Exploring Shell Commands, Streams, and Redirection 1 CSE 390a Lecture 2 Exploring Shell Commands, Streams, and Redirection slides created by Marty Stepp, modified by Jessica Miller & Ruth Anderson http://www.cs.washington.edu/390a/ 2 Lecture summary Unix

More information

Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

Regular Expressions. Todd Kelley CST8207 Todd Kelley 1 Regular Expressions Todd Kelley kelleyt@algonquincollege.com CST8207 Todd Kelley 1 POSIX character classes Some Regular Expression gotchas Regular Expression Resources Assignment 3 on Regular Expressions

More information

STATS Data Analysis using Python. Lecture 15: Advanced Command Line

STATS Data Analysis using Python. Lecture 15: Advanced Command Line STATS 700-002 Data Analysis using Python Lecture 15: Advanced Command Line Why UNIX/Linux? As a data scientist, you will spend most of your time dealing with data Data sets never arrive ready to analyze

More information

INFORMATION SHEET 24002/1: AN EXCEL PRIMER

INFORMATION SHEET 24002/1: AN EXCEL PRIMER INFORMATION SHEET 24002/1: AN EXCEL PRIMER How to use this document This guide to the basics of Microsoft Excel is intended for those people who use the program, but need or wish to know more than the

More information

Module 8 Pipes, Redirection and REGEX

Module 8 Pipes, Redirection and REGEX Module 8 Pipes, Redirection and REGEX Exam Objective 3.2 Searching and Extracting Data from Files Objective Summary Piping and redirection Partial POSIX Command Line and Redirection Command Line Pipes

More information

CS214-AdvancedUNIX. Lecture 2 Basic commands and regular expressions. Ymir Vigfusson. CS214 p.1

CS214-AdvancedUNIX. Lecture 2 Basic commands and regular expressions. Ymir Vigfusson. CS214 p.1 CS214-AdvancedUNIX Lecture 2 Basic commands and regular expressions Ymir Vigfusson CS214 p.1 Shellexpansions Let us first consider regular expressions that arise when using the shell (shell expansions).

More information

Essentials for Scientific Computing: Bash Shell Scripting Day 3

Essentials for Scientific Computing: Bash Shell Scripting Day 3 Essentials for Scientific Computing: Bash Shell Scripting Day 3 Ershaad Ahamed TUE-CMS, JNCASR May 2012 1 Introduction In the previous sessions, you have been using basic commands in the shell. The bash

More information

Getting Started with Amicus Document Assembly

Getting Started with Amicus Document Assembly Getting Started with Amicus Document Assembly How great would it be to automatically create legal documents with just a few mouse clicks? We re going to show you how to do exactly that and how to get started

More information

MITOCW watch?v=0jljzrnhwoi

MITOCW watch?v=0jljzrnhwoi MITOCW watch?v=0jljzrnhwoi The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To

More information

Basic Excel. Helen Mills OME-RESA

Basic Excel. Helen Mills OME-RESA Basic Excel Helen Mills OME-RESA Agenda Introduction- Highlight Basic Components of Microsoft Excel Entering & Formatting Data, Numbers, & Tables Calculating Totals & Summaries Using Formulas Conditional

More information

UNIX, GNU/Linux and simple tools for data manipulation

UNIX, GNU/Linux and simple tools for data manipulation UNIX, GNU/Linux and simple tools for data manipulation Dr Jean-Baka DOMELEVO ENTFELLNER BecA-ILRI Hub Basic Bioinformatics Training Workshop @ILRI Addis Ababa Wednesday December 13 th 2017 Dr Jean-Baka

More information

Lecture 3. Essential skills for bioinformatics: Unix/Linux

Lecture 3. Essential skills for bioinformatics: Unix/Linux Lecture 3 Essential skills for bioinformatics: Unix/Linux RETRIEVING DATA Overview Whether downloading large sequencing datasets or accessing a web application hundreds of times to download specific files,

More information

Week - 04 Lecture - 01 Merge Sort. (Refer Slide Time: 00:02)

Week - 04 Lecture - 01 Merge Sort. (Refer Slide Time: 00:02) Programming, Data Structures and Algorithms in Python Prof. Madhavan Mukund Department of Computer Science and Engineering Indian Institute of Technology, Madras Week - 04 Lecture - 01 Merge Sort (Refer

More information

Indian Institute of Technology Kharagpur. PERL Part III. Prof. Indranil Sen Gupta Dept. of Computer Science & Engg. I.I.T.

Indian Institute of Technology Kharagpur. PERL Part III. Prof. Indranil Sen Gupta Dept. of Computer Science & Engg. I.I.T. Indian Institute of Technology Kharagpur PERL Part III Prof. Indranil Sen Gupta Dept. of Computer Science & Engg. I.I.T. Kharagpur, INDIA Lecture 23: PERL Part III On completion, the student will be able

More information

Using UNIX. -rwxr--r-- 1 root sys Sep 5 14:15 good_program

Using UNIX. -rwxr--r-- 1 root sys Sep 5 14:15 good_program Using UNIX. UNIX is mainly a command line interface. This means that you write the commands you want executed. In the beginning that will seem inferior to windows point-and-click, but in the long run the

More information

Bashed One Too Many Times. Features of the Bash Shell St. Louis Unix Users Group Jeff Muse, Jan 14, 2009

Bashed One Too Many Times. Features of the Bash Shell St. Louis Unix Users Group Jeff Muse, Jan 14, 2009 Bashed One Too Many Times Features of the Bash Shell St. Louis Unix Users Group Jeff Muse, Jan 14, 2009 What is a Shell? The shell interprets commands and executes them It provides you with an environment

More information

Breeding Guide. Customer Services PHENOME-NETWORKS 4Ben Gurion Street, 74032, Nes-Ziona, Israel

Breeding Guide. Customer Services PHENOME-NETWORKS 4Ben Gurion Street, 74032, Nes-Ziona, Israel Breeding Guide Customer Services PHENOME-NETWORKS 4Ben Gurion Street, 74032, Nes-Ziona, Israel www.phenome-netwoks.com Contents PHENOME ONE - INTRODUCTION... 3 THE PHENOME ONE LAYOUT... 4 THE JOBS ICON...

More information

Scripting Languages Course 1. Diana Trandabăț

Scripting Languages Course 1. Diana Trandabăț Scripting Languages Course 1 Diana Trandabăț Master in Computational Linguistics - 1 st year 2017-2018 Today s lecture Introduction to scripting languages What is a script? What is a scripting language

More information

CHAPTER 1 GETTING STARTED

CHAPTER 1 GETTING STARTED GETTING STARTED WITH EXCEL CHAPTER 1 GETTING STARTED Microsoft Excel is an all-purpose spreadsheet application with many functions. We will be using Excel 97. This guide is not a general Excel manual,

More information

ITST Searching, Extracting & Archiving Data

ITST Searching, Extracting & Archiving Data ITST 1136 - Searching, Extracting & Archiving Data Name: Step 1 Sign into a Pi UN = pi PW = raspberry Step 2 - Grep - One of the most useful and versatile commands in a Linux terminal environment is the

More information

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Module 02 Lecture - 45 Memoization

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Module 02 Lecture - 45 Memoization Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute Module 02 Lecture - 45 Memoization Let us continue our discussion of inductive definitions. (Refer Slide Time: 00:05)

More information

Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore (Refer Slide Time: 00:20) Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore Lecture - 4 Lexical Analysis-Part-3 Welcome

More information

Regex, Sed, Awk. Arindam Fadikar. December 12, 2017

Regex, Sed, Awk. Arindam Fadikar. December 12, 2017 Regex, Sed, Awk Arindam Fadikar December 12, 2017 Why Regex Lots of text data. twitter data (social network data) government records web scrapping many more... Regex Regular Expressions or regex or regexp

More information

Unit 3 Fill Series, Functions, Sorting

Unit 3 Fill Series, Functions, Sorting Unit 3 Fill Series, Functions, Sorting Fill enter repetitive values or formulas in an indicated direction Using the Fill command is much faster than using copy and paste you can do entire operation in

More information

Unit 3 Functions Review, Fill Series, Sorting, Merge & Center

Unit 3 Functions Review, Fill Series, Sorting, Merge & Center Unit 3 Functions Review, Fill Series, Sorting, Merge & Center Function built-in formula that performs simple or complex calculations automatically names a function instead of using operators (+, -, *,

More information

Microsoft Word. Part 2. Hanging Indent

Microsoft Word. Part 2. Hanging Indent Microsoft Word Part 2 Hanging Indent 1 The hanging indent feature indents each line except the first line by the amount specified in the By field in the Paragraph option under the format option, as shown

More information

Magento Extension User Guide PRODUCTS FINDER SOLUTION. for Magento 2

Magento Extension User Guide PRODUCTS FINDER SOLUTION. for Magento 2 Magento Extension User Guide PRODUCTS FINDER SOLUTION for Magento 2 Table of contents 1. Key Features 1.1. Display flter in one related category 1.2. Create limitless number of product fnders 1.3. Auto

More information

/ Cloud Computing. Recitation 3 Sep 13 & 15, 2016

/ Cloud Computing. Recitation 3 Sep 13 & 15, 2016 15-319 / 15-619 Cloud Computing Recitation 3 Sep 13 & 15, 2016 1 Overview Administrative Issues Last Week s Reflection Project 1.1, OLI Unit 1, Quiz 1 This Week s Schedule Project1.2, OLI Unit 2, Module

More information

Pattern Matching. An Introduction to File Globs and Regular Expressions

Pattern Matching. An Introduction to File Globs and Regular Expressions Pattern Matching An Introduction to File Globs and Regular Expressions Copyright 2006 2009 Stewart Weiss The danger that lies ahead Much to your disadvantage, there are two different forms of patterns

More information

Dalhousie University CSCI 2132 Software Development Winter 2018 Lab 2, January 25

Dalhousie University CSCI 2132 Software Development Winter 2018 Lab 2, January 25 Dalhousie University CSCI 2132 Software Development Winter 2018 Lab 2, January 25 In this lab, you will first learn autocompletion, a feature of the Bash shell. You will also learn more about the command

More information

Lecture 05 I/O statements Printf, Scanf Simple statements, Compound statements

Lecture 05 I/O statements Printf, Scanf Simple statements, Compound statements Programming, Data Structures and Algorithms Prof. Shankar Balachandran Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture 05 I/O statements Printf, Scanf Simple

More information

CS Unix Tools & Scripting

CS Unix Tools & Scripting Cornell University, Spring 2014 1 February 7, 2014 1 Slides evolved from previous versions by Hussam Abu-Libdeh and David Slater Regular Expression A new level of mastery over your data. Pattern matching

More information

IB047. Unix Text Tools. Pavel Rychlý Mar 3.

IB047. Unix Text Tools. Pavel Rychlý Mar 3. Unix Text Tools pary@fi.muni.cz 2014 Mar 3 Unix Text Tools Tradition Unix has tools for text processing from the very beginning (1970s) Small, simple tools, each tool doing only one operation Pipe (pipeline):

More information

Excel Basics Fall 2016

Excel Basics Fall 2016 If you have never worked with Excel, it can be a little confusing at first. When you open Excel, you are faced with various toolbars and menus and a big, empty grid. So what do you do with it? The great

More information

Pattern Matching. An Introduction to File Globs and Regular Expressions. Adapted from Practical Unix and Programming Hunter College

Pattern Matching. An Introduction to File Globs and Regular Expressions. Adapted from Practical Unix and Programming Hunter College Pattern Matching An Introduction to File Globs and Regular Expressions Adapted from Practical Unix and Programming Hunter College Copyright 2006 2009 Stewart Weiss The danger that lies ahead Much to your

More information

Essential Skills for Bioinformatics: Unix/Linux

Essential Skills for Bioinformatics: Unix/Linux Essential Skills for Bioinformatics: Unix/Linux WORKING WITH COMPRESSED DATA Overview Data compression, the process of condensing data so that it takes up less space (on disk drives, in memory, or across

More information

COMS 6100 Class Notes 3

COMS 6100 Class Notes 3 COMS 6100 Class Notes 3 Daniel Solus September 1, 2016 1 General Remarks The class was split into two main sections. We finished our introduction to Linux commands by reviewing Linux commands I and II

More information

Practical Linux examples: Exercises

Practical Linux examples: Exercises Practical Linux examples: Exercises 1. Login (ssh) to the machine that you are assigned for this workshop (assigned machines: https://cbsu.tc.cornell.edu/ww/machines.aspx?i=87 ). Prepare working directory,

More information

DEVELOPING DATABASE APPLICATIONS (INTERMEDIATE MICROSOFT ACCESS, X405.5)

DEVELOPING DATABASE APPLICATIONS (INTERMEDIATE MICROSOFT ACCESS, X405.5) Technology & Information Management Instructor: Michael Kremer, Ph.D. Database Program: Microsoft Access Series DEVELOPING DATABASE APPLICATIONS (INTERMEDIATE MICROSOFT ACCESS, X405.5) Section 8 AGENDA

More information

Merge Conflicts p. 92 More GitHub Workflows: Forking and Pull Requests p. 97 Using Git to Make Life Easier: Working with Past Commits p.

Merge Conflicts p. 92 More GitHub Workflows: Forking and Pull Requests p. 97 Using Git to Make Life Easier: Working with Past Commits p. Preface p. xiii Ideology: Data Skills for Robust and Reproducible Bioinformatics How to Learn Bioinformatics p. 1 Why Bioinformatics? Biology's Growing Data p. 1 Learning Data Skills to Learn Bioinformatics

More information

Table of contents. Excel in English. Important information LAYOUT. Format Painter. Format Painter. Fix columns. Fix columns.

Table of contents. Excel in English. Important information LAYOUT. Format Painter. Format Painter. Fix columns. Fix columns. Table of contents 1. Excel in English 2. Important information 3. LAYOUT 4. Format Painter 5. Format Painter 6. Fix columns 7. Fix columns 8. Create a grid 9. Create a grid 10. Create a numeric sequence

More information

Hadoop streaming is an alternative way to program Hadoop than the traditional approach of writing and compiling Java code.

Hadoop streaming is an alternative way to program Hadoop than the traditional approach of writing and compiling Java code. title: "Data Analytics with HPC: Hadoop Walkthrough" In this walkthrough you will learn to execute simple Hadoop Map/Reduce jobs on a Hadoop cluster. We will use Hadoop to count the occurrences of words

More information

Excel Training - Beginner March 14, 2018

Excel Training - Beginner March 14, 2018 Excel Training - Beginner March 14, 2018 Working File File was emailed to you this morning, please log in to your email, download and open the file. Once you have the file PLEASE CLOSE YOUR EMAIL. Open

More information

User Commands sed ( 1 )

User Commands sed ( 1 ) NAME sed stream editor SYNOPSIS /usr/bin/sed [-n] script [file...] /usr/bin/sed [-n] [-e script]... [-f script_file]... [file...] /usr/xpg4/bin/sed [-n] script [file...] /usr/xpg4/bin/sed [-n] [-e script]...

More information

Unix as a Platform Exercises + Solutions. Course Code: OS 01 UNXPLAT

Unix as a Platform Exercises + Solutions. Course Code: OS 01 UNXPLAT Unix as a Platform Exercises + Solutions Course Code: OS 01 UNXPLAT Working with Unix Most if not all of these will require some investigation in the man pages. That's the idea, to get them used to looking

More information

Unix basics exercise MBV-INFX410

Unix basics exercise MBV-INFX410 Unix basics exercise MBV-INFX410 In order to start this exercise, you need to be logged in on a UNIX computer with a terminal window open on your computer. It is best if you are logged in on freebee.abel.uio.no.

More information

genome[phd14]:/home/people/phd14/alignment >

genome[phd14]:/home/people/phd14/alignment > Unix Introduction to Unix Shell There is a special type of window called shell or terminalwindow. Terminal windows are the principal vehicle of interaction with a UNIX machine. Their function is to perform

More information

Excel Shortcuts Increasing YOUR Productivity

Excel Shortcuts Increasing YOUR Productivity Excel Shortcuts Increasing YOUR Productivity CompuHELP Division of Tommy Harrington Enterprises, Inc. tommy@tommyharrington.com https://www.facebook.com/tommyharringtonextremeexcel Excel Shortcuts Increasing

More information

Basic Linux (Bash) Commands

Basic Linux (Bash) Commands Basic Linux (Bash) Commands Hint: Run commands in the emacs shell (emacs -nw, then M-x shell) instead of the terminal. It eases searching for and revising commands and navigating and copying-and-pasting

More information

Data. Selecting Data. Sorting Data

Data. Selecting Data. Sorting Data 1 of 1 Data Selecting Data To select a large range of cells: Click on the first cell in the area you want to select Scroll down to the last cell and hold down the Shift key while you click on it. This

More information

Pathologically Eclectic Rubbish Lister

Pathologically Eclectic Rubbish Lister Pathologically Eclectic Rubbish Lister 1 Perl Design Philosophy Author: Reuben Francis Cornel perl is an acronym for Practical Extraction and Report Language. But I guess the title is a rough translation

More information

Using Microsoft Word. Text Tools. Spell Check

Using Microsoft Word. Text Tools. Spell Check Using Microsoft Word In addition to the editing tools covered in the previous section, Word has a number of other tools to assist in working with test documents. There are tools to help you find and correct

More information

The toolbars at the top are the standard toolbar and the formatting toolbar.

The toolbars at the top are the standard toolbar and the formatting toolbar. Lecture 8 EXCEL Excel is a spreadsheet (all originally developed for bookkeeping and accounting). It is very useful for any mathematical or tabular operations. It allows you to make quick changes in input

More information

Using Microsoft Excel

Using Microsoft Excel Using Microsoft Excel Formatting a spreadsheet means changing the way it looks to make it neater and more attractive. Formatting changes can include modifying number styles, text size and colours. Many

More information

CSE 390a Lecture 2. Exploring Shell Commands, Streams, Redirection, and Processes

CSE 390a Lecture 2. Exploring Shell Commands, Streams, Redirection, and Processes CSE 390a Lecture 2 Exploring Shell Commands, Streams, Redirection, and Processes slides created by Marty Stepp, modified by Jessica Miller & Ruth Anderson http://www.cs.washington.edu/390a/ 1 2 Lecture

More information

The inverse of a matrix

The inverse of a matrix The inverse of a matrix A matrix that has an inverse is called invertible. A matrix that does not have an inverse is called singular. Most matrices don't have an inverse. The only kind of matrix that has

More information

Basic Unix Command. It is used to see the manual of the various command. It helps in selecting the correct options

Basic Unix Command. It is used to see the manual of the various command. It helps in selecting the correct options Basic Unix Command The Unix command has the following common pattern command_name options argument(s) Here we are trying to give some of the basic unix command in Unix Information Related man It is used

More information