Introduction to Unix/Linux INX_S17, Day 8,

Similar documents
Command-Line Data Analysis INX_S17, Day 10,

Introduction to Unix/Linux INX_S17, Day 6,

Introduction: What is Unix?

CS 307: UNIX PROGRAMMING ENVIRONMENT KATAS FOR EXAM 2

Command-Line Data Analysis INX_S17, Day 15,

Advanced Linux Commands & Shell Scripting

When talking about how to launch commands and other things that is to be typed into the terminal, the following syntax is used:

CSCI2467: Systems Programming Concepts

5/20/2007. Touring Essential Programs

Cloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK

Cloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK

Table of contents. Our goal. Notes. Notes. Notes. Summer June 29, Our goal is to see how we can use Unix as a tool for developing programs

EECS2301. Lab 1 Winter 2016

Shell. SSE2034: System Software Experiment 3, Fall 2018, Jinkyu Jeong

The Unix Shell. Pipes and Filters

The UNIX Shells. Computer Center, CS, NCTU. How shell works. Unix shells. Fetch command Analyze Execute

Bash Shell Programming Helps

Module 8 Pipes, Redirection and REGEX

Week Overview. Simple filter commands: head, tail, cut, sort, tr, wc grep utility stdin, stdout, stderr Redirection and piping /dev/null file

Review of Fundamentals

Lecture 5. Essential skills for bioinformatics: Unix/Linux

Working With Unix. Scott A. Handley* September 15, *Adapted from UNIX introduction material created by Dr. Julian Catchen

Shells. A shell is a command line interpreter that is the interface between the user and the OS. The shell:

Introduction To. Barry Grant

sottotitolo A.A. 2016/17 Federico Reghenzani, Alessandro Barenghi

Scripting. More Shell Scripts Loops. Adapted from Practical Unix and Programming Hunter College

Unix basics exercise MBV-INFX410

Recap From Last Time:

BGGN 213 Working with UNIX Barry Grant

Processes. Shell Commands. a Command Line Interface accepts typed (textual) inputs and provides textual outputs. Synonyms:

Bash Script. CIRC Summer School 2015 Baowei Liu

The Shell. EOAS Software Carpentry Workshop. September 20th, 2016

Essential Skills for Bioinformatics: Unix/Linux

Basics. I think that the later is better.

Linux Command Line Interface. December 27, 2017

Linux II and III. Douglas Scofield. Crea-ng directories and files 15/08/16. Evolu6onary Biology Centre, Uppsala University

Introduction to UNIX Shell Exercises

Essential Skills for Bioinformatics: Unix/Linux

Unix Processes. What is a Process?

Lecture 3. Unix. Question? b. The world s best restaurant. c. Being in the top three happiest countries in the world.

CST Algonquin College 2

CS 307: UNIX PROGRAMMING ENVIRONMENT FIND COMMAND

Command Interpreters. command-line (e.g. Unix shell) On Unix/Linux, bash has become defacto standard shell.

Unix/Linux Basics. Cpt S 223, Fall 2007 Copyright: Washington State University

Part III. Shell Config. Tobias Neckel: Scripting with Bash and Python Compact Max-Planck, February 16-26,

Review of Fundamentals. Todd Kelley CST8207 Todd Kelley 1

Command-line interpreters

Introduction to UNIX Part II

System Programming. Introduction to Unix

UNIX COMMANDS AND SHELLS. UNIX Programming 2015 Fall by Euiseong Seo

Lecture 3. Essential skills for bioinformatics: Unix/Linux

UNIX. The Very 10 Short Howto for beginners. Soon-Hyung Yook. March 27, Soon-Hyung Yook UNIX March 27, / 29

Chapter 9. Shell and Kernel

System Programming. Unix Shells

Operating Systems. Lecture 07. System Calls, Input/Output and Error Redirection, Inter-process Communication in Unix/Linux

Wildcards and Regular Expressions

CS 25200: Systems Programming. Lecture 11: *nix Commands and Shell Internals

find starting-directory -name filename -user username

Review of Fundamentals. Todd Kelley CST8207 Todd Kelley 1

Practical Linux Examples

Introduction to Shell Scripting

More Scripting and Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

CSCI 2132 Software Development. Lecture 5: File Permissions

Shells and Shell Programming

Pipes and FIFOs. Woo-Yeong Jeong Computer Systems Laboratory Sungkyunkwan University

Unix Shell. Advanced Shell Tricks

Introduction to the shell Part II

CSC209H Lecture 1. Dan Zingaro. January 7, 2015

Introduction to Linux

COMP 4/6262: Programming UNIX

5/8/2012. Exploring Utilities Chapter 5

Linux II and III. Douglas Scofield. Crea-ng directories and files 18/01/14. Evolu5onary Biology Centre, Uppsala University

Introduction to Unix The Windows User perspective. Wes Frisby Kyle Horne Todd Johansen

Bioinformatics. Computational Methods I: Genomic Resources and Unix. George Bell WIBR Biocomputing Group

Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny.

Unix. Examples: OS X and Ubuntu

CMPS 12A Introduction to Programming Lab Assignment 7

Process Management! Goals of this Lecture!

STATS Data Analysis using Python. Lecture 15: Advanced Command Line

Unleashing the Shell Hands-On UNIX System Administration DeCal Week 6 28 February 2011

ERIE COMMUNITY COLLEGE COURSE OUTLINE. DA155 Operating Systems and Shell Scripting

ERTH 401 / GEOP 501 Computational Methods for Geoscientists. Lecture 04: Functions

A shell can be used in one of two ways:

Programming Languages and Uses in Bioinformatics

Answers to AWK problems. Shell-Programming. Future: Using loops to automate tasks. Download and Install: Python (Windows only.) R

Chapter 1 - Introduction. September 8, 2016

Lab #2 Physics 91SI Spring 2013

22-Sep CSCI 2132 Software Development Lecture 8: Shells, Processes, and Job Control. Faculty of Computer Science, Dalhousie University

UNIX Essentials Featuring Solaris 10 Op System

Scripting Languages Course 1. Diana Trandabăț

Lec 1 add-on: Linux Intro

Introduction to UNIX command-line II

Introduction to the Linux Command Line January Presentation Topics

Lecture 8: Structs & File I/O

Files and Directories

Process Management 1

The Unix Shell & Shell Scripts

BASH and command line utilities Variables Conditional Commands Loop Commands BASH scripts

CSE 15L Winter Midterm :) Review

CSE 374 Midterm Exam Sample Solution 2/6/12

Transcription:

Introduction to Unix/Linux INX_S17, Day 8, 2017-04-21 stdin, stdout, stderr, piping, iterative filtering, grep, cat, UUOC Learning Outcome(s): Redirect the standard output to the standard input stream of another program. Distinguish between the standard output and standard error streams Utilize the tools grep and wc for basic analysis of bioinformatics data. Matthew Peterson, OSU CGRB, matthew@cgrb.oregonstate.edu Please do not redistribute outside of OSU; contains copyrighted materials.

Papilio zelicaon FASTA Common anise swallowtail butterfly of western North America pz_cdnas.fasta from the book contains 471 de-novo assembled transcript sequences ("without a reference genome" assembled DNA sequences, that are copied to RNA (mrna); first step of gene expression) http://teaching.cgrb.oregonstate.edu/cgrb/oneil/primer/parti_8_the_standard_streams.html Image attribution User:Calibas: https://en.wikipedia.org/wiki/papilio_zelicaon 1

fasta_stats A Python script from the book Generates statistics about a FASTA file it's supplied./fasta_stats Usage: fasta_dna_stats <fasta_file> This script is for informational purposes only, and requires that the input file be a DNA (As, Ts, Cs, and Gs) FASTA-formatted file../fasta_stats pz_cdnas_sample.fasta # Only 2 sequences in this FASTA file http://teaching.cgrb.oregonstate.edu/cgrb/oneil/primer/parti_8_the_standard_streams.html 2

PZ7180000031590 example 1 2 3 4 5 6 7 8 PZ7180000031590 0.378 486 ACAAA 5 unit:attta 10 pentanucleotide 2: GC content of 37.8% (vs. AT) 3: 486 base pairs (bp) long 4: Most common 5-bp sequence is ACAAA 5: This 5-bp sequence ACAAA appears 5 times 6: Longest perfect repeat sequence is ATTTA 7: and is 10bp long 8: caused by the pentanucleotide ATTTA appearing twice (ATTTAATTTA) 3

stdout redirection?./fasta_stats pz_cdnas_sample.fasta > pz_sample_stats.txt Processing sequence ID PZ7180000031590 Processing sequence ID PZ7180000000004_TX Why was some output sent to the file (>) and some (2 lines) appeared on the terminal? Was everything from stdout redirected to the file? File contains everything but the above two lines? less S pz_sample_stats.txt 4

stderr Standard Error The information printed to the terminal is coming not from stdout but from a stderr ("standard error") By default stderr is stderr usually contains, e.g., the following was not part of the results, just an FYI: Processing sequence ID PZ7180000031590 Processing sequence ID PZ7180000000004_TX 5

stdout and stderr * * Programs produce 2 streams: stdout that can be redirected via > stderr that by default is printed to the terminal 6

Capturing stderr to a file on tcsh As 2> to capture stderr to a file only works on bash, there is a workaround for tcsh: (./fasta_stats pz_cdnas_sample.fasta > pz_sample_stats.txt ) >& pz_sample_stats.err.txt Two independent redirects are run: ( cmd > output ) results in the redirect of stdout first The remainder of stderr is redirected by >& http://teaching.cgrb.oregonstate.edu/cgrb/oneil/primer/parti_8_the_standard_streams.html#t cshstderr 7

Capturing stdout and stderr to a file If you wanted to capture both stdout and stderr to the same file you can use: >&./fasta_stats pz_cdnas_sample.fasta >& pz_sample_stats_all.txt Nothing appears on the terminal Everything is captured to the file http://teaching.cgrb.oregonstate.edu/cgrb/oneil/primer/parti_8_the_standard_streams.html#t cshstderr 8

grep Extracting a pattern grep '<pattern>' file./fasta_stats pz_cdnas.fasta > pz_stats.txt grep 'unit:' pz_stats.txt This ignores the informational lines and only shows the output we care about (that contains the pattern 'unit:' that we specified). To not match a pattern use -v grep v 'unit:' pz_stats.txt 9

grep results to a file grep 'unit:' pz_stats.txt > pz_stats.table less -S pz_stats.table 10

wc Word count (lines, words, chars) wc <file> wc pz_stats.table To count only lines (not words or characters): l wc -l pz_stats.table 11

Reading from stdin (Standard input) * stdin is a for programs (other than reading from files directly). By default, 12

stdin example using ("pipe") grep wc rm pz_stats.table grep 'unit:' pz_stats.txt wc 13

More piping examples./fasta_stats pz_cdnas.fasta grep 'unit:' wc stdout stdin stdout stdin Two pipes ( ) chaining: fasta_stats, which pipes ( ) its stdout to: stdin of grep, which pipes ( ) its stdout to: stdin of wc, which outputs to stdout (terminal) Note: stderr is still printed to the terminal! 14

cat <filename> cat Concatenate files It can be used to send files to stdout cat <file1> <file2> <file3> It can also be used to concatenate files to stdout Q: Does the following work? cat <file1> grep 'pattern' 15

\ Commands over multiple lines./fasta_stats pz_cdnas.fasta \ grep 'unit:' \ wc In tcsh appears as:./fasta_stats pz_cdnas.fasta \? grep 'unit:' \? wc What happens if we up arrow post run? 16

Pipelines! cmd1 cmd2 cmd3 cmd4 cmd5 Chaining several commands together with pipes ( ) on a line is is known as a pipeline! 17

Pipelines in general The term pipeline is often used for any series of steps from input data to output data, e.g., the Muscle/HMMER pipeline we built. 18

Finding AT repeats./fasta_stats pz_cdnas.fasta \ grep 'unit:at' \ less -S Matches ATT and ATG, not just AT Not exactly what we wanted 19

Filtering for AT dinucleotides./fasta_stats pz_cdnas.fasta \ grep 'unit:at' \ grep 'dinucleotide' less -S Good, now we're getting just the ATs 20

Counting AT dinucleotides via wc -l./fasta_stats pz_cdnas.fasta \ grep 'unit:at' \ grep 'dinucleotide' wc l "Processing sequence ID" to stderr is not counted Finally tally of AT dinucleotides: 22 Q: What did we just do? A: Iterative development! 21

Script time! (reuse this code) nano count_ats.sh $# Is the #!/bin/tcsh if ( $#!= 1 ) then echo "Wrong number of parameters" echo "Usage: $0 <fasta_file>" exit endif setenv file $1 fasta_stats $file \ grep 'unit:at' \ grep 'dinucleotide' \ wc l 22

Command / Concept Review > stdout (cmd) >& stderr stdin grep 'pattern' Pipelines wc Iterative cat development \ 23