BGGN 213 Working with UNIX Barry Grant

Similar documents
Recap From Last Time:

Introduction To. Barry Grant

Recap From Last Time: Setup Checklist BGGN 213. Todays Menu. Introduction to UNIX.

Introduction To. Barry Grant

Contents. Note: pay attention to where you are. Note: Plaintext version. Note: pay attention to where you are... 1 Note: Plaintext version...

Introduction To Linux. Rob Thomas - ACRC

Computer Systems and Architecture

Lab 2: Linux/Unix shell

Table of contents. Our goal. Notes. Notes. Notes. Summer June 29, Our goal is to see how we can use Unix as a tool for developing programs

Unix/Linux Basics. Cpt S 223, Fall 2007 Copyright: Washington State University

The Unix Shell. Pipes and Filters

Perl and R Scripting for Biologists

Introduction to Linux

Processes. Shell Commands. a Command Line Interface accepts typed (textual) inputs and provides textual outputs. Synonyms:

Useful Unix Commands Cheat Sheet

Introduction p. 1 Who Should Read This Book? p. 1 What You Need to Know Before Reading This Book p. 2 How This Book Is Organized p.

Introduction to Unix The Windows User perspective. Wes Frisby Kyle Horne Todd Johansen

Linux Command Line Interface. December 27, 2017

Introduction: What is Unix?

Advanced Linux Commands & Shell Scripting

CS 3410 Intro to Unix, shell commands, etc... (slides from Hussam Abu-Libdeh and David Slater)

Introduction to the shell Part II

Computer Systems and Architecture

Introduction to UNIX command-line

Introduction to Linux Organizing Files

Chapter 4. Unix Tutorial. Unix Shell

CENG 334 Computer Networks. Laboratory I Linux Tutorial

Introduction to remote command line Linux. Research Computing Team University of Birmingham

Introduction to UNIX command-line II

CSE 390a Lecture 2. Exploring Shell Commands, Streams, Redirection, and Processes

Shell. SSE2034: System Software Experiment 3, Fall 2018, Jinkyu Jeong

Part 1: Basic Commands/U3li3es

System Administration

Introduction to Linux

Linux Bootcamp Fall 2015

Unix as a Platform Exercises + Solutions. Course Code: OS 01 UNXPLAT

CSE 303 Lecture 2. Introduction to bash shell. read Linux Pocket Guide pp , 58-59, 60, 65-70, 71-72, 77-80

Introduction Into Linux Lecture 1 Johannes Werner WS 2017

When talking about how to launch commands and other things that is to be typed into the terminal, the following syntax is used:

Lec 1 add-on: Linux Intro

Introduction to Linux

Running Programs in UNIX 1 / 30

M2PGER FORTRAN programming. General introduction. Virginie DURAND and Jean VIRIEUX 10/13/2013 M2PGER - ALGORITHME SCIENTIFIQUE

Merge Conflicts p. 92 More GitHub Workflows: Forking and Pull Requests p. 97 Using Git to Make Life Easier: Working with Past Commits p.

Unix as a Platform Exercises. Course Code: OS-01-UNXPLAT

Introduction to Linux for BlueBEAR. January

Linux Essentials Objectives Topics:

CSC209H Lecture 1. Dan Zingaro. January 7, 2015

Introduction to Supercomputing

CS 307: UNIX PROGRAMMING ENVIRONMENT KATAS FOR EXAM 2

Command-line interpreters

Chap2: Operating-System Structures

First of all, these notes will cover only a small subset of the available commands and utilities, and will cover most of those in a shallow fashion.

Virtual Machine. Linux flavor : Debian. Everything (except slides) preinstalled for you.

CSE 390a Lecture 2. Exploring Shell Commands, Streams, and Redirection

Unix Essentials. BaRC Hot Topics Bioinformatics and Research Computing Whitehead Institute October 12 th

EECS 470 Lab 5. Linux Shell Scripting. Friday, 1 st February, 2018

CSCI 2132 Software Development. Lecture 4: Files and Directories

Using UNIX. -rwxr--r-- 1 root sys Sep 5 14:15 good_program

CSCI 2132 Software Development. Lecture 5: File Permissions

The Shell. EOAS Software Carpentry Workshop. September 20th, 2016

Introduction in Unix. Linus Torvalds Ken Thompson & Dennis Ritchie

Cloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK

Cloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK

Linux Essentials. Programming and Data Structures Lab M Tech CS First Year, First Semester

Linux Command Line Primer. By: Scott Marshall

UNIX Essentials Featuring Solaris 10 Op System

Basic UNIX commands. HORT Lab 2 Instructor: Kranthi Varala

Chapter-3. Introduction to Unix: Fundamental Commands

Introduction to UNIX Command Line

Basic Linux (Bash) Commands

Introduction to UNIX. SURF Research Boot Camp April Jeroen Engelberts Consultant Supercomputing

unix intro Documentation

Introduction to the Linux Command Line January Presentation Topics

CSE Linux VM. For Microsoft Windows. Based on opensuse Leap 42.2

Linux Operating System Environment Computadors Grau en Ciència i Enginyeria de Dades Q2

The Unix Shell & Shell Scripts

Introduction to UNIX. Introduction EECS l UNIX is an operating system (OS). l Our goals:

A Brief Introduction to the Linux Shell for Data Science

1. What statistic did the wc -l command show? (do man wc to get the answer) A. The number of bytes B. The number of lines C. The number of words

Introduction to UNIX. Logging in. Basic System Architecture 10/7/10. most systems have graphical login on Linux machines

DATA 301 Introduction to Data Analytics Command Line. Dr. Ramon Lawrence University of British Columbia Okanagan

Why learn the Command Line? The command line is the text interface to the computer. DATA 301 Introduction to Data Analytics Command Line

Lab #2 Physics 91SI Spring 2013

The UNIX Shells. Computer Center, CS, NCTU. How shell works. Unix shells. Fetch command Analyze Execute

CSC 2500: Unix Lab Fall 2016

Introduction to Linux

Read the relevant material in Sobell! If you want to follow along with the examples that follow, and you do, open a Linux terminal.

Shell Programming Systems Skills in C and Unix

Files and Directories

Practical Linux examples: Exercises

UNIX. The Very 10 Short Howto for beginners. Soon-Hyung Yook. March 27, Soon-Hyung Yook UNIX March 27, / 29

Command Line Interface Techniques

Unix basics exercise MBV-INFX410

Introduction to UNIX. Introduction. Processes. ps command. The File System. Directory Structure. UNIX is an operating system (OS).

Introduction to UNIX. CSE 2031 Fall November 5, 2012

Open up a terminal, make sure you are in your home directory, and run the command.

Exploring UNIX: Session 3

Unix - Basics Course on Unix and Genomic Data Prague, January 2017

Tutorial-1b: The Linux CLI

Transcription:

BGGN 213 Working with UNIX Barry Grant http://thegrantlab.org/bggn213

Recap From Last Time: Motivation: Why we use UNIX for bioinformatics. Modularity, Programmability, Infrastructure, Reliability and Unix Philosophy Shell advantages: Makes your work less error-prone, more reproducible and less boring allowing you to automate repetitive tasks and concentrate on more exciting things. Key commands: Introduced the 21 key unix commands that you will use during ~95% of your future unix work Jetstream: Many bioinformatic tasks require large amounts of computing power and can t realistically be run on your own machine. These tasks are best performed using remote computers or cloud computing, which can only be accessed through a shell.

Basics File Control Viewing & Editing Files Misc. useful Power commands Process related ls mv less chmod grep top cd cp head echo find ps pwd mkdir tail wc sed kill man rm nano curl sudo Crl-c ssh scp (pipe) > (write to file) < (read from file) touch source git Crl-z cat R bg tmux python fg

Basics File Control Viewing & Editing Files Misc. useful Power commands Process related ls mv less chmod grep top cd cp head echo find ps pwd mkdir tail wc sed kill man rm nano curl sudo Crl-c ssh scp (pipe) > (write to file) < (read from file) touch source git Crl-z cat R bg tmux python fg

Working with Unix How do we actually use Unix?

Combining Utilities with Redirection (>, <) and Pipes ( ) The power of the shell lies in the ability to combine simple utilities (i.e. commands) into more complex algorithms very quickly. A key element of this is the ability to send the output from one command into a file or to pass it directly to another program. This is the job of >, < and

Side-Note: Standard Input and Standard Output streams Two very important concepts that unpin Unix workflows: Standard Output (stdout) - default destination of a program's output. It is generally the terminal screen. Standard Input (stdin) - default source of a program's input. It is generally the command line.

Do it Yourself! Output redirection and piping > ls ~/Desktop # stdin is ~/Desktop ; stdout to screen

Output redirection and piping > > ls ~/Desktop # stdin is ~/Desktop ; stdout to screen > ls ~/Desktop > mylist.txt # stdout redirected to file > ls ~/Desktop less # sdout piped to less (no file created)

Output redirection and piping > -arg > ls ~/Desktop # stdin is ~/Desktop ; stdout to screen > ls ~/Desktop > mylist.txt # stdout redirected to file > ls ~/Desktop less # sdout piped to less (no file created) > ls -l ~/Desktop # extra optional input argument -l

Output redirection and piping > ls ~/Desktop # stdin is ~/Desktop ; stdout to screen > ls ~/Desktop > mylist.txt # stdout redirected to file > ls ~/Desktop less # sdout piped to less (no file created) > ls /nodirexists/ 2> binlist.txt # stderr to file

Output redirection and piping Do it Yourself! > ls ~/Desktop # stdin is ~/Desktop ; stdout to screen > ls ~/Desktop > mylist.txt # stdout redirected to file > ls ~/Desktop less # sdout piped to less (no file created) > ls /nodirexists/ 2> binlist.txt # stderr to file

Do it Yourself! Output redirection and piping > 2> > ls ~/Desktop # stdin is ~/Desktop ; stdout to screen > ls ~/Desktop > mylist.txt # stdout redirected to file > ls ~/Desktop less # sdout piped to less (no file created) > ls /nodirexists/ 2> binlist.txt # stderr to file

Output redirection summary < << -arg > >> 2> 2>>

ls -l

ls -l > list_of_files

ls -l grep partial_name > list_of_files We have piped ( ) the stdout of one command into the stdin of another command!

ls -l /usr/bin/ grep tree > list_of_files Do it Yourself! grep: prints lines containing a string. Also searches for strings in text files.

Basics File Control Viewing & Editing Files Misc. useful Power commands Process related ls mv less chmod grep top cd cp head echo find ps pwd mkdir tail wc sed kill man rm nano curl sudo Crl-c ssh scp (pipe) > (write to file) < (read from file) touch source git Crl-z cat R bg tmux python fg

Side-Note: grep power command Do it Yourself! grep - prints lines containing a string pattern. Also searches for strings in text files, e.g. > grep --color "GESGKS" sequences/data/seqdump.fasta REVKLLLLGAGESGKSTIVKQMKIIHEAGYSEEECKQYK grep is a power tool that is often used with pipes as it accepts regular expressions as input (e.g. G..GK[ST] ) and has lots of useful options - see the man page for details.

grep example using Do it Yourself! regular expressions Suppose a program that you are working with complains that your input sequence file contains non-nucleotide characters. You can eye-ball your file or > grep -v "^>" seqdump.fasta grep --color "[^ATGC]" Exercises: (1). Use man grep to find out what the -v argument option is doing! (2). How could we also show line number for each match along with the output? (tip you can grep the output of man grep for line number )

grep example using Do it Yourself! regular expressions Suppose a program that you are working with complains that your input sequence file contains non-nucleotide characters. You can eye-ball your file or > grep -v "^>" seqdump.fasta grep --color -n "[^ATGC]" First we remove (with -v option) lines that start with a > character (these are sequence identifiers). Next we find characters that are not A, T, C or G. To do this we use ^ symbols second meaning: match anything but the pattern in square brackets. We also print line number (with -n option) and color output (with --color option).

Key Point: Pipes and redirects avoid unnecessary i/o Disc i/o is often a bottleneck in data processing! Pipes prevent unnecessary disc i/o operations by connecting the stdout of one process to the stdin of another (these are frequently called streams ) > program1 input.txt 2> program1.stderr \ program2 2> program2.stderr > results.txt Pipes and redirects allow us to build solutions from modular parts that work with stdin and stdout streams.

Unix Philosophy Revisited Write programs that do one thing and do it well. Write programs to work together and that encourage open standards. Write programs to handle text streams, because that is a universal interface. Doug McIlory

Pipes provide speed, flexibility and sometimes simplicity In 1986 Communications of the ACM magazine asked famous computer scientist Donald Knuth to write a simple program to count and print the k most common words in a file alongside their counts, in descending order. Kunth wrote a literate programming solution that was 7 pages long, and also highly customized to this problem (e.g. Kunth implemented a custom data structure for counting English words). Doug McIlroy replied with one line: > cat input.txt tr A-Z a-z sort uniq -c sort -rn sed 10q

Key Point: You can chain any number of programs together to achieve your goal! This allows you to build up fairly complex workflows within one command-line.

Do it Yourself! Shell scripting #!/bin/bash # This is a very simple hello world script. echo "Hello, world! Exercise: Create a "Hello world"-like script using command line tools and execute it. Copy and alter your script to redirect output to a file using > along with a list of files in your home directory. Alter your script to use >> instead of >. What effect does this have on its behavior?

Do it Yourself! Variables in shell scripts #!/bin/bash # Another simple hello world script message='hello World!' echo $message message - is a variable to which the string 'Hello World!' is assigned echo - prints to screen the contents of the variable "$message" See #6 @ https://swcarpentry.github.io/shell-novice/

Side-Note: Environment Variables

$PATH special environment variable What is the output of this command? > echo $PATH Note the structure: <path1>:<path2>:<path3> PATH is an environmental variable which the shell uses to search for commands typed on the command line without a full path. Exercise: Use the command env to discover more.

Summary Built-in unix shell commands allow for easy data manipulation (e.g. sort, grep, etc.) Commands can be easily combined to generate flexible solutions to data manipulation tasks. The unix shell allows users to automate repetitive tasks through the use of shell scripts that promote reproducibility and easy troubleshooting Introduced the 21 key unix commands that you will use during ~95% of your future unix work

Basics File Control Viewing & Editing Files Misc. useful Power commands Process related ls mv less chmod grep top cd cp head echo find ps pwd mkdir tail wc sed kill man rm nano curl sudo Crl-c ssh (pipe) > (write to file) < (read from file) touch source git Crl-z cat R bg python fg

Do it Yourself! Hands-on time Using Jetstream for Bioinformatics Running command-line BLAST Ortholog mapping (running large jobs) Visualizing results with R/RStudio

New commands sudo Execute a command with root permissions apt-get Package handling utility for updating & installing software curl Download data gunzip File compression and decompression blastp Command line BLAST shmlast Mapping orthologs from RNA-seq data

How to Get Working Best practices for organizing your computational biology projects

Read: Noble PLoS Comp Biol (2009) - A Quick Guide to Organizing Computational Biology Projects" http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000424 All files and directories used in your project should live in a single project directory. Use sub-directories to divide your project into sub-projects. Do not use spaces in file and directory names! Document your methods and workflows with plain text README files Also document the origin of all data in your project directory Also document the versions of the software that you ran and the options you used. Consider using Markdown for your documentation. Use version control and backup to multiple destinations! Be reproducible: http://ropensci.github.io/reproducibility-guide/sections/introduction/