Recap From Last Time: Setup Checklist BGGN 213. Todays Menu. Introduction to UNIX.

Similar documents
Introduction To. Barry Grant

Recap From Last Time:

BGGN 213 Working with UNIX Barry Grant

Introduction To. Barry Grant

Contents. Note: pay attention to where you are. Note: Plaintext version. Note: pay attention to where you are... 1 Note: Plaintext version...

Lab 2: Linux/Unix shell

Unix/Linux Basics. Cpt S 223, Fall 2007 Copyright: Washington State University

Perl and R Scripting for Biologists

Linux Command Line Interface. December 27, 2017

Introduction to Unix The Windows User perspective. Wes Frisby Kyle Horne Todd Johansen

Introduction: What is Unix?

Table of contents. Our goal. Notes. Notes. Notes. Summer June 29, Our goal is to see how we can use Unix as a tool for developing programs

Unix as a Platform Exercises. Course Code: OS-01-UNXPLAT

Introduction to Linux

When talking about how to launch commands and other things that is to be typed into the terminal, the following syntax is used:

Linux Bootcamp Fall 2015

Linux Command Line Primer. By: Scott Marshall

Chapter 4. Unix Tutorial. Unix Shell

Useful Unix Commands Cheat Sheet

Introduction p. 1 Who Should Read This Book? p. 1 What You Need to Know Before Reading This Book p. 2 How This Book Is Organized p.

Advanced Linux Commands & Shell Scripting

Short Read Sequencing Analysis Workshop

Introduction to Linux

A Brief Introduction to the Linux Shell for Data Science

Introduction to remote command line Linux. Research Computing Team University of Birmingham

Unix as a Platform Exercises + Solutions. Course Code: OS 01 UNXPLAT

Introduction To Linux. Rob Thomas - ACRC

CSE 303 Lecture 2. Introduction to bash shell. read Linux Pocket Guide pp , 58-59, 60, 65-70, 71-72, 77-80

Introduction to Linux for BlueBEAR. January

Computer Systems and Architecture

Using UNIX. -rwxr--r-- 1 root sys Sep 5 14:15 good_program

CSE Linux VM. For Microsoft Windows. Based on opensuse Leap 42.2

Mills HPC Tutorial Series. Linux Basics I

Introduction to Linux

Lec 1 add-on: Linux Intro

The Shell. EOAS Software Carpentry Workshop. September 20th, 2016

Linux at the Command Line Don Johnson of BU IS&T

DATA 301 Introduction to Data Analytics Command Line. Dr. Ramon Lawrence University of British Columbia Okanagan

Why learn the Command Line? The command line is the text interface to the computer. DATA 301 Introduction to Data Analytics Command Line

Introduction to Linux

Introduction to the shell Part II

unix intro Documentation

Introduction to UNIX. Logging in. Basic System Architecture 10/7/10. most systems have graphical login on Linux machines

The Unix Shell & Shell Scripts

The Unix Shell. Pipes and Filters

GNU/Linux 101. Casey McLaughlin. Research Computing Center Spring Workshop Series 2018

Shells and Shell Programming

Introduction to Supercomputing

Shell. SSE2034: System Software Experiment 3, Fall 2018, Jinkyu Jeong

Introduction to UNIX Command Line

Processes. Shell Commands. a Command Line Interface accepts typed (textual) inputs and provides textual outputs. Synonyms:

CSE 390a Lecture 2. Exploring Shell Commands, Streams, Redirection, and Processes

Connecting to ICS Server, Shell, Vim CS238P Operating Systems fall 18

Introduction to the Linux Command Line January Presentation Topics

CSCI 2132 Software Development. Lecture 4: Files and Directories

Review of Fundamentals

Basic UNIX commands. HORT Lab 2 Instructor: Kranthi Varala

1. What statistic did the wc -l command show? (do man wc to get the answer) A. The number of bytes B. The number of lines C. The number of words

Part 1: Basic Commands/U3li3es

Chapter-3. Introduction to Unix: Fundamental Commands

Working With Unix. Scott A. Handley* September 15, *Adapted from UNIX introduction material created by Dr. Julian Catchen

Introduction to the Linux Command Line

Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny.

CS 3410 Intro to Unix, shell commands, etc... (slides from Hussam Abu-Libdeh and David Slater)

CSC209H Lecture 1. Dan Zingaro. January 7, 2015

Computer Systems and Architecture

Shells. A shell is a command line interpreter that is the interface between the user and the OS. The shell:

Introduction to UNIX. SURF Research Boot Camp April Jeroen Engelberts Consultant Supercomputing

Working with Basic Linux. Daniel Balagué

Introduction to UNIX command-line

CS CS Tutorial 2 2 Winter 2018

Shells and Shell Programming

Open up a terminal, make sure you are in your home directory, and run the command.

Introduction in Unix. Linus Torvalds Ken Thompson & Dennis Ritchie

CSE 390a Lecture 3. bash shell continued: processes; multi-user systems; remote login; editors

Merge Conflicts p. 92 More GitHub Workflows: Forking and Pull Requests p. 97 Using Git to Make Life Easier: Working with Past Commits p.

A Brief Introduction to Unix

Running Programs in UNIX 1 / 30

Introduction to the UNIX command line

Introduction to Linux Part 1. Anita Orendt and Wim Cardoen Center for High Performance Computing 24 May 2017

Linux Operating System Environment Computadors Grau en Ciència i Enginyeria de Dades Q2

Introduction to UNIX command-line II

Introduction of Linux

Unix/Linux Operating System. Introduction to Computational Statistics STAT 598G, Fall 2011

UNIX. The Very 10 Short Howto for beginners. Soon-Hyung Yook. March 27, Soon-Hyung Yook UNIX March 27, / 29

M2PGER FORTRAN programming. General introduction. Virginie DURAND and Jean VIRIEUX 10/13/2013 M2PGER - ALGORITHME SCIENTIFIQUE

Cloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK

Review of Fundamentals. Todd Kelley CST8207 Todd Kelley 1

Introduction to Linux Workshop 1

Unix Tutorial Haverford Astronomy 2014/2015

EECS 470 Lab 5. Linux Shell Scripting. Friday, 1 st February, 2018

UNIX Quick Reference

Cloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK

Introduction to Linux Organizing Files

Introduction Into Linux Lecture 1 Johannes Werner WS 2017

Essentials for Scientific Computing: Bash Shell Scripting Day 3

Introduction to UNIX Shell Exercises

Linux Essentials. Programming and Data Structures Lab M Tech CS First Year, First Semester

Scripting Languages Course 1. Diana Trandabăț

Unix tutorial. Thanks to Michael Wood-Vasey (UPitt) and Beth Willman (Haverford) for providing Unix tutorials on which this is based.

Transcription:

Recap From Last Time: BGGN 213 Introduction to UNIX Barry Grant http://thegrantlab.org/bggn213 Substitution matrices: Where our alignment match and mis-match scores typically come from Comparing methods: The trade-off between sensitivity, selectivity and performance Sequence motifs and patterns: Finding functional cues from conservation patterns Sequence profiles and position specific scoring matrices (PSSMs), Building and searching with profiles, Their advantages and limitations PSI-BLAST algorithm: Application of iterative PSSM searching to improve BLAST sensitivity Hidden Markov models (HMMs): More versatile probabilistic model for detection of remote similarities Todays Menu Time Topics I 9:00-9:15 AM Setup and Motivation II 9:15-10:00 AM Beginning Unix 10:00-10:40 AM Hands on session III 10:40-11:00 AM Working with Unix IV 11:00-11:20 PM How to Get Working 11:20-11:45 PM Hands on session https://bioboot.github.io/bggn213_f17/setup/ Setup Checklist https://bioboot.github.io/bggn213_f17/setup/ Mac: Terminal or PC: MoblXterm PC: MoblXterm CygUtils plugin (core UNIX tools) Downloaded our class specific jetstream keyfile (See email: This is required for connecting to jetstream virtual machines with your terminal.) Problems? Example data downloaded: bggn213_01_unix.zip

Mac Terminal Lets get started SideNote: Terminal vs Shell Shell: A command-line interface that allows a user to interact with the operating system by typing commands. Terminal [emulator]: A graphical interface to the shell (i.e. the window you get when you launch MobaXterm). PC MobaXterm Shell prompt Introduction To Motivation Why do we use Unix? Shell

Modularity Core programs are modular and work well with others Modularity Core programs are modular and work well with others Programmability Best software development environment Programmability Best software development environment Infrastructure Access to existing tools and cuttingedge methods Infrastructure Access to existing tools and cuttingedge methods Reliability Unparalleled uptime and stability Reliability Unparalleled uptime and stability Unix Philosophy Encourages open standards Unix Philosophy Encourages open standards Modularity The Unix shell was designed to allow users to easily build complex workflows by interfacing smaller modular programs together. wget awk grep sort uniq plot An alternative approach is to write a single complex program that takes raw data as input, and after hours of data processing, outputs publication figures and a final table of results. Which would you prefer and why? Modular vs Custom All-in-one custom Monster program

Advantages/Disadvantages Unix Philosophy The monster approach is customized to a particular project but results in massive, fragile and difficult to modify (therefore inflexible, untransferable, and error prone) code. With modular workflows, it s easier to: Spot errors and figure out where they re occurring by inspecting intermediate results. Experiment with alternative methods by swapping out components. Tackle novel problems by remixing existing modular tools. Write programs that do one thing and do it well. Write programs to work together and that encourage open standards. Write programs to handle text streams, because that is a universal interface. Doug McIlory Unix family tree [1969-2010] Basics File Control Viewing & Editing Files Misc. useful Power commands Process related ls mv less chmod grep top cd cp head echo find ps pwd mkdir tail wc sed kill LINUX Mac OS X Source: https://commons.wikimedia.org/wiki/file:unix_history-simple.svg man rm nano curl sudo Crl-c ssh scp (pipe) (write to file) < (read from file) touch source git Crl-z cat R bg tmux python fg

Basics File Control Viewing & Editing Files Misc. useful Power commands Process related ls mv less chmod grep top cd cp head echo find ps pwd mkdir tail wc sed kill man rm nano curl sudo Crl-c Mac Terminal Lets get started ssh scp (pipe) (write to file) touch source git Crl-z cat R bg PC MobaXterm < (read from file) tmux python fg File System Structure Beginning Unix Getting started with basic Unix commands Information in the file system is stored in files, which are stored in directories (folders). Directories can also store other directories, which forms a directory tree. Download the example data and mv to your Desktop bggn213_01_unix.zip The forward slash character / is used to represent the root directory of the whole file system, and is also used to separate directory names. E.g. /home/jono/work/bggn213_notes.txt

Basics: Using the filesystem Side Note: File Paths ls cd pwd mkdir cp mv rm List files and directories Change directory (i.e. move to a different folder ) Print working directory (which folder are you in) MaKe a new DIRectories CoPy a file or directory to somewhere else MoVe a file or directory (basically rename) ReMove a file or directory An absolute path specifies a location from the root of the file system. E.g. /home/jono/work/bggn213_notes.txt A relative path specifies a location starting from the current location. E.g.../bggn213_notes.txt. Single dot. (for current directory).. Double dot.. (for parent directory) ~ Tilda ~ (for your home directory) [Tab] Pressing the tab key can autocomplete names Finding the Right Hammer (man and apropos) You can access the manual (i.e. user documentation) on a command with man, e.g: man pwd The man page is only helpful if you know the name of the command you re looking for. apropos will search the man pages for keywords. apropos "working directory" Inspecting text files less - visualize a text file: use arrow keys page down/page up with space / b keys search by typing "/" quit by typing "q" Also see: head, tail, cat, more

Creating text files Creating files can be done in a few ways: With a text editor (such as nano, emacs, or vi) With the touch command ( touch a_file) From the command line with cat or echo and redirection (more on this later) Creating and editing text files with nano In the terminal type: nano yourfilename.txt nano is a simple text editor that is recommended for first-time users. Other text editors have more powerful features but also steep learning curves There are many other text file editors (e.g. vim, emacs and sublime text, etc.) Connecting to remote machines (with ssh) Most high-performance computing (HPC) resources can only be accessed by ssh (Secure SHell) ssh [user@host.address] For example: ssh barry@bio3d.ucsd.edu ssh tb170077@ip_address Connecting to jetstream (with ssh) First we have to login online and fire-up a new virtual machine Full instructions here: https://bioboot.github.io/bggn213_f17/jetstream/boot/ We will go through this process in the last section today!

Copying to and from remote machines (scp) Basics File Control Viewing & Editing Files Misc. useful Power commands Process related The scp (Secure CoPy) command can be used to copy files and directories from one computer to another. scp [file] [user@host]:[destination] scp localfile.txt bgrant@bigcomputer.net:/remotedir/. ls mv less chmod grep top cd cp head echo find ps pwd mkdir tail wc sed kill man rm nano curl sudo Crl-c ssh scp (pipe) (write to file) < (read from file) touch source git Crl-z cat R bg tmux python fg Process refers to a running instance of a program top ps kill Crl-c Crl-z bg fg & Provides a real-time view of all running processes Report a snapshot of the current processes Terminate a process (the force quit of the unix world) Stop a job Suspend a job Resume a suspended job in the background Resume a suspended job in the foreground Start a job in the background Hands-on time Sections 1 to 3 of software carpentry UNIX lesson https://swcarpentry.github.io/shell-novice/ ~20 mins

Basics File Control Viewing & Editing Files Misc. useful Power commands Process related Working with Unix How do we actually use Unix? ls mv less chmod grep top cd cp head echo find ps pwd mkdir tail wc sed kill man rm nano curl sudo Crl-c ssh (pipe) (write to file) < (read from file) touch source git Crl-z cat R bg python fg Combining Utilities with Redirection (, <) and Pipes () The power of the shell lies in the ability to combine simple utilities (i.e. commands) into more complex algorithms very quickly. A key element of this is the ability to send the output from one command into a file or to pass it directly to another program. This is the job of, < and Side-Note: Standard Input and Standard Output streams Two very important concepts that unpin Unix workflows: Standard Output (stdout) - default destination of a program's output. It is generally the terminal screen. Standard Input (stdin) - default source of a program's input. It is generally the command line.

Output redirection and piping Output redirection and piping ls /usr/bin # stdin is /usr/bin ; stdout to screen ls /usr/bin # stdin is /usr/bin ; stdout to screen ls /usr/bin binlist.txt # stdout redirected to file ls /usr/bin less # sdout piped to less (no file created) Output redirection and piping Output redirection and piping -arg ls /usr/bin # stdin is /usr/bin ; stdout to screen ls /usr/bin binlist.txt # stdout redirected to file ls /usr/bin less # sdout piped to less (no file created) ls -l /usr/bin # extra optional input argument -l ls /usr/bin # stdin is /usr/bin ; stdout to screen ls /usr/bin binlist.txt # stdout redirected to file ls /usr/bin less # sdout piped to less (no file created) ls /nodirexists/ # stderr to screen

Output redirection and piping Output redirection and piping 2 ls /usr/bin # stdin is /usr/bin ; stdout to screen ls /usr/bin # stdin is /usr/bin ; stdout to screen ls /usr/bin binlist.txt # stdout redirected to file ls /usr/bin binlist.txt # stdout redirected to file ls /usr/bin less # sdout piped to less (no file created) ls /nodirexists/ binlist.txt # stderr to screen ls /usr/bin less # sdout piped to less (no file created) ls /nodirexists/ 2 binlist.txt # stderr to file < << -arg Output redirection summary ls -l 2 2

ls -l list_of_files ls -l grep partial_name list_of_files We have piped ( ) the stdout of one command into the stdin of another command! ls -l /usr/bin/ grep tree list_of_files Basics File Control Viewing & Editing Files Misc. useful Power commands Process related ls mv less chmod grep top cd cp head echo find ps pwd mkdir tail wc sed kill man rm nano curl sudo Crl-c grep: prints lines containing a string. Also searches for strings in text files. ssh (pipe) (write to file) < (read from file) touch source git Crl-z cat R bg python fg

Side-Note: grep power command grep example using regular expressions grep - prints lines containing a string pattern. Also searches for strings in text files, e.g. grep --color "GESGKS" sequences/data/seqdump.fasta REVKLLLLGAGESGKSTIVKQMKIIHEAGYSEEECKQYK grep is a power tool that is often used with pipes as it accepts regular expressions as input (e.g. G..GK[ST] ) and has lots of useful options - see the man page for details. Suppose a program that you are working with complains that your input sequence file contains non-nucleotide characters. You can eye-ball your file or grep -v "^" seqdump.fasta grep --color "[^ATGC]" Exercises: (1). Use man grep to find out what the -v argument option is doing! (2). How could we also show line number for each match along with the output? (tip you can grep the output of man grep for line number ) grep example using regular expressions Suppose a program that you are working with complains that your input sequence file contains non-nucleotide characters. You can eye-ball your file or grep -v "^" seqdump.fasta grep --color -n "[^ATGC]" First we remove (with -v option) lines that start with a character (these are sequence identifiers). Next we find characters that are not A, T, C or G. To do this we use ^ symbols second meaning: match anything but the pattern in square brackets. We also print line number (with -n option) and color output (with --color option). Key Point: Pipes and redirects avoid unnecessary i/o Disc i/o is often a bottleneck in data processing! Pipes prevent unnecessary disc i/o operations by connecting the stdout of one process to the stdin of another (these are frequently called streams ) program1 input.txt 2 program1.stderr \ program2 2 program2.stderr results.txt Pipes and redirects allow us to build solutions from modular parts that work with stdin and stdout streams.

Unix Philosophy Revisited Pipes provide speed, flexibility and sometimes simplicity Write programs that do one thing and do it well. Write programs to work together and that encourage open standards. Write programs to handle text streams, because that is a universal interface. Doug McIlory In 1986 Communications of the ACM magazine asked famous computer scientist Donald Knuth to write a simple program to count and print the k most common words in a file alongside their counts, in descending order. Kunth wrote a literate programming solution that was 7 pages long, and also highly customized to this problem (e.g. Kunth implemented a custom data structure for counting English words). Doug McIlroy replied with one line: cat input.txt tr A-Z a-z sort uniq -c sort -rn sed 10q Key Point: You can chain any number of programs together to achieve your goal! Hands-on time This allows you to build up fairly complex workflows within one command-line. Section 4 of software carpentry UNIX lesson https://swcarpentry.github.io/shell-novice/ ~15 mins

Shell scripting Variables in shell scripts #!/bin/bash # This is a very simple hello world script. echo "Hello, world! Exercise: Create a "Hello world"-like script using command line tools and execute it. Copy and alter your script to redirect output to a file using along with a list of files in your home directory. Alter your script to use instead of. What effect does this have on its behavior? #!/bin/bash # Another simple hello world script message='hello World!' echo $message message - is a variable to which the string 'Hello World!' is assigned echo - prints to screen the contents of the variable "$message" Side-Note: Environment Variables $PATH special environment variable What is the output of this command? echo $PATH Note the structure: <path1:<path2:<path3 PATH is an environmental variable which Bash uses to search for commands typed on the command line without a full path. Exercise: Use the command env to discover more.

Summary Built-in unix shell commands allow for easy data manipulation (e.g. sort, grep, etc.) Commands can be easily combined to generate flexible solutions to data manipulation tasks. The unix shell allows users to automate repetitive tasks through the use of shell scripts that promote reproducibility and easy troubleshooting Introduced the 21 key unix commands that you will use during ~95% of your future unix work Basics File Control Viewing & Editing Files Misc. useful Power commands Process related ls mv less chmod grep top cd cp head echo find ps pwd mkdir tail wc sed kill man rm nano curl sudo Crl-c ssh (pipe) (write to file) < (read from file) touch source git Crl-z cat R bg python fg Read: Noble PLoS Comp Biol (2009) - A Quick Guide to Organizing Computational Biology Projects" http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000424 How to Get Working 1) Connecting to jetstream 2) Best practices for organizing your computational biology projects All files and directories used in your project should live in a single project directory. Use sub-directories to divide your project into sub-projects. Do not use spaces in file and directory names! Document your methods and workflows with plain text README files Also document the origin of all data in your project directory Also document the versions of the software that you ran and the options you used. Consider using Markdown for your documentation. Use version control and backup to multiple destinations! Be reproducible: http://ropensci.github.io/reproducibility-guide/sections/introduction/