Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny.

Similar documents
Advanced Linux Commands & Shell Scripting

Command Line Interface The basics

Mills HPC Tutorial Series. Linux Basics I

Introduction to Linux. Fundamentals of Computer Science

A Brief Introduction to the Linux Shell for Data Science

Unix/Linux Basics. Cpt S 223, Fall 2007 Copyright: Washington State University

Working With Unix. Scott A. Handley* September 15, *Adapted from UNIX introduction material created by Dr. Julian Catchen

CISC 220 fall 2011, set 1: Linux basics

Introduction to Unix and Linux. Workshop 1: Directories and Files

Operating System Interaction via bash

Perl and R Scripting for Biologists

Introduction to Linux Workshop 1

Scripting Languages Course 1. Diana Trandabăț

Linux Introduction Martin Dahlö Valentin Georgiev

Introduction to UNIX command-line

Essential Unix and Linux! Perl for Bioinformatics, ! F. Pineda

Introduction to Linux

Table Of Contents. 1. Zoo Information a. Logging in b. Transferring files 2. Unix Basics 3. Homework Commands

Unix basics exercise MBV-INFX410

CSE 303 Lecture 2. Introduction to bash shell. read Linux Pocket Guide pp , 58-59, 60, 65-70, 71-72, 77-80

Lab 2: Linux/Unix shell

Part I. UNIX Workshop Series: Quick-Start

Linux Bootcamp Fall 2015

Linux Command Line Interface. December 27, 2017

Examples: Directory pathname: File pathname: /home/username/ics124/assignments/ /home/username/ops224/assignments/assn1.txt

Introduction to Linux Part 1. Anita Orendt and Wim Cardoen Center for High Performance Computing 24 May 2017

Chapter-3. Introduction to Unix: Fundamental Commands

Introduction: What is Unix?

UNIX, GNU/Linux and simple tools for data manipulation

Introduction to Linux. Woo-Yeong Jeong Computer Systems Laboratory Sungkyunkwan University

Table of contents. Our goal. Notes. Notes. Notes. Summer June 29, Our goal is to see how we can use Unix as a tool for developing programs

Operating Systems and Using Linux. Topics What is an Operating System? Linux Overview Frequently Used Linux Commands

Unix tutorial. Thanks to Michael Wood-Vasey (UPitt) and Beth Willman (Haverford) for providing Unix tutorials on which this is based.

CSE 390a Lecture 2. Exploring Shell Commands, Streams, and Redirection

Linux Command Line Primer. By: Scott Marshall

Using UNIX. -rwxr--r-- 1 root sys Sep 5 14:15 good_program

When talking about how to launch commands and other things that is to be typed into the terminal, the following syntax is used:

Contents. Note: pay attention to where you are. Note: Plaintext version. Note: pay attention to where you are... 1 Note: Plaintext version...

Introduction to Linux for BlueBEAR. January

Using LINUX a BCMB/CHEM 8190 Tutorial Updated (1/17/12)

CSE 390a Lecture 2. Exploring Shell Commands, Streams, Redirection, and Processes

Introduction to Linux

The Shell. EOAS Software Carpentry Workshop. September 20th, 2016

The Unix Shell & Shell Scripts

The Command Line. Matthew Bender. September 10, CMSC Command Line Workshop. Matthew Bender (2015) The Command Line September 10, / 25

Introduction to UNIX command-line II

Filesystem and common commands

Introduction to Unix: Fundamental Commands

Introduction. File System. Note. Achtung!

First of all, these notes will cover only a small subset of the available commands and utilities, and will cover most of those in a shallow fashion.

Working with Basic Linux. Daniel Balagué

Operating systems fundamentals - B02

Chapter 4. Unix Tutorial. Unix Shell

Introduction to Linux

CHE3935. Lecture 1. Introduction to Linux

COMS 6100 Class Notes 3

Reading and manipulating files

In this exercise you will practice working with HDFS, the Hadoop. You will use the HDFS command line tool and the Hue File Browser

Virtual Machine. Linux flavor : Debian. Everything (except slides) preinstalled for you.

Std: XI CHAPTER-3 LINUX

Introduction to Linux

Practical Session 0 Introduction to Linux

The Command Shell. Fundamentals of Computer Science

Principles of Bioinformatics. BIO540/STA569/CSI660 Fall 2010

Intro to Linux. this will open up a new terminal window for you is super convenient on the computers in the lab

GNU/Linux Course Lesson 1. Puria Nafisi

Introduction to Linux and Supercomputers

Chapter 1 - Introduction. September 8, 2016

EECS2301. Lab 1 Winter 2016

Chap2: Operating-System Structures

Physics REU Unix Tutorial

An Introduction to Unix Power Tools

Shells and Shell Programming

EE516: Embedded Software Project 1. Setting Up Environment for Projects

Linux Introduction Martin Dahlö

Command-line interpreters

DATA 301 Introduction to Data Analytics Command Line. Dr. Ramon Lawrence University of British Columbia Okanagan

Why learn the Command Line? The command line is the text interface to the computer. DATA 301 Introduction to Data Analytics Command Line

Linux & Shell Programming 2014

LING 408/508: Computational Techniques for Linguists. Lecture 5

Using the Zoo Workstations

EECS Software Tools. Lab 2 Tutorial: Introduction to UNIX/Linux. Tilemachos Pechlivanoglou

Helsinki 19 Jan Practical course in genome bioinformatics DAY 0

Unix Tutorial Haverford Astronomy 2014/2015

Introduction to Linux Spring 2014, Section 02, Lecture 3 Jason Tang

Carnegie Mellon. Linux Boot Camp. Jack, Matthew, Nishad, Stanley 6 Sep 2016

CS 3410 Intro to Unix, shell commands, etc... (slides from Hussam Abu-Libdeh and David Slater)

Cloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK

Linux Tutorial. Ken-ichi Nomura. 3 rd Magics Materials Software Workshop. Gaithersburg Marriott Washingtonian Center November 11-13, 2018

Introduction to Linux Organizing Files

Short Read Sequencing Analysis Workshop

Lab #2 Physics 91SI Spring 2013

Cloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK

A Hands-On Tutorial: RNA Sequencing Using High-Performance Computing

Answers to AWK problems. Shell-Programming. Future: Using loops to automate tasks. Download and Install: Python (Windows only.) R

An Introduction to Cluster Computing Using Newton

Linux Systems Administration Getting Started with Linux

Introduction of Linux

History. Terminology. Opening a Terminal. Introduction to the Unix command line GNOME

Introduction to UNIX. Logging in. Basic System Architecture 10/7/10. most systems have graphical login on Linux machines

Transcription:

Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny stefano.gaiarsa@unimi.it

Linux and the command line PART 1 Survival kit for the bash environment

Purpose of the lesson Familiarise with the command line interface (CLI) Why? Most of bioinformatics software is CLI based Lots of bioinformatic data is huge text files Lots of bioinformatic work is repetitive (bio)informatics is all about optimization

first of all: Linux doesn t bite! Take a look around the OS!

What is a command line interface?

Text based interface between user and computer Usually implemented with a shell Shell: computer program that takes commands (text input) and converts them to appropriate operating system functions (other programs) One of the most used shells is the bourne again shell BASH That s the one we will use!

Bash is a container with lots of different commands (tools)

Each command is very well suited for one simple task We can combine commands to do less trivial tasks

What is better? It depends I m trying to perform a simple task once I m not too worried by optimization Then, I do something visual I m processing huge amount of data/files and/ or I m performing a series of different tasks and/ or I need optimization and reproducibility Then, I use CLI

How does it look? Ctrl Alt t Username@Machine:Working_directory$

If bash is a language, statements are our sentences We have verbs (commands) We have objects (inputs) And adjectives, adverbs... (modifiers, variables, etc) We must have an idea of how to compose a statement to perform for our task (experience, google)

Each part of a statement after a command is called argument Command argument1 argument2 argument3

How can a command understand arguments? 1) Position 2) Prefixes Examples: A command y understands the first argument as the input and the second as the output Command input output A command z understands the argument after the prefix -i as input and and the argument after the prefix -o as output Command -i input -o output

How can we know how to use a command? Documentation command -help command -h command --help man command try with: man ls

Our first command is ls and it s used to list the files and folders in the working directory Exercise: list the files ordering them according to the last modification date

Some tips and recommendations 1) Remember, we can t use the mouse to move the cursor 2) Tab autocompletes 3) ctrl - c stops the process running 4) up/down arrow are used to see past statements, you can modify them and execute them again

Some tips and recommendations TAB press TAB once for autocomplete (if there is more than one possible command/file to autocomplete, TAB adds just the letters common to all possibilities) press TAB twice for the list of possible completions

Filesystem Filesystem: how files are organized in our HD In Linux (the OS we are using), it can be seen as a tree or a graph Each file can be seen as a node of the graph It has a parent node and can have one or more children nodes We have a starting file(directory), the root directory

The path Each file (directories are file too!) is defined by its position in the filesystem, called path The path is the address of the file, needed when we want to reach it Bash is not good with addresses, so we must be exact when writing the path of a file

Absolute and relative path Absolute path: complete address of the file, from the start (root directory) to the file itself Relative path: relative address of the file, from where we are to the file Think of phone numbers I want to call someone within the University of Pavia, his intern number is 9898 I m in the University: I dial 9898 - relative path I m outside the University: I dial 0382 98 9898 - relative path I m on Earth(root): I dial +39 0382 98 9898 - absolute path

4 1 2 3 5 6 Paths are directory names divided by forward slashes or back slashes (Windows)

Working directory The working directory can be seen as where we are you can type pwd to know the absolute path of the working dyrectory

Working directory (dual cam) Bash GUI

Change working directory It can be useful to change the working directory: cd change directory By default it sets my home directory as the working directory./ means the current working directory../ means the parent directory of the current working directory ~/ means the home directory

When I use the GUI, I combine the two commands that we know: cd - double click directory ls - view inside directory BUT, if I know the path, I can get to any place in the filesystem with just one line, without clicking at every folder level E.g: cd Desktop/root (two jumps )

Standard (glob) wildcards Can be used with bash commands to work with multiple files? - any single character * - any number of characters (even zero) [1-9] - range {1,2,3} - or [!5] - not

Standard wildcards Can be used with bash commands to work with multiple files Examples: List all files starting with gene contained in the working directory: ls gene* List all files starting with numbers 2 to 5 and ending with.tsv contained in the wd: ls [2-5]*.tsv

Exercises a) Start from root Go to folder 5 List the files contained b) List the files contained in folder 4 without leaving folder 5 c) Go to folder 4 List all files ending with.fasta 4 1 2 3 5 6

Moving, renaming, copying In bash there is only one command for moving and renaming files mv source directory mv source newname Copying is similar cp source directory cp source newname If source is a directory I will want to copy also the files contained in it: cp -r source newname/directory

Deleting files WARNING: when you delete a file from the command line it is deleted, you can t find it in the trash bin remove rm source If source is a directory I need to add -r rm -r source

> Output redirection >> By adding > filename after a command, we redirect its stdout to a new file named filename (if filename already exists, it is overwritten) By adding >> filename after a command, we redirect its stdout, appending it to a file named filename Let s try with ls

Piping commands We can also redirect the output of a command as the input of another command(s) command1 command2 command3 is called the pipe sign By piping we can combine multiple commands and create complex statements

Text file manipulation - visualizing Sometimes I need to explore a file without opening it into a text editor (I don t need to see the whole file, file is too big) Strategies: Reading it one screen at the time less filename Reading first 10 lines head filename Reading last 10 lines tail filename

Exercise: Use a combination of head and tail to print the 27th line of file toy.tsv (in folder 4) Hint: Using the optional argument -n number (e.g. -n 5) head and tail will show n lines instead of 10 Use pipe to combine two commands

File formats, exploiting structure Big files can be intimidating but we can exploit the way they are organized (formatted) to quickly edit them or extract useful information If the information is not organized we are out of luck If we don t know how the format works we need to read its documentation try to look inside file toy.csv in folder 4 and see if you can recognize any pattern

Text file manipulation - select columns cut your turn!! use the man page to understand how it works. Look for: - delimiter - fields (a.k.a. columns) Try to extract columns # 1,3 and 5 from a comma separated values (.csv) file (toy.csv in folder 4) Hint: by default cut uses tabs as delimiters

Text file manipulation - join files(1) cat Catenate Cat can be use to pass a text file to stdout cat filename Its main purpose is to join two files cat file1 file2 What if we want to join n files? Hint: wildcards

Text file manipulation - exercise File is, as always, toy.csv in folder 4 Execute the following operations and write the final result to a new file, you choose the name (no spaces), and move it to folder 2 (or create it directly in folder 2) Get lines 3 and 55 Get columns 1,3, and 5 Hint: create temporary files for the partial results (or not!) and delete them when you have finished (not the file containing the final result!)

Scripts We can write our own programs and scripts Scritps are lists of commands (in a given language) that are read and executed by an interpreter Some examples: python script.py argument2 argument3 RScript script.r argument2 argument3 perl script.pl argument2 argument3 sh script.sh argument2 argument3 in these cases, the command is the name of the interpreter, while the script is the first argument you can write scripts in bash too!

SEE YOU NEXT TIME!