Introduction to UNIX command-line II

Similar documents
Introduction to UNIX command-line

Useful Unix Commands Cheat Sheet

Computer Systems and Architecture

Using Linux as a Virtual Machine

Computer Systems and Architecture

Cloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK

Cloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK

Mills HPC Tutorial Series. Linux Basics I

Unix L555. Dept. of Linguistics, Indiana University Fall Unix. Unix. Directories. Files. Useful Commands. Permissions. tar.


Unix Essentials. BaRC Hot Topics Bioinformatics and Research Computing Whitehead Institute October 12 th

Week 2 Lecture 3. Unix

Chapter-3. Introduction to Unix: Fundamental Commands

Using UNIX. -rwxr--r-- 1 root sys Sep 5 14:15 good_program

Unix Guide. Meher Krishna Patel. Created on : Octorber, 2017 Last updated : December, More documents are freely available at PythonDSP

Introduction: What is Unix?

Linux Command Line Primer. By: Scott Marshall

Virtual Machine. Linux flavor : Debian. Everything (except slides) preinstalled for you.


Introduction to Linux

Introduction to Unix: Fundamental Commands

No Food or Drink in this room. Logon to Windows machine

Introduction To Linux. Rob Thomas - ACRC

CSE 303 Lecture 2. Introduction to bash shell. read Linux Pocket Guide pp , 58-59, 60, 65-70, 71-72, 77-80

Introduction. File System. Note. Achtung!

Unix/Linux Operating System. Introduction to Computational Statistics STAT 598G, Fall 2011

Unix/Linux Basics. Cpt S 223, Fall 2007 Copyright: Washington State University

Lecture 3. Essential skills for bioinformatics: Unix/Linux

Perl and R Scripting for Biologists

Introduction p. 1 Who Should Read This Book? p. 1 What You Need to Know Before Reading This Book p. 2 How This Book Is Organized p.

Introduction to UNIX Command Line

Introduction to the Linux Command Line

acmteam/unix.pdf How to manage your account (user ID, password, shell); How to compile C, C++, and Java programs;

Introduction to the UNIX command line

Tutorial-1b: The Linux CLI

Contents. Note: pay attention to where you are. Note: Plaintext version. Note: pay attention to where you are... 1 Note: Plaintext version...

CSCI 2132 Software Development. Lecture 4: Files and Directories

Files and Directories

Where can UNIX be used? Getting to the terminal. Where are you? most important/useful commands & examples. Real Unix computers

Chapter 1 - Introduction. September 8, 2016

Introduction to Linux Organizing Files

Working With Unix. Scott A. Handley* September 15, *Adapted from UNIX introduction material created by Dr. Julian Catchen

Basic Linux (Bash) Commands

Introduction to Linux

Introduction to Unix and Linux. Workshop 1: Directories and Files

CENG 334 Computer Networks. Laboratory I Linux Tutorial

Practical Linux examples: Exercises

Table of contents. Our goal. Notes. Notes. Notes. Summer June 29, Our goal is to see how we can use Unix as a tool for developing programs

Unix Tools / Command Line

Operating Systems. Copyleft 2005, Binnur Kurt

Operating Systems 3. Operating Systems. Content. What is an Operating System? What is an Operating System? Resource Abstraction and Sharing

Connecting to ICS Server, Shell, Vim CS238P Operating Systems fall 18

A Brief Introduction to the Linux Shell for Data Science

CS CS Tutorial 2 2 Winter 2018

The Linux Command Line & Shell Scripting

The Unix Shell. Pipes and Filters

CSE Linux VM. For Microsoft Windows. Based on opensuse Leap 42.2

Recap From Last Time:

This lab exercise is to be submitted at the end of the lab session! passwd [That is the command to change your current password to a new one]

BGGN 213 Working with UNIX Barry Grant

Linux Essentials. Programming and Data Structures Lab M Tech CS First Year, First Semester

Essential Skills for Bioinformatics: Unix/Linux

Unix basics exercise MBV-INFX410

History. Terminology. Opening a Terminal. Introduction to the Unix command line GNOME

Introduction to Linux Workshop 1

Introduction to the shell Part II


commandname flags arguments

Basic UNIX commands. HORT Lab 2 Instructor: Kranthi Varala

Short Read Sequencing Analysis Workshop

Analytical Processing of Data of statistical genetics research in UNIX like Systems

Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny.

Basic Linux. Boyce Thompson Institute for Plant Research Tower Road Ithaca, New York U.S.A. by Aureliano Bombarely Gomez

CSCI 2132 Software Development. Lecture 5: File Permissions

Working with Basic Linux. Daniel Balagué


Tutorial-1b: The Linux CLI

Introduction to UNIX/Linux

Basic Linux Commands. Srihari Kalgi M.Tech, CSE (KReSIT), IIT Bombay. May 5, 2009

Embedded Linux Systems. Bin Li Assistant Professor Dept. of Electrical, Computer and Biomedical Engineering University of Rhode Island

CS 3410 Intro to Unix, shell commands, etc... (slides from Hussam Abu-Libdeh and David Slater)

Introduction to Linux. Roman Cheplyaka

Arkansas High Performance Computing Center at the University of Arkansas

BIOINFORMATICS POST-DIPLOMA PROGRAM SUBJECT OUTLINE Subject Title: OPERATING SYSTEMS AND PROJECT MANAGEMENT Subject Code: BIF713 Subject Description:

Reading and manipulating files

Linux Essentials Objectives Topics:

A Brief Introduction to Unix

Unix background. COMP9021, Session 2, Using the Terminal application, open an x-term window. You type your commands in an x-term window.

When talking about how to launch commands and other things that is to be typed into the terminal, the following syntax is used:

Introduction to remote command line Linux. Research Computing Team University of Birmingham

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing

Some useful UNIX Commands written down by Razor for newbies to get a start in UNIX

Unix/Linux Primer. Taras V. Pogorelov and Mike Hallock School of Chemical Sciences, University of Illinois

Introduction to Linux

First of all, these notes will cover only a small subset of the available commands and utilities, and will cover most of those in a shallow fashion.

Command Line Interface The basics

Basic Unix Commands. CGS 3460, Lecture 6 Jan 23, 2006 Zhen Yang

Introduction to Linux Part 1. Anita Orendt and Wim Cardoen Center for High Performance Computing 24 May 2017

Introduction of Linux

CS 460 Linux Tutorial

Transcription:

Introduction to UNIX command-line II Boyce Thompson Institute 2017 Prashant Hosmani

Class Content Terminal file system navigation Wildcards, shortcuts and special characters File permissions Compression UNIX commands Networking UNIX commands Basic NGS file formats Text files manipulation commands Command-line pipelines Introduction to bash scripts

Text Handling Commands UNIX Command-Line Cheat Sheet BTI-SGN Bioinformatics Course 2014 Text Handling Commands File system Commands ls lists directories and files ls -a lists all files including hidden files ls -lh formatted list including more data ls -t lists sorted by date pwd returns path to working directory cd dir changes directory cd.. goes to parent directory cd / goes to root directory cd goes to home directory touch file_name creates en empty file cp file file_copy copy a file cp -r copy files contained in directories rm file deletes a file rm -r dir deletes a directory and its files mv file1 file2 moves or renames a file mkdir dir_name creates a directory rmdir dir_name deletes a directory locate file_name searches a file man command shows commands manual top shows process activity df -h shows disk space info Compression commands gzip/zip compress a file gunzip/unzip decompress a file tar -cvf groups files tar -xvf ungroups files tar -zcvf groups and gzip files tar -zxvf gunzip and ungroups files command > file command >> file cat file Text handling commands saves STDOUT in a file appends STDOUT in a file concatenate and print files cat file1 file2 > file3 merges files 1 and 2 into file3 cat *fasta > all.fasta head file head -n 5 file tail file tail -n 5 file less file less -N file less -S file grep pattern file concatenates all fasta files in the current directory prints first lines from a file prints first five lines from a file prints last lines from a file prints last five lines from a file view a file includes line numbers wraps long lines Prints lines matching a pattern grep -c pattern file counts lines matching a pattern cut -f 1,3 file sort file sort -u file uniq -c file wc file paste file1 file2 paste -d, sed wget URL ssh user@server scp apt-get install retrieves data from selected columns in a tab-delimited file sorts lines from a file sorts and return unique lines filters adjacent repeated lines counts lines, words and bytes concatenates the lines of input files concatenates the lines of input files by commas transforms text Networking Commands download a file from an URL connects to a server copy files between computers installs applications in linux

FASTA format A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol at the beginning. http://www.ncbi.nlm.nih.gov/ description line sequence data >sequence_id1 description ATGCGCGCGCGCGCGCGCGGGTAGCAGATGACGACACAGAGCGAGGATGCGCTGAGAGTA GTGTGACGACGATGACGGAAAATCAGATGGACCCGATGACAGCATGACGATGGGACGGGA AAGATTGGACCAGGACAGGACCAGGACCAGGACCAGGGATTAGA >sequence_id2 description ATGGGGGGGACGACGATGGACACAGAGACAGAGACGACGACAGCAGACAGATTTACCTTA GACGAGATAGGAGAGACGACAGATATATATATATAGCAGACAGACAGACATTTAGACGAG ACGACGATAGACGATaaaaataa

FASTQ format A FASTQ file normally uses four lines per sequence. Line 1: begins with a '@' character, followed by a sequence identifier and an optional description. Line 2: is the raw sequence letters. Line 3: begins with a '+' character, is optionally followed by the same sequence identifier. Line 4 encodes the quality values for the sequence in Line 2, and must contain the same number of symbols as letters in the sequence. wikipedia description line sequence data sequence quality @D3B4KKQ1:291:D17NUACXX:8:1101:3630:2109 1:N:0: GACTTGCAGGCATGCAAGCTTGGCACTGGCCGTCGTTTTACAACGTCGTGACTGGGAAAACACTGGCGT +?@<+ADDDDFDFFI<FGE=EHGIGFFGEFIIFFBGFIDEI>D?FFFFA4;C;DC=;=ABDD; @D3B4KKQ1:291:D17NUACXX:8:1101:3971:2092 1:N:0: ATTGCAGAAGCGGCCCCGCATCTGCGAAGGGTTAACCGCAGGTGCAGAAGCTGGCTTTAAGTGAGAAGT + =BAADBA?D?FGI<@FHDB6?ADFEGGIE8@FGGII3ABBBB(;;6@CC?C3;C<99?CCCCC;:::?

Tab-delimited text files Tab-delimited files are a very common format in scientific data. They consist in columns of text separated by tabs. Other file formats could have different delimiters. mismatch Query Subject id % length qstart sstart gaps qend send evalue score ATCG00500.1 PACid:23047568 64.88 299 64 2 220 477 112 410 5e-131 388 ATCG00500.1 PACid:23052247 58.88 321 69 3 220 477 381 701 3e-117 361 ATCG00890.1 PACid:16418828 90.60 117 11 0 18 134 1 117 1e-71 220 ATCG00890.1 PACid:16412855 90.48 147 14 2 41 387 27 173 1e-68 214 ATCG00280.1 PACid:24129717 95.99 474 19 0 1 474 1 474 0.0 847 ATCG00280.1 PACid:24095593 95.36 474 22 0 1 474 1 474 0.0 840 ATCG00280.1 PACid:20871697 94.94 474 24 0 1 474 1 474 0.0 837 Tabular blast output example Blast, SAM (mapping), BED, VCF (SNPs), GTF, GFF...

BIOINFORMATICIAN What is the best option to explore the content of a file of 2Gb? A: MS Word B: Less C: Internet Explorer D: Cat

BIOINFORMATICIAN What is the best option to explore the content of a file of 2Gb? A: MS Word B: Less C: Internet Explorer D: Cat

less to view large files scroll through the file < or g go to file beginning > or G go to file end /pattern n N search pattern find next find previous space bar page down b page up q quit less view file blast_sample.txt view file blast_sample.txt without wrapping long lines less blast_sample.txt less -S blast_sample.txt less -N blast_sample.txt view file blast_sample.txt showing line numbers

cat concatenates and prints files prints file sample1.fasta on the screen prints file sample1.fasta on the screen cat sample1.fasta cat /home/bioinfo/desktop/unix_data/sample1.fasta cat sample1.fasta sample2.fasta > new_file.fasta concatenates files sample1.fasta and sample2.fasta and saves them in the file new_file.fasta redirects output to a file

cat concatenates and prints files concatenates all FASTA files in the current directory and saves them in the file all_samples.fasta redirect output to a file cat *fasta > all_samples.fasta cat sample3.fasta >> new_file.fasta appends sample3.fasta file to new_file.fasta

head displays first lines of a file print first lines from blast_sample.txt file (10 by default) and save them in blast10.txt head blast_sample.txt > blast10.txt head -n 5 blast_sample.txt print first five lines from blast_sample.txt file

tail displays the last part of a file print last 10 lines from blast_sample.txt file tail blast_sample.txt tail -n 5 blast_sample.txt print last five lines from blast_sample.txt file

grep searches patterns in files prints lines starting with a >, i.e., prints description lines from FASTA files grep ^> sample1.fasta grep -c ^> sample1.fasta grep -c ^+$ *fastq counts lines starting with a >, i.e., it counts the number of sequences from a FASTA file search pattern at line start search pattern at line end counts lines formed only by +, i.e., it counts the number of sequences from all FASTQ files in the current directory

grep searches patterns in files prints lines containing Vvin and all their case combinations grep -i Vvin blast10.txt grep -v Vvin blast10.txt prints all lines but the ones containing Vvin

cut gets columns from a tab-delimited file prints columns 1 and 2 from blast10.txt cut -f 1,2 blast10.txt cut -c 1-4,17-21 blast_sample.txt > tmp.txt prints characters from 1 to 4 and from 17 to 21 for each line in blast_sample.txt and save them in tmp.txt

sort sorts lines from a file sort lines from file tmp.txt and save them in tmp2.txt sort lines from file tmp.txt and remove the repeated ones sort tmp.txt > tmp2.txt sort -u tmp.txt uniq -c tmp2.txt removes repeated lines from tmp.txt and counts how many times they were repeated. Lines have to be sorted since only adjacent lines are compared

wc counts lines, words and characters counts lines, words and characters in blast10.txt counts lines in blast10.txt wc blast10.txt wc -l blast10.txt wc -w blast10.txt wc -c blast10.txt counts bytes in blast_sample.txt (including the line return) counts words in blast10.txt

paste concatenates files as columns creates a file for the columns 1, 2 and 3 respectively from blast10.txt cut -f 1 blast10.txt > col1.txt cut -f 2 blast10.txt > col2.txt cut -f 3 blast10.txt > col3.txt paste col2.txt col3.txt col1.txt paste -d, col2.txt col3.txt col1.txt pastes columns with commas as delimiters concatenates files by their right end

sed replaces a pattern replaces Atha by SGN in col1.txt file replaces all A characters by a in col1.txt file sed s/atha/sgn/ col1.txt sed s/a/a/g col1.txt sed -r s/^([a-za-z]+)\ (.+)/gene \2 from \1/ col2.txt Saves species name in \1 Saves gene name in \2 get species and gene name from col2.txt and print each line in a different format

Pipelines Pipelines consists in concatenate several commands by using the output of the first command as the input of the next one. Two commands are connected placing the sign between them. ls wc -l counts files in current directory

Pipelines counts sequences in all fasta files from current directory prints sequence description line for all fasta files from current directory cat *fasta grep -c ^> cat *fasta grep ^> sed s/>// cut -f 1 blast_sample.txt sort -u wc -l cut -f 1 blast_sample.txt sort uniq -c counts different query ids in a blast tabular file counts the appearance of each query id in a blast tabular file

shell script (bash) example All commands and programs we run in the terminal could be included in a text file with extension.sh This file will execute the commands in the order they were written, from top to bottom. head of bash scripts comment line command or program line execution

Run a bash script on a server emacs: text editor save = ctrl-x ctrl-s exit = ctrl-x ctrl-c creates an empty file touch file.sh emacs file.sh open file.sh in emacs

reviewing the permissions owner user owner group permissions links # size date File name group user other d rwx r-x r-x d Directory - Regular file r readable w writable x executable or searchable - not rwx

Run a bash script on a server chmod 755./file.sh Chmod manual makes file.sh executable screen -L./file.sh ctrl+a+d screen -r process_id less screenlog.0 run file.sh script in screen mode detach screen return to process screen watch log from screen execution

Exercises 1. Merge all fasta files, in the order sample3.fasta, sample1.fasta and sample2.fasta, and save them in a new file called all_samples.fasta 2. Merge all fastq files (sample1.fastq, sample2.fastq and sample3.fastq) using wildcards, and save them in a new file called all_samples.fastq 3. Save in a file called blast100.txt the first 100 lines from blast_sample.txt 4. Save in a file called blast200.txt the last 200 lines from blast_sample.txt 5. How many sequences are in all_samples.fasta? 6. How many sequences are in all_sample.fastq? 7. Create a file with the subject ids and their scores for the 15 first lines from blast_sample.txt 8. How many different queries ids are in blast_sample.txt? 9. How many different subjects ids are in blast_sample.txt? 10. Change all in blast_sample.txt by _ and save the new file in Desktop as tmp.txt. 11. Count how many genes are in each Arabidopsis thaliana chromosome, chloroplast and mitochondria based on the next file: ftp://ftp.sgn.cornell.edu/bioinfo_class/btibic/2013/data/tair10_pep.fasta