Lecture 3. Essential skills for bioinformatics: Unix/Linux

Size: px
Start display at page:

Download "Lecture 3. Essential skills for bioinformatics: Unix/Linux"

Transcription

1 Lecture 3 Essential skills for bioinformatics: Unix/Linux

2 RETRIEVING DATA

3 Overview Whether downloading large sequencing datasets or accessing a web application hundreds of times to download specific files, retrieving data in bioinformatics can require special tools and skills. Downloading a large scale of sequencing data through your web browser is not feasible. Web browsers are not designed to download such large datasets. Additionally, you d need to download the sequencing data to your server, not your local workstation. To do this, you d need to SSH into your data-crunching server and download your data directly to this machine using command-line tools. We will take a look at some of these tools.

4 wget wget is useful for quickly downloading a file from the command line. For example, human gene annotation GTF file from Ensembl:

5 wget wget downloads this file to your current directory and provides a useful progress indicator. It can handle both http and ftp links. In general, FTP is preferable to HTTP for large files. For simple HTTP or FTP authentication with a user name and password, you can authenticate using --user= and -- ask-password options.

6 wget: useful options Option Values Use -A, --accept Either a suffix like.gtf or a pattern with *,?, or [ and ]. Only download files matching this criteria -R, --reject Same as with --accept Don t download files matching this -nd, --no-directory No value Don t place locally downloaded files in the same directory hierarchy as remote files. -r, --recursive No value Follow and download links on a page -np, --no-parent No value Don t move above the parent directory --limit-rate Number of bytes per second Maximum download bandwidth --user=user User name User name --ask-password No value Prompt for password -O Output file name Download file to filename specified

7 DATA INTEGRITY

8 Overview Data we download is the starting point of all future analyses. The risk of data corruption during transfers is a concern when transferring large datasets. In addition to verifying your transfer finished without error, it s also important to explicitly check the transferred data s integrity with checksums. Checksums are very compressed summaries of data, computed in a way that even if just one bit of the data is changed, the checksum will be different.

9 SHA and MD5 checksums The two most common checksum algorithms for ensuring data integrity are MD5 and SHA-1. MD5 is an older checksum algorithm, but one that is still commonly used. Both MD5 and SHA-1 behave similarly, but SHA-1 is newer and generally preferred. The long string of numbers and letters (hexadecimal format, 0-9 and a-f) is the checksum

10 SHA and MD5 checksums Checksums are completely deterministic, meaning that regardless of the time the checksum is calculated or the system used, checksum values will only differ if the input differs (even one bit of data). When downloading many files, it can get rather tedious to check each checksum individually. The program shasum can create and validate against a file containing the checksums of files.

11 SHA and MD5 checksums Then, we can use shasum s check option (-c) to validate that these files match the original versions:

12 SHA and MD5 checksums There are only possible different checksums for SHA-1 checksum. However, is a huge number so the probability of a checksum collision is very small. For the purpose of checking data integrity, the risk of a collision is negligible. Some servers use an antiquated checksum implementation such as sum or chsum. How we use these older commandline checksum programs is similar to using shasum and md5sum.

13 WORKING WITH COMPRESSED DATA

14 Overview Data compression, the process of condensing data so that it takes up less space (on disk drives, in memory, or across network transfer), is an indispensable technology in modern bioinformatics. For example, sequences from a recent Illumina HiSeq run example.fastq: 63,203,414,514 bytes (59 GB) example.fastq.gz: 21,408,674,240 bytes (20 GB) Compression ratio (uncompressed size/compressed size) of this data is 2.95, which translates to a significant space saving of about 66%.

15 Overview Data can remain compressed on the disk throughout processing and analyses. Most well-written bioinformatics tools can work natively with compressed data as input, without requiring us to decompress it to disk first. Using pipes and redirection, we can stream compressed data and write compressed files directly to the disk. Common Unix tools like cat, grep all have variants that work with compressed data. While working with large datasets in bioinformatics can be challenging, using the compression tools in Unix and software libraries make our lives much easier.

16 gzip The two most common compression systems used on Unix are gzip and bzip2. gzip faster than bzip2. bzip2 has a higher compression ratio (the previous fastq file is only about 16 GB when compressed with bzip2) Generally, gzip is used in bioinformatics to compress most sizable files, while bzip2 is more common for long-term data archiving.

17 gzip It can compress results from standard input. This is useful, as we can compress results directly from another bioinformatics program s standard output.

18 gzip It can compress files on disk in place. gzip will compress this file in place, replacing the original uncompressed version with the compressed file (appending the extension.gz to the original filename).

19 gunzip We can decompress files in place with the command gunzip. Note that this replaces tb1.fasta.gz file with the decompressed version.

20 gzip -c Both gzip and gunzip can also output their results to standard out. This can be enabled using the c option:

21 gzip with multiple files

22 gzip with multiple files

23 Working with gzipped files The greatest advantage of gzip (and bzip2) is that many Unix and bioinformatics tools can work directly with compressed files. For example, we can search compressed files using grep s analog for gzipped files, zgrep. Likewise, cat has zcat. If programs cannot handle compressed input, you can use zcat and pipe output directly to the standard input of another program.

24 Working with gzipped files

25 Creating a tar.gz archive

26 Extracting a tar.gz file

27 CASE STUDY: REPRODUCIBLY DOWNLOADING DATA

28 GRCm38 mouse reference genome We usually download genomic resources like sequence and annotation files from remote servers over the Internet, which may change in the future. Furthermore, new versions of sequence and annotation data may be released, so it is imperative that we document everything about how data was acquired for full reproducibility The human, mouse, zebrafish, and chicken genomes releases are coordinated through the Genome Reference Consortium (

29 GRCm38 mouse reference genome The GRC prefix in GRCm38 refers to the Genome Reference Consortium. We can download GRCm38 from Ensembl using wget.

30 Compare checksum values From ftp://ftp.ensembl.org/pub/release-87/fasta/mus_musculus/dna/checksums

31 Extract the FASTA headers

32 Document README Document how and when we downloaded this file in README Copy the SHA-1 checksum values into README

33 UNIX DATA TOOLS

34 Overview Understanding how to use Unix data tools in bioinformatics is not only about learning what each tool does, it is about mastering the practice of connecting tools together creating programs from Unix pipelines. By connecting data tools together with pipes, we can construct programs that parse, manipulate, and summarize data. Unix pipelines can be developed in shell scripts or as one-liners (tiny programs built by connecting Unix tools with pipes directly on the shell).

35 Overview Building more complex programs from small, modular tools capitalizes on the design and philosophy of Unix. The pipeline approach to building programs is a wellestablished tradition in Unix and bioinformatics because it is a fast way to solve problems, incredibly powerful, and adaptable to a variety of problems.

36 When to use the Unix pipeline approach The Unix one-linear approach is not appropriate for all problems. Many bioinformatics tasks are better accomplished through a custom, well-documented script. Knowing when to use a fast and simple engineering solution like a Unix pipeline and when to resort to writing a welldocumented Python and R script takes experience.

37 When to use the Unix pipeline approach Unix pipelines: Fast, low-level data manipulation toolkit to explore data, transform data between formats, and inspect data for potential problems. Useful when we want to get a quick answer and keep moving forward with our project. It is essential that everything that produces a result is documented. Storing pipelines in shell scripts is a good approach. Custom scripts using Python or R: Useful for larger, more complex tasks as these allow for the flexibility in checking input data, structuring programs, use of data structures, code documentation.

38 Inspecting and manipulating text data Many formats in bioinformatics are simple tabular plain-text files delimited by a character. The most common tabular plain-text file format used in bioinformatics is tab-delimited because most Unix tools treat tabs as delimiters by default. Tab-delimited file formats are also simple to parse with scripting language like Python and Perl, and easy to load into R.

39 Tabular plain-text data formats The basic format: Each row (known as a record) is kept on its own line Each column (known as a field) is separated by some delimiter Three formats: Tab-delimited Comma-separated Variable space-delimited

40 Tab-delimited The most commonly used in bioinformatics (e.g. BED, GTF/GFF, SA M, VCF). Columns of a tab-delimited file are separated by a single tab char acter (the escape code: \t). A common convention (not a standard) is to include metadata on the first few lines of a tab-delimited files. These metadata lines be gin with #. Tabs in data are not allowed.

41 Comma-separated values (CSV) CSV is similar to tab-delimited, except the delimiter is a comma character. While not a common occurrence in bioinformatics, it is possible that the data stored in CSV format contain commas. Some variants just do not allow this, while others use quotes around entries that could contain commas.

42 Variable space-delimited In general, tab-delimited formats and CSV are better choices than variable space-delimited formats because it is quite com mon to encounter data containing spaces.

43 How lines are separated In Linux and OS X: use a single linefeed character (the escape code: \n) to separate lines. In Windows: use a DOS-style line separator of a carriage return and a linefeed character (\r\n). To convert DOS to Unix text format, use dos2unix. To convert Unix to DOS text format, use unix2dos.

44 Inspecting data with head and tail Many files in bioinformatics are much too long to inspect with cat. Running cat on a file a million lines long would quickly fill your shell. A better option is to take a look at the top of a file with head.

45 Inspecting data with head and tail

46 Inspecting data with head and tail We can control how many lines we see.

47 Inspecting data with head and tail tail is designed to look at the end of a file. tail works just like head.

48 Inspecting data with head and tail We can also use tail to remove the header of a file. If n is given a number x preceded with a + sign (e.g. +x), tail will start from the x th line.

49 Inspecting data with head and tail head is useful for taking a peek at data resulting from a Unix pipeline. We will use grep s results as the standard input for the next program in our pipeline, but first we want to check grep s standard out to see if everything looks correct. When head exits, your shell catches this and stops the entire pipe. When building complex pipelines that process large amounts of data, this is important.

50 less less is a useful program for a inspecting files and the output of pipes. It is a terminal pager, a program that allows us to view large amounts of text in our terminals at a time. less has more features and is generally preferred than the older terminal pager called more.

51 less Shortcut Space bar b g G j k /<pattern>?<pattern> Action Next page Previous page First line Last line Down one line at a time Up one line at a time Search down for string <pattern> Search up for string <pattern>

52 less less is useful in debugging our command-line pipelines. Just pipe the output of the command you want to debug to less. When you run the pipe, less will capture the output of the last command and pause so you can inspect it. less is crucial when iteratively building up a pipeline.

53 less A useful behavior of pipes is that the execution of a program with output piped to less will be paused when less has a full screen of data. When you pipe a program s output to less and inspect it, less stops reading input from the pipe. The pipe will block and we can spend as much time as needed to inspect the output.

54 Counting the number of lines By default, wc outputs the number of lines, words, and characters of the supplied file. To return the number of line, we can use option -l

55 Reporting file sizes

56 Counting the number of fields We could manually count the number of columns of the first row with head n 1. But an easier way is to use awk. awk simply prints the number of fields of the first row of the file (due to exit).

57 Counting the number of fields Because the first line of this file is a comment, we need to first chop off the comments. This solution will not work on other files since tail n +6 is tailed to this specific file.

58 Counting the number of fields A less fragile and more robust solution is

59 Working with column data with cut When working with plain-text tabular data, we often need to extract specific columns from the original file. Suppose we want to extract the columns of chromosome, start, and end position from the GTF file. The simplest way to do this is with cut. The f argument is how we specify which columns to keep. cut f 2 input.txt cut f 3-8 input.txt cut f 3,5,8 input.txt cut f 8,3,5 input.txt (no reordering)

Essential Skills for Bioinformatics: Unix/Linux

Essential Skills for Bioinformatics: Unix/Linux Essential Skills for Bioinformatics: Unix/Linux WORKING WITH COMPRESSED DATA Overview Data compression, the process of condensing data so that it takes up less space (on disk drives, in memory, or across

More information

Lecture 5. Essential skills for bioinformatics: Unix/Linux

Lecture 5. Essential skills for bioinformatics: Unix/Linux Lecture 5 Essential skills for bioinformatics: Unix/Linux UNIX DATA TOOLS Text processing with awk We have illustrated two ways awk can come in handy: Filtering data using rules that can combine regular

More information

Introduction to UNIX command-line II

Introduction to UNIX command-line II Introduction to UNIX command-line II Boyce Thompson Institute 2017 Prashant Hosmani Class Content Terminal file system navigation Wildcards, shortcuts and special characters File permissions Compression

More information

Merge Conflicts p. 92 More GitHub Workflows: Forking and Pull Requests p. 97 Using Git to Make Life Easier: Working with Past Commits p.

Merge Conflicts p. 92 More GitHub Workflows: Forking and Pull Requests p. 97 Using Git to Make Life Easier: Working with Past Commits p. Preface p. xiii Ideology: Data Skills for Robust and Reproducible Bioinformatics How to Learn Bioinformatics p. 1 Why Bioinformatics? Biology's Growing Data p. 1 Learning Data Skills to Learn Bioinformatics

More information

Handling Ordinary Files

Handling Ordinary Files Handling Ordinary Files Unit 2 Sahaj Computer Solutions visit : projectsatsahaj.com 1 cat: Displaying and Creating Files cat is one of the most frequently used commands on Unix-like operating systems.

More information

Utilities. September 8, 2015

Utilities. September 8, 2015 Utilities September 8, 2015 Useful ideas Listing files and display text and binary files Copy, move, and remove files Search, sort, print, compare files Using pipes Compression and archiving Your fellow

More information

Using Linux as a Virtual Machine

Using Linux as a Virtual Machine Intro to UNIX Using Linux as a Virtual Machine We will use the VMware Player to run a Virtual Machine which is a way of having more than one Operating System (OS) running at once. Your Virtual OS (Linux)

More information

Practical Linux examples: Exercises

Practical Linux examples: Exercises Practical Linux examples: Exercises 1. Login (ssh) to the machine that you are assigned for this workshop (assigned machines: https://cbsu.tc.cornell.edu/ww/machines.aspx?i=87 ). Prepare working directory,

More information

Essential Skills for Bioinformatics: Unix/Linux

Essential Skills for Bioinformatics: Unix/Linux Essential Skills for Bioinformatics: Unix/Linux SHELL SCRIPTING Overview Bash, the shell we have used interactively in this course, is a full-fledged scripting language. Unlike Python, Bash is not a general-purpose

More information

http://xkcd.com/208/ 1. Review of pipes 2. Regular expressions 3. sed 4. awk 5. Editing Files 6. Shell loops 7. Shell scripts cat seqs.fa >0! TGCAGGTATATCTATTAGCAGGTTTAATTTTGCCTGCACTTGGTTGGGTACATTATTTTAAGTGTATTTGACAAG!

More information

Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny.

Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny. Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny stefano.gaiarsa@unimi.it Linux and the command line PART 1 Survival kit for the bash environment Purpose of the

More information

A Brief Introduction to the Linux Shell for Data Science

A Brief Introduction to the Linux Shell for Data Science A Brief Introduction to the Linux Shell for Data Science Aris Anagnostopoulos 1 Introduction Here we will see a brief introduction of the Linux command line or shell as it is called. Linux is a Unix-like

More information

http://xkcd.com/208/ 1. Review of pipes 2. Regular expressions 3. sed 4. Editing Files 5. Shell loops 6. Shell scripts cat seqs.fa >0! TGCAGGTATATCTATTAGCAGGTTTAATTTTGCCTGCACTTGGTTGGGTACATTATTTTAAGTGTATTTGACAAG!

More information

Introduction to UNIX command-line

Introduction to UNIX command-line Introduction to UNIX command-line Boyce Thompson Institute March 17, 2015 Lukas Mueller & Noe Fernandez Class Content Terminal file system navigation Wildcards, shortcuts and special characters File permissions

More information

Unzip command in unix

Unzip command in unix Unzip command in unix Search 24-4-2015 Howto Extract Zip Files in a Linux and. You need to use the unzip command on a Linux or Unix like system. The nixcraft takes a lot of my time and. 16-4-2010 Howto:

More information

http://xkcd.com/208/ cat seqs.fa >0 TGCAGGTATATCTATTAGCAGGTTTAATTTTGCCTGCACTTGGTTGGGTACATTATTTTAAGTGTATTTGACAAG >1 TGCAGGTTGTTGTTACTCAGGTCCAGTTCTCTGAGACTGGAGGACTGGGAGCTGAGAACTGAGGACAGAGCTTCA >2 TGCAGGGCCGGTCCAAGGCTGCATGAGGCCTGGGGCAGAATCTGACCTAGGGGCCCCTCTTGCTGCTAAAACCAT

More information

Scripting Languages Course 1. Diana Trandabăț

Scripting Languages Course 1. Diana Trandabăț Scripting Languages Course 1 Diana Trandabăț Master in Computational Linguistics - 1 st year 2017-2018 Today s lecture Introduction to scripting languages What is a script? What is a scripting language

More information

CSE 390a Lecture 2. Exploring Shell Commands, Streams, and Redirection

CSE 390a Lecture 2. Exploring Shell Commands, Streams, and Redirection 1 CSE 390a Lecture 2 Exploring Shell Commands, Streams, and Redirection slides created by Marty Stepp, modified by Jessica Miller & Ruth Anderson http://www.cs.washington.edu/390a/ 2 Lecture summary Unix

More information

Linux command line basics III: piping commands for text processing. Yanbin Yin Fall 2015

Linux command line basics III: piping commands for text processing. Yanbin Yin Fall 2015 Linux command line basics III: piping commands for text processing Yanbin Yin Fall 2015 1 h.p://korflab.ucdavis.edu/unix_and_perl/unix_and_perl_v3.1.1.pdf 2 The beauty of Unix for bioinformagcs sort, cut,

More information

ITST Searching, Extracting & Archiving Data

ITST Searching, Extracting & Archiving Data ITST 1136 - Searching, Extracting & Archiving Data Name: Step 1 Sign into a Pi UN = pi PW = raspberry Step 2 - Grep - One of the most useful and versatile commands in a Linux terminal environment is the

More information

LCE Splunk Client 4.6 User Manual. Last Revised: March 27, 2018

LCE Splunk Client 4.6 User Manual. Last Revised: March 27, 2018 LCE Splunk Client 4.6 User Manual Last Revised: March 27, 2018 Table of Contents Getting Started with the LCE Splunk Client 3 Standards and Conventions 4 Install, Configure, and Remove 5 Download an LCE

More information

UNIX, GNU/Linux and simple tools for data manipulation

UNIX, GNU/Linux and simple tools for data manipulation UNIX, GNU/Linux and simple tools for data manipulation Dr Jean-Baka DOMELEVO ENTFELLNER BecA-ILRI Hub Basic Bioinformatics Training Workshop @ILRI Addis Ababa Wednesday December 13 th 2017 Dr Jean-Baka

More information

Shell Programming Overview

Shell Programming Overview Overview Shell programming is a way of taking several command line instructions that you would use in a Unix command prompt and incorporating them into one program. There are many versions of Unix. Some

More information

No Food or Drink in this room. Logon to Windows machine

No Food or Drink in this room. Logon to Windows machine While you are waiting No Food or Drink in this room Logon to Windows machine Username/password on right-hand monitor Not the username/password I gave you earlier We will walk through connecting to the

More information

http://xkcd.com/208/ 1. Computer Hardware 2. Review of pipes 3. Regular expressions 4. sed 5. awk 6. Editing Files 7. Shell loops 8. Shell scripts Hardware http://www.theverge.com/2011/11/23/2582677/thailand-flood-seagate-hard-drive-shortage

More information

Unix Essentials. BaRC Hot Topics Bioinformatics and Research Computing Whitehead Institute October 12 th

Unix Essentials. BaRC Hot Topics Bioinformatics and Research Computing Whitehead Institute October 12 th Unix Essentials BaRC Hot Topics Bioinformatics and Research Computing Whitehead Institute October 12 th 2016 http://barc.wi.mit.edu/hot_topics/ 1 Outline Unix overview Logging in to tak Directory structure

More information

7. Archiving and compressing 7.1 Introduction

7. Archiving and compressing 7.1 Introduction 7. Archiving and compressing 7.1 Introduction In this chapter, we discuss how to manage archive files at the command line. File archiving is used when one or more files need to be transmitted or stored

More information

Unix/Linux Primer. Taras V. Pogorelov and Mike Hallock School of Chemical Sciences, University of Illinois

Unix/Linux Primer. Taras V. Pogorelov and Mike Hallock School of Chemical Sciences, University of Illinois Unix/Linux Primer Taras V. Pogorelov and Mike Hallock School of Chemical Sciences, University of Illinois August 25, 2017 This primer is designed to introduce basic UNIX/Linux concepts and commands. No

More information

Principles of Bioinformatics. BIO540/STA569/CSI660 Fall 2010

Principles of Bioinformatics. BIO540/STA569/CSI660 Fall 2010 Principles of Bioinformatics BIO540/STA569/CSI660 Fall 2010 Lecture Five Practical Computing Skills Emphasis This time it s concrete, not abstract. Fall 2010 BIO540/STA569/CSI660 3 Administrivia Monday

More information

Table of contents. Our goal. Notes. Notes. Notes. Summer June 29, Our goal is to see how we can use Unix as a tool for developing programs

Table of contents. Our goal. Notes. Notes. Notes. Summer June 29, Our goal is to see how we can use Unix as a tool for developing programs Summer 2010 Department of Computer Science and Engineering York University Toronto June 29, 2010 1 / 36 Table of contents 1 2 3 4 2 / 36 Our goal Our goal is to see how we can use Unix as a tool for developing

More information

UNIX COMMANDS AND SHELLS. UNIX Programming 2015 Fall by Euiseong Seo

UNIX COMMANDS AND SHELLS. UNIX Programming 2015 Fall by Euiseong Seo UNIX COMMANDS AND SHELLS UNIX Programming 2015 Fall by Euiseong Seo What is a Shell? A system program that allows a user to execute Shell functions (internal commands) Other programs (external commands)

More information

ls /data/atrnaseq/ egrep "(fastq fasta fq fa)\.gz" ls /data/atrnaseq/ egrep "(cn ts)[1-3]ln[^3a-za-z]\."

ls /data/atrnaseq/ egrep (fastq fasta fq fa)\.gz ls /data/atrnaseq/ egrep (cn ts)[1-3]ln[^3a-za-z]\. Command line tools - bash, awk and sed We can only explore a small fraction of the capabilities of the bash shell and command-line utilities in Linux during this course. An entire course could be taught

More information

Cloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK

Cloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK Cloud Computing and Unix: An Introduction Dr. Sophie Shaw University of Aberdeen, UK s.shaw@abdn.ac.uk Aberdeen London Exeter What We re Going To Do Why Unix? Cloud Computing Connecting to AWS Introduction

More information

Lecture 8. Sequence alignments

Lecture 8. Sequence alignments Lecture 8 Sequence alignments DATA FORMATS bioawk bioawk is a program that extends awk s powerful processing of tabular data to processing tasks involving common bioinformatics formats like FASTA/FASTQ,

More information

Read the relevant material in Sobell! If you want to follow along with the examples that follow, and you do, open a Linux terminal.

Read the relevant material in Sobell! If you want to follow along with the examples that follow, and you do, open a Linux terminal. Warnings 1 First of all, these notes will cover only a small subset of the available commands and utilities, and will cover most of those in a shallow fashion. Read the relevant material in Sobell! If

More information

Cloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK

Cloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK Cloud Computing and Unix: An Introduction Dr. Sophie Shaw University of Aberdeen, UK s.shaw@abdn.ac.uk Aberdeen London Exeter What We re Going To Do Why Unix? Cloud Computing Connecting to AWS Introduction

More information

UNIX and Linux Essentials Student Guide

UNIX and Linux Essentials Student Guide UNIX and Linux Essentials Student Guide D76989GC10 Edition 1.0 June 2012 D77816 Authors Uma Sannasi Pardeep Sharma Technical Contributor and Reviewer Harald van Breederode Editors Anwesha Ray Raj Kumar

More information

Computer Architecture Lab 1 (Starting with Linux)

Computer Architecture Lab 1 (Starting with Linux) Computer Architecture Lab 1 (Starting with Linux) Linux is a computer operating system. An operating system consists of the software that manages your computer and lets you run applications on it. The

More information

5/8/2012. Exploring Utilities Chapter 5

5/8/2012. Exploring Utilities Chapter 5 Exploring Utilities Chapter 5 Examining the contents of files. Working with the cut and paste feature. Formatting output with the column utility. Searching for lines containing a target string with grep.

More information

Using UNIX. -rwxr--r-- 1 root sys Sep 5 14:15 good_program

Using UNIX. -rwxr--r-- 1 root sys Sep 5 14:15 good_program Using UNIX. UNIX is mainly a command line interface. This means that you write the commands you want executed. In the beginning that will seem inferior to windows point-and-click, but in the long run the

More information

BASH SHELL SCRIPT 1- Introduction to Shell

BASH SHELL SCRIPT 1- Introduction to Shell BASH SHELL SCRIPT 1- Introduction to Shell What is shell Installation of shell Shell features Bash Keywords Built-in Commands Linux Commands Specialized Navigation and History Commands Shell Aliases Bash

More information

Review of Fundamentals

Review of Fundamentals Review of Fundamentals 1 The shell vi General shell review 2 http://teaching.idallen.com/cst8207/14f/notes/120_shell_basics.html The shell is a program that is executed for us automatically when we log

More information

Linux for Biologists Part 2

Linux for Biologists Part 2 Linux for Biologists Part 2 Robert Bukowski Institute of Biotechnology Bioinformatics Facility (aka Computational Biology Service Unit - CBSU) http://cbsu.tc.cornell.edu/lab/doc/linux_workshop_part2.pdf

More information

Chapter-3. Introduction to Unix: Fundamental Commands

Chapter-3. Introduction to Unix: Fundamental Commands Chapter-3 Introduction to Unix: Fundamental Commands What You Will Learn The fundamental commands of the Unix operating system. Everything told for Unix here is applicable to the Linux operating system

More information

CSE 390a Lecture 2. Exploring Shell Commands, Streams, Redirection, and Processes

CSE 390a Lecture 2. Exploring Shell Commands, Streams, Redirection, and Processes CSE 390a Lecture 2 Exploring Shell Commands, Streams, Redirection, and Processes slides created by Marty Stepp, modified by Jessica Miller & Ruth Anderson http://www.cs.washington.edu/390a/ 1 2 Lecture

More information

Practical Linux Examples

Practical Linux Examples Practical Linux Examples Processing large text file Parallelization of independent tasks Qi Sun & Robert Bukowski Bioinformatics Facility Cornell University http://cbsu.tc.cornell.edu/lab/doc/linux_examples_slides.pdf

More information

Introduction to Unix: Fundamental Commands

Introduction to Unix: Fundamental Commands Introduction to Unix: Fundamental Commands Ricky Patterson UVA Library Based on slides from Turgut Yilmaz Istanbul Teknik University 1 What We Will Learn The fundamental commands of the Unix operating

More information

Introduction to Linux. Roman Cheplyaka

Introduction to Linux. Roman Cheplyaka Introduction to Linux Roman Cheplyaka Generic commands, files, directories What am I running? ngsuser@ubuntu:~$ cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=16.04 DISTRIB_CODENAME=xenial DISTRIB_DESCRIPTION="Ubuntu

More information

Introduction To Linux. Rob Thomas - ACRC

Introduction To Linux. Rob Thomas - ACRC Introduction To Linux Rob Thomas - ACRC What Is Linux A free Operating System based on UNIX (TM) An operating system originating at Bell Labs. circa 1969 in the USA More of this later... Why Linux? Free

More information

GMI-Cmd.exe Reference Manual GMI Command Utility General Management Interface Foundation

GMI-Cmd.exe Reference Manual GMI Command Utility General Management Interface Foundation GMI-Cmd.exe Reference Manual GMI Command Utility General Management Interface Foundation http://www.gmi-foundation.org Program Description The "GMI-Cmd.exe" program is a standard part of the GMI program

More information

Review of Fundamentals. Todd Kelley CST8207 Todd Kelley 1

Review of Fundamentals. Todd Kelley CST8207 Todd Kelley 1 Review of Fundamentals Todd Kelley kelleyt@algonquincollege.com CST8207 Todd Kelley 1 GPL the shell SSH (secure shell) the Course Linux Server RTFM vi general shell review 2 These notes are available on

More information

First of all, these notes will cover only a small subset of the available commands and utilities, and will cover most of those in a shallow fashion.

First of all, these notes will cover only a small subset of the available commands and utilities, and will cover most of those in a shallow fashion. Warnings Linux Commands 1 First of all, these notes will cover only a small subset of the available commands and utilities, and will cover most of those in a shallow fashion. Read the relevant material

More information

Linux II and III. Douglas Scofield. Crea-ng directories and files 18/01/14. Evolu5onary Biology Centre, Uppsala University

Linux II and III. Douglas Scofield. Crea-ng directories and files 18/01/14. Evolu5onary Biology Centre, Uppsala University Linux II and III Douglas Scofield Evolu5onary Biology Centre, Uppsala University douglas.scofield@ebc.uu.se slides at Crea-ng directories and files mkdir 1 Crea-ng directories and files touch if file does

More information

Working With Unix. Scott A. Handley* September 15, *Adapted from UNIX introduction material created by Dr. Julian Catchen

Working With Unix. Scott A. Handley* September 15, *Adapted from UNIX introduction material created by Dr. Julian Catchen Working With Unix Scott A. Handley* September 15, 2014 *Adapted from UNIX introduction material created by Dr. Julian Catchen What is UNIX? An operating system (OS) Designed to be multiuser and multitasking

More information

IB047. Unix Text Tools. Pavel Rychlý Mar 3.

IB047. Unix Text Tools. Pavel Rychlý Mar 3. Unix Text Tools pary@fi.muni.cz 2014 Mar 3 Unix Text Tools Tradition Unix has tools for text processing from the very beginning (1970s) Small, simple tools, each tool doing only one operation Pipe (pipeline):

More information

Linux Fundamentals (L-120)

Linux Fundamentals (L-120) Linux Fundamentals (L-120) Modality: Virtual Classroom Duration: 5 Days SUBSCRIPTION: Master, Master Plus About this course: This is a challenging course that focuses on the fundamental tools and concepts

More information

BIOINFORMATICS POST-DIPLOMA PROGRAM SUBJECT OUTLINE Subject Title: OPERATING SYSTEMS AND PROJECT MANAGEMENT Subject Code: BIF713 Subject Description:

BIOINFORMATICS POST-DIPLOMA PROGRAM SUBJECT OUTLINE Subject Title: OPERATING SYSTEMS AND PROJECT MANAGEMENT Subject Code: BIF713 Subject Description: BIOINFORMATICS POST-DIPLOMA PROGRAM SUBJECT OUTLINE Subject Title: OPERATING SYSTEMS AND PROJECT MANAGEMENT Subject Code: BIF713 Subject Description: This course provides Bioinformatics students with the

More information

Introduction to UNIX. Logging in. Basic System Architecture 10/7/10. most systems have graphical login on Linux machines

Introduction to UNIX. Logging in. Basic System Architecture 10/7/10. most systems have graphical login on Linux machines Introduction to UNIX Logging in Basic system architecture Getting help Intro to shell (tcsh) Basic UNIX File Maintenance Intro to emacs I/O Redirection Shell scripts Logging in most systems have graphical

More information

File: PLT File Format Libraries

File: PLT File Format Libraries File: PLT File Format Libraries Version 4.0 June 11, 2008 1 Contents 1 gzip Compression and File Creation 3 2 gzip Decompression 4 3 zip File Creation 6 4 tar File Creation 7 5 MD5 Message Digest 8 6 GIF

More information

Genomic Files. University of Massachusetts Medical School. October, 2014

Genomic Files. University of Massachusetts Medical School. October, 2014 .. Genomic Files University of Massachusetts Medical School October, 2014 2 / 39. A Typical Deep-Sequencing Workflow Samples Fastq Files Fastq Files Sam / Bam Files Various files Deep Sequencing Further

More information

Unix/Linux Operating System. Introduction to Computational Statistics STAT 598G, Fall 2011

Unix/Linux Operating System. Introduction to Computational Statistics STAT 598G, Fall 2011 Unix/Linux Operating System Introduction to Computational Statistics STAT 598G, Fall 2011 Sergey Kirshner Department of Statistics, Purdue University September 7, 2011 Sergey Kirshner (Purdue University)

More information

Answers to AWK problems. Shell-Programming. Future: Using loops to automate tasks. Download and Install: Python (Windows only.) R

Answers to AWK problems. Shell-Programming. Future: Using loops to automate tasks. Download and Install: Python (Windows only.) R Today s Class Answers to AWK problems Shell-Programming Using loops to automate tasks Future: Download and Install: Python (Windows only.) R Awk basics From the command line: $ awk '$1>20' filename Command

More information

First of all, these notes will cover only a small subset of the available commands and utilities, and will cover most of those in a shallow fashion.

First of all, these notes will cover only a small subset of the available commands and utilities, and will cover most of those in a shallow fashion. Warnings 1 First of all, these notes will cover only a small subset of the available commands and utilities, and will cover most of those in a shallow fashion. Read the relevant material in Sobell! If

More information

Introduction to Linux

Introduction to Linux Introduction to Linux University of Bristol - Advance Computing Research Centre 1 / 47 Operating Systems Program running all the time Interfaces between other programs and hardware Provides abstractions

More information

CSE 303 Lecture 2. Introduction to bash shell. read Linux Pocket Guide pp , 58-59, 60, 65-70, 71-72, 77-80

CSE 303 Lecture 2. Introduction to bash shell. read Linux Pocket Guide pp , 58-59, 60, 65-70, 71-72, 77-80 CSE 303 Lecture 2 Introduction to bash shell read Linux Pocket Guide pp. 37-46, 58-59, 60, 65-70, 71-72, 77-80 slides created by Marty Stepp http://www.cs.washington.edu/303/ 1 Unix file system structure

More information

File: Racket File Format Libraries

File: Racket File Format Libraries File: Racket File Format Libraries Version 5.0.2 November 6, 2010 1 Contents 1 gzip Compression and File Creation 3 2 gzip Decompression 4 3 zip File Creation 6 4 tar File Creation 7 5 MD5 Message Digest

More information

Practical: Using LAST and MEGAN to get a quick view of a metagenome

Practical: Using LAST and MEGAN to get a quick view of a metagenome Practical: Using LAST and MEGAN to get a quick view of a metagenome Daniel Lundin Linneaeus University November 14, 2014 Daniel Lundin (LNU) LAST+MEGAN practical November 14, 2014 1 / 25 A GIT archive

More information

Recap From Last Time:

Recap From Last Time: Recap From Last Time: BGGN 213 Working with UNIX Barry Grant http://thegrantlab.org/bggn213 Motivation: Why we use UNIX for bioinformatics. Modularity, Programmability, Infrastructure, Reliability and

More information

BGGN 213 Working with UNIX Barry Grant

BGGN 213 Working with UNIX Barry Grant BGGN 213 Working with UNIX Barry Grant http://thegrantlab.org/bggn213 Recap From Last Time: Motivation: Why we use UNIX for bioinformatics. Modularity, Programmability, Infrastructure, Reliability and

More information

Linux command line basics II: downloading data and controlling files. Yanbin Yin

Linux command line basics II: downloading data and controlling files. Yanbin Yin Linux command line basics II: downloading data and controlling files Yanbin Yin 1 Things you should know about programming Learning programming has to go through the hands-on practice, a lot of practice

More information

Fall Lecture 5. Operating Systems: Configuration & Use CIS345. The Linux Utilities. Mostafa Z. Ali.

Fall Lecture 5. Operating Systems: Configuration & Use CIS345. The Linux Utilities. Mostafa Z. Ali. Fall 2009 Lecture 5 Operating Systems: Configuration & Use CIS345 The Linux Utilities Mostafa Z. Ali mzali@just.edu.jo 1 1 The Linux Utilities Linux did not have a GUI. It ran on character based terminals

More information

Running Programs in UNIX 1 / 30

Running Programs in UNIX 1 / 30 Running Programs in UNIX 1 / 30 Outline Cmdline Running Programs in UNIX Capturing Output Using Pipes in UNIX to pass Input/Output 2 / 30 cmdline options in BASH ^ means "Control key" cancel a running

More information

9/22/2017

9/22/2017 Learning Perl Through Examples Part 2 L1110@BUMC 9/22/2017 Tutorial Resource Before we start, please take a note - all the codes and supporting documents are accessible through: http://rcs.bu.edu/examples/perl/tutorials/

More information

Basics. I think that the later is better.

Basics.  I think that the later is better. Basics Before we take up shell scripting, let s review some of the basic features and syntax of the shell, specifically the major shells in the sh lineage. Command Editing If you like vi, put your shell

More information

Today. Review. Unix as an OS case study Intro to Shell Scripting. What is an Operating System? What are its goals? How do we evaluate it?

Today. Review. Unix as an OS case study Intro to Shell Scripting. What is an Operating System? What are its goals? How do we evaluate it? Today Unix as an OS case study Intro to Shell Scripting Make sure the computer is in Linux If not, restart, holding down ALT key Login! Posted slides contain material not explicitly covered in class 1

More information

Shells and Shell Programming

Shells and Shell Programming Shells and Shell Programming 1 Shells A shell is a command line interpreter that is the interface between the user and the OS. The shell: analyzes each command determines what actions are to be performed

More information

List of Linux Commands in an IPm

List of Linux Commands in an IPm List of Linux Commands in an IPm Directory structure for Executables bin: ash cpio false ln mount rm tar zcat busybox date getopt login mv rmdir touch cat dd grep ls perl sed true chgrp df gunzip mkdir

More information

INTRODUCTION TO BIOINFORMATICS

INTRODUCTION TO BIOINFORMATICS Introducing the LINUX Operating System BecA-ILRI INTRODUCTION TO BIOINFORMATICS Mark Wamalwa BecA- ILRI Hub, Nairobi, Kenya h"p://hub.africabiosciences.org/ h"p://www.ilri.org/ m.wamalwa@cgiar.org 1 What

More information

Practical Unix exercise MBV INFX410

Practical Unix exercise MBV INFX410 Practical Unix exercise MBV INFX410 We will in this exercise work with a practical task that, it turns out, can easily be solved by using basic Unix. Let us pretend that an engineer in your group has spent

More information

1 Abstract. 2 Introduction. 3 Requirements

1 Abstract. 2 Introduction. 3 Requirements 1 Abstract 2 Introduction This SOP describes the HMP Whole- Metagenome Annotation Pipeline run at CBCB. This pipeline generates a 'Pretty Good Assembly' - a reasonable attempt at reconstructing pieces

More information

First of all, these notes will cover only a small subset of the available commands and utilities, and will cover most of those in a shallow fashion.

First of all, these notes will cover only a small subset of the available commands and utilities, and will cover most of those in a shallow fashion. Warnings 1 First of all, these notes will cover only a small subset of the available commands and utilities, and will cover most of those in a shallow fashion. Read the relevant material in Sobell! If

More information

Unix unzip zip compress uncompress zip zip zip zip Extracting zip Unzip ZIP Unix Unix zip extracting ZIP zip zip unzip zip unzip zip Unix zipped

Unix unzip zip compress uncompress zip zip zip zip Extracting zip Unzip ZIP Unix Unix zip extracting ZIP zip zip unzip zip unzip zip Unix zipped Unix unzip zip Jan 28, 2011. Typically one uses tar to create an uncompressed archive and either gzip or bzip2 to compress that archive. The corresponding gunzip and bunzip2 commands can be used to uncompress

More information

commandname flags arguments

commandname flags arguments Unix Review, additional Unix commands CS101, Mock Introduction This handout/lecture reviews some basic UNIX commands that you should know how to use. A more detailed description of this and other commands

More information

Shells and Shell Programming

Shells and Shell Programming Shells and Shell Programming Shells A shell is a command line interpreter that is the interface between the user and the OS. The shell: analyzes each command determines what actions are to be performed

More information

Chapter 4. Unix Tutorial. Unix Shell

Chapter 4. Unix Tutorial. Unix Shell Chapter 4 Unix Tutorial Users and applications interact with hardware through an operating system (OS). Unix is a very basic operating system in that it has just the essentials. Many operating systems,

More information

Unix/Linux Basics. Cpt S 223, Fall 2007 Copyright: Washington State University

Unix/Linux Basics. Cpt S 223, Fall 2007 Copyright: Washington State University Unix/Linux Basics 1 Some basics to remember Everything is case sensitive Eg., you can have two different files of the same name but different case in the same folder Console-driven (same as terminal )

More information

Introduction. File System. Note. Achtung!

Introduction. File System. Note. Achtung! 3 Unix Shell 1: Introduction Lab Objective: Explore the basics of the Unix Shell. Understand how to navigate and manipulate file directories. Introduce the Vim text editor for easy writing and editing

More information

Linux Essentials. Smith, Roderick W. Table of Contents ISBN-13: Introduction xvii. Chapter 1 Selecting an Operating System 1

Linux Essentials. Smith, Roderick W. Table of Contents ISBN-13: Introduction xvii. Chapter 1 Selecting an Operating System 1 Linux Essentials Smith, Roderick W. ISBN-13: 9781118106792 Table of Contents Introduction xvii Chapter 1 Selecting an Operating System 1 What Is an OS? 1 What Is a Kernel? 1 What Else Identifies an OS?

More information

Genomic Files. University of Massachusetts Medical School. October, 2015

Genomic Files. University of Massachusetts Medical School. October, 2015 .. Genomic Files University of Massachusetts Medical School October, 2015 2 / 55. A Typical Deep-Sequencing Workflow Samples Fastq Files Fastq Files Sam / Bam Files Various files Deep Sequencing Further

More information

Extensible scriptlet-driven tool to manipulate, or do work based on, files and file metadata (fields)

Extensible scriptlet-driven tool to manipulate, or do work based on, files and file metadata (fields) 1. MCUtils This package contains a suite of scripts for acquiring and manipulating MC metadata, and for performing various actions. The available scripts are listed below. The scripts are written in Perl

More information

Hostopia WebMail Help

Hostopia WebMail Help Hostopia WebMail Help Table of Contents GETTING STARTED WITH WEBMAIL...5 Version History...6 Introduction to WebMail...6 Cookies and WebMail...6 Logging in to your account...6 Connection time limit...7

More information

Cisco IOS Shell. Finding Feature Information. Prerequisites for Cisco IOS.sh. Last Updated: December 14, 2012

Cisco IOS Shell. Finding Feature Information. Prerequisites for Cisco IOS.sh. Last Updated: December 14, 2012 Cisco IOS Shell Last Updated: December 14, 2012 The Cisco IOS Shell (IOS.sh) feature provides shell scripting capability to the Cisco IOS command-lineinterface (CLI) environment. Cisco IOS.sh enhances

More information

STATS Data Analysis using Python. Lecture 15: Advanced Command Line

STATS Data Analysis using Python. Lecture 15: Advanced Command Line STATS 700-002 Data Analysis using Python Lecture 15: Advanced Command Line Why UNIX/Linux? As a data scientist, you will spend most of your time dealing with data Data sets never arrive ready to analyze

More information

Regular expressions: Text editing and Advanced manipulation. HORT Lecture 4 Instructor: Kranthi Varala

Regular expressions: Text editing and Advanced manipulation. HORT Lecture 4 Instructor: Kranthi Varala Regular expressions: Text editing and Advanced manipulation HORT 59000 Lecture 4 Instructor: Kranthi Varala Simple manipulations Tabular data files can be manipulated at a columnlevel. cut: Divide file

More information

Sep. Guide. Edico Genome Corp North Torrey Pines Court, Plaza Level, La Jolla, CA 92037

Sep. Guide.  Edico Genome Corp North Torrey Pines Court, Plaza Level, La Jolla, CA 92037 Sep 2017 DRAGEN TM Quick Start Guide www.edicogenome.com info@edicogenome.com Edico Genome Corp. 3344 North Torrey Pines Court, Plaza Level, La Jolla, CA 92037 Notice Contents of this document and associated

More information

File Commands. Objectives

File Commands. Objectives File Commands Chapter 2 SYS-ED/Computer Education Techniques, Inc. 2: 1 Objectives You will learn: Purpose and function of file commands. Interrelated usage of commands. SYS-ED/Computer Education Techniques,

More information

Public Repositories Tutorial: Bulk Downloads

Public Repositories Tutorial: Bulk Downloads Public Repositories Tutorial: Bulk Downloads Almost all of the public databases, genome browsers, and other tools you have explored so far offer some form of access to rapidly download all or large chunks

More information

DATA STRUCTURE AND ALGORITHM USING PYTHON

DATA STRUCTURE AND ALGORITHM USING PYTHON DATA STRUCTURE AND ALGORITHM USING PYTHON Advanced Data Structure and File Manipulation Peter Lo Linear Structure Queue, Stack, Linked List and Tree 2 Queue A queue is a line of people or things waiting

More information

Review of Fundamentals. Todd Kelley CST8207 Todd Kelley 1

Review of Fundamentals. Todd Kelley CST8207 Todd Kelley 1 Review of Fundamentals Todd Kelley kelleyt@algonquincollege.com CST8207 Todd Kelley 1 The CST8207 course notes GPL the shell SSH (secure shell) the Course Linux Server RTFM vi general shell review 2 Linux

More information

Introduction p. 1 Who Should Read This Book? p. 1 What You Need to Know Before Reading This Book p. 2 How This Book Is Organized p.

Introduction p. 1 Who Should Read This Book? p. 1 What You Need to Know Before Reading This Book p. 2 How This Book Is Organized p. Introduction p. 1 Who Should Read This Book? p. 1 What You Need to Know Before Reading This Book p. 2 How This Book Is Organized p. 2 Conventions Used in This Book p. 2 Introduction to UNIX p. 5 An Overview

More information