Essential Skills for Bioinformatics: Unix/Linux
|
|
- Colin McKinney
- 5 years ago
- Views:
Transcription
1 Essential Skills for Bioinformatics: Unix/Linux
2 WORKING WITH COMPRESSED DATA
3 Overview Data compression, the process of condensing data so that it takes up less space (on disk drives, in memory, or across network transfer), is an indispensable technology in modern bioinformatics. For example, sequences from a recent Illumina HiSeq run example.fastq: 63,203,414,514 bytes (59 GB) example.fastq.gz: 21,408,674,240 bytes (20 GB) Compression ratio (uncompressed size/compressed size) of this data is 2.95, which translates to a significant space saving of about 66%.
4 Overview Data can remain compressed on the disk throughout processing and analyses. Most well-written bioinformatics tools can work natively with compressed data as input, without requiring us to decompress it to disk first. Using pipes and redirection, we can stream compressed data and write compressed files directly to the disk. Common Unix tools like cat, grep all have variants that work with compressed data. While working with large datasets in bioinformatics can be challenging, using the compression tools in Unix and software libraries make our lives much easier.
5 gzip The two most common compression systems used on Unix are gzip and bzip2. gzip faster than bzip2. bzip2 has a higher compression ratio (the previous fastq file is only about 16 GB when compressed with bzip2) Generally, gzip is used in bioinformatics to compress most sizable files, while bzip2 is more common for long-term data archiving.
6 gzip It can compress results from standard input. This is useful, as we can compress results directly from another bioinformatics program s standard output.
7 gzip It also can compress files on disk in place. gzip will compress this file in place, replacing the original uncompressed version with the compressed file (appending the extension.gz to the original filename).
8 gunzip We can decompress files in place with the command gunzip. Note that this replaces tb1.fasta.gz file with the decompressed version.
9 gzip -c Both gzip and gunzip can also output their results to standard out. This can be enabled using the c option:
10 gzip with multiple files
11 gzip with multiple files
12 Working with gzipped files The greatest advantage of gzip (and bzip2) is that many Unix and bioinformatics tools can work directly with compressed files. For example, we can search compressed files using grep s analog for gzipped files, zgrep. Likewise, cat has zcat. If programs cannot handle compressed input, you can use zcat and pipe output directly to the standard input of another program.
13 Working with gzipped files
14 Creating a tar.gz archive
15 Extracting a tar.gz file
16 CASE STUDY: REPRODUCIBLY DOWNLOADING DATA
17 GRCm38 mouse reference genome We usually download genomic resources like sequence and annotation files from remote servers over the Internet, which may change in the future. Furthermore, new versions of sequence and annotation data may be released, so it is imperative that we document everything about how data was acquired for full reproducibility The human, mouse, zebrafish, and chicken genomes releases are coordinated through the Genome Reference Consortium (
18 GRCm38 mouse reference genome The GRC prefix in GRCm38 refers to the Genome Reference Consortium. We can download GRCm38 from Ensembl using wget.
19 Compare checksum values From ftp://ftp.ensembl.org/pub/release-87/fasta/mus_musculus/dna/checksums
20 Extract the FASTA headers
21 Document README Document how and when we downloaded this file in README Copy the SHA-1 checksum values into README
22 UNIX DATA TOOLS
23 Overview Understanding how to use Unix data tools in bioinformatics is not only about learning what each tool does, it is about mastering the practice of connecting tools together creating programs from Unix pipelines. By connecting data tools together with pipes, we can construct programs that parse, manipulate, and summarize data. Unix pipelines can be developed in shell scripts or as one-liners (tiny programs built by connecting Unix tools with pipes directly on the shell).
24 Overview Building more complex programs from small, modular tools capitalizes on the design and philosophy of Unix. The pipeline approach to building programs is a wellestablished tradition in Unix and bioinformatics because it is a fast way to solve problems, incredibly powerful, and adaptable to a variety of problems.
25 When to use the Unix pipeline approach The Unix one-linear approach is not appropriate for all problems. Many bioinformatics tasks are better accomplished through a custom, well-documented script. Knowing when to use a fast and simple engineering solution like a Unix pipeline and when to resort to writing a welldocumented Python and R script takes experience.
26 When to use the Unix pipeline approach Unix pipelines: Fast, low-level data manipulation toolkit to explore data, transform data between formats, and inspect data for potential problems. Useful when we want to get a quick answer and keep moving forward with our project. It is essential that everything that produces a result is documented. Storing pipelines in shell scripts is a good approach. Custom scripts using Python or R: Useful for larger, more complex tasks as these allow for the flexibility in checking input data, structuring programs, use of data structures, code documentation.
27 Inspecting and manipulating text data Many formats in bioinformatics are simple tabular plain-text files delimited by a character. The most common tabular plain-text file format used in bioinformatics is tab-delimited because most Unix tools treat tabs as delimiters by default. Tab-delimited file formats are also simple to parse with scripting language like Python and Perl, and easy to load into R.
28 Tabular plain-text data formats The basic format: Each row (known as a record) is kept on its own line Each column (known as a field) is separated by some delimiter Three formats: Tab-delimited Comma-separated Variable space-delimited
29 Tab-delimited The most commonly used in bioinformatics (e.g. BED, GTF/GFF, SA M, VCF). Columns of a tab-delimited file are separated by a single tab char acter (the escape code: \t). A common convention (not a standard) is to include metadata on the first few lines of a tab-delimited files. These metadata lines be gin with #. Tabs in data are not allowed.
30 Comma-separated values (CSV) CSV is similar to tab-delimited, except the delimiter is a comma character. While not a common occurrence in bioinformatics, it is possible that the data stored in CSV format contain commas. Some variants just do not allow this, while others use quotes around entries that could contain commas.
31 Variable space-delimited In general, tab-delimited formats and CSV are better choices than variable space-delimited formats because it is quite com mon to encounter data containing spaces.
32 How lines are separated In Linux and OS X: use a single linefeed character (the escape code: \n) to separate lines. In Windows: use a DOS-style line separator of a carriage return and a linefeed character (\r\n). To convert DOS to Unix text format, use dos2unix. To convert Unix to DOS text format, use unix2dos.
33 Inspecting data with head and tail Many files in bioinformatics are much too long to inspect with cat. Running cat on a file a million lines long would quickly fill your shell. A better option is to take a look at the top of a file with head.
34 Inspecting data with head and tail
35 Inspecting data with head and tail We can control how many lines we see.
36 Inspecting data with head and tail tail is designed to look at the end of a file. tail works just like head.
37 Inspecting data with head and tail We can also use tail to remove the header of a file. If n is given a number x preceded with a + sign (e.g. +x), tail will start from the x th line.
38 Inspecting data with head and tail head is useful for taking a peek at data resulting from a Unix pipeline. We will use grep s results as the standard input for the next program in our pipeline, but first we want to check grep s standard out to see if everything looks correct. When head exits, your shell catches this and stops the entire pipe. When building complex pipelines that process large amounts of data, this is important.
39 less less is a useful program for a inspecting files and the output of pipes. It is a terminal pager, a program that allows us to view large amounts of text in our terminals at a time. less has more features and is generally preferred than the older terminal pager called more.
40 less Shortcut Space bar b g G j k /<pattern>?<pattern> Action Next page Previous page First line Last line Down one line at a time Up one line at a time Search down for string <pattern> Search up for string <pattern>
41 less less is useful in debugging our command-line pipelines. Just pipe the output of the command you want to debug to less. When you run the pipe, less will capture the output of the last command and pause so you can inspect it. less is crucial when iteratively building up a pipeline.
42 less A useful behavior of pipes is that the execution of a program with output piped to less will be paused when less has a full screen of data. When you pipe a program s output to less and inspect it, less stops reading input from the pipe. The pipe will block and we can spend as much time as needed to inspect the output.
Lecture 3. Essential skills for bioinformatics: Unix/Linux
Lecture 3 Essential skills for bioinformatics: Unix/Linux RETRIEVING DATA Overview Whether downloading large sequencing datasets or accessing a web application hundreds of times to download specific files,
More informationMerge Conflicts p. 92 More GitHub Workflows: Forking and Pull Requests p. 97 Using Git to Make Life Easier: Working with Past Commits p.
Preface p. xiii Ideology: Data Skills for Robust and Reproducible Bioinformatics How to Learn Bioinformatics p. 1 Why Bioinformatics? Biology's Growing Data p. 1 Learning Data Skills to Learn Bioinformatics
More informationIntroduction to UNIX command-line II
Introduction to UNIX command-line II Boyce Thompson Institute 2017 Prashant Hosmani Class Content Terminal file system navigation Wildcards, shortcuts and special characters File permissions Compression
More informationUsing Linux as a Virtual Machine
Intro to UNIX Using Linux as a Virtual Machine We will use the VMware Player to run a Virtual Machine which is a way of having more than one Operating System (OS) running at once. Your Virtual OS (Linux)
More informationUtilities. September 8, 2015
Utilities September 8, 2015 Useful ideas Listing files and display text and binary files Copy, move, and remove files Search, sort, print, compare files Using pipes Compression and archiving Your fellow
More informationhttp://xkcd.com/208/ 1. Review of pipes 2. Regular expressions 3. sed 4. Editing Files 5. Shell loops 6. Shell scripts cat seqs.fa >0! TGCAGGTATATCTATTAGCAGGTTTAATTTTGCCTGCACTTGGTTGGGTACATTATTTTAAGTGTATTTGACAAG!
More informationHandling Ordinary Files
Handling Ordinary Files Unit 2 Sahaj Computer Solutions visit : projectsatsahaj.com 1 cat: Displaying and Creating Files cat is one of the most frequently used commands on Unix-like operating systems.
More informationLecture 5. Essential skills for bioinformatics: Unix/Linux
Lecture 5 Essential skills for bioinformatics: Unix/Linux UNIX DATA TOOLS Text processing with awk We have illustrated two ways awk can come in handy: Filtering data using rules that can combine regular
More informationhttp://xkcd.com/208/ 1. Review of pipes 2. Regular expressions 3. sed 4. awk 5. Editing Files 6. Shell loops 7. Shell scripts cat seqs.fa >0! TGCAGGTATATCTATTAGCAGGTTTAATTTTGCCTGCACTTGGTTGGGTACATTATTTTAAGTGTATTTGACAAG!
More informationIntroduction to UNIX command-line
Introduction to UNIX command-line Boyce Thompson Institute March 17, 2015 Lukas Mueller & Noe Fernandez Class Content Terminal file system navigation Wildcards, shortcuts and special characters File permissions
More informationhttp://xkcd.com/208/ cat seqs.fa >0 TGCAGGTATATCTATTAGCAGGTTTAATTTTGCCTGCACTTGGTTGGGTACATTATTTTAAGTGTATTTGACAAG >1 TGCAGGTTGTTGTTACTCAGGTCCAGTTCTCTGAGACTGGAGGACTGGGAGCTGAGAACTGAGGACAGAGCTTCA >2 TGCAGGGCCGGTCCAAGGCTGCATGAGGCCTGGGGCAGAATCTGACCTAGGGGCCCCTCTTGCTGCTAAAACCAT
More informationIntroduction To Linux. Rob Thomas - ACRC
Introduction To Linux Rob Thomas - ACRC What Is Linux A free Operating System based on UNIX (TM) An operating system originating at Bell Labs. circa 1969 in the USA More of this later... Why Linux? Free
More informationBioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny.
Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny stefano.gaiarsa@unimi.it Linux and the command line PART 1 Survival kit for the bash environment Purpose of the
More informationhttp://xkcd.com/208/ 1. Computer Hardware 2. Review of pipes 3. Regular expressions 4. sed 5. awk 6. Editing Files 7. Shell loops 8. Shell scripts Hardware http://www.theverge.com/2011/11/23/2582677/thailand-flood-seagate-hard-drive-shortage
More informationUnzip command in unix
Unzip command in unix Search 24-4-2015 Howto Extract Zip Files in a Linux and. You need to use the unzip command on a Linux or Unix like system. The nixcraft takes a lot of my time and. 16-4-2010 Howto:
More informationLCE Splunk Client 4.6 User Manual. Last Revised: March 27, 2018
LCE Splunk Client 4.6 User Manual Last Revised: March 27, 2018 Table of Contents Getting Started with the LCE Splunk Client 3 Standards and Conventions 4 Install, Configure, and Remove 5 Download an LCE
More informationEssential Skills for Bioinformatics: Unix/Linux
Essential Skills for Bioinformatics: Unix/Linux SHELL SCRIPTING Overview Bash, the shell we have used interactively in this course, is a full-fledged scripting language. Unlike Python, Bash is not a general-purpose
More information7. Archiving and compressing 7.1 Introduction
7. Archiving and compressing 7.1 Introduction In this chapter, we discuss how to manage archive files at the command line. File archiving is used when one or more files need to be transmitted or stored
More informationPractical Linux examples: Exercises
Practical Linux examples: Exercises 1. Login (ssh) to the machine that you are assigned for this workshop (assigned machines: https://cbsu.tc.cornell.edu/ww/machines.aspx?i=87 ). Prepare working directory,
More informationITST Searching, Extracting & Archiving Data
ITST 1136 - Searching, Extracting & Archiving Data Name: Step 1 Sign into a Pi UN = pi PW = raspberry Step 2 - Grep - One of the most useful and versatile commands in a Linux terminal environment is the
More information1 Abstract. 2 Introduction. 3 Requirements
1 Abstract 2 Introduction This SOP describes the HMP Whole- Metagenome Annotation Pipeline run at CBCB. This pipeline generates a 'Pretty Good Assembly' - a reasonable attempt at reconstructing pieces
More informationIntroduction to Linux
Introduction to Linux University of Bristol - Advance Computing Research Centre 1 / 47 Operating Systems Program running all the time Interfaces between other programs and hardware Provides abstractions
More informationUNIX, GNU/Linux and simple tools for data manipulation
UNIX, GNU/Linux and simple tools for data manipulation Dr Jean-Baka DOMELEVO ENTFELLNER BecA-ILRI Hub Basic Bioinformatics Training Workshop @ILRI Addis Ababa Wednesday December 13 th 2017 Dr Jean-Baka
More informationUnix Essentials. BaRC Hot Topics Bioinformatics and Research Computing Whitehead Institute October 12 th
Unix Essentials BaRC Hot Topics Bioinformatics and Research Computing Whitehead Institute October 12 th 2016 http://barc.wi.mit.edu/hot_topics/ 1 Outline Unix overview Logging in to tak Directory structure
More informationReview of Fundamentals
Review of Fundamentals 1 The shell vi General shell review 2 http://teaching.idallen.com/cst8207/14f/notes/120_shell_basics.html The shell is a program that is executed for us automatically when we log
More informationLinux command line basics III: piping commands for text processing. Yanbin Yin Fall 2015
Linux command line basics III: piping commands for text processing Yanbin Yin Fall 2015 1 h.p://korflab.ucdavis.edu/unix_and_perl/unix_and_perl_v3.1.1.pdf 2 The beauty of Unix for bioinformagcs sort, cut,
More informationUNIX and Linux Essentials Student Guide
UNIX and Linux Essentials Student Guide D76989GC10 Edition 1.0 June 2012 D77816 Authors Uma Sannasi Pardeep Sharma Technical Contributor and Reviewer Harald van Breederode Editors Anwesha Ray Raj Kumar
More informationWorking With Unix. Scott A. Handley* September 15, *Adapted from UNIX introduction material created by Dr. Julian Catchen
Working With Unix Scott A. Handley* September 15, 2014 *Adapted from UNIX introduction material created by Dr. Julian Catchen What is UNIX? An operating system (OS) Designed to be multiuser and multitasking
More informationIntroduction to Unix: Fundamental Commands
Introduction to Unix: Fundamental Commands Ricky Patterson UVA Library Based on slides from Turgut Yilmaz Istanbul Teknik University 1 What We Will Learn The fundamental commands of the Unix operating
More informationCloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK
Cloud Computing and Unix: An Introduction Dr. Sophie Shaw University of Aberdeen, UK s.shaw@abdn.ac.uk Aberdeen London Exeter What We re Going To Do Why Unix? Cloud Computing Connecting to AWS Introduction
More informationCloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK
Cloud Computing and Unix: An Introduction Dr. Sophie Shaw University of Aberdeen, UK s.shaw@abdn.ac.uk Aberdeen London Exeter What We re Going To Do Why Unix? Cloud Computing Connecting to AWS Introduction
More informationPractical: Using LAST and MEGAN to get a quick view of a metagenome
Practical: Using LAST and MEGAN to get a quick view of a metagenome Daniel Lundin Linneaeus University November 14, 2014 Daniel Lundin (LNU) LAST+MEGAN practical November 14, 2014 1 / 25 A GIT archive
More informationls /data/atrnaseq/ egrep "(fastq fasta fq fa)\.gz" ls /data/atrnaseq/ egrep "(cn ts)[1-3]ln[^3a-za-z]\."
Command line tools - bash, awk and sed We can only explore a small fraction of the capabilities of the bash shell and command-line utilities in Linux during this course. An entire course could be taught
More informationIntroduction to Linux. Roman Cheplyaka
Introduction to Linux Roman Cheplyaka Generic commands, files, directories What am I running? ngsuser@ubuntu:~$ cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=16.04 DISTRIB_CODENAME=xenial DISTRIB_DESCRIPTION="Ubuntu
More informationLinux II and III. Douglas Scofield. Crea-ng directories and files 18/01/14. Evolu5onary Biology Centre, Uppsala University
Linux II and III Douglas Scofield Evolu5onary Biology Centre, Uppsala University douglas.scofield@ebc.uu.se slides at Crea-ng directories and files mkdir 1 Crea-ng directories and files touch if file does
More informationUnix unzip zip compress uncompress zip zip zip zip Extracting zip Unzip ZIP Unix Unix zip extracting ZIP zip zip unzip zip unzip zip Unix zipped
Unix unzip zip Jan 28, 2011. Typically one uses tar to create an uncompressed archive and either gzip or bzip2 to compress that archive. The corresponding gunzip and bunzip2 commands can be used to uncompress
More informationPractical Linux Examples
Practical Linux Examples Processing large text file Parallelization of independent tasks Qi Sun & Robert Bukowski Bioinformatics Facility Cornell University http://cbsu.tc.cornell.edu/lab/doc/linux_examples_slides.pdf
More informationA Brief Introduction to the Linux Shell for Data Science
A Brief Introduction to the Linux Shell for Data Science Aris Anagnostopoulos 1 Introduction Here we will see a brief introduction of the Linux command line or shell as it is called. Linux is a Unix-like
More informationChapter-3. Introduction to Unix: Fundamental Commands
Chapter-3 Introduction to Unix: Fundamental Commands What You Will Learn The fundamental commands of the Unix operating system. Everything told for Unix here is applicable to the Linux operating system
More informationMetaStorm: User Manual
MetaStorm: User Manual User Account: First, either log in as a guest or login to your user account. If you login as a guest, you can visualize public MetaStorm projects, but can not run any analysis. To
More informationGenomic Files. University of Massachusetts Medical School. October, 2014
.. Genomic Files University of Massachusetts Medical School October, 2014 2 / 39. A Typical Deep-Sequencing Workflow Samples Fastq Files Fastq Files Sam / Bam Files Various files Deep Sequencing Further
More informationComputer Systems and Architecture
Computer Systems and Architecture Introduction to UNIX Stephen Pauwels University of Antwerp October 2, 2015 Outline What is Unix? Getting started Streams Exercises UNIX Operating system Servers, desktops,
More informationReview of Fundamentals. Todd Kelley CST8207 Todd Kelley 1
Review of Fundamentals Todd Kelley kelleyt@algonquincollege.com CST8207 Todd Kelley 1 GPL the shell SSH (secure shell) the Course Linux Server RTFM vi general shell review 2 These notes are available on
More informationCommand-Line Data Analysis INX_S17, Day 15,
Command-Line Data Analysis INX_S17, Day 15, 2017-05-12 General tool efficiency, tr, newlines, join, column Learning Outcome(s): Discuss the theory behind Unix/Linux tool efficiency, e.g., the reasons behind
More informationFall Lecture 5. Operating Systems: Configuration & Use CIS345. The Linux Utilities. Mostafa Z. Ali.
Fall 2009 Lecture 5 Operating Systems: Configuration & Use CIS345 The Linux Utilities Mostafa Z. Ali mzali@just.edu.jo 1 1 The Linux Utilities Linux did not have a GUI. It ran on character based terminals
More informationIB047. Unix Text Tools. Pavel Rychlý Mar 3.
Unix Text Tools pary@fi.muni.cz 2014 Mar 3 Unix Text Tools Tradition Unix has tools for text processing from the very beginning (1970s) Small, simple tools, each tool doing only one operation Pipe (pipeline):
More informationEL2310 Scientific Programming
(yaseminb@kth.se) Overview Overview Roots of C Getting started with C Closer look at Hello World Programming Environment Discussion Basic Datatypes and printf Schedule Introduction to C - main part of
More informationLecture 8. Sequence alignments
Lecture 8 Sequence alignments DATA FORMATS bioawk bioawk is a program that extends awk s powerful processing of tabular data to processing tasks involving common bioinformatics formats like FASTA/FASTQ,
More informationPractical Unix exercise MBV INFX410
Practical Unix exercise MBV INFX410 We will in this exercise work with a practical task that, it turns out, can easily be solved by using basic Unix. Let us pretend that an engineer in your group has spent
More informationRunning Programs in UNIX 1 / 30
Running Programs in UNIX 1 / 30 Outline Cmdline Running Programs in UNIX Capturing Output Using Pipes in UNIX to pass Input/Output 2 / 30 cmdline options in BASH ^ means "Control key" cancel a running
More informationIntroduction to Unix/Linux INX_S17, Day 8,
Introduction to Unix/Linux INX_S17, Day 8, 2017-04-21 stdin, stdout, stderr, piping, iterative filtering, grep, cat, UUOC Learning Outcome(s): Redirect the standard output to the standard input stream
More informationRecap From Last Time:
Recap From Last Time: BGGN 213 Working with UNIX Barry Grant http://thegrantlab.org/bggn213 Motivation: Why we use UNIX for bioinformatics. Modularity, Programmability, Infrastructure, Reliability and
More informationTable of contents. Our goal. Notes. Notes. Notes. Summer June 29, Our goal is to see how we can use Unix as a tool for developing programs
Summer 2010 Department of Computer Science and Engineering York University Toronto June 29, 2010 1 / 36 Table of contents 1 2 3 4 2 / 36 Our goal Our goal is to see how we can use Unix as a tool for developing
More informationBGGN 213 Working with UNIX Barry Grant
BGGN 213 Working with UNIX Barry Grant http://thegrantlab.org/bggn213 Recap From Last Time: Motivation: Why we use UNIX for bioinformatics. Modularity, Programmability, Infrastructure, Reliability and
More informationScripting Languages Course 1. Diana Trandabăț
Scripting Languages Course 1 Diana Trandabăț Master in Computational Linguistics - 1 st year 2017-2018 Today s lecture Introduction to scripting languages What is a script? What is a scripting language
More informationUnix - Basics Course on Unix and Genomic Data Prague, January 2017
Unix - Basics Course on Unix and Genomic Data Prague, January 2017 Libor Mořkovský, Václav Janoušek, Anastassiya Zidkova, Anna Přistoupilová, Filip Sedlák http://ngs-course.readthedocs.org/en/praha-january-2017/
More informationFile: PLT File Format Libraries
File: PLT File Format Libraries Version 4.0 June 11, 2008 1 Contents 1 gzip Compression and File Creation 3 2 gzip Decompression 4 3 zip File Creation 6 4 tar File Creation 7 5 MD5 Message Digest 8 6 GIF
More informationLinux Fundamentals (L-120)
Linux Fundamentals (L-120) Modality: Virtual Classroom Duration: 5 Days SUBSCRIPTION: Master, Master Plus About this course: This is a challenging course that focuses on the fundamental tools and concepts
More informationreplace my_user_id in the commands with your actual user ID
Exercise 1. Alignment with TOPHAT Part 1. Prepare the working directory. 1. Find out the name of the computer that has been reserved for you (https://cbsu.tc.cornell.edu/ww/machines.aspx?i=57 ). Everyone
More informationShell Programming Overview
Overview Shell programming is a way of taking several command line instructions that you would use in a Unix command prompt and incorporating them into one program. There are many versions of Unix. Some
More informationCSE 390a Lecture 2. Exploring Shell Commands, Streams, and Redirection
1 CSE 390a Lecture 2 Exploring Shell Commands, Streams, and Redirection slides created by Marty Stepp, modified by Jessica Miller & Ruth Anderson http://www.cs.washington.edu/390a/ 2 Lecture summary Unix
More information9/22/2017
Learning Perl Through Examples Part 2 L1110@BUMC 9/22/2017 Tutorial Resource Before we start, please take a note - all the codes and supporting documents are accessible through: http://rcs.bu.edu/examples/perl/tutorials/
More informationFile: Racket File Format Libraries
File: Racket File Format Libraries Version 5.0.2 November 6, 2010 1 Contents 1 gzip Compression and File Creation 3 2 gzip Decompression 4 3 zip File Creation 6 4 tar File Creation 7 5 MD5 Message Digest
More informationSep. Guide. Edico Genome Corp North Torrey Pines Court, Plaza Level, La Jolla, CA 92037
Sep 2017 DRAGEN TM Quick Start Guide www.edicogenome.com info@edicogenome.com Edico Genome Corp. 3344 North Torrey Pines Court, Plaza Level, La Jolla, CA 92037 Notice Contents of this document and associated
More informationCS 460 Linux Tutorial
CS 460 Linux Tutorial http://ryanstutorials.net/linuxtutorial/cheatsheet.php # Change directory to your home directory. # Remember, ~ means your home directory cd ~ # Check to see your current working
More informationNo Food or Drink in this room. Logon to Windows machine
While you are waiting No Food or Drink in this room Logon to Windows machine Username/password on right-hand monitor Not the username/password I gave you earlier We will walk through connecting to the
More informationGenomic Files. University of Massachusetts Medical School. October, 2015
.. Genomic Files University of Massachusetts Medical School October, 2015 2 / 55. A Typical Deep-Sequencing Workflow Samples Fastq Files Fastq Files Sam / Bam Files Various files Deep Sequencing Further
More informationMineração de Dados Aplicada
Simple but Powerful Text-Processing Commands August, 29 th 2018 DCC ICEx UFMG Unix philosophy Unix philosophy Doug McIlroy (inventor of Unix pipes). In A Quarter-Century of Unix (1994): Write programs
More informationAn Introduction to Linux and Bowtie
An Introduction to Linux and Bowtie Cavan Reilly November 10, 2017 Table of contents Introduction to UNIX-like operating systems Installing programs Bowtie SAMtools Introduction to Linux In order to use
More informationComputer Architecture Lab 1 (Starting with Linux)
Computer Architecture Lab 1 (Starting with Linux) Linux is a computer operating system. An operating system consists of the software that manages your computer and lets you run applications on it. The
More informationPublic Repositories Tutorial: Bulk Downloads
Public Repositories Tutorial: Bulk Downloads Almost all of the public databases, genome browsers, and other tools you have explored so far offer some form of access to rapidly download all or large chunks
More informationcommandname flags arguments
Unix Review, additional Unix commands CS101, Mock Introduction This handout/lecture reviews some basic UNIX commands that you should know how to use. A more detailed description of this and other commands
More informationLinux Tutorial #7. quota. df (disk free) du (disk usage)
Linux Tutorial #7 quota On many computer systems, the system administrator has to restrict the amount of disk space users are allowed to use in order to avoid running out of space on the shared file system.
More informationBasics. I think that the later is better.
Basics Before we take up shell scripting, let s review some of the basic features and syntax of the shell, specifically the major shells in the sh lineage. Command Editing If you like vi, put your shell
More informationUnix/Linux Primer. Taras V. Pogorelov and Mike Hallock School of Chemical Sciences, University of Illinois
Unix/Linux Primer Taras V. Pogorelov and Mike Hallock School of Chemical Sciences, University of Illinois August 25, 2017 This primer is designed to introduce basic UNIX/Linux concepts and commands. No
More informationPre-Instructions for Proteomics Bioinformatics session Optional things that you can do before the Proteomics session of the Bioinformatics Course:
Pre-Instructions for Proteomics Bioinformatics session Optional things that you can do before the Proteomics session of the Bioinformatics Course: (doing the things below ahead of time will ensure that
More informationReview of Fundamentals. Todd Kelley CST8207 Todd Kelley 1
Review of Fundamentals Todd Kelley kelleyt@algonquincollege.com CST8207 Todd Kelley 1 The CST8207 course notes GPL the shell SSH (secure shell) the Course Linux Server RTFM vi general shell review 2 Linux
More informationBIOINFORMATICS POST-DIPLOMA PROGRAM SUBJECT OUTLINE Subject Title: OPERATING SYSTEMS AND PROJECT MANAGEMENT Subject Code: BIF713 Subject Description:
BIOINFORMATICS POST-DIPLOMA PROGRAM SUBJECT OUTLINE Subject Title: OPERATING SYSTEMS AND PROJECT MANAGEMENT Subject Code: BIF713 Subject Description: This course provides Bioinformatics students with the
More informationIntroduction to Unix and Linux. Workshop 1: Directories and Files
Introduction to Unix and Linux Workshop 1: Directories and Files Genomics Core Lab TEXAS A&M UNIVERSITY CORPUS CHRISTI Anvesh Paidipala, Evan Krell, Kelly Pennoyer, Chris Bird Genomics Core Lab Informatics
More informationSession: Shell Programming Topic: Additional Commands
Lecture Session: Shell Programming Topic: Additional Commands Daniel Chang diff [-b][-i][-w] filename1 filename2 diff [-b][-i][-w] filename1 directory1 diff [-b][-i][-w][-r] directory1 directory2 Description:
More informationWhy SAS Programmers Should Learn Python Too
PharmaSUG 2018 - Paper AD-12 ABSTRACT Why SAS Programmers Should Learn Python Too Michael Stackhouse, Covance, Inc. Day to day work can often require simple, yet repetitive tasks. All companies have tedious
More informationIntroduction to the shell Part II
Introduction to the shell Part II Graham Markall http://www.doc.ic.ac.uk/~grm08 grm08@doc.ic.ac.uk Civil Engineering Tech Talks 16 th November, 1pm Last week Covered applications and Windows compatibility
More informationAnswers to AWK problems. Shell-Programming. Future: Using loops to automate tasks. Download and Install: Python (Windows only.) R
Today s Class Answers to AWK problems Shell-Programming Using loops to automate tasks Future: Download and Install: Python (Windows only.) R Awk basics From the command line: $ awk '$1>20' filename Command
More informationCOMP 4/6262: Programming UNIX
COMP 4/6262: Programming UNIX Lecture 12 shells, shell programming: passing arguments, if, debug March 13, 2006 Outline shells shell programming passing arguments (KW Ch.7) exit status if (KW Ch.8) test
More informationDNA Sequence Reads Compression
DNA Sequence Reads Compression User Guide Release 2.0 March 31, 2014 Contents Contents ii 1 Introduction 1 1.1 What is DSRC?....................................... 1 1.2 Main features.......................................
More informationChIP-seq (NGS) Data Formats
ChIP-seq (NGS) Data Formats Biological samples Sequence reads SRA/SRF, FASTQ Quality control SAM/BAM/Pileup?? Mapping Assembly... DE Analysis Variant Detection Peak Calling...? Counts, RPKM VCF BED/narrowPeak/
More informationAdvanced Linux Commands & Shell Scripting
Advanced Linux Commands & Shell Scripting Advanced Genomics & Bioinformatics Workshop James Oguya Nairobi, Kenya August, 2016 Man pages Most Linux commands are shipped with their reference manuals To view
More informationLecture 5. Additional useful commands. COP 3353 Introduction to UNIX
Lecture 5 Additional useful commands COP 3353 Introduction to UNIX diff diff compares two text files ( can also be used on directories) and prints the lines for which the files differ. The format is as
More informationArchives. Gather and compress Campus-Booster ID : **XXXXX. Copyright SUPINFO. All rights reserved
Archives Gather and compress Campus-Booster ID : **XXXXX www.supinfo.com Copyright SUPINFO. All rights reserved Archives Your trainer Presenter s Name Title: **Enter title or job role. Accomplishments:
More informationEL2310 Scientific Programming
Lecture 6: Introduction to C (pronobis@kth.se) Overview Overview Lecture 6: Introduction to C Roots of C Getting started with C Closer look at Hello World Programming Environment Schedule Last time (and
More informationFREEENGINEER.ORG. 1 of 6 11/5/15 8:31 PM. Learn UNIX in 10 minutes. Version 1.3. Preface
FREEENGINEER.ORG Learn UNIX in 10 minutes. Version 1.3 Preface This is something that I had given out to students (CAD user training) in years past. The purpose was to have on one page the basics commands
More informationCS155: Computer Security Spring Project #1
CS155: Computer Security Spring 2018 Project #1 Due: Part 1: Thursday, April 12-11:59pm, Parts 2 and 3: Thursday, April 19-11:59pm. The goal of this assignment is to gain hands-on experience finding vulnerabilities
More informationToday. Review. Unix as an OS case study Intro to Shell Scripting. What is an Operating System? What are its goals? How do we evaluate it?
Today Unix as an OS case study Intro to Shell Scripting Make sure the computer is in Linux If not, restart, holding down ALT key Login! Posted slides contain material not explicitly covered in class 1
More informationPreparation of alignments for variant calling with GATK: exercise instructions for BioHPC Lab computers
Preparation of alignments for variant calling with GATK: exercise instructions for BioHPC Lab computers Data used in the exercise We will use D. melanogaster WGS paired-end Illumina data with NCBI accessions
More informationCisco IOS Shell. Finding Feature Information. Prerequisites for Cisco IOS.sh. Last Updated: December 14, 2012
Cisco IOS Shell Last Updated: December 14, 2012 The Cisco IOS Shell (IOS.sh) feature provides shell scripting capability to the Cisco IOS command-lineinterface (CLI) environment. Cisco IOS.sh enhances
More informationComputer Systems and Architecture
Computer Systems and Architecture Stephen Pauwels Computer Systems Academic Year 2018-2019 Overview of the Semester UNIX Introductie Regular Expressions Scripting Data Representation Integers, Fixed point,
More informationWelcome to BCB/EEOB546X! Computational Skills for Biological Data. Instructors: Matt Hufford Tracy Heath Dennis Lavrov
Welcome to BCB/EEOB546X! Computational Skills for Biological Data Instructors: Matt Hufford Tracy Heath Dennis Lavrov What motivated us to teach this class? What motivated you to take this class? Course
More informationExploring the Microsoft Access User Interface and Exploring Navicat and Sequel Pro, and refer to chapter 5 of The Data Journalist.
Chapter 5 Exporting Data from Access and MySQL Skills you will learn: How to export data in text format from Microsoft Access, and from MySQL using Navicat and Sequel Pro. If you are unsure of the basics
More informationSAS7BDAT Database Binary Format
SAS7BDAT Database Binary Format Matthew S. Shotwell Contents ˆ Introduction ˆ SAS7BDAT Header ˆ SAS7BDAT Pages ˆ SAS7BDAT Subheaders ˆ SAS7BDAT Packed Binary Data ˆ Platform Differences ˆ Compression Data
More informationC++ Programming. Final Project. Implementing the Smith-Waterman Algorithm Software Engineering, EIM-I Philipp Schubert Version 1.1.
C++ Programming Implementing the Smith-Waterman Algorithm Software Engineering, EIM-I Philipp Schubert Version 1.1 January 26, 2018 This project is mandatory in order to pass the course and to obtain the
More information