By Ludovic Duvaux (27 November 2013)

Size: px
Start display at page:

Download "By Ludovic Duvaux (27 November 2013)"

Transcription

1 Array of jobs using SGE - an example using stampy, a mapping software. Running java applications on the cluster - merge sam files using the Picard tools By Ludovic Duvaux (27 November 2013) The idea ========== One may have many inputs (samples, files...) to process. Sometimes, it is also possible that the processing of different parts of a single file can be made independently and in parallel on several cores (this possibility being not incompatible with the previous one). In both cases (or a mix on both), create an array of jobs using SGE can be of great help! For instance, most recent mapping programs allow to process only a fraction of a fastq input file (e.g. only 1/10th of your file willbe mapped on your reference genome) and one can have many samples tomap on a reference genome! For the current course, we'll do that using some pea aphid data. I) Preliminary steps ==================== I.1) Open an interactive session on the cluster =============================================== Very little software and libraries are installed on the Head-node. Therefore you may run into severe problems if you attempt to install anything 'under your account or otherwise' while working on cluster Head-node. [MyLogin@cluster ~]$ qrsh [MyLogin@node15 ~]$ I.2) create a new folder & fetch the training files =================================================== To do so: ln -s /data/mylogin/./data # create a symbolic link toward your data folder in the current folder. If the above doesn't work, go first to your data folder then come back in order to trigger your data folder on. cd /data/mylogin cd ln -s /data/mylogin./data Then: cd data

2 mkdir -p ArrayJobs_Stampy cp /usr/local/extras/genomics/hpc_course/stampy_example/script_files/* ArrayJobs_Stampy # fetch all useful files of these training ll ArrayJobs_Stampy I.3) Check if you can run stampy ===================================== Try to call stampy's manual. From anywhere, type: /usr/local/extras/genomics/applications/stampy/1.0.22/stampy.py You should obtain something like: [bo4cm17@testnode01 ~]$ /usr/local/extras/genomics/applications/stampy/1.0.22/stampy.py stampy v (r1848), <gerton.lunter@well.ox.ac.uk> Usage: /usr/local/extras/genomics/applications/stampy/1.0.22/stampy.py [options] [.fa files] Option summary (--help for all): Command options -G PREFIX file1.fa [...] Build genome index PREFIX.stidx from fasta file(s) on command line -H PREFIX Build hash PREFIX.sthash -M FILE[,FILE] Map fastq/fasta/bam file(s) -A FILE Convert qualities; strip adapters Mapping/output options -g PREFIX Use genome index file PREFIX.stidx -h PREFIX Use hash file PREFIX.sthash -o FILE Write mapping output to FILE [stdout] --readgroup=id:id,tag:value,... Set read-group tags (ID,SM,LB,DS,PU,PI,CN,DT,PL) (SAM format) --solexa, --solexaold, --sanger Solexa read qualities (@-based); pre-v1.3 Solexa; and Sanger (!-based, default) --substitutionrate=f Set substitution rate for mapping and simulation [0.001] --gapopen=n Gap open penalty (phred score) [40] --gapextend=n Gap extension penalty (phred score) [3] --bwaoptions=opts Options and <prefix> for BWA pre-mapper (quote multiple options) --bwamaxmismatch=n Max number of mismatches for BWA maps; - 1=auto [-1] --bwatmpdir=s Set directory for BWA temporary files --bwa=f Set BWA executable [default: bwa] --bwamark Include/mark BWA-mapped reads with XP:Z:BWA tag (produces more output lines) General options --help Full help -v N Set verbosity level (0-3) [2] If not, there is a problem in the installation. Note that stampy use python. See the file "HowToInstallOnIceberg_Python2.x_stampy.txt" on the cluster to install it if needed.

3 For greater simplicity, create a symbolic link of this executable in your bin folders (then you won't have to type again the "/usr/local/extras/genomics/applications/stampy/1.0.22/" before "stampy.py" each time afterwards): mkdir ~/bin # ~/bin is a special folder already included in your "PATH" even if it doesn't exists! ln -s /usr/local/extras/genomics/applications/stampy/1.0.22/stampy.py ~/bin # note the difference of behaviour when creating a symbolic link toward a folder or a file (see above) ll ~/bin try to run stampy again stampy.py To get more extensive help on stampy, just type: stampy.py --help I.4) prepare inputs & folders ============================= cd ~/data # go back in our data directory if needed mkdir -p ArrayJobs_Stampy/RefGenom mkdir -p ArrayJobs_Stampy/mapping_results mkdir -p ArrayJobs_Stampy/log_files cd ArrayJobs_Stampy cp -v /usr/local/extras/genomics/applications/stampy/1.0.22/refgenom/acypi_assembly2_rehead-noblank.stidx./refgenom # copy stampy's genome index in your folder cp -v /usr/local/extras/genomics/applications/stampy/1.0.22/refgenom/acypi_assembly2_rehead-noblank.sthash./refgenom # stampy's genome hash table II) run complementary mapping jobs in parallel ============================================== II.1) prepare the bash script to runs the jobs using SGE ======================================================== The first step is to prepare a bash script to run our jobs via SGE scheduler. The script allows declaring the amount of resources needed, the estimated run time of the job (important to go on the priority queue...). For our training, this file already exists: see RunArrayOfstampyJobs.sh II.2) prepare a text file with the command line options for stampy ================================================================== One great advantage of the command line tools is that many different specific command lines can be prepared in order to deal with different sample requirements. In our case, these differences are: - sample names (fastq files) - library names - fraction of the file to process - fastq format (VERY IMPORTANT to run stampy properly) The best thing to do before running our analysis is thus to record all these differences in a file.

4 For our training, this file already exists: "01_stampyCommandOptions.txt". II.3) write a script that will prepare specific command lines ============================================================= We have then to write a script that will read and interpret the command line option file in order to prepare specific command lines of reach sample/job in the array. To do so, the choice of the scritpting language mainly depends on own our preferences (perl, python, R...) even though some are probably more efficient than others. For our training, I wrote such a file in 'R': "02_Runstampy.R". II.4) run the bash script ========================= To do so, just type in the shell: qsub RunArrayOfStampyJobs.sh # run the bash scripts qstat grep bo4cm # check that the jobs are well on the queue qstat -u MyLogin When the jobs start, they create a SGE logfile with a name similar to "RunArrayOfstampyJobs.sh.o " where the first and second numbers are the job and task IDs respectively. In that log, you will find all the information that you may have script for in the bash script. With my current implementation, the time spent to run the job is indicated at the end. This is pretty useful to schedule your future job on the cluster. You'll also find the R and stampy's log files in the folder "~/ArrayJobs_Stampy/log_files" and the results files in "~/ArrayJobs_Stampy/mapping_results". III) Running java applications on the cluster - merge sam files using the Picard tools ================================================================================== The problem now is that using different cores to map the reads on the reference genome, we have created several sam files for the same sample, so we want to merge them for all the subsequent analyses. To do so, we can use the jar function "MergeSamFiles.jar" of the picard tools ( The great advantage of java programs is that their source code is totally inter-compatible among operating systems (i.e. once the code has been written for one OS (e.g. Windows), you don't need to modify it for other OS (e.g. Linux)). III.1) download the picard tools ================================ A very efficient way to transfer program to iceberg is to directly dowload them from the net using the command "wget": mkdir ~/Applications cd ~/Applications wget

5 Then have to unzip them: unzip picard-tools zip The picard tools are now ready to use. III.2) running the "MergeSamFiles.jar" funtion ============================================== cd $OLDPWD # allow to come back to the last directory we were in before the current one (see others environment variables by typing "set less") The corresponding command line is (WARNING: think to change the name of the sam files, i.e. the date is not the same): java -jar ~/Applications/picard-tools-1.103/MergeSamFiles.jar MAX_RECORDS_IN_RAM= CREATE_INDEX=true VALIDATION_STRINGENCY=LENIENT I=~/data/ArrayJobs_Stampy/mapping_results/Lathyrus_N152_RawMapping-1_ sam I=~/data/ArrayJobs_Stampy/mapping_results/Lathyrus_N152_RawMapping-2_ sam O=~/data/ArrayJobs_Stampy/mapping_results/Lathyrus_N152_RawMapping-all_ sam However, when we try we obtain something like that: Error occurred during initialization of VM Could not reserve enough space for object heap Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. Indeed, you have to load a special module of iceberg to run java. First you can obtain the list of special modules available on iceberg by typing: module avail In our case, we are intereste by "apps/java/1.7", so jus type: module load apps/java/1.7 then, retry to run the picard function: java -jar ~/Applications/picard-tools-1.103/MergeSamFiles.jar MAX_RECORDS_IN_RAM= CREATE_INDEX=true VALIDATION_STRINGENCY=LENIENT I=~/data/ArrayJobs_Stampy/mapping_results/Lathyrus_N152_RawMapping-1_ sam I=~/data/ArrayJobs_Stampy/mapping_results/Lathyrus_N152_RawMapping-2_ sam O=~/data/ArrayJobs_Stampy/mapping_results/Lathyrus_N152_RawMapping-all_ sam results: [Wed Nov 27 11:40:54 GMT 2013] net.sf.picard.sam.mergesamfiles INPUT=[/home/bo4cm17/data/ArrayJobs_Stampy/mapping_results/Lathyrus_N152_RawMapping-1_ sam, /home/bo4cm17/data/arrayjobs_stampy/mapping_results/lathyrus_n152_rawmapping-2_ sam] OUTPUT=/home/bo4cm17/data/ArrayJobs_Stampy/mapping_results/Lathyrus_N152_RawMapping-all_ sam VALIDATION_STRINGENCY=LENIENT MAX_RECORDS_IN_RAM= CREATE_INDEX=true SORT_ORDER=coordinate ASSUME_SORTED=false MERGE_SEQUENCE_DICTIONARIES=false USE_THREADING=false VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 CREATE_MD5_FILE=false [Wed Nov 27 11:40:54 GMT 2013] Executing as bo4cm17@amd-node02 on Linux el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.7.0_09-icedtea-root_2013_03_07_09_45-b00; Picard version: 1.103(1598) INFO :40:55 MergeSamFiles Sorting input files using temp directory [/tmp/bo4cm17] INFO :40:55 MergeSamFiles Finished reading inputs. [Wed Nov 27 11:40:55 GMT 2013] net.sf.picard.sam.mergesamfiles done. Elapsed time: 0.03 minutes. Runtime.totalMemory()= We can check that the number of reads in the resulting files is well the sum of the two previous files:

6 cat ~/data/arrayjobs_stampy/mapping_results/lathyrus_n152_rawmapping- 1_ sam grep RG:Z:Lathyrus_N152 wc -l # count the number of reads (each line corresponding to a read has the field "RG:Z:Lathyrus_N152" cat ~/data/arrayjobs_stampy/mapping_results/lathyrus_n152_rawmapping- 2_ sam grep RG:Z:Lathyrus_N152 wc -l cat ~/data/arrayjobs_stampy/mapping_results/lathyrus_n152_rawmappingall_ sam grep RG:Z:Lathyrus_N152 wc -l IMPORTANT NOTE: you may sometimes need to change the default parameters of java concerning the memory allocated to the java virtaul machine in order to run it properly, e.g.: java -Xms512m -Xmx7g ~/Applications/picard-tools-1.103/MergeSamFiles.jar MAX_RECORDS_IN_RAM= CREATE_INDEX=true VALIDATION_STRINGENCY=LENIENT I=~/data/ArrayJobs_Stampy/mapping_results/Lathyrus_N152_RawMapping-1_ sam I=~/data/ArrayJobs_Stampy/mapping_results/Lathyrus_N152_RawMapping-2_ sam O=~/data/ArrayJobs_Stampy/mapping_results/Lathyrus_N152_RawMapping-all_ sam - where Xms512m specifies that the minimum memory allocated to the java virtual machine will be 512 MB - where Xmx7g specifies that the maximum memory allocated to the java virtual machine will be 7GB In this case take care that Xmx does not exceed the memory you asked for the job in the SGE bash script, actually you even need to spare some RAM for others applications (well it's a guess of mine here), i.e.: If you ask 6GB for your the jobs, let's Xmx not exceed 5GB. Other parameters of the picard tools influencing the the memory usage may be important to finely set up for big data sets (see Picard manual for further information).

Introduction to NGS analysis on a Raspberry Pi. Beta version 1.1 (04 June 2013)

Introduction to NGS analysis on a Raspberry Pi. Beta version 1.1 (04 June 2013) Introduction to NGS analysis on a Raspberry Pi Beta version 1.1 (04 June 2013)!! Contents Overview Contents... 3! Overview... 4! Download some simulated reads... 5! Quality Control... 7! Map reads using

More information

Preparation of alignments for variant calling with GATK: exercise instructions for BioHPC Lab computers

Preparation of alignments for variant calling with GATK: exercise instructions for BioHPC Lab computers Preparation of alignments for variant calling with GATK: exercise instructions for BioHPC Lab computers Data used in the exercise We will use D. melanogaster WGS paired-end Illumina data with NCBI accessions

More information

CSE 303 Lecture 2. Introduction to bash shell. read Linux Pocket Guide pp , 58-59, 60, 65-70, 71-72, 77-80

CSE 303 Lecture 2. Introduction to bash shell. read Linux Pocket Guide pp , 58-59, 60, 65-70, 71-72, 77-80 CSE 303 Lecture 2 Introduction to bash shell read Linux Pocket Guide pp. 37-46, 58-59, 60, 65-70, 71-72, 77-80 slides created by Marty Stepp http://www.cs.washington.edu/303/ 1 Unix file system structure

More information

CSE 390a Lecture 2. Exploring Shell Commands, Streams, Redirection, and Processes

CSE 390a Lecture 2. Exploring Shell Commands, Streams, Redirection, and Processes CSE 390a Lecture 2 Exploring Shell Commands, Streams, Redirection, and Processes slides created by Marty Stepp, modified by Jessica Miller & Ruth Anderson http://www.cs.washington.edu/390a/ 1 2 Lecture

More information

CSE 390a Lecture 2. Exploring Shell Commands, Streams, and Redirection

CSE 390a Lecture 2. Exploring Shell Commands, Streams, and Redirection 1 CSE 390a Lecture 2 Exploring Shell Commands, Streams, and Redirection slides created by Marty Stepp, modified by Jessica Miller & Ruth Anderson http://www.cs.washington.edu/390a/ 2 Lecture summary Unix

More information

Batch system usage arm euthen F azo he Z J. B T

Batch system usage arm euthen F azo he Z J. B T Batch system usage 10.11.2010 General stuff Computing wikipage: http://dvinfo.ifh.de Central email address for questions & requests: uco-zn@desy.de Data storage: AFS ( /afs/ifh.de/group/amanda/scratch/

More information

Using ISMLL Cluster. Tutorial Lec 5. Mohsan Jameel, Information Systems and Machine Learning Lab, University of Hildesheim

Using ISMLL Cluster. Tutorial Lec 5. Mohsan Jameel, Information Systems and Machine Learning Lab, University of Hildesheim Using ISMLL Cluster Tutorial Lec 5 1 Agenda Hardware Useful command Submitting job 2 Computing Cluster http://www.admin-magazine.com/hpc/articles/building-an-hpc-cluster Any problem or query regarding

More information

Whole genome assembly comparison of duplication originally described in Bailey et al

Whole genome assembly comparison of duplication originally described in Bailey et al WGAC Whole genome assembly comparison of duplication originally described in Bailey et al. 2001. Inputs species name path to FASTA sequence(s) to be processed either a directory of chromosomal FASTA files

More information

Contents. Note: pay attention to where you are. Note: Plaintext version. Note: pay attention to where you are... 1 Note: Plaintext version...

Contents. Note: pay attention to where you are. Note: Plaintext version. Note: pay attention to where you are... 1 Note: Plaintext version... Contents Note: pay attention to where you are........................................... 1 Note: Plaintext version................................................... 1 Hello World of the Bash shell 2 Accessing

More information

An Introduction to Cluster Computing Using Newton

An Introduction to Cluster Computing Using Newton An Introduction to Cluster Computing Using Newton Jason Harris and Dylan Storey March 25th, 2014 Jason Harris and Dylan Storey Introduction to Cluster Computing March 25th, 2014 1 / 26 Workshop design.

More information

Programming introduction part I:

Programming introduction part I: Programming introduction part I: Perl, Unix/Linux and using the BlueHive cluster Bio472- Spring 2014 Amanda Larracuente Text editor Syntax coloring Recognize several languages Line numbers Free! Mac/Windows

More information

Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny.

Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny. Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny stefano.gaiarsa@unimi.it Linux and the command line PART 1 Survival kit for the bash environment Purpose of the

More information

Calling variants in diploid or multiploid genomes

Calling variants in diploid or multiploid genomes Calling variants in diploid or multiploid genomes Diploid genomes The initial steps in calling variants for diploid or multi-ploid organisms with NGS data are the same as what we've already seen: 1. 2.

More information

Linux II and III. Douglas Scofield. Crea-ng directories and files 18/01/14. Evolu5onary Biology Centre, Uppsala University

Linux II and III. Douglas Scofield. Crea-ng directories and files 18/01/14. Evolu5onary Biology Centre, Uppsala University Linux II and III Douglas Scofield Evolu5onary Biology Centre, Uppsala University douglas.scofield@ebc.uu.se slides at Crea-ng directories and files mkdir 1 Crea-ng directories and files touch if file does

More information

A Hands-On Tutorial: RNA Sequencing Using High-Performance Computing

A Hands-On Tutorial: RNA Sequencing Using High-Performance Computing A Hands-On Tutorial: RNA Sequencing Using Computing February 11th and 12th, 2016 1st session (Thursday) Preliminaries: Linux, HPC, command line interface Using HPC: modules, queuing system Presented by:

More information

Working With Unix. Scott A. Handley* September 15, *Adapted from UNIX introduction material created by Dr. Julian Catchen

Working With Unix. Scott A. Handley* September 15, *Adapted from UNIX introduction material created by Dr. Julian Catchen Working With Unix Scott A. Handley* September 15, 2014 *Adapted from UNIX introduction material created by Dr. Julian Catchen What is UNIX? An operating system (OS) Designed to be multiuser and multitasking

More information

Introduction to UNIX command-line II

Introduction to UNIX command-line II Introduction to UNIX command-line II Boyce Thompson Institute 2017 Prashant Hosmani Class Content Terminal file system navigation Wildcards, shortcuts and special characters File permissions Compression

More information

DePaul University CSC555 -Mining Big Data. Course Project by Bill Qualls Dr. Alexander Rasin, Instructor November 2013

DePaul University CSC555 -Mining Big Data. Course Project by Bill Qualls Dr. Alexander Rasin, Instructor November 2013 DePaul University CSC555 -Mining Big Data Course Project by Bill Qualls Dr. Alexander Rasin, Instructor November 2013 1 Outline Objectives About the Data Loading the Data to HDFS The Map Reduce Program

More information

Cloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK

Cloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK Cloud Computing and Unix: An Introduction Dr. Sophie Shaw University of Aberdeen, UK s.shaw@abdn.ac.uk Aberdeen London Exeter What We re Going To Do Why Unix? Cloud Computing Connecting to AWS Introduction

More information

Cloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK

Cloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK Cloud Computing and Unix: An Introduction Dr. Sophie Shaw University of Aberdeen, UK s.shaw@abdn.ac.uk Aberdeen London Exeter What We re Going To Do Why Unix? Cloud Computing Connecting to AWS Introduction

More information

Applying Cortex to Phase Genomes data - the recipe. Zamin Iqbal

Applying Cortex to Phase Genomes data - the recipe. Zamin Iqbal Applying Cortex to Phase 3 1000Genomes data - the recipe Zamin Iqbal (zam@well.ox.ac.uk) 21 June 2013 - version 1 Contents 1 Overview 1 2 People 1 3 What has changed since version 0 of this document? 1

More information

Anthill User Group Meeting, 2015

Anthill User Group Meeting, 2015 Agenda Anthill User Group Meeting, 2015 1. Introduction to the machines and the networks 2. Accessing the machines 3. Command line introduction 4. Setting up your environment to see the queues 5. The different

More information

Using Linux as a Virtual Machine

Using Linux as a Virtual Machine Intro to UNIX Using Linux as a Virtual Machine We will use the VMware Player to run a Virtual Machine which is a way of having more than one Operating System (OS) running at once. Your Virtual OS (Linux)

More information

This tutorial will guide you how to setup and run your own minecraft server on a Linux CentOS 6 in no time.

This tutorial will guide you how to setup and run your own minecraft server on a Linux CentOS 6 in no time. This tutorial will guide you how to setup and run your own minecraft server on a Linux CentOS 6 in no time. Running your own server lets you play together with your friends and family with your own set

More information

CSC BioWeek 2018: Using Taito cluster for high throughput data analysis

CSC BioWeek 2018: Using Taito cluster for high throughput data analysis CSC BioWeek 2018: Using Taito cluster for high throughput data analysis 7. 2. 2018 Running Jobs in CSC Servers Exercise 1: Running a simple batch job in Taito We will run a small alignment using BWA: https://research.csc.fi/-/bwa

More information

Exploring UNIX: Session 3

Exploring UNIX: Session 3 Exploring UNIX: Session 3 UNIX file system permissions UNIX is a multi user operating system. This means several users can be logged in simultaneously. For obvious reasons UNIX makes sure users cannot

More information

Grid Engine Users Guide. 5.5 Edition

Grid Engine Users Guide. 5.5 Edition Grid Engine Users Guide 5.5 Edition Grid Engine Users Guide : 5.5 Edition Published May 08 2012 Copyright 2012 University of California and Scalable Systems This document is subject to the Rocks License

More information

Unix basics exercise MBV-INFX410

Unix basics exercise MBV-INFX410 Unix basics exercise MBV-INFX410 In order to start this exercise, you need to be logged in on a UNIX computer with a terminal window open on your computer. It is best if you are logged in on freebee.abel.uio.no.

More information

Scripting Languages Course 1. Diana Trandabăț

Scripting Languages Course 1. Diana Trandabăț Scripting Languages Course 1 Diana Trandabăț Master in Computational Linguistics - 1 st year 2017-2018 Today s lecture Introduction to scripting languages What is a script? What is a scripting language

More information

NBIC TechTrack PBS Tutorial. by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen

NBIC TechTrack PBS Tutorial. by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen NBIC TechTrack PBS Tutorial by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen 1 NBIC PBS Tutorial This part is an introduction to clusters and the PBS

More information

An Introduction to Linux and Bowtie

An Introduction to Linux and Bowtie An Introduction to Linux and Bowtie Cavan Reilly November 10, 2017 Table of contents Introduction to UNIX-like operating systems Installing programs Bowtie SAMtools Introduction to Linux In order to use

More information

Download the current release* of VirtualBox for the OS on which you will install VirtualBox. In these notes, that's Windows 7.

Download the current release* of VirtualBox for the OS on which you will install VirtualBox. In these notes, that's Windows 7. Get VirtualBox Go to www.virtualbox.org and select Downloads. VirtualBox/CentOS Setup 1 Download the current release* of VirtualBox for the OS on which you will install VirtualBox. In these notes, that's

More information

Perl and R Scripting for Biologists

Perl and R Scripting for Biologists Perl and R Scripting for Biologists Lukas Mueller PLBR 4092 Course overview Linux basics (today) Linux advanced (Aure, next week) Why Linux? Free open source operating system based on UNIX specifications

More information

IBM z Systems Development and Test Environment Tools User's Guide IBM

IBM z Systems Development and Test Environment Tools User's Guide IBM IBM z Systems Development and Test Environment Tools User's Guide IBM ii IBM z Systems Development and Test Environment Tools User's Guide Contents Chapter 1. Overview......... 1 Introduction..............

More information

ITCS 4145/5145 Assignment 2

ITCS 4145/5145 Assignment 2 ITCS 4145/5145 Assignment 2 Compiling and running MPI programs Author: B. Wilkinson and Clayton S. Ferner. Modification date: September 10, 2012 In this assignment, the workpool computations done in Assignment

More information

Bash for SLURM. Author: Wesley Schaal Pharmaceutical Bioinformatics, Uppsala University

Bash for SLURM. Author: Wesley Schaal Pharmaceutical Bioinformatics, Uppsala University Bash for SLURM Author: Wesley Schaal Pharmaceutical Bioinformatics, Uppsala University wesley.schaal@farmbio.uu.se Lab session: Pavlin Mitev (pavlin.mitev@kemi.uu.se) it i slides at http://uppmax.uu.se/support/courses

More information

1. What statistic did the wc -l command show? (do man wc to get the answer) A. The number of bytes B. The number of lines C. The number of words

1. What statistic did the wc -l command show? (do man wc to get the answer) A. The number of bytes B. The number of lines C. The number of words More Linux Commands 1 wc The Linux command for acquiring size statistics on a file is wc. This command provides the line count, word count and number of bytes in a file. Open up a terminal, make sure you

More information

PROJECT INFRASTRUCTURE AND BASH INTRODUCTION MARKUS PILMAN<

PROJECT INFRASTRUCTURE AND BASH INTRODUCTION MARKUS PILMAN< PROJECT INFRASTRUCTURE AND BASH INTRODUCTION MARKUS PILMAN< MPILMAN@INF.ETHZ.CH> ORGANIZATION Tutorials on Tuesdays - Sometimes, will be announced In General: no exercise sessions (unless you get an email

More information

Molecular Index Error correction

Molecular Index Error correction Molecular Index Error correction Overview: This section provides directions for generating SSCS (Single Strand Consensus Sequence) reads and trimming molecular indexes from raw fastq files. Learning Objectives:

More information

SGE Roll: Users Guide. Version 5.3 Edition

SGE Roll: Users Guide. Version 5.3 Edition SGE Roll: Users Guide Version 5.3 Edition SGE Roll: Users Guide : Version 5.3 Edition Published Dec 2009 Copyright 2009 University of California and Scalable Systems This document is subject to the Rocks

More information

Genome Assembly. 2 Sept. Groups. Wiki. Job files Read cleaning Other cleaning Genome Assembly

Genome Assembly. 2 Sept. Groups. Wiki. Job files Read cleaning Other cleaning Genome Assembly 2 Sept Groups Group 5 was down to 3 people so I merged it into the other groups Group 1 is now 6 people anyone want to change? The initial drafter is not the official leader use any management structure

More information

New User Tutorial. OSU High Performance Computing Center

New User Tutorial. OSU High Performance Computing Center New User Tutorial OSU High Performance Computing Center TABLE OF CONTENTS Logging In... 3-5 Windows... 3-4 Linux... 4 Mac... 4-5 Changing Password... 5 Using Linux Commands... 6 File Systems... 7 File

More information

Linux Essentials Objectives Topics:

Linux Essentials Objectives Topics: Linux Essentials Linux Essentials is a professional development certificate program that covers basic knowledge for those working and studying Open Source and various distributions of Linux. Exam Objectives

More information

Shells. A shell is a command line interpreter that is the interface between the user and the OS. The shell:

Shells. A shell is a command line interpreter that is the interface between the user and the OS. The shell: Shells A shell is a command line interpreter that is the interface between the user and the OS. The shell: analyzes each command determines what actions are to be performed performs the actions Example:

More information

Ibis RMI User s Guide

Ibis RMI User s Guide Ibis RMI User s Guide http://www.cs.vu.nl/ibis November 16, 2009 1 Introduction Java applications typically consist of one or more threads that manipulate a collection of objects by invoking methods on

More information

Linux II and III. Douglas Scofield. Crea-ng directories and files 15/08/16. Evolu6onary Biology Centre, Uppsala University

Linux II and III. Douglas Scofield. Crea-ng directories and files 15/08/16. Evolu6onary Biology Centre, Uppsala University Linux II and III Douglas Scofield Evolu6onary Biology Centre, Uppsala University douglas.scofield@ebc.uu.se Crea-ng directories and files mkdir 1 Crea-ng directories and files touch if file does not exist,

More information

Unix Essentials. BaRC Hot Topics Bioinformatics and Research Computing Whitehead Institute October 12 th

Unix Essentials. BaRC Hot Topics Bioinformatics and Research Computing Whitehead Institute October 12 th Unix Essentials BaRC Hot Topics Bioinformatics and Research Computing Whitehead Institute October 12 th 2016 http://barc.wi.mit.edu/hot_topics/ 1 Outline Unix overview Logging in to tak Directory structure

More information

Do not start the test until instructed to do so!

Do not start the test until instructed to do so! Instructions: Print your name in the space provided below. This examination is closed book and closed notes, aside from the permitted one-page formula sheet. No calculators or other electronic devices

More information

EE516: Embedded Software Project 1. Setting Up Environment for Projects

EE516: Embedded Software Project 1. Setting Up Environment for Projects EE516: Embedded Software Project 1. Setting Up Environment for Projects By Dong Jae Shin 2015. 09. 01. Contents Introduction to Projects of EE516 Tasks Setting Up Environment Virtual Machine Environment

More information

High Performance Computing Cluster Basic course

High Performance Computing Cluster Basic course High Performance Computing Cluster Basic course Jeremie Vandenplas, Gwen Dawes 30 October 2017 Outline Introduction to the Agrogenomics HPC Connecting with Secure Shell to the HPC Introduction to the Unix/Linux

More information

Practical Linux examples: Exercises

Practical Linux examples: Exercises Practical Linux examples: Exercises 1. Login (ssh) to the machine that you are assigned for this workshop (assigned machines: https://cbsu.tc.cornell.edu/ww/machines.aspx?i=87 ). Prepare working directory,

More information

1. Download the data from ENA and QC it:

1. Download the data from ENA and QC it: GenePool-External : Genome Assembly tutorial for NGS workshop 20121016 This page last changed on Oct 11, 2012 by tcezard. This is a whole genome sequencing of a E. coli from the 2011 German outbreak You

More information

HORIZONTAL GENE TRANSFER DETECTION

HORIZONTAL GENE TRANSFER DETECTION HORIZONTAL GENE TRANSFER DETECTION Sequenzanalyse und Genomik (Modul 10-202-2207) Alejandro Nabor Lozada-Chávez Before start, the user must create a new folder or directory (WORKING DIRECTORY) for all

More information

Manual Script Windows Batch Rename File With Date And Time

Manual Script Windows Batch Rename File With Date And Time Manual Script Windows Batch Rename File With Date And Time Rename a file (or folder) by appending the current date and time to the the file in the format "Test File-2014-12-30@16-55-01.txt" Echo: Echo

More information

Introduction to Linux Basics Part II. Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala

Introduction to Linux Basics Part II. Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala Introduction to Linux Basics Part II 1 Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala pakala@uga.edu 2 Variables in Shell HOW DOES LINUX WORK? Shell Arithmetic I/O and

More information

CSC BioWeek 2016: Using Taito cluster for high throughput data analysis

CSC BioWeek 2016: Using Taito cluster for high throughput data analysis CSC BioWeek 2016: Using Taito cluster for high throughput data analysis 4. 2. 2016 Running Jobs in CSC Servers A note on typography: Some command lines are too long to fit a line in printed form. These

More information

CSC209. Software Tools and Systems Programming. https://mcs.utm.utoronto.ca/~209

CSC209. Software Tools and Systems Programming. https://mcs.utm.utoronto.ca/~209 CSC209 Software Tools and Systems Programming https://mcs.utm.utoronto.ca/~209 What is this Course About? Software Tools Using them Building them Systems Programming Quirks of C The file system System

More information

Variant calling using SAMtools

Variant calling using SAMtools Variant calling using SAMtools Calling variants - a trivial use of an Interactive Session We are going to conduct the variant calling exercises in an interactive idev session just so you can get a feel

More information

Advanced Linux Commands & Shell Scripting

Advanced Linux Commands & Shell Scripting Advanced Linux Commands & Shell Scripting Advanced Genomics & Bioinformatics Workshop James Oguya Nairobi, Kenya August, 2016 Man pages Most Linux commands are shipped with their reference manuals To view

More information

MinHash Alignment Process (MHAP) Documentation

MinHash Alignment Process (MHAP) Documentation MinHash Alignment Process (MHAP) Documentation Release 2.1 Sergey Koren and Konstantin Berlin December 24, 2016 Contents 1 Overview 1 1.1 Installation................................................ 1

More information

Introduction To Linux. Rob Thomas - ACRC

Introduction To Linux. Rob Thomas - ACRC Introduction To Linux Rob Thomas - ACRC What Is Linux A free Operating System based on UNIX (TM) An operating system originating at Bell Labs. circa 1969 in the USA More of this later... Why Linux? Free

More information

Getting Started with Hadoop

Getting Started with Hadoop Getting Started with Hadoop May 28, 2018 Michael Völske, Shahbaz Syed Web Technology & Information Systems Bauhaus-Universität Weimar 1 webis 2018 What is Hadoop Started in 2004 by Yahoo Open-Source implementation

More information

ChIP-seq Analysis Practical

ChIP-seq Analysis Practical ChIP-seq Analysis Practical Vladimir Teif (vteif@essex.ac.uk) An updated version of this document will be available at http://generegulation.info/index.php/teaching In this practical we will learn how

More information

Introduction to Linux for BlueBEAR. January

Introduction to Linux for BlueBEAR. January Introduction to Linux for BlueBEAR January 2019 http://intranet.birmingham.ac.uk/bear Overview Understanding of the BlueBEAR workflow Logging in to BlueBEAR Introduction to basic Linux commands Basic file

More information

Introduction to UNIX command-line

Introduction to UNIX command-line Introduction to UNIX command-line Boyce Thompson Institute March 17, 2015 Lukas Mueller & Noe Fernandez Class Content Terminal file system navigation Wildcards, shortcuts and special characters File permissions

More information

CMSC 201 Fall 2016 Lab 09 Advanced Debugging

CMSC 201 Fall 2016 Lab 09 Advanced Debugging CMSC 201 Fall 2016 Lab 09 Advanced Debugging Assignment: Lab 09 Advanced Debugging Due Date: During discussion Value: 10 points Part 1: Introduction to Errors Throughout this semester, we have been working

More information

Snakemake overview. Thomas Cokelaer. Nov 9th 2017 Snakemake and Sequana overview. Institut Pasteur

Snakemake overview. Thomas Cokelaer. Nov 9th 2017 Snakemake and Sequana overview. Institut Pasteur Snakemake overview Thomas Cokelaer Institut Pasteur Nov 9th 2017 Snakemake and Sequana overview Many bioinformatic pipeline frameworks available A review of bioinformatic pipeline frameworks. Jeremy Leipzig

More information

Linux Command Line Interface. December 27, 2017

Linux Command Line Interface. December 27, 2017 Linux Command Line Interface December 27, 2017 Foreword It is supposed to be a refresher (?!) If you are familiar with UNIX/Linux/MacOS X CLI, this is going to be boring... I will not talk about editors

More information

Running LAMMPS on CC servers at IITM

Running LAMMPS on CC servers at IITM Running LAMMPS on CC servers at IITM Srihari Sundar September 9, 2016 This tutorial assumes prior knowledge about LAMMPS [2, 1] and deals with running LAMMPS scripts on the compute servers at the computer

More information

HOD User Guide. Table of contents

HOD User Guide. Table of contents Table of contents 1 Introduction...3 2 Getting Started Using HOD... 3 2.1 A typical HOD session... 3 2.2 Running hadoop scripts using HOD...5 3 HOD Features... 6 3.1 Provisioning and Managing Hadoop Clusters...6

More information

GUT. GUT Installation Guide

GUT. GUT Installation Guide Date : 17 Mar 2011 1/6 GUT Contents 1 Introduction...2 2 Installing GUT...2 2.1 Optional Extensions...2 2.2 Installation using the Binary package...2 2.2.1 Linux or Mac OS X...2 2.2.2 Windows...4 2.3 Installing

More information

ChIP-seq practical: peak detection and peak annotation. Mali Salmon-Divon Remco Loos Myrto Kostadima

ChIP-seq practical: peak detection and peak annotation. Mali Salmon-Divon Remco Loos Myrto Kostadima ChIP-seq practical: peak detection and peak annotation Mali Salmon-Divon Remco Loos Myrto Kostadima March 2012 Introduction The goal of this hands-on session is to perform some basic tasks in the analysis

More information

Working with Basic Linux. Daniel Balagué

Working with Basic Linux. Daniel Balagué Working with Basic Linux Daniel Balagué How Linux Works? Everything in Linux is either a file or a process. A process is an executing program identified with a PID number. It runs in short or long duration

More information

Introduction to Unix: Fundamental Commands

Introduction to Unix: Fundamental Commands Introduction to Unix: Fundamental Commands Ricky Patterson UVA Library Based on slides from Turgut Yilmaz Istanbul Teknik University 1 What We Will Learn The fundamental commands of the Unix operating

More information

A shell can be used in one of two ways:

A shell can be used in one of two ways: Shell Scripting 1 A shell can be used in one of two ways: A command interpreter, used interactively A programming language, to write shell scripts (your own custom commands) 2 If we have a set of commands

More information

Grid Engine Users Guide. 7.0 Edition

Grid Engine Users Guide. 7.0 Edition Grid Engine Users Guide 7.0 Edition Grid Engine Users Guide : 7.0 Edition Published Dec 01 2017 Copyright 2017 University of California and Scalable Systems This document is subject to the Rocks License

More information

Chapter-3. Introduction to Unix: Fundamental Commands

Chapter-3. Introduction to Unix: Fundamental Commands Chapter-3 Introduction to Unix: Fundamental Commands What You Will Learn The fundamental commands of the Unix operating system. Everything told for Unix here is applicable to the Linux operating system

More information

Video Performance Evaluation Resource. Quick Start Guide

Video Performance Evaluation Resource. Quick Start Guide Video Performance Evaluation Resource Quick Start Guide November 25, 2002 Table of Contents 1 Welcome to ViPER... 3 1.1 Welcome to the ViPER Documentation... 3 2 Setting Up ViPER... 3 2.1 Preparing for

More information

Using UNIX. -rwxr--r-- 1 root sys Sep 5 14:15 good_program

Using UNIX. -rwxr--r-- 1 root sys Sep 5 14:15 good_program Using UNIX. UNIX is mainly a command line interface. This means that you write the commands you want executed. In the beginning that will seem inferior to windows point-and-click, but in the long run the

More information

Running Java Programs

Running Java Programs Running Java Programs Written by: Keith Fenske, http://www.psc-consulting.ca/fenske/ First version: Thursday, 10 January 2008 Document revised: Saturday, 13 February 2010 Copyright 2008, 2010 by Keith

More information

Linux Operating System Environment Computadors Grau en Ciència i Enginyeria de Dades Q2

Linux Operating System Environment Computadors Grau en Ciència i Enginyeria de Dades Q2 Linux Operating System Environment Computadors Grau en Ciència i Enginyeria de Dades 2017-2018 Q2 Facultat d Informàtica de Barcelona This first lab session is focused on getting experience in working

More information

Chapter 4. Unix Tutorial. Unix Shell

Chapter 4. Unix Tutorial. Unix Shell Chapter 4 Unix Tutorial Users and applications interact with hardware through an operating system (OS). Unix is a very basic operating system in that it has just the essentials. Many operating systems,

More information

Linux Systems Administration Getting Started with Linux

Linux Systems Administration Getting Started with Linux Linux Systems Administration Getting Started with Linux Network Startup Resource Center www.nsrc.org These materials are licensed under the Creative Commons Attribution-NonCommercial 4.0 International

More information

SuperQ (Version 1.2) Manual

SuperQ (Version 1.2) Manual SuperQ (Version 1.2) Manual October 20, 2013 1 Description SuperQ is a program written in Java which computes a phylogenetic supernetwork from a collection of partial phylogenetic trees as described in

More information

Quick Guide for the Torque Cluster Manager

Quick Guide for the Torque Cluster Manager Quick Guide for the Torque Cluster Manager Introduction: One of the main purposes of the Aries Cluster is to accommodate especially long-running programs. Users who run long jobs (which take hours or days

More information

Server Monitoring. AppDynamics Pro Documentation. Version 4.1.x. Page 1

Server Monitoring. AppDynamics Pro Documentation. Version 4.1.x. Page 1 Server Monitoring AppDynamics Pro Documentation Version 4.1.x Page 1 Server Monitoring......................................................... 4 Standalone Machine Agent Requirements and Supported Environments............

More information

Introduction to the shell Part II

Introduction to the shell Part II Introduction to the shell Part II Graham Markall http://www.doc.ic.ac.uk/~grm08 grm08@doc.ic.ac.uk Civil Engineering Tech Talks 16 th November, 1pm Last week Covered applications and Windows compatibility

More information

Today. Review. Unix as an OS case study Intro to Shell Scripting. What is an Operating System? What are its goals? How do we evaluate it?

Today. Review. Unix as an OS case study Intro to Shell Scripting. What is an Operating System? What are its goals? How do we evaluate it? Today Unix as an OS case study Intro to Shell Scripting Make sure the computer is in Linux If not, restart, holding down ALT key Login! Posted slides contain material not explicitly covered in class 1

More information

SGE Roll: Users Guide. Version Edition

SGE Roll: Users Guide. Version Edition SGE Roll: Users Guide Version 4.2.1 Edition SGE Roll: Users Guide : Version 4.2.1 Edition Published Sep 2006 Copyright 2006 University of California and Scalable Systems This document is subject to the

More information

ls /data/atrnaseq/ egrep "(fastq fasta fq fa)\.gz" ls /data/atrnaseq/ egrep "(cn ts)[1-3]ln[^3a-za-z]\."

ls /data/atrnaseq/ egrep (fastq fasta fq fa)\.gz ls /data/atrnaseq/ egrep (cn ts)[1-3]ln[^3a-za-z]\. Command line tools - bash, awk and sed We can only explore a small fraction of the capabilities of the bash shell and command-line utilities in Linux during this course. An entire course could be taught

More information

CS 460 Linux Tutorial

CS 460 Linux Tutorial CS 460 Linux Tutorial http://ryanstutorials.net/linuxtutorial/cheatsheet.php # Change directory to your home directory. # Remember, ~ means your home directory cd ~ # Check to see your current working

More information

Genomic Files. University of Massachusetts Medical School. October, 2015

Genomic Files. University of Massachusetts Medical School. October, 2015 .. Genomic Files University of Massachusetts Medical School October, 2015 2 / 55. A Typical Deep-Sequencing Workflow Samples Fastq Files Fastq Files Sam / Bam Files Various files Deep Sequencing Further

More information

High Performance Computing (HPC) Club Training Session. Xinsheng (Shawn) Qin

High Performance Computing (HPC) Club Training Session. Xinsheng (Shawn) Qin High Performance Computing (HPC) Club Training Session Xinsheng (Shawn) Qin Outline HPC Club The Hyak Supercomputer Logging in to Hyak Basic Linux Commands Transferring Files Between Your PC and Hyak Submitting

More information

Automatic Dependency Management for Scientific Applications on Clusters. Ben Tovar*, Nicholas Hazekamp, Nathaniel Kremer-Herman, Douglas Thain

Automatic Dependency Management for Scientific Applications on Clusters. Ben Tovar*, Nicholas Hazekamp, Nathaniel Kremer-Herman, Douglas Thain Automatic Dependency Management for Scientific Applications on Clusters Ben Tovar*, Nicholas Hazekamp, Nathaniel Kremer-Herman, Douglas Thain Where users are Scientist says: "This demo task runs on my

More information

simplevisor Documentation

simplevisor Documentation simplevisor Documentation Release 1.2 Massimo Paladin June 27, 2016 Contents 1 Main Features 1 2 Installation 3 3 Configuration 5 4 simplevisor command 9 5 simplevisor-control command 13 6 Supervisor

More information

Working with Shell Scripting. Daniel Balagué

Working with Shell Scripting. Daniel Balagué Working with Shell Scripting Daniel Balagué Editing Text Files We offer many text editors in the HPC cluster. Command-Line Interface (CLI) editors: vi / vim nano (very intuitive and easy to use if you

More information

Workshop Practical on concatenation and model testing

Workshop Practical on concatenation and model testing Workshop Practical on concatenation and model testing Jacob L. Steenwyk & Antonis Rokas Programs that you will use: Bash, Python, Perl, Phyutility, PartitionFinder, awk To infer a putative species phylogeny

More information

Essential Skills for Bioinformatics: Unix/Linux

Essential Skills for Bioinformatics: Unix/Linux Essential Skills for Bioinformatics: Unix/Linux SHELL SCRIPTING Overview Bash, the shell we have used interactively in this course, is a full-fledged scripting language. Unlike Python, Bash is not a general-purpose

More information

(MCQZ-CS604 Operating Systems)

(MCQZ-CS604 Operating Systems) command to resume the execution of a suspended job in the foreground fg (Page 68) bg jobs kill commands in Linux is used to copy file is cp (Page 30) mv mkdir The process id returned to the child process

More information

X Grid Engine. Where X stands for Oracle Univa Open Son of more to come...?!?

X Grid Engine. Where X stands for Oracle Univa Open Son of more to come...?!? X Grid Engine Where X stands for Oracle Univa Open Son of more to come...?!? Carsten Preuss on behalf of Scientific Computing High Performance Computing Scheduler candidates LSF too expensive PBS / Torque

More information