By Ludovic Duvaux (27 November 2013)
|
|
- Jesse Skinner
- 6 years ago
- Views:
Transcription
1 Array of jobs using SGE - an example using stampy, a mapping software. Running java applications on the cluster - merge sam files using the Picard tools By Ludovic Duvaux (27 November 2013) The idea ========== One may have many inputs (samples, files...) to process. Sometimes, it is also possible that the processing of different parts of a single file can be made independently and in parallel on several cores (this possibility being not incompatible with the previous one). In both cases (or a mix on both), create an array of jobs using SGE can be of great help! For instance, most recent mapping programs allow to process only a fraction of a fastq input file (e.g. only 1/10th of your file willbe mapped on your reference genome) and one can have many samples tomap on a reference genome! For the current course, we'll do that using some pea aphid data. I) Preliminary steps ==================== I.1) Open an interactive session on the cluster =============================================== Very little software and libraries are installed on the Head-node. Therefore you may run into severe problems if you attempt to install anything 'under your account or otherwise' while working on cluster Head-node. [MyLogin@cluster ~]$ qrsh [MyLogin@node15 ~]$ I.2) create a new folder & fetch the training files =================================================== To do so: ln -s /data/mylogin/./data # create a symbolic link toward your data folder in the current folder. If the above doesn't work, go first to your data folder then come back in order to trigger your data folder on. cd /data/mylogin cd ln -s /data/mylogin./data Then: cd data
2 mkdir -p ArrayJobs_Stampy cp /usr/local/extras/genomics/hpc_course/stampy_example/script_files/* ArrayJobs_Stampy # fetch all useful files of these training ll ArrayJobs_Stampy I.3) Check if you can run stampy ===================================== Try to call stampy's manual. From anywhere, type: /usr/local/extras/genomics/applications/stampy/1.0.22/stampy.py You should obtain something like: [bo4cm17@testnode01 ~]$ /usr/local/extras/genomics/applications/stampy/1.0.22/stampy.py stampy v (r1848), <gerton.lunter@well.ox.ac.uk> Usage: /usr/local/extras/genomics/applications/stampy/1.0.22/stampy.py [options] [.fa files] Option summary (--help for all): Command options -G PREFIX file1.fa [...] Build genome index PREFIX.stidx from fasta file(s) on command line -H PREFIX Build hash PREFIX.sthash -M FILE[,FILE] Map fastq/fasta/bam file(s) -A FILE Convert qualities; strip adapters Mapping/output options -g PREFIX Use genome index file PREFIX.stidx -h PREFIX Use hash file PREFIX.sthash -o FILE Write mapping output to FILE [stdout] --readgroup=id:id,tag:value,... Set read-group tags (ID,SM,LB,DS,PU,PI,CN,DT,PL) (SAM format) --solexa, --solexaold, --sanger Solexa read qualities (@-based); pre-v1.3 Solexa; and Sanger (!-based, default) --substitutionrate=f Set substitution rate for mapping and simulation [0.001] --gapopen=n Gap open penalty (phred score) [40] --gapextend=n Gap extension penalty (phred score) [3] --bwaoptions=opts Options and <prefix> for BWA pre-mapper (quote multiple options) --bwamaxmismatch=n Max number of mismatches for BWA maps; - 1=auto [-1] --bwatmpdir=s Set directory for BWA temporary files --bwa=f Set BWA executable [default: bwa] --bwamark Include/mark BWA-mapped reads with XP:Z:BWA tag (produces more output lines) General options --help Full help -v N Set verbosity level (0-3) [2] If not, there is a problem in the installation. Note that stampy use python. See the file "HowToInstallOnIceberg_Python2.x_stampy.txt" on the cluster to install it if needed.
3 For greater simplicity, create a symbolic link of this executable in your bin folders (then you won't have to type again the "/usr/local/extras/genomics/applications/stampy/1.0.22/" before "stampy.py" each time afterwards): mkdir ~/bin # ~/bin is a special folder already included in your "PATH" even if it doesn't exists! ln -s /usr/local/extras/genomics/applications/stampy/1.0.22/stampy.py ~/bin # note the difference of behaviour when creating a symbolic link toward a folder or a file (see above) ll ~/bin try to run stampy again stampy.py To get more extensive help on stampy, just type: stampy.py --help I.4) prepare inputs & folders ============================= cd ~/data # go back in our data directory if needed mkdir -p ArrayJobs_Stampy/RefGenom mkdir -p ArrayJobs_Stampy/mapping_results mkdir -p ArrayJobs_Stampy/log_files cd ArrayJobs_Stampy cp -v /usr/local/extras/genomics/applications/stampy/1.0.22/refgenom/acypi_assembly2_rehead-noblank.stidx./refgenom # copy stampy's genome index in your folder cp -v /usr/local/extras/genomics/applications/stampy/1.0.22/refgenom/acypi_assembly2_rehead-noblank.sthash./refgenom # stampy's genome hash table II) run complementary mapping jobs in parallel ============================================== II.1) prepare the bash script to runs the jobs using SGE ======================================================== The first step is to prepare a bash script to run our jobs via SGE scheduler. The script allows declaring the amount of resources needed, the estimated run time of the job (important to go on the priority queue...). For our training, this file already exists: see RunArrayOfstampyJobs.sh II.2) prepare a text file with the command line options for stampy ================================================================== One great advantage of the command line tools is that many different specific command lines can be prepared in order to deal with different sample requirements. In our case, these differences are: - sample names (fastq files) - library names - fraction of the file to process - fastq format (VERY IMPORTANT to run stampy properly) The best thing to do before running our analysis is thus to record all these differences in a file.
4 For our training, this file already exists: "01_stampyCommandOptions.txt". II.3) write a script that will prepare specific command lines ============================================================= We have then to write a script that will read and interpret the command line option file in order to prepare specific command lines of reach sample/job in the array. To do so, the choice of the scritpting language mainly depends on own our preferences (perl, python, R...) even though some are probably more efficient than others. For our training, I wrote such a file in 'R': "02_Runstampy.R". II.4) run the bash script ========================= To do so, just type in the shell: qsub RunArrayOfStampyJobs.sh # run the bash scripts qstat grep bo4cm # check that the jobs are well on the queue qstat -u MyLogin When the jobs start, they create a SGE logfile with a name similar to "RunArrayOfstampyJobs.sh.o " where the first and second numbers are the job and task IDs respectively. In that log, you will find all the information that you may have script for in the bash script. With my current implementation, the time spent to run the job is indicated at the end. This is pretty useful to schedule your future job on the cluster. You'll also find the R and stampy's log files in the folder "~/ArrayJobs_Stampy/log_files" and the results files in "~/ArrayJobs_Stampy/mapping_results". III) Running java applications on the cluster - merge sam files using the Picard tools ================================================================================== The problem now is that using different cores to map the reads on the reference genome, we have created several sam files for the same sample, so we want to merge them for all the subsequent analyses. To do so, we can use the jar function "MergeSamFiles.jar" of the picard tools ( The great advantage of java programs is that their source code is totally inter-compatible among operating systems (i.e. once the code has been written for one OS (e.g. Windows), you don't need to modify it for other OS (e.g. Linux)). III.1) download the picard tools ================================ A very efficient way to transfer program to iceberg is to directly dowload them from the net using the command "wget": mkdir ~/Applications cd ~/Applications wget
5 Then have to unzip them: unzip picard-tools zip The picard tools are now ready to use. III.2) running the "MergeSamFiles.jar" funtion ============================================== cd $OLDPWD # allow to come back to the last directory we were in before the current one (see others environment variables by typing "set less") The corresponding command line is (WARNING: think to change the name of the sam files, i.e. the date is not the same): java -jar ~/Applications/picard-tools-1.103/MergeSamFiles.jar MAX_RECORDS_IN_RAM= CREATE_INDEX=true VALIDATION_STRINGENCY=LENIENT I=~/data/ArrayJobs_Stampy/mapping_results/Lathyrus_N152_RawMapping-1_ sam I=~/data/ArrayJobs_Stampy/mapping_results/Lathyrus_N152_RawMapping-2_ sam O=~/data/ArrayJobs_Stampy/mapping_results/Lathyrus_N152_RawMapping-all_ sam However, when we try we obtain something like that: Error occurred during initialization of VM Could not reserve enough space for object heap Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. Indeed, you have to load a special module of iceberg to run java. First you can obtain the list of special modules available on iceberg by typing: module avail In our case, we are intereste by "apps/java/1.7", so jus type: module load apps/java/1.7 then, retry to run the picard function: java -jar ~/Applications/picard-tools-1.103/MergeSamFiles.jar MAX_RECORDS_IN_RAM= CREATE_INDEX=true VALIDATION_STRINGENCY=LENIENT I=~/data/ArrayJobs_Stampy/mapping_results/Lathyrus_N152_RawMapping-1_ sam I=~/data/ArrayJobs_Stampy/mapping_results/Lathyrus_N152_RawMapping-2_ sam O=~/data/ArrayJobs_Stampy/mapping_results/Lathyrus_N152_RawMapping-all_ sam results: [Wed Nov 27 11:40:54 GMT 2013] net.sf.picard.sam.mergesamfiles INPUT=[/home/bo4cm17/data/ArrayJobs_Stampy/mapping_results/Lathyrus_N152_RawMapping-1_ sam, /home/bo4cm17/data/arrayjobs_stampy/mapping_results/lathyrus_n152_rawmapping-2_ sam] OUTPUT=/home/bo4cm17/data/ArrayJobs_Stampy/mapping_results/Lathyrus_N152_RawMapping-all_ sam VALIDATION_STRINGENCY=LENIENT MAX_RECORDS_IN_RAM= CREATE_INDEX=true SORT_ORDER=coordinate ASSUME_SORTED=false MERGE_SEQUENCE_DICTIONARIES=false USE_THREADING=false VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 CREATE_MD5_FILE=false [Wed Nov 27 11:40:54 GMT 2013] Executing as bo4cm17@amd-node02 on Linux el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.7.0_09-icedtea-root_2013_03_07_09_45-b00; Picard version: 1.103(1598) INFO :40:55 MergeSamFiles Sorting input files using temp directory [/tmp/bo4cm17] INFO :40:55 MergeSamFiles Finished reading inputs. [Wed Nov 27 11:40:55 GMT 2013] net.sf.picard.sam.mergesamfiles done. Elapsed time: 0.03 minutes. Runtime.totalMemory()= We can check that the number of reads in the resulting files is well the sum of the two previous files:
6 cat ~/data/arrayjobs_stampy/mapping_results/lathyrus_n152_rawmapping- 1_ sam grep RG:Z:Lathyrus_N152 wc -l # count the number of reads (each line corresponding to a read has the field "RG:Z:Lathyrus_N152" cat ~/data/arrayjobs_stampy/mapping_results/lathyrus_n152_rawmapping- 2_ sam grep RG:Z:Lathyrus_N152 wc -l cat ~/data/arrayjobs_stampy/mapping_results/lathyrus_n152_rawmappingall_ sam grep RG:Z:Lathyrus_N152 wc -l IMPORTANT NOTE: you may sometimes need to change the default parameters of java concerning the memory allocated to the java virtaul machine in order to run it properly, e.g.: java -Xms512m -Xmx7g ~/Applications/picard-tools-1.103/MergeSamFiles.jar MAX_RECORDS_IN_RAM= CREATE_INDEX=true VALIDATION_STRINGENCY=LENIENT I=~/data/ArrayJobs_Stampy/mapping_results/Lathyrus_N152_RawMapping-1_ sam I=~/data/ArrayJobs_Stampy/mapping_results/Lathyrus_N152_RawMapping-2_ sam O=~/data/ArrayJobs_Stampy/mapping_results/Lathyrus_N152_RawMapping-all_ sam - where Xms512m specifies that the minimum memory allocated to the java virtual machine will be 512 MB - where Xmx7g specifies that the maximum memory allocated to the java virtual machine will be 7GB In this case take care that Xmx does not exceed the memory you asked for the job in the SGE bash script, actually you even need to spare some RAM for others applications (well it's a guess of mine here), i.e.: If you ask 6GB for your the jobs, let's Xmx not exceed 5GB. Other parameters of the picard tools influencing the the memory usage may be important to finely set up for big data sets (see Picard manual for further information).
Introduction to NGS analysis on a Raspberry Pi. Beta version 1.1 (04 June 2013)
Introduction to NGS analysis on a Raspberry Pi Beta version 1.1 (04 June 2013)!! Contents Overview Contents... 3! Overview... 4! Download some simulated reads... 5! Quality Control... 7! Map reads using
More informationPreparation of alignments for variant calling with GATK: exercise instructions for BioHPC Lab computers
Preparation of alignments for variant calling with GATK: exercise instructions for BioHPC Lab computers Data used in the exercise We will use D. melanogaster WGS paired-end Illumina data with NCBI accessions
More informationCSE 303 Lecture 2. Introduction to bash shell. read Linux Pocket Guide pp , 58-59, 60, 65-70, 71-72, 77-80
CSE 303 Lecture 2 Introduction to bash shell read Linux Pocket Guide pp. 37-46, 58-59, 60, 65-70, 71-72, 77-80 slides created by Marty Stepp http://www.cs.washington.edu/303/ 1 Unix file system structure
More informationCSE 390a Lecture 2. Exploring Shell Commands, Streams, Redirection, and Processes
CSE 390a Lecture 2 Exploring Shell Commands, Streams, Redirection, and Processes slides created by Marty Stepp, modified by Jessica Miller & Ruth Anderson http://www.cs.washington.edu/390a/ 1 2 Lecture
More informationCSE 390a Lecture 2. Exploring Shell Commands, Streams, and Redirection
1 CSE 390a Lecture 2 Exploring Shell Commands, Streams, and Redirection slides created by Marty Stepp, modified by Jessica Miller & Ruth Anderson http://www.cs.washington.edu/390a/ 2 Lecture summary Unix
More informationBatch system usage arm euthen F azo he Z J. B T
Batch system usage 10.11.2010 General stuff Computing wikipage: http://dvinfo.ifh.de Central email address for questions & requests: uco-zn@desy.de Data storage: AFS ( /afs/ifh.de/group/amanda/scratch/
More informationUsing ISMLL Cluster. Tutorial Lec 5. Mohsan Jameel, Information Systems and Machine Learning Lab, University of Hildesheim
Using ISMLL Cluster Tutorial Lec 5 1 Agenda Hardware Useful command Submitting job 2 Computing Cluster http://www.admin-magazine.com/hpc/articles/building-an-hpc-cluster Any problem or query regarding
More informationWhole genome assembly comparison of duplication originally described in Bailey et al
WGAC Whole genome assembly comparison of duplication originally described in Bailey et al. 2001. Inputs species name path to FASTA sequence(s) to be processed either a directory of chromosomal FASTA files
More informationContents. Note: pay attention to where you are. Note: Plaintext version. Note: pay attention to where you are... 1 Note: Plaintext version...
Contents Note: pay attention to where you are........................................... 1 Note: Plaintext version................................................... 1 Hello World of the Bash shell 2 Accessing
More informationAn Introduction to Cluster Computing Using Newton
An Introduction to Cluster Computing Using Newton Jason Harris and Dylan Storey March 25th, 2014 Jason Harris and Dylan Storey Introduction to Cluster Computing March 25th, 2014 1 / 26 Workshop design.
More informationProgramming introduction part I:
Programming introduction part I: Perl, Unix/Linux and using the BlueHive cluster Bio472- Spring 2014 Amanda Larracuente Text editor Syntax coloring Recognize several languages Line numbers Free! Mac/Windows
More informationBioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny.
Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny stefano.gaiarsa@unimi.it Linux and the command line PART 1 Survival kit for the bash environment Purpose of the
More informationCalling variants in diploid or multiploid genomes
Calling variants in diploid or multiploid genomes Diploid genomes The initial steps in calling variants for diploid or multi-ploid organisms with NGS data are the same as what we've already seen: 1. 2.
More informationLinux II and III. Douglas Scofield. Crea-ng directories and files 18/01/14. Evolu5onary Biology Centre, Uppsala University
Linux II and III Douglas Scofield Evolu5onary Biology Centre, Uppsala University douglas.scofield@ebc.uu.se slides at Crea-ng directories and files mkdir 1 Crea-ng directories and files touch if file does
More informationA Hands-On Tutorial: RNA Sequencing Using High-Performance Computing
A Hands-On Tutorial: RNA Sequencing Using Computing February 11th and 12th, 2016 1st session (Thursday) Preliminaries: Linux, HPC, command line interface Using HPC: modules, queuing system Presented by:
More informationWorking With Unix. Scott A. Handley* September 15, *Adapted from UNIX introduction material created by Dr. Julian Catchen
Working With Unix Scott A. Handley* September 15, 2014 *Adapted from UNIX introduction material created by Dr. Julian Catchen What is UNIX? An operating system (OS) Designed to be multiuser and multitasking
More informationIntroduction to UNIX command-line II
Introduction to UNIX command-line II Boyce Thompson Institute 2017 Prashant Hosmani Class Content Terminal file system navigation Wildcards, shortcuts and special characters File permissions Compression
More informationDePaul University CSC555 -Mining Big Data. Course Project by Bill Qualls Dr. Alexander Rasin, Instructor November 2013
DePaul University CSC555 -Mining Big Data Course Project by Bill Qualls Dr. Alexander Rasin, Instructor November 2013 1 Outline Objectives About the Data Loading the Data to HDFS The Map Reduce Program
More informationCloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK
Cloud Computing and Unix: An Introduction Dr. Sophie Shaw University of Aberdeen, UK s.shaw@abdn.ac.uk Aberdeen London Exeter What We re Going To Do Why Unix? Cloud Computing Connecting to AWS Introduction
More informationCloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK
Cloud Computing and Unix: An Introduction Dr. Sophie Shaw University of Aberdeen, UK s.shaw@abdn.ac.uk Aberdeen London Exeter What We re Going To Do Why Unix? Cloud Computing Connecting to AWS Introduction
More informationApplying Cortex to Phase Genomes data - the recipe. Zamin Iqbal
Applying Cortex to Phase 3 1000Genomes data - the recipe Zamin Iqbal (zam@well.ox.ac.uk) 21 June 2013 - version 1 Contents 1 Overview 1 2 People 1 3 What has changed since version 0 of this document? 1
More informationAnthill User Group Meeting, 2015
Agenda Anthill User Group Meeting, 2015 1. Introduction to the machines and the networks 2. Accessing the machines 3. Command line introduction 4. Setting up your environment to see the queues 5. The different
More informationUsing Linux as a Virtual Machine
Intro to UNIX Using Linux as a Virtual Machine We will use the VMware Player to run a Virtual Machine which is a way of having more than one Operating System (OS) running at once. Your Virtual OS (Linux)
More informationThis tutorial will guide you how to setup and run your own minecraft server on a Linux CentOS 6 in no time.
This tutorial will guide you how to setup and run your own minecraft server on a Linux CentOS 6 in no time. Running your own server lets you play together with your friends and family with your own set
More informationCSC BioWeek 2018: Using Taito cluster for high throughput data analysis
CSC BioWeek 2018: Using Taito cluster for high throughput data analysis 7. 2. 2018 Running Jobs in CSC Servers Exercise 1: Running a simple batch job in Taito We will run a small alignment using BWA: https://research.csc.fi/-/bwa
More informationExploring UNIX: Session 3
Exploring UNIX: Session 3 UNIX file system permissions UNIX is a multi user operating system. This means several users can be logged in simultaneously. For obvious reasons UNIX makes sure users cannot
More informationGrid Engine Users Guide. 5.5 Edition
Grid Engine Users Guide 5.5 Edition Grid Engine Users Guide : 5.5 Edition Published May 08 2012 Copyright 2012 University of California and Scalable Systems This document is subject to the Rocks License
More informationUnix basics exercise MBV-INFX410
Unix basics exercise MBV-INFX410 In order to start this exercise, you need to be logged in on a UNIX computer with a terminal window open on your computer. It is best if you are logged in on freebee.abel.uio.no.
More informationScripting Languages Course 1. Diana Trandabăț
Scripting Languages Course 1 Diana Trandabăț Master in Computational Linguistics - 1 st year 2017-2018 Today s lecture Introduction to scripting languages What is a script? What is a scripting language
More informationNBIC TechTrack PBS Tutorial. by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen
NBIC TechTrack PBS Tutorial by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen 1 NBIC PBS Tutorial This part is an introduction to clusters and the PBS
More informationAn Introduction to Linux and Bowtie
An Introduction to Linux and Bowtie Cavan Reilly November 10, 2017 Table of contents Introduction to UNIX-like operating systems Installing programs Bowtie SAMtools Introduction to Linux In order to use
More informationDownload the current release* of VirtualBox for the OS on which you will install VirtualBox. In these notes, that's Windows 7.
Get VirtualBox Go to www.virtualbox.org and select Downloads. VirtualBox/CentOS Setup 1 Download the current release* of VirtualBox for the OS on which you will install VirtualBox. In these notes, that's
More informationPerl and R Scripting for Biologists
Perl and R Scripting for Biologists Lukas Mueller PLBR 4092 Course overview Linux basics (today) Linux advanced (Aure, next week) Why Linux? Free open source operating system based on UNIX specifications
More informationIBM z Systems Development and Test Environment Tools User's Guide IBM
IBM z Systems Development and Test Environment Tools User's Guide IBM ii IBM z Systems Development and Test Environment Tools User's Guide Contents Chapter 1. Overview......... 1 Introduction..............
More informationITCS 4145/5145 Assignment 2
ITCS 4145/5145 Assignment 2 Compiling and running MPI programs Author: B. Wilkinson and Clayton S. Ferner. Modification date: September 10, 2012 In this assignment, the workpool computations done in Assignment
More informationBash for SLURM. Author: Wesley Schaal Pharmaceutical Bioinformatics, Uppsala University
Bash for SLURM Author: Wesley Schaal Pharmaceutical Bioinformatics, Uppsala University wesley.schaal@farmbio.uu.se Lab session: Pavlin Mitev (pavlin.mitev@kemi.uu.se) it i slides at http://uppmax.uu.se/support/courses
More information1. What statistic did the wc -l command show? (do man wc to get the answer) A. The number of bytes B. The number of lines C. The number of words
More Linux Commands 1 wc The Linux command for acquiring size statistics on a file is wc. This command provides the line count, word count and number of bytes in a file. Open up a terminal, make sure you
More informationPROJECT INFRASTRUCTURE AND BASH INTRODUCTION MARKUS PILMAN<
PROJECT INFRASTRUCTURE AND BASH INTRODUCTION MARKUS PILMAN< MPILMAN@INF.ETHZ.CH> ORGANIZATION Tutorials on Tuesdays - Sometimes, will be announced In General: no exercise sessions (unless you get an email
More informationMolecular Index Error correction
Molecular Index Error correction Overview: This section provides directions for generating SSCS (Single Strand Consensus Sequence) reads and trimming molecular indexes from raw fastq files. Learning Objectives:
More informationSGE Roll: Users Guide. Version 5.3 Edition
SGE Roll: Users Guide Version 5.3 Edition SGE Roll: Users Guide : Version 5.3 Edition Published Dec 2009 Copyright 2009 University of California and Scalable Systems This document is subject to the Rocks
More informationGenome Assembly. 2 Sept. Groups. Wiki. Job files Read cleaning Other cleaning Genome Assembly
2 Sept Groups Group 5 was down to 3 people so I merged it into the other groups Group 1 is now 6 people anyone want to change? The initial drafter is not the official leader use any management structure
More informationNew User Tutorial. OSU High Performance Computing Center
New User Tutorial OSU High Performance Computing Center TABLE OF CONTENTS Logging In... 3-5 Windows... 3-4 Linux... 4 Mac... 4-5 Changing Password... 5 Using Linux Commands... 6 File Systems... 7 File
More informationLinux Essentials Objectives Topics:
Linux Essentials Linux Essentials is a professional development certificate program that covers basic knowledge for those working and studying Open Source and various distributions of Linux. Exam Objectives
More informationShells. A shell is a command line interpreter that is the interface between the user and the OS. The shell:
Shells A shell is a command line interpreter that is the interface between the user and the OS. The shell: analyzes each command determines what actions are to be performed performs the actions Example:
More informationIbis RMI User s Guide
Ibis RMI User s Guide http://www.cs.vu.nl/ibis November 16, 2009 1 Introduction Java applications typically consist of one or more threads that manipulate a collection of objects by invoking methods on
More informationLinux II and III. Douglas Scofield. Crea-ng directories and files 15/08/16. Evolu6onary Biology Centre, Uppsala University
Linux II and III Douglas Scofield Evolu6onary Biology Centre, Uppsala University douglas.scofield@ebc.uu.se Crea-ng directories and files mkdir 1 Crea-ng directories and files touch if file does not exist,
More informationUnix Essentials. BaRC Hot Topics Bioinformatics and Research Computing Whitehead Institute October 12 th
Unix Essentials BaRC Hot Topics Bioinformatics and Research Computing Whitehead Institute October 12 th 2016 http://barc.wi.mit.edu/hot_topics/ 1 Outline Unix overview Logging in to tak Directory structure
More informationDo not start the test until instructed to do so!
Instructions: Print your name in the space provided below. This examination is closed book and closed notes, aside from the permitted one-page formula sheet. No calculators or other electronic devices
More informationEE516: Embedded Software Project 1. Setting Up Environment for Projects
EE516: Embedded Software Project 1. Setting Up Environment for Projects By Dong Jae Shin 2015. 09. 01. Contents Introduction to Projects of EE516 Tasks Setting Up Environment Virtual Machine Environment
More informationHigh Performance Computing Cluster Basic course
High Performance Computing Cluster Basic course Jeremie Vandenplas, Gwen Dawes 30 October 2017 Outline Introduction to the Agrogenomics HPC Connecting with Secure Shell to the HPC Introduction to the Unix/Linux
More informationPractical Linux examples: Exercises
Practical Linux examples: Exercises 1. Login (ssh) to the machine that you are assigned for this workshop (assigned machines: https://cbsu.tc.cornell.edu/ww/machines.aspx?i=87 ). Prepare working directory,
More information1. Download the data from ENA and QC it:
GenePool-External : Genome Assembly tutorial for NGS workshop 20121016 This page last changed on Oct 11, 2012 by tcezard. This is a whole genome sequencing of a E. coli from the 2011 German outbreak You
More informationHORIZONTAL GENE TRANSFER DETECTION
HORIZONTAL GENE TRANSFER DETECTION Sequenzanalyse und Genomik (Modul 10-202-2207) Alejandro Nabor Lozada-Chávez Before start, the user must create a new folder or directory (WORKING DIRECTORY) for all
More informationManual Script Windows Batch Rename File With Date And Time
Manual Script Windows Batch Rename File With Date And Time Rename a file (or folder) by appending the current date and time to the the file in the format "Test File-2014-12-30@16-55-01.txt" Echo: Echo
More informationIntroduction to Linux Basics Part II. Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala
Introduction to Linux Basics Part II 1 Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala pakala@uga.edu 2 Variables in Shell HOW DOES LINUX WORK? Shell Arithmetic I/O and
More informationCSC BioWeek 2016: Using Taito cluster for high throughput data analysis
CSC BioWeek 2016: Using Taito cluster for high throughput data analysis 4. 2. 2016 Running Jobs in CSC Servers A note on typography: Some command lines are too long to fit a line in printed form. These
More informationCSC209. Software Tools and Systems Programming. https://mcs.utm.utoronto.ca/~209
CSC209 Software Tools and Systems Programming https://mcs.utm.utoronto.ca/~209 What is this Course About? Software Tools Using them Building them Systems Programming Quirks of C The file system System
More informationVariant calling using SAMtools
Variant calling using SAMtools Calling variants - a trivial use of an Interactive Session We are going to conduct the variant calling exercises in an interactive idev session just so you can get a feel
More informationAdvanced Linux Commands & Shell Scripting
Advanced Linux Commands & Shell Scripting Advanced Genomics & Bioinformatics Workshop James Oguya Nairobi, Kenya August, 2016 Man pages Most Linux commands are shipped with their reference manuals To view
More informationMinHash Alignment Process (MHAP) Documentation
MinHash Alignment Process (MHAP) Documentation Release 2.1 Sergey Koren and Konstantin Berlin December 24, 2016 Contents 1 Overview 1 1.1 Installation................................................ 1
More informationIntroduction To Linux. Rob Thomas - ACRC
Introduction To Linux Rob Thomas - ACRC What Is Linux A free Operating System based on UNIX (TM) An operating system originating at Bell Labs. circa 1969 in the USA More of this later... Why Linux? Free
More informationGetting Started with Hadoop
Getting Started with Hadoop May 28, 2018 Michael Völske, Shahbaz Syed Web Technology & Information Systems Bauhaus-Universität Weimar 1 webis 2018 What is Hadoop Started in 2004 by Yahoo Open-Source implementation
More informationChIP-seq Analysis Practical
ChIP-seq Analysis Practical Vladimir Teif (vteif@essex.ac.uk) An updated version of this document will be available at http://generegulation.info/index.php/teaching In this practical we will learn how
More informationIntroduction to Linux for BlueBEAR. January
Introduction to Linux for BlueBEAR January 2019 http://intranet.birmingham.ac.uk/bear Overview Understanding of the BlueBEAR workflow Logging in to BlueBEAR Introduction to basic Linux commands Basic file
More informationIntroduction to UNIX command-line
Introduction to UNIX command-line Boyce Thompson Institute March 17, 2015 Lukas Mueller & Noe Fernandez Class Content Terminal file system navigation Wildcards, shortcuts and special characters File permissions
More informationCMSC 201 Fall 2016 Lab 09 Advanced Debugging
CMSC 201 Fall 2016 Lab 09 Advanced Debugging Assignment: Lab 09 Advanced Debugging Due Date: During discussion Value: 10 points Part 1: Introduction to Errors Throughout this semester, we have been working
More informationSnakemake overview. Thomas Cokelaer. Nov 9th 2017 Snakemake and Sequana overview. Institut Pasteur
Snakemake overview Thomas Cokelaer Institut Pasteur Nov 9th 2017 Snakemake and Sequana overview Many bioinformatic pipeline frameworks available A review of bioinformatic pipeline frameworks. Jeremy Leipzig
More informationLinux Command Line Interface. December 27, 2017
Linux Command Line Interface December 27, 2017 Foreword It is supposed to be a refresher (?!) If you are familiar with UNIX/Linux/MacOS X CLI, this is going to be boring... I will not talk about editors
More informationRunning LAMMPS on CC servers at IITM
Running LAMMPS on CC servers at IITM Srihari Sundar September 9, 2016 This tutorial assumes prior knowledge about LAMMPS [2, 1] and deals with running LAMMPS scripts on the compute servers at the computer
More informationHOD User Guide. Table of contents
Table of contents 1 Introduction...3 2 Getting Started Using HOD... 3 2.1 A typical HOD session... 3 2.2 Running hadoop scripts using HOD...5 3 HOD Features... 6 3.1 Provisioning and Managing Hadoop Clusters...6
More informationGUT. GUT Installation Guide
Date : 17 Mar 2011 1/6 GUT Contents 1 Introduction...2 2 Installing GUT...2 2.1 Optional Extensions...2 2.2 Installation using the Binary package...2 2.2.1 Linux or Mac OS X...2 2.2.2 Windows...4 2.3 Installing
More informationChIP-seq practical: peak detection and peak annotation. Mali Salmon-Divon Remco Loos Myrto Kostadima
ChIP-seq practical: peak detection and peak annotation Mali Salmon-Divon Remco Loos Myrto Kostadima March 2012 Introduction The goal of this hands-on session is to perform some basic tasks in the analysis
More informationWorking with Basic Linux. Daniel Balagué
Working with Basic Linux Daniel Balagué How Linux Works? Everything in Linux is either a file or a process. A process is an executing program identified with a PID number. It runs in short or long duration
More informationIntroduction to Unix: Fundamental Commands
Introduction to Unix: Fundamental Commands Ricky Patterson UVA Library Based on slides from Turgut Yilmaz Istanbul Teknik University 1 What We Will Learn The fundamental commands of the Unix operating
More informationA shell can be used in one of two ways:
Shell Scripting 1 A shell can be used in one of two ways: A command interpreter, used interactively A programming language, to write shell scripts (your own custom commands) 2 If we have a set of commands
More informationGrid Engine Users Guide. 7.0 Edition
Grid Engine Users Guide 7.0 Edition Grid Engine Users Guide : 7.0 Edition Published Dec 01 2017 Copyright 2017 University of California and Scalable Systems This document is subject to the Rocks License
More informationChapter-3. Introduction to Unix: Fundamental Commands
Chapter-3 Introduction to Unix: Fundamental Commands What You Will Learn The fundamental commands of the Unix operating system. Everything told for Unix here is applicable to the Linux operating system
More informationVideo Performance Evaluation Resource. Quick Start Guide
Video Performance Evaluation Resource Quick Start Guide November 25, 2002 Table of Contents 1 Welcome to ViPER... 3 1.1 Welcome to the ViPER Documentation... 3 2 Setting Up ViPER... 3 2.1 Preparing for
More informationUsing UNIX. -rwxr--r-- 1 root sys Sep 5 14:15 good_program
Using UNIX. UNIX is mainly a command line interface. This means that you write the commands you want executed. In the beginning that will seem inferior to windows point-and-click, but in the long run the
More informationRunning Java Programs
Running Java Programs Written by: Keith Fenske, http://www.psc-consulting.ca/fenske/ First version: Thursday, 10 January 2008 Document revised: Saturday, 13 February 2010 Copyright 2008, 2010 by Keith
More informationLinux Operating System Environment Computadors Grau en Ciència i Enginyeria de Dades Q2
Linux Operating System Environment Computadors Grau en Ciència i Enginyeria de Dades 2017-2018 Q2 Facultat d Informàtica de Barcelona This first lab session is focused on getting experience in working
More informationChapter 4. Unix Tutorial. Unix Shell
Chapter 4 Unix Tutorial Users and applications interact with hardware through an operating system (OS). Unix is a very basic operating system in that it has just the essentials. Many operating systems,
More informationLinux Systems Administration Getting Started with Linux
Linux Systems Administration Getting Started with Linux Network Startup Resource Center www.nsrc.org These materials are licensed under the Creative Commons Attribution-NonCommercial 4.0 International
More informationSuperQ (Version 1.2) Manual
SuperQ (Version 1.2) Manual October 20, 2013 1 Description SuperQ is a program written in Java which computes a phylogenetic supernetwork from a collection of partial phylogenetic trees as described in
More informationQuick Guide for the Torque Cluster Manager
Quick Guide for the Torque Cluster Manager Introduction: One of the main purposes of the Aries Cluster is to accommodate especially long-running programs. Users who run long jobs (which take hours or days
More informationServer Monitoring. AppDynamics Pro Documentation. Version 4.1.x. Page 1
Server Monitoring AppDynamics Pro Documentation Version 4.1.x Page 1 Server Monitoring......................................................... 4 Standalone Machine Agent Requirements and Supported Environments............
More informationIntroduction to the shell Part II
Introduction to the shell Part II Graham Markall http://www.doc.ic.ac.uk/~grm08 grm08@doc.ic.ac.uk Civil Engineering Tech Talks 16 th November, 1pm Last week Covered applications and Windows compatibility
More informationToday. Review. Unix as an OS case study Intro to Shell Scripting. What is an Operating System? What are its goals? How do we evaluate it?
Today Unix as an OS case study Intro to Shell Scripting Make sure the computer is in Linux If not, restart, holding down ALT key Login! Posted slides contain material not explicitly covered in class 1
More informationSGE Roll: Users Guide. Version Edition
SGE Roll: Users Guide Version 4.2.1 Edition SGE Roll: Users Guide : Version 4.2.1 Edition Published Sep 2006 Copyright 2006 University of California and Scalable Systems This document is subject to the
More informationls /data/atrnaseq/ egrep "(fastq fasta fq fa)\.gz" ls /data/atrnaseq/ egrep "(cn ts)[1-3]ln[^3a-za-z]\."
Command line tools - bash, awk and sed We can only explore a small fraction of the capabilities of the bash shell and command-line utilities in Linux during this course. An entire course could be taught
More informationCS 460 Linux Tutorial
CS 460 Linux Tutorial http://ryanstutorials.net/linuxtutorial/cheatsheet.php # Change directory to your home directory. # Remember, ~ means your home directory cd ~ # Check to see your current working
More informationGenomic Files. University of Massachusetts Medical School. October, 2015
.. Genomic Files University of Massachusetts Medical School October, 2015 2 / 55. A Typical Deep-Sequencing Workflow Samples Fastq Files Fastq Files Sam / Bam Files Various files Deep Sequencing Further
More informationHigh Performance Computing (HPC) Club Training Session. Xinsheng (Shawn) Qin
High Performance Computing (HPC) Club Training Session Xinsheng (Shawn) Qin Outline HPC Club The Hyak Supercomputer Logging in to Hyak Basic Linux Commands Transferring Files Between Your PC and Hyak Submitting
More informationAutomatic Dependency Management for Scientific Applications on Clusters. Ben Tovar*, Nicholas Hazekamp, Nathaniel Kremer-Herman, Douglas Thain
Automatic Dependency Management for Scientific Applications on Clusters Ben Tovar*, Nicholas Hazekamp, Nathaniel Kremer-Herman, Douglas Thain Where users are Scientist says: "This demo task runs on my
More informationsimplevisor Documentation
simplevisor Documentation Release 1.2 Massimo Paladin June 27, 2016 Contents 1 Main Features 1 2 Installation 3 3 Configuration 5 4 simplevisor command 9 5 simplevisor-control command 13 6 Supervisor
More informationWorking with Shell Scripting. Daniel Balagué
Working with Shell Scripting Daniel Balagué Editing Text Files We offer many text editors in the HPC cluster. Command-Line Interface (CLI) editors: vi / vim nano (very intuitive and easy to use if you
More informationWorkshop Practical on concatenation and model testing
Workshop Practical on concatenation and model testing Jacob L. Steenwyk & Antonis Rokas Programs that you will use: Bash, Python, Perl, Phyutility, PartitionFinder, awk To infer a putative species phylogeny
More informationEssential Skills for Bioinformatics: Unix/Linux
Essential Skills for Bioinformatics: Unix/Linux SHELL SCRIPTING Overview Bash, the shell we have used interactively in this course, is a full-fledged scripting language. Unlike Python, Bash is not a general-purpose
More information(MCQZ-CS604 Operating Systems)
command to resume the execution of a suspended job in the foreground fg (Page 68) bg jobs kill commands in Linux is used to copy file is cp (Page 30) mv mkdir The process id returned to the child process
More informationX Grid Engine. Where X stands for Oracle Univa Open Son of more to come...?!?
X Grid Engine Where X stands for Oracle Univa Open Son of more to come...?!? Carsten Preuss on behalf of Scientific Computing High Performance Computing Scheduler candidates LSF too expensive PBS / Torque
More information