Genome Assembly. 2 Sept. Groups. Wiki. Job files Read cleaning Other cleaning Genome Assembly

Size: px
Start display at page:

Download "Genome Assembly. 2 Sept. Groups. Wiki. Job files Read cleaning Other cleaning Genome Assembly"

Transcription

1 2 Sept Groups Group 5 was down to 3 people so I merged it into the other groups Group 1 is now 6 people anyone want to change? The initial drafter is not the official leader use any management structure you like Wiki Use the wiki as your group notebook Share your job files Need to see your results Job files Read cleaning Other cleaning Genome Assembly

2 RCAC Job files Why? When you log into an RCAC server you are using a special server designed for multiple users. This is called a frontend node ( or sometimes a head node). There are (I think) three front end nodes often they are very busy. Frontend node: edit files, send mail, backup data, compile programs No computing The other nodes are called compute nodes. They are allocated and run by a system called PBS/Torque. The preferred way to use PBS is by submitting a job file using the command qsub When you run a job with qsub, all of the normal output (STDOUT) and error output (STDERR) is sent to files called jobname.o<jobnumber> and jobname.e<jobnumber>, respectively. For example check_clip_man.o check_clip_man.e

3 RCAC Job files Example: Running seqyclean (a module) #!/bin/sh -l #PBS -N seqyclean_monpu1 #PBS -q scholar #PBS -l nodes=1:ppn=16 #PBS -l walltime=168:00:00 module load seqyclean cd $PBS_O_WORKDIR pwd Shebang tell unix this is a shell file. It could be a Perl file Jobname (seen in qstat) Queue use scholar unless otherwise instructed Number of nodes and CPUs (ppn) to reserve. Usually ppn will be 1 or 16 on scholar Maximum CPU time the job will run. The scholar queue is limited to 168 hours cat seqyclean.job date +"%d %B %Y %H:%M:%S" echo " " seqyclean -t 16 \ -1../../data/Monpu1.genome.rawReads.r1.fq \ -2../../data/Monpu1.genome.rawReads.r2.fq \ -v adapter.fa \ -qual \ -minimum_read_length 30 \ -o Monpu1.genome.rawReads.seqyclean.stats \ > seqyclean.log Optional: pwd $PBS_O_WORKDIR is a predefined symbol that means the directory from which you submitted the job with qsub Pwd print the directory after the cd useful for debugging cat <filename> copies the command file to the output The date echo command writes the date into the output The backslash, \, is a line continuation character in unix. It makes it easier to write and understand very long command lines The greater-than symbol, >, redirects output in unix, i.e., everything written to STDOUT is sent to the file seqyclean.log echo " " date +"%d %B %Y %H:%M:%S"

4 RCAC Job files #PBS -N seqyclean_monpu1 #PBS -q scholar #PBS -l nodes=1:ppn=16 #PBS -l walltime=168:00:00 PBS commands can also be entered on the command line when you run qsub qsub N seqyclean_monpu1 -q scholar -l nodes=1:ppn=16 -l walltime=168:00:00 I like the PBS commands in the job file so I have a record Its easy to make a mistake Can save a lot of work by copying from old jobs

5 RCAC job files Job files # (header stuff removed for example) ~/src/btrim/btrim64 \ -3 \ -p adapter2.fa \ -t../monpu1.genome.rawreads.fastq \ -o Monpu1.trimmed \ -s Monpu1.btrim.summary \ >btrim.log This file is in /home/mgribsko/src. In unix, ~ is a symbol for your home directory. ~<username>, for instance ~mgribsko is a symbol for the named user s home directory echo " " date +"%d %B %Y %H:%M:%S" # Btrim64: -q -p <pattern file> -t <fastq file> -o <trim file> [-u 5'-error -v 3'-error -l minlen -b <5'-cut> -e <3'-cut> \ # -w <window> -a <average> -f <5'-trim> -I] # # Required for pattern trimming: # -p <pattern file> each line contains one pair of 5'- and 3'-adaptors; ignored if -q in effect # -t <sequence file> fastq file to be trimmed # -o <output file> fastq file of trimmed sequences # # Required for quality trimming (-q in effect): # -t <sequence file> fastq file to be trimmed # -o <output file> fastq file of trimmed sequences # # Optional: # -q toggle to quality trimming [default=adaptor trimming] # -3 3'-adaptor trimming only [default=off] # -P pass if no adaptor is found [default=off] # -Q do a quality trimming even if adaptor is found [default=off] # -s <summary file> detailed trimming info for each sequence # -u <5'-error> maximum number of errors in 5'-adaptor [default=3] # -v <3'-error> maximum number of errors in 3'-adaptor [default=4] # -l <minimal length> minimal insert size [default=25] # -b <5'-range> the length of sequence to look for 5'-adaptor at the beginning of the sequence [default=1.3 x adaptor length] # I often copy the help for the command into the job file as a comment. Comments begin with #. This makes it much easier to change the command later. Notice that the PBS commands are comments as far as unix is concerned

6 RCAC Job files Time for an Example Job files Grep tricks

7 Adapter trimming Over the summer I tried many methods AdapterRemoval AlienTrimmer Btrim Cutadapt Fastx_clip Fastqmcf Flexbar Reaper Scythe Seqprep Seqyclean Skewer Trimmomatic

8 Adapter trimming Quick and Dirty test: use grep to check for the first 14 bases of the universal and index adapters, and their reverse complement Why 14? Long enough that you don t expect to see (many) matches by chance. Why quick and dirty? Only exact matches will be found Quality not considered Matches may be cut off by end of read This test will UNDERESTIMATE the number of adapters.

9 Adapter trimming index Forward Index Reverse Universal Forward Universal Reverse Total Adapters reads remain adapters remain Monpu1.genome.rawReads.r1.fq Monpu1.genome.rawReads.r2.fq Monpu1.genome.rawReads.both.fq % % Monpu1.genome.filteredReads.fastq % 34.11% adapterremoval % 3.16% alientrimmer % 5.46% cutadapt % 65.96% fastqmcf % 17.39% flexbar % 1.72% reaper % 2.66% scythe % 4.21% seqprep % 3.26% skewer % 2.14% seqyclean all % 0.11% trimmomatic paired.r1.fq trimmomatic unpaired.r1.fq trimmomatic paired.r2.fq trimmomatic unpaired.r2.fq trimmomatic all % 20.45%

10 Adapter trimming Group 1- trimmomatic

11 Adapter trimming

12 Adapter trimming

13 Adapter trimming

14 Adapter trimming

15 Adapter trimming

16 Adapter trimming

17 Other Cleaning Mitochondrial Phi-X174 Match to reads using Bowtie2 (or any other mapper) use local-very-sensitive (matches with small gaps)

18 De Bruijn Graphs (from Homolog.us Bioinformatics)

19 De Bruijn Graph

20 De Bruijn Graph Repeats

21 De Bruijn Graph reads

22 Velvet One of the first De Bruijn assemblers Pruning tips a chain of nodes disconnected on one end caused by sequencing errors OR coverage gaps errors tend to be short (rule trim if < 2 kmer ) errors tend to have low multiplicity at junction bubbles paths that leave and return caused by sequence variation (SNPs) length/multiplicity rule shorter, higher multiplicity paths are preferred Erroneous connections duplicate sequences + errors errors will have low coverage, so will areas with low coverage

RCAC. Job files Example: Running seqyclean (a module)

RCAC. Job files Example: Running seqyclean (a module) RCAC Job files Why? When you log into an RCAC server you are using a special server designed for multiple users. This is called a frontend node ( or sometimes a head node). There are (I think) three front

More information

Genomics AGRY Michael Gribskov Hock 331

Genomics AGRY Michael Gribskov Hock 331 Genomics AGRY 60000 Michael Gribskov gribskov@purdue.edu Hock 331 Computing Essentials Resources In this course we will assemble and annotate both genomic and transcriptomic sequence assemblies We will

More information

Introduction to UNIX

Introduction to UNIX PURDUE UNIVERSITY Introduction to UNIX Manual Michael Gribskov 8/21/2016 1 Contents Connecting to servers... 4 PUTTY... 4 SSH... 5 File Transfer... 5 scp secure copy... 5 sftp

More information

Using ITaP clusters for large scale statistical analysis with R. Doug Crabill Purdue University

Using ITaP clusters for large scale statistical analysis with R. Doug Crabill Purdue University Using ITaP clusters for large scale statistical analysis with R Doug Crabill Purdue University Topics Running multiple R jobs on departmental Linux servers serially, and in parallel Cluster concepts and

More information

OpenPBS Users Manual

OpenPBS Users Manual How to Write a PBS Batch Script OpenPBS Users Manual PBS scripts are rather simple. An MPI example for user your-user-name: Example: MPI Code PBS -N a_name_for_my_parallel_job PBS -l nodes=7,walltime=1:00:00

More information

Parameter searches and the batch system

Parameter searches and the batch system Parameter searches and the batch system Scientific Computing Group css@rrzn.uni-hannover.de Parameter searches and the batch system Scientific Computing Group 1st of October 2012 1 Contents 1 Parameter

More information

Lab #2 Physics 91SI Spring 2013

Lab #2 Physics 91SI Spring 2013 Lab #2 Physics 91SI Spring 2013 Objective: Some more experience with advanced UNIX concepts, such as redirecting and piping. You will also explore the usefulness of Mercurial version control and how to

More information

User Guide of High Performance Computing Cluster in School of Physics

User Guide of High Performance Computing Cluster in School of Physics User Guide of High Performance Computing Cluster in School of Physics Prepared by Sue Yang (xue.yang@sydney.edu.au) This document aims at helping users to quickly log into the cluster, set up the software

More information

Quick Guide for the Torque Cluster Manager

Quick Guide for the Torque Cluster Manager Quick Guide for the Torque Cluster Manager Introduction: One of the main purposes of the Aries Cluster is to accommodate especially long-running programs. Users who run long jobs (which take hours or days

More information

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing Quick Start Guide by Burak Himmetoglu Supercomputing Consultant Enterprise Technology Services & Center for Scientific Computing E-mail: bhimmetoglu@ucsb.edu Linux/Unix basic commands Basic command structure:

More information

Cloud Computing Research Cloud: NeCTAR Commercial Cloud: Amazon AWS, Microsoft Azure, etc. Seed money for exploration of new cloud technologies

Cloud Computing Research Cloud: NeCTAR Commercial Cloud: Amazon AWS, Microsoft Azure, etc. Seed money for exploration of new cloud technologies High Performance Computing (HPC) As a service: NCI Raijin Katana local HPC cluster Cloud Computing Research Cloud: NeCTAR Commercial Cloud: Amazon AWS, Microsoft Azure, etc. Seed money for exploration

More information

NBIC TechTrack PBS Tutorial

NBIC TechTrack PBS Tutorial NBIC TechTrack PBS Tutorial by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen Visit our webpage at: http://www.nbic.nl/support/brs 1 NBIC PBS Tutorial

More information

NBIC TechTrack PBS Tutorial. by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen

NBIC TechTrack PBS Tutorial. by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen NBIC TechTrack PBS Tutorial by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen 1 NBIC PBS Tutorial This part is an introduction to clusters and the PBS

More information

Batch Systems. Running calculations on HPC resources

Batch Systems. Running calculations on HPC resources Batch Systems Running calculations on HPC resources Outline What is a batch system? How do I interact with the batch system Job submission scripts Interactive jobs Common batch systems Converting between

More information

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing Quick Start Guide by Burak Himmetoglu Supercomputing Consultant Enterprise Technology Services & Center for Scientific Computing E-mail: bhimmetoglu@ucsb.edu Contents User access, logging in Linux/Unix

More information

These will serve as a basic guideline for read prep. This assumes you have demultiplexed Illumina data.

These will serve as a basic guideline for read prep. This assumes you have demultiplexed Illumina data. These will serve as a basic guideline for read prep. This assumes you have demultiplexed Illumina data. We have a few different choices for running jobs on DT2 we will explore both here. We need to alter

More information

Programming introduction part I:

Programming introduction part I: Programming introduction part I: Perl, Unix/Linux and using the BlueHive cluster Bio472- Spring 2014 Amanda Larracuente Text editor Syntax coloring Recognize several languages Line numbers Free! Mac/Windows

More information

UF Research Computing: Overview and Running STATA

UF Research Computing: Overview and Running STATA UF : Overview and Running STATA www.rc.ufl.edu Mission Improve opportunities for research and scholarship Improve competitiveness in securing external funding Matt Gitzendanner magitz@ufl.edu Provide high-performance

More information

Shell Scripting. With Applications to HPC. Edmund Sumbar Copyright 2007 University of Alberta. All rights reserved

Shell Scripting. With Applications to HPC. Edmund Sumbar Copyright 2007 University of Alberta. All rights reserved AICT High Performance Computing Workshop With Applications to HPC Edmund Sumbar research.support@ualberta.ca Copyright 2007 University of Alberta. All rights reserved High performance computing environment

More information

An Introduction to Cluster Computing Using Newton

An Introduction to Cluster Computing Using Newton An Introduction to Cluster Computing Using Newton Jason Harris and Dylan Storey March 25th, 2014 Jason Harris and Dylan Storey Introduction to Cluster Computing March 25th, 2014 1 / 26 Workshop design.

More information

Introduction to Linux and Cluster Computing Environments for Bioinformatics

Introduction to Linux and Cluster Computing Environments for Bioinformatics Introduction to Linux and Cluster Computing Environments for Bioinformatics Doug Crabill Senior Academic IT Specialist Department of Statistics Purdue University dgc@purdue.edu What you will learn Linux

More information

Answers to Federal Reserve Questions. Training for University of Richmond

Answers to Federal Reserve Questions. Training for University of Richmond Answers to Federal Reserve Questions Training for University of Richmond 2 Agenda Cluster Overview Software Modules PBS/Torque Ganglia ACT Utils 3 Cluster overview Systems switch ipmi switch 1x head node

More information

Getting started with the CEES Grid

Getting started with the CEES Grid Getting started with the CEES Grid October, 2013 CEES HPC Manager: Dennis Michael, dennis@stanford.edu, 723-2014, Mitchell Building room 415. Please see our web site at http://cees.stanford.edu. Account

More information

GPU Cluster Usage Tutorial

GPU Cluster Usage Tutorial GPU Cluster Usage Tutorial How to make caffe and enjoy tensorflow on Torque 2016 11 12 Yunfeng Wang 1 PBS and Torque PBS: Portable Batch System, computer software that performs job scheduling versions

More information

Linux Command Line Interface. December 27, 2017

Linux Command Line Interface. December 27, 2017 Linux Command Line Interface December 27, 2017 Foreword It is supposed to be a refresher (?!) If you are familiar with UNIX/Linux/MacOS X CLI, this is going to be boring... I will not talk about editors

More information

The DTU HPC system. and how to use TopOpt in PETSc on a HPC system, visualize and 3D print results.

The DTU HPC system. and how to use TopOpt in PETSc on a HPC system, visualize and 3D print results. The DTU HPC system and how to use TopOpt in PETSc on a HPC system, visualize and 3D print results. Niels Aage Department of Mechanical Engineering Technical University of Denmark Email: naage@mek.dtu.dk

More information

Introduction to HPC Resources and Linux

Introduction to HPC Resources and Linux Introduction to HPC Resources and Linux Burak Himmetoglu Enterprise Technology Services & Center for Scientific Computing e-mail: bhimmetoglu@ucsb.edu Paul Weakliem California Nanosystems Institute & Center

More information

22-Sep CSCI 2132 Software Development Lecture 8: Shells, Processes, and Job Control. Faculty of Computer Science, Dalhousie University

22-Sep CSCI 2132 Software Development Lecture 8: Shells, Processes, and Job Control. Faculty of Computer Science, Dalhousie University Lecture 8 p.1 Faculty of Computer Science, Dalhousie University CSCI 2132 Software Development Lecture 8: Shells, Processes, and Job Control 22-Sep-2017 Location: Goldberg CS 127 Time: 14:35 15:25 Instructor:

More information

Quality Control of Illumina Data at the Command Line

Quality Control of Illumina Data at the Command Line Quality Control of Illumina Data at the Command Line Quick UNIX Introduction: UNIX is an operating system like OSX or Windows. The interface between you and the UNIX OS is called the shell. There are a

More information

Batch Systems. Running your jobs on an HPC machine

Batch Systems. Running your jobs on an HPC machine Batch Systems Running your jobs on an HPC machine Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

How to run applications on Aziz supercomputer. Mohammad Rafi System Administrator Fujitsu Technology Solutions

How to run applications on Aziz supercomputer. Mohammad Rafi System Administrator Fujitsu Technology Solutions How to run applications on Aziz supercomputer Mohammad Rafi System Administrator Fujitsu Technology Solutions Agenda Overview Compute Nodes Storage Infrastructure Servers Cluster Stack Environment Modules

More information

Data Preprocessing. Next Generation Sequencing analysis DTU Bioinformatics Next Generation Sequencing Analysis

Data Preprocessing. Next Generation Sequencing analysis DTU Bioinformatics Next Generation Sequencing Analysis Data Preprocessing Next Generation Sequencing analysis DTU Bioinformatics Generalized NGS analysis Data size Application Assembly: Compare Raw Pre- specific: Question Alignment / samples / Answer? reads

More information

Understanding and Pre-processing Raw Illumina Data

Understanding and Pre-processing Raw Illumina Data Understanding and Pre-processing Raw Illumina Data Matt Johnson October 4, 2013 1 Understanding FASTQ files After an Illumina sequencing run, the data is stored in very large text files in a standard format

More information

CS Unix Tools. Fall 2010 Lecture 5. Hussam Abu-Libdeh based on slides by David Slater. September 17, 2010

CS Unix Tools. Fall 2010 Lecture 5. Hussam Abu-Libdeh based on slides by David Slater. September 17, 2010 Fall 2010 Lecture 5 Hussam Abu-Libdeh based on slides by David Slater September 17, 2010 Reasons to use Unix Reason #42 to use Unix: Wizardry Mastery of Unix makes you a wizard need proof? here is the

More information

Running Jobs, Submission Scripts, Modules

Running Jobs, Submission Scripts, Modules 9/17/15 Running Jobs, Submission Scripts, Modules 16,384 cores total of about 21,000 cores today Infiniband interconnect >3PB fast, high-availability, storage GPGPUs Large memory nodes (512GB to 1TB of

More information

Logging in to the CRAY

Logging in to the CRAY Logging in to the CRAY 1. Open Terminal Cray Hostname: cray2.colostate.edu Cray IP address: 129.82.103.183 On a Mac 2. type ssh username@cray2.colostate.edu where username is your account name 3. enter

More information

A Hands-On Tutorial: RNA Sequencing Using High-Performance Computing

A Hands-On Tutorial: RNA Sequencing Using High-Performance Computing A Hands-On Tutorial: RNA Sequencing Using Computing February 11th and 12th, 2016 1st session (Thursday) Preliminaries: Linux, HPC, command line interface Using HPC: modules, queuing system Presented by:

More information

Variation among genomes

Variation among genomes Variation among genomes Comparing genomes The reference genome http://www.ncbi.nlm.nih.gov/nuccore/26556996 Arabidopsis thaliana, a model plant Col-0 variety is from Landsberg, Germany Ler is a mutant

More information

Sharpen Exercise: Using HPC resources and running parallel applications

Sharpen Exercise: Using HPC resources and running parallel applications Sharpen Exercise: Using HPC resources and running parallel applications Andrew Turner, Dominic Sloan-Murphy, David Henty, Adrian Jackson Contents 1 Aims 2 2 Introduction 2 3 Instructions 3 3.1 Log into

More information

The cluster system. Introduction 22th February Jan Saalbach Scientific Computing Group

The cluster system. Introduction 22th February Jan Saalbach Scientific Computing Group The cluster system Introduction 22th February 2018 Jan Saalbach Scientific Computing Group cluster-help@luis.uni-hannover.de Contents 1 General information about the compute cluster 2 Available computing

More information

New High Performance Computing Cluster For Large Scale Multi-omics Data Analysis. 28 February 2018 (Wed) 2:30pm 3:30pm Seminar Room 1A, G/F

New High Performance Computing Cluster For Large Scale Multi-omics Data Analysis. 28 February 2018 (Wed) 2:30pm 3:30pm Seminar Room 1A, G/F New High Performance Computing Cluster For Large Scale Multi-omics Data Analysis 28 February 2018 (Wed) 2:30pm 3:30pm Seminar Room 1A, G/F The Team (Bioinformatics & Information Technology) Eunice Kelvin

More information

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike

More information

Cloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK

Cloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK Cloud Computing and Unix: An Introduction Dr. Sophie Shaw University of Aberdeen, UK s.shaw@abdn.ac.uk Aberdeen London Exeter What We re Going To Do Why Unix? Cloud Computing Connecting to AWS Introduction

More information

Job Management on LONI and LSU HPC clusters

Job Management on LONI and LSU HPC clusters Job Management on LONI and LSU HPC clusters Le Yan HPC Consultant User Services @ LONI Outline Overview Batch queuing system Job queues on LONI clusters Basic commands The Cluster Environment Multiple

More information

Cloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK

Cloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK Cloud Computing and Unix: An Introduction Dr. Sophie Shaw University of Aberdeen, UK s.shaw@abdn.ac.uk Aberdeen London Exeter What We re Going To Do Why Unix? Cloud Computing Connecting to AWS Introduction

More information

A Brief Introduction to the Linux Shell for Data Science

A Brief Introduction to the Linux Shell for Data Science A Brief Introduction to the Linux Shell for Data Science Aris Anagnostopoulos 1 Introduction Here we will see a brief introduction of the Linux command line or shell as it is called. Linux is a Unix-like

More information

Installing and running COMSOL 4.3a on a Linux cluster COMSOL. All rights reserved.

Installing and running COMSOL 4.3a on a Linux cluster COMSOL. All rights reserved. Installing and running COMSOL 4.3a on a Linux cluster 2012 COMSOL. All rights reserved. Introduction This quick guide explains how to install and operate COMSOL Multiphysics 4.3a on a Linux cluster. It

More information

Simple examples how to run MPI program via PBS on Taurus HPC

Simple examples how to run MPI program via PBS on Taurus HPC Simple examples how to run MPI program via PBS on Taurus HPC MPI setup There's a number of MPI implementations install on the cluster. You can list them all issuing the following command: module avail/load/list/unload

More information

UoW HPC Quick Start. Information Technology Services University of Wollongong. ( Last updated on October 10, 2011)

UoW HPC Quick Start. Information Technology Services University of Wollongong. ( Last updated on October 10, 2011) UoW HPC Quick Start Information Technology Services University of Wollongong ( Last updated on October 10, 2011) 1 Contents 1 Logging into the HPC Cluster 3 1.1 From within the UoW campus.......................

More information

Week Overview. Simple filter commands: head, tail, cut, sort, tr, wc grep utility stdin, stdout, stderr Redirection and piping /dev/null file

Week Overview. Simple filter commands: head, tail, cut, sort, tr, wc grep utility stdin, stdout, stderr Redirection and piping /dev/null file ULI101 Week 05 Week Overview Simple filter commands: head, tail, cut, sort, tr, wc grep utility stdin, stdout, stderr Redirection and piping /dev/null file head and tail commands These commands display

More information

Our new HPC-Cluster An overview

Our new HPC-Cluster An overview Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization

More information

Read mapping with BWA and BOWTIE

Read mapping with BWA and BOWTIE Read mapping with BWA and BOWTIE Before We Start In order to save a lot of typing, and to allow us some flexibility in designing these courses, we will establish a UNIX shell variable BASE to point to

More information

Queue systems. and how to use Torque/Maui. Piero Calucci. Scuola Internazionale Superiore di Studi Avanzati Trieste

Queue systems. and how to use Torque/Maui. Piero Calucci. Scuola Internazionale Superiore di Studi Avanzati Trieste Queue systems and how to use Torque/Maui Piero Calucci Scuola Internazionale Superiore di Studi Avanzati Trieste March 9th 2007 Advanced School in High Performance Computing Tools for e-science Outline

More information

5/20/2007. Touring Essential Programs

5/20/2007. Touring Essential Programs Touring Essential Programs Employing fundamental utilities. Managing input and output. Using special characters in the command-line. Managing user environment. Surveying elements of a functioning system.

More information

Data Preprocessing : Next Generation Sequencing analysis CBS - DTU Next Generation Sequencing Analysis

Data Preprocessing : Next Generation Sequencing analysis CBS - DTU Next Generation Sequencing Analysis Data Preprocessing 27626: Next Generation Sequencing analysis CBS - DTU Generalized NGS analysis Data size Application Assembly: Compare Raw Pre- specific: Question Alignment / samples / Answer? reads

More information

Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny.

Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny. Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny stefano.gaiarsa@unimi.it Linux and the command line PART 1 Survival kit for the bash environment Purpose of the

More information

Whole genome assembly comparison of duplication originally described in Bailey et al

Whole genome assembly comparison of duplication originally described in Bailey et al WGAC Whole genome assembly comparison of duplication originally described in Bailey et al. 2001. Inputs species name path to FASTA sequence(s) to be processed either a directory of chromosomal FASTA files

More information

Basic UNIX commands. HORT Lab 2 Instructor: Kranthi Varala

Basic UNIX commands. HORT Lab 2 Instructor: Kranthi Varala Basic UNIX commands HORT 59000 Lab 2 Instructor: Kranthi Varala Client/Server architecture User1 User2 User3 Server (UNIX/ Web/ Database etc..) User4 High Performance Compute (HPC) cluster User1 Compute

More information

see also:

see also: ESSENTIALS OF NEXT GENERATION SEQUENCING WORKSHOP 2014 UNIVERSITY OF KENTUCKY AGTC Class 3 Genome Assembly Newbler 2.9 Most assembly programs are run in a similar manner to one another. We will use the

More information

Computing with the Moore Cluster

Computing with the Moore Cluster Computing with the Moore Cluster Edward Walter An overview of data management and job processing in the Moore compute cluster. Overview Getting access to the cluster Data management Submitting jobs (MPI

More information

PBS Pro and Ansys Examples

PBS Pro and Ansys Examples PBS Pro and Ansys Examples Introduction This document contains a number of different types of examples of using Ansys on the HPC, listed below. 1. Single-node Ansys Job 2. Single-node CFX Job 3. Single-node

More information

Image Sharpening. Practical Introduction to HPC Exercise. Instructions for Cirrus Tier-2 System

Image Sharpening. Practical Introduction to HPC Exercise. Instructions for Cirrus Tier-2 System Image Sharpening Practical Introduction to HPC Exercise Instructions for Cirrus Tier-2 System 2 1. Aims The aim of this exercise is to get you used to logging into an HPC resource, using the command line

More information

Using Sapelo2 Cluster at the GACRC

Using Sapelo2 Cluster at the GACRC Using Sapelo2 Cluster at the GACRC New User Training Workshop Georgia Advanced Computing Resource Center (GACRC) EITS/University of Georgia Zhuofei Hou zhuofei@uga.edu 1 Outline GACRC Sapelo2 Cluster Diagram

More information

Batch system usage arm euthen F azo he Z J. B T

Batch system usage arm euthen F azo he Z J. B T Batch system usage 10.11.2010 General stuff Computing wikipage: http://dvinfo.ifh.de Central email address for questions & requests: uco-zn@desy.de Data storage: AFS ( /afs/ifh.de/group/amanda/scratch/

More information

Unix/Linux Basics. Cpt S 223, Fall 2007 Copyright: Washington State University

Unix/Linux Basics. Cpt S 223, Fall 2007 Copyright: Washington State University Unix/Linux Basics 1 Some basics to remember Everything is case sensitive Eg., you can have two different files of the same name but different case in the same folder Console-driven (same as terminal )

More information

High Performance Computing (HPC) Club Training Session. Xinsheng (Shawn) Qin

High Performance Computing (HPC) Club Training Session. Xinsheng (Shawn) Qin High Performance Computing (HPC) Club Training Session Xinsheng (Shawn) Qin Outline HPC Club The Hyak Supercomputer Logging in to Hyak Basic Linux Commands Transferring Files Between Your PC and Hyak Submitting

More information

Basics. I think that the later is better.

Basics.  I think that the later is better. Basics Before we take up shell scripting, let s review some of the basic features and syntax of the shell, specifically the major shells in the sh lineage. Command Editing If you like vi, put your shell

More information

Sharpen Exercise: Using HPC resources and running parallel applications

Sharpen Exercise: Using HPC resources and running parallel applications Sharpen Exercise: Using HPC resources and running parallel applications Contents 1 Aims 2 2 Introduction 2 3 Instructions 3 3.1 Log into ARCHER frontend nodes and run commands.... 3 3.2 Download and extract

More information

and how to use TORQUE & Maui Piero Calucci

and how to use TORQUE & Maui Piero Calucci Queue and how to use & Maui Scuola Internazionale Superiore di Studi Avanzati Trieste November 2008 Advanced School in High Performance and Grid Computing Outline 1 We Are Trying to Solve 2 Using the Manager

More information

High Performance Beowulf Cluster Environment User Manual

High Performance Beowulf Cluster Environment User Manual High Performance Beowulf Cluster Environment User Manual Version 3.1c 2 This guide is intended for cluster users who want a quick introduction to the Compusys Beowulf Cluster Environment. It explains how

More information

Trimming and quality control ( )

Trimming and quality control ( ) Trimming and quality control (2015-06-03) Alexander Jueterbock, Martin Jakt PhD course: High throughput sequencing of non-model organisms Contents 1 Overview of sequence lengths 2 2 Quality control 3 3

More information

Supercomputing environment TMA4280 Introduction to Supercomputing

Supercomputing environment TMA4280 Introduction to Supercomputing Supercomputing environment TMA4280 Introduction to Supercomputing NTNU, IMF February 21. 2018 1 Supercomputing environment Supercomputers use UNIX-type operating systems. Predominantly Linux. Using a shell

More information

Introduction to HPC at MSU

Introduction to HPC at MSU Introduction to HPC at MSU CYBERINFRASTRUCTURE DAYS 2014 Oct/23/2014 Yongjun Choi choiyj@msu.edu Research Specialist, Institute for Cyber- Enabled Research Agenda Introduction to HPCC Introduction to icer

More information

Advanced Scripting Using PBS Environment Variables

Advanced Scripting Using PBS Environment Variables Advanced Scripting Using PBS Environment Variables Your job submission script has a number of environment variables that can be used to help you write some more advanced scripts. These variables can make

More information

Unix basics exercise MBV-INFX410

Unix basics exercise MBV-INFX410 Unix basics exercise MBV-INFX410 In order to start this exercise, you need to be logged in on a UNIX computer with a terminal window open on your computer. It is best if you are logged in on freebee.abel.uio.no.

More information

Exercise 1: Connecting to BW using ssh: NOTE: $ = command starts here, =means one space between words/characters.

Exercise 1: Connecting to BW using ssh: NOTE: $ = command starts here, =means one space between words/characters. Exercise 1: Connecting to BW using ssh: NOTE: $ = command starts here, =means one space between words/characters. Before you login to the Blue Waters system, make sure you have the following information

More information

By Ludovic Duvaux (27 November 2013)

By Ludovic Duvaux (27 November 2013) Array of jobs using SGE - an example using stampy, a mapping software. Running java applications on the cluster - merge sam files using the Picard tools By Ludovic Duvaux (27 November 2013) The idea ==========

More information

Introduction: What is Unix?

Introduction: What is Unix? Introduction Introduction: What is Unix? An operating system Developed at AT&T Bell Labs in the 1960 s Command Line Interpreter GUIs (Window systems) are now available Introduction: Unix vs. Linux Unix

More information

Genomics. Nolan C. Kane

Genomics. Nolan C. Kane Genomics Nolan C. Kane Nolan.Kane@Colorado.edu Course info http://nkane.weebly.com/genomics.html Emails let me know if you are not getting them! Email me at nolan.kane@colorado.edu Office hours by appointment

More information

Applying Cortex to Phase Genomes data - the recipe. Zamin Iqbal

Applying Cortex to Phase Genomes data - the recipe. Zamin Iqbal Applying Cortex to Phase 3 1000Genomes data - the recipe Zamin Iqbal (zam@well.ox.ac.uk) 21 June 2013 - version 1 Contents 1 Overview 1 2 People 1 3 What has changed since version 0 of this document? 1

More information

CSE 15L Winter Midterm :) Review

CSE 15L Winter Midterm :) Review CSE 15L Winter 2015 Midterm :) Review Makefiles Makefiles - The Overview Questions you should be able to answer What is the point of a Makefile Why don t we just compile it again? Why don t we just use

More information

Practical: a sample code

Practical: a sample code Practical: a sample code Alistair Hart Cray Exascale Research Initiative Europe 1 Aims The aim of this practical is to examine, compile and run a simple, pre-prepared OpenACC code The aims of this are:

More information

Advanced Linux Commands & Shell Scripting

Advanced Linux Commands & Shell Scripting Advanced Linux Commands & Shell Scripting Advanced Genomics & Bioinformatics Workshop James Oguya Nairobi, Kenya August, 2016 Man pages Most Linux commands are shipped with their reference manuals To view

More information

SGI Altix Running Batch Jobs With PBSPro Reiner Vogelsang SGI GmbH

SGI Altix Running Batch Jobs With PBSPro Reiner Vogelsang SGI GmbH SGI Altix Running Batch Jobs With PBSPro Reiner Vogelsang SGI GmbH reiner@sgi.com Module Objectives After completion of this module you should be able to Submit batch jobs Create job chains Monitor your

More information

Introduction to GALILEO

Introduction to GALILEO Introduction to GALILEO Parallel & production environment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Domenico Guida d.guida@cineca.it Maurizio Cremonesi m.cremonesi@cineca.it

More information

CS Unix Tools & Scripting Lecture 7 Working with Stream

CS Unix Tools & Scripting Lecture 7 Working with Stream CS2043 - Unix Tools & Scripting Lecture 7 Working with Streams Spring 2015 1 February 4, 2015 1 based on slides by Hussam Abu-Libdeh, Bruno Abrahao and David Slater over the years Announcements Course

More information

UBDA Platform User Gudie. 16 July P a g e 1

UBDA Platform User Gudie. 16 July P a g e 1 16 July 2018 P a g e 1 Revision History Version Date Prepared By Summary of Changes 1.0 Jul 16, 2018 Initial release P a g e 2 Table of Contents 1. Introduction... 4 2. Perform the test... 5 3 Job submission...

More information

Assembly of the Ariolimax dolicophallus genome with Discovar de novo. Chris Eisenhart, Robert Calef, Natasha Dudek, Gepoliano Chaves

Assembly of the Ariolimax dolicophallus genome with Discovar de novo. Chris Eisenhart, Robert Calef, Natasha Dudek, Gepoliano Chaves Assembly of the Ariolimax dolicophallus genome with Discovar de novo Chris Eisenhart, Robert Calef, Natasha Dudek, Gepoliano Chaves Overview -Introduction -Pair correction and filling -Assembly theory

More information

Using ISMLL Cluster. Tutorial Lec 5. Mohsan Jameel, Information Systems and Machine Learning Lab, University of Hildesheim

Using ISMLL Cluster. Tutorial Lec 5. Mohsan Jameel, Information Systems and Machine Learning Lab, University of Hildesheim Using ISMLL Cluster Tutorial Lec 5 1 Agenda Hardware Useful command Submitting job 2 Computing Cluster http://www.admin-magazine.com/hpc/articles/building-an-hpc-cluster Any problem or query regarding

More information

Martinos Center Compute Cluster

Martinos Center Compute Cluster Why-N-How: Intro to Launchpad 8 September 2016 Lee Tirrell Laboratory for Computational Neuroimaging Adapted from slides by Jon Kaiser 1. Intro 2. Using launchpad 3. Summary 4. Appendix: Miscellaneous

More information

New User Tutorial. OSU High Performance Computing Center

New User Tutorial. OSU High Performance Computing Center New User Tutorial OSU High Performance Computing Center TABLE OF CONTENTS Logging In... 3-5 Windows... 3-4 Linux... 4 Mac... 4-5 Changing Password... 5 Using Linux Commands... 6 File Systems... 7 File

More information

NGS Data Analysis. Roberto Preste

NGS Data Analysis. Roberto Preste NGS Data Analysis Roberto Preste 1 Useful info http://bit.ly/2r1y2dr Contacts: roberto.preste@gmail.com Slides: http://bit.ly/ngs-data 2 NGS data analysis Overview 3 NGS Data Analysis: the basic idea http://bit.ly/2r1y2dr

More information

Pipelines! CTB 6/15/13

Pipelines! CTB 6/15/13 Pipelines! CTB 6/15/13 A pipeline view of the world Sequence E. coli 2x110 Remove adapters Discard/trim low quality Assemble Genome! Each computa@onal step is one or more commands Sequence E. coli 2x110

More information

Working on the NewRiver Cluster

Working on the NewRiver Cluster Working on the NewRiver Cluster CMDA3634: Computer Science Foundations for Computational Modeling and Data Analytics 22 February 2018 NewRiver is a computing cluster provided by Virginia Tech s Advanced

More information

About this course 1 Recommended chapters... 1 A note about solutions... 2

About this course 1 Recommended chapters... 1 A note about solutions... 2 Contents About this course 1 Recommended chapters.............................................. 1 A note about solutions............................................... 2 Exercises 2 Your first script (recommended).........................................

More information

Knights Landing production environment on MARCONI

Knights Landing production environment on MARCONI Knights Landing production environment on MARCONI Alessandro Marani - a.marani@cineca.it March 20th, 2017 Agenda In this presentation, we will discuss - How we interact with KNL environment on MARCONI

More information

Quality assessment of NGS data

Quality assessment of NGS data Quality assessment of NGS data Ines de Santiago July 27, 2015 Contents 1 Introduction 1 2 Checking read quality with FASTQC 1 3 Preprocessing with FASTX-Toolkit 2 3.1 Preprocessing with FASTX-Toolkit:

More information

ITST Searching, Extracting & Archiving Data

ITST Searching, Extracting & Archiving Data ITST 1136 - Searching, Extracting & Archiving Data Name: Step 1 Sign into a Pi UN = pi PW = raspberry Step 2 - Grep - One of the most useful and versatile commands in a Linux terminal environment is the

More information

Genomic Files. University of Massachusetts Medical School. October, 2015

Genomic Files. University of Massachusetts Medical School. October, 2015 .. Genomic Files University of Massachusetts Medical School October, 2015 2 / 55. A Typical Deep-Sequencing Workflow Samples Fastq Files Fastq Files Sam / Bam Files Various files Deep Sequencing Further

More information

PBS Pro Documentation

PBS Pro Documentation Introduction Most jobs will require greater resources than are available on individual nodes. All jobs must be scheduled via the batch job system. The batch job system in use is PBS Pro. Jobs are submitted

More information