A Hands-On Tutorial: RNA Sequencing Using High-Performance Computing

Similar documents
Using ISMLL Cluster. Tutorial Lec 5. Mohsan Jameel, Information Systems and Machine Learning Lab, University of Hildesheim

Joint High Performance Computing Exchange (JHPCE) Cluster Orientation.

Unix/Linux Basics. Cpt S 223, Fall 2007 Copyright: Washington State University

An Introduction to Cluster Computing Using Newton

Using the computational resources at the GACRC

NBIC TechTrack PBS Tutorial

High Performance Computing (HPC) Using zcluster at GACRC

New User Tutorial. OSU High Performance Computing Center

Session 1: Accessing MUGrid and Command Line Basics

UoW HPC Quick Start. Information Technology Services University of Wollongong. ( Last updated on October 10, 2011)

Introduction: What is Unix?

CENG 334 Computer Networks. Laboratory I Linux Tutorial

Introduction to HPC Using zcluster at GACRC

Shark Cluster Overview

NBIC TechTrack PBS Tutorial. by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen

Computational Skills Primer. Lecture 2

Introduction to HPC Using zcluster at GACRC

Basic UNIX commands. HORT Lab 2 Instructor: Kranthi Varala

Introduction to HPC Using zcluster at GACRC

Name Department/Research Area Have you used the Linux command line?

Migrating from Zcluster to Sapelo

CISC 220 fall 2011, set 1: Linux basics

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing

Shark Cluster Overview

CS CS Tutorial 2 2 Winter 2018

High Performance Computing (HPC) Club Training Session. Xinsheng (Shawn) Qin

Contents. Note: pay attention to where you are. Note: Plaintext version. Note: pay attention to where you are... 1 Note: Plaintext version...

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing

Introduction of Linux

Grid Engine Users Guide. 5.5 Edition

Using the MaRC2 HPC Cluster

SGE Roll: Users Guide. Version Edition

Batch Systems. Running calculations on HPC resources

The Supercomputing Facility for Bioinformatics & Computational Biology, IIT Delhi

Exercise 1: Basic Tools

Supercomputing environment TMA4280 Introduction to Supercomputing

GNU/Linux 101. Casey McLaughlin. Research Computing Center Spring Workshop Series 2018

bwunicluster Tutorial Access, Data Transfer, Compiling, Modulefiles, Batch Jobs

Introduction to Computing V - Linux and High-Performance Computing

The cluster system. Introduction 22th February Jan Saalbach Scientific Computing Group

Introduction to the Linux Command Line

For Dr Landau s PHYS8602 course

Introduction to HPC Resources and Linux

Working with Shell Scripting. Daniel Balagué

Introduction to HPC Using zcluster at GACRC On-Class GENE 4220

SGE Roll: Users Guide. Version 5.3 Edition

Introduction to Linux for BlueBEAR. January

Unix/Linux Operating System. Introduction to Computational Statistics STAT 598G, Fall 2011

Computer Systems and Architecture

History. Terminology. Opening a Terminal. Introduction to the Unix command line GNOME

Lezione 8. Shell command language Introduction. Sommario. Bioinformatica. Mauro Ceccanti e Alberto Paoluzzi

Introduction in Unix. Linus Torvalds Ken Thompson & Dennis Ritchie

Introduction to Unix The Windows User perspective. Wes Frisby Kyle Horne Todd Johansen

Introduction to GALILEO

CS370 Operating Systems

Introduction to HPC Using zcluster at GACRC

Introduction to Linux

Introduction to Linux

Quick Guide for the Torque Cluster Manager

Using Sapelo2 Cluster at the GACRC

Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU

Lezione 8. Shell command language Introduction. Sommario. Bioinformatica. Esercitazione Introduzione al linguaggio di shell

bwunicluster Tutorial Access, Data Transfer, Compiling, Modulefiles, Batch Jobs

Introduction to UNIX. Logging in. Basic System Architecture 10/7/10. most systems have graphical login on Linux machines

National Biochemical Computational Research Familiarize yourself with the account policy

Cluster User Training

Working with Basic Linux. Daniel Balagué

The Unix Shell & Shell Scripts

Chap2: Operating-System Structures

Introduction to Linux Workshop 1

Introduction To. Barry Grant

Kohinoor queuing document

Introduction to UNIX. SURF Research Boot Camp April Jeroen Engelberts Consultant Supercomputing

When talking about how to launch commands and other things that is to be typed into the terminal, the following syntax is used:

Systems Programming and Computer Architecture ( ) Exercise Session 01 Data Lab

Read mapping with BWA and BOWTIE

Computer Systems and Architecture

Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny.

Introduction to Linux Part 1. Anita Orendt and Wim Cardoen Center for High Performance Computing 24 May 2017

Introduction to Linux. Woo-Yeong Jeong Computer Systems Laboratory Sungkyunkwan University

Grid Engine Users Guide. 7.0 Edition

Parallel Programming Pre-Assignment. Setting up the Software Environment

Getting Started with UNIX

CS 261 Recitation 1 Compiling C on UNIX

MERCED CLUSTER BASICS Multi-Environment Research Computer for Exploration and Discovery A Centerpiece for Computational Science at UC Merced

Logging in to the CRAY

Cluster Clonetroop: HowTo 2014

Linux Tutorial. Ken-ichi Nomura. 3 rd Magics Materials Software Workshop. Gaithersburg Marriott Washingtonian Center November 11-13, 2018

CSE Linux VM. For Microsoft Windows. Based on opensuse Leap 42.2

Sharpen Exercise: Using HPC resources and running parallel applications

EE516: Embedded Software Project 1. Setting Up Environment for Projects

Submit a Job. Want to run a batch script: #!/bin/sh echo Starting job date /usr/bin/time./hello date echo Ending job. qsub A HPC job.

Introduction to Linux Environment. Yun-Wen Chen

Introduction to Linux

ITCS 4145/5145 Assignment 2

AMS 200: Working on Linux/Unix Machines

ACEnet for CS6702 Ross Dickson, Computational Research Consultant 29 Sep 2009

Unix Workshop Aug 2014

High Performance Computing Cluster Basic course

HPC Introductory Course - Exercises

Transcription:

A Hands-On Tutorial: RNA Sequencing Using Computing February 11th and 12th, 2016 1st session (Thursday) Preliminaries: Linux, HPC, command line interface Using HPC: modules, queuing system Presented by: Ali Siavosh, Eric Peskin, Loren Koenig 2nd session (Friday) RNA sequencing: basic concepts and data preparation, using HPC to analyze the data Presented by: Adriana Heguy, Stuart Brown 1

HPC ( Computing)? generally refers to the practice of aggregating computing power in a way that delivers much higher performance than what one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering and business. HPC Linux Cluster Head Node Phoenix: Head node: 32 cores,128 GB RAM. 70 Compute worker nodes: Each Compute node: 32 cores,128 GB RAM. Compute Nodes jobs Researchers 3

login ssh demo@phoenix.med.nyu.edu ssh Keys logout 4

Linux Operating System Graphical User Interface (GUI) vs. Command Line Interface (CLI): echo -n Hello world! separated by spaces: options/switches other arguments pwd / /ifs/home/siavoa01 man ls name of the command Commands: ls your command bin/ ifs/ data/ home/ root directory, denoted by a single slash...... tutorial/... user01/...working directory absolute path: project1/ /ifs/home/user01 5

Making Directory: mkdir project1 mkdir project2 List of contents in a directory: ls ls -l Remove a directory: rmdir project2 Change to a directory: cd project1 cd.. Editor: nano script1 This is my first script Remove file/directory: ^o ^x rm script1 nano script2 6

cp script1 project1/script2 mv script1 project1/. cd project1 cp script1 script2 echo Hello World! date nano script2 #!/bin/bash echo Hello World date cat script2 chmod +x script2./script2 7

File transfer Command line (CLI): scp myfile userid@phoenix.med.nyu.edu: scp userid@phoenix.med.nyu.edu:project1/script2. Graphical User Interface (GUI): WinSCP 8

Environment Modules Some programs need extensive setup of environment variables. We have multiple versions of some programs installed. Solution: Environment Modules make sure that environment variables are set up for: Software and version you want to use. Any prerequisite software. No conflicting software. 10

Environment Modules: Commands module avail List available modules. List currently loaded modules. module list module load name Load named module. module unload name module help name See help on named module. module show name See what module will do to environment. man module Get help on module command itself. Unload named module. 11

Environment Modules: Versions and Conflicts bedtools --version module avail bedtools module load bedtools/2.15.0 bedtools --version module load bedtools/2.17.0 module unload bedtools/2.15.0 module load bedtools/2.17.0 bedtools --version echo $BEDTOOLS_ROOT ls $BEDTOOLS_ROOT 12

Environment Modules: Prerequisites module list module load macs/1.4.2 module list module unload macs module list module load macs/2.0.10 module list module unload macs module list 13

SGE: The Batch System HPC Linux Cluster Researchers Head Node ssh Compute Nodes jobs Cluster has many compute nodes and many users. Need to spread out work among the compute nodes. Rather than having users hunt for node that is not too busy... let the batch system do it! We use a batch system called Sun Grid Engine (SGE). 14

SGE: Cluster Usage Users interact with head node of the cluster. Editing, compiling, file manipulation, etc. From the head node, users submit jobs to batch system. Submission includes shell script with commands to run. Batch system runs jobs on compute nodes. Users collect output in files viewable on the head node. 15

Example Script Use nano to create a file with the following contents, and save it as simple.bash #!/bin/bash date sleep 20 date 16

Running Script Make script executable. chmod +x simple.bash Run script./simple.bash Normally we would not run directly on the head node. Don't try this at home! 17

SGE: Submit Job Submit job qsub -S /bin/bash -cwd simple.bash Note job name and job number. Monitor status (try the following a few times quickly): qstat Watch progression from qw (waiting) to r (running) to not listed (finished). 18

SGE: Directives Could put SGE directives in script itself #!/bin/bash #$ -S /bin/bash #$ -cwd date sleep 20 date 19

SGE: Output and Error Files When job is finished, look for output files. ls -ltr Standard output redirected to jobname.ojobnumber Standard error redirected to jobname.ejobnumber Check size of.e file. Read output (substitute your job number below): cat simple.bash.ojobnumber 20

SGE: Useful Directives -cwd Run script in Current Working Directory (where you submit). -S /bin/bash Use this shell to interpret script. -M first.last@nyumc.org Send status updates to this address. -m when (when = any combination of following, e.g. ae) a = abort Mail if disaster strikes. b = begin Mail when job starts. e = end Mail when job completes. 21

SGE: Useful Commands qsub Submit job. qstat Check status of running jobs or queues. qacct Get accounting information on finished jobs. qdel Delete (kill) running job. qlogin Start interactive shell on compute node. Use sparingly and remember to logout when finished. 22

Environment Modules in SGE Scripts Use needed modules in your SGE scripts #!/bin/bash #$ -S /bin/bash #$ -cwd module load samtools/0.1.19 module load bwa/0.7.3a samtools... bwa... 23

Multi-threaded Jobs Several applications can use multiple threads: a form of shared-memory parallelism uses multiple CPU cores within one node Use SGE threaded parallel environment. 24

Threaded Environment Examples qsub -pe threaded 4 script.sh waits until 4 cores available on one node starts on that node, claiming 4 slots qsub -pe threaded 6-13 script.sh waits until at least 6 cores on one node starts on node with most available (up to 13) claims as many as available on that node (up to 13) 25

How Many Did I Get? SGE defines NSLOTS variable to tell you: #!/bin/bash #$ -S /bin/bash #$ -cwd module load cufflinks/2.1.1 cufflinks -p $NSLOTS... 26

Threads: Best Practices Read application documentation carefully: Are multiple threads supported? Do they happen by default? What are the tradeoffs? What flag controls them? Don t hard-code number of threads. use $NSLOTS 27

Variables What was that $BEDTOOLS_ROOT (or $NSLOTS) anyway? A variable: A name associated with value Can be defined by: System Environment modules SGE You! 28

Variables: User defined Environment variable is a name associated with a value. To access value, put dollar sign in front of name. Try: NAME=Bob echo $NAME NAME=Alice echo $NAME 29

Variables in script #!/bin/bash #$ -S /bin/bash #$ -cwd dir=/some/really/long/path mkdir $dir cp file.txt $dir/ 30

Arguments in script #!/bin/bash #$ -S /bin/bash #$ -cwd dir=$1 mkdir $dir cp file.txt $dir/ qsub script.sh /some/really/long/path 31

Multiple Arguments in script #!/bin/bash #$ -S /bin/bash #$ -cwd dir=$1 file=$2 mkdir $dir cp $file $dir/ qsub script.sh /some/really/long/path file1.txt 32

Exercises Write and submit the following SGE scripts: date/sleep/date script, but with sleep time as argument script that prints value of $NSLOTS Submit it with threaded parallel environment Use range of slots min-max Watch it with qstat: hint: add a sleep command, so you can see it in qstat for a while Check output file 33

Further Reading and Help User guide: https://genome.med.nyu.edu/hpcf/wiki Our past (longer) tutorials at http://www.med.nyu.edu/chibi/education/tutorials Linux for Cluster Computing: Two tutorials Cornell tutorial: https://cvw.cac.cornell.edu/linux/ Kernighan and Pike. The Unix Programming Environment Ask us: hpc_admins@nyumc.org For problems, please include command you typed and the messages that resulted (copy paste session), full path to scripts and output files. 34