Introduction to High-Performance Computing (HPC)

Similar documents
Introduction to High-Performance Computing (HPC)

Introduction to High-Performance Computing (HPC)

Slurm basics. Summer Kickstart June slide 1 of 49

High Performance Computing Cluster Basic course

High Performance Computing Cluster Advanced course

Sherlock for IBIIS. William Law Stanford Research Computing

Introduction to SLURM & SLURM batch scripts

Introduction to SLURM & SLURM batch scripts

Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 FAS Research Computing

Submitting batch jobs

Graham vs legacy systems

Choosing Resources Wisely. What is Research Computing?

Duke Compute Cluster Workshop. 3/28/2018 Tom Milledge rc.duke.edu

HPC Introductory Course - Exercises

Introduction to the NCAR HPC Systems. 25 May 2018 Consulting Services Group Brian Vanderwende

Before We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop

Using a Linux System 6

Submitting and running jobs on PlaFRIM2 Redouane Bouchouirbat

Introduction to the Cluster

Introduction to SLURM & SLURM batch scripts

How to access Geyser and Caldera from Cheyenne. 19 December 2017 Consulting Services Group Brian Vanderwende

Duke Compute Cluster Workshop. 11/10/2016 Tom Milledge h:ps://rc.duke.edu/

Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU

CRUK cluster practical sessions (SLURM) Part I processes & scripts

Batch Usage on JURECA Introduction to Slurm. May 2016 Chrysovalantis Paschoulas HPS JSC

Slurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT October 23, 2012

Slurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT November 13, 2013

The cluster system. Introduction 22th February Jan Saalbach Scientific Computing Group

Introduction to RCC. September 14, 2016 Research Computing Center

Introduction to RCC. January 18, 2017 Research Computing Center

Training day SLURM cluster. Context. Context renewal strategy

Exercises: Abel/Colossus and SLURM

Duke Compute Cluster Workshop. 10/04/2018 Tom Milledge rc.duke.edu

How to Use a Supercomputer - A Boot Camp

RHRK-Seminar. High Performance Computing with the Cluster Elwetritsch - II. Course instructor : Dr. Josef Schüle, RHRK

Training day SLURM cluster. Context Infrastructure Environment Software usage Help section SLURM TP For further with SLURM Best practices Support TP

Bash for SLURM. Author: Wesley Schaal Pharmaceutical Bioinformatics, Uppsala University

Introduction to High Performance Computing Using Sapelo2 at GACRC

Introduction to HPC Using zcluster at GACRC

Submitting batch jobs Slurm on ecgate Solutions to the practicals

How to run a job on a Cluster?

For Dr Landau s PHYS8602 course

Introduction to High Performance Computing and an Statistical Genetics Application on the Janus Supercomputer. Purpose

CNAG Advanced User Training

Introduction to GACRC Teaching Cluster PHYS8602

Using Sapelo2 Cluster at the GACRC

An Introduction to Gauss. Paul D. Baines University of California, Davis November 20 th 2012

Introduction to UBELIX

Scientific Computing in practice

Submitting batch jobs Slurm on ecgate

Introduction to HPC Using zcluster at GACRC On-Class GENE 4220

High Performance Computing (HPC) Using zcluster at GACRC

Name Department/Research Area Have you used the Linux command line?

Introduction to HPC Using zcluster at GACRC

Introduction to GACRC Teaching Cluster

Heterogeneous Job Support

Introduction to the Cluster

Introduction to GACRC Teaching Cluster

Introduction to High Performance Computing (HPC) Resources at GACRC

Introduction to HPC Using zcluster at GACRC

Slurm at UPPMAX. How to submit jobs with our queueing system. Jessica Nettelblad sysadmin at UPPMAX

HPC Workshop. Nov. 9, 2018 James Coyle, PhD Dir. Of High Perf. Computing

XSEDE New User Training. Ritu Arora November 14, 2014

OBTAINING AN ACCOUNT:

Using Compute Canada. Masao Fujinaga Information Services and Technology University of Alberta

Introduction to High Performance Computing at Case Western Reserve University. KSL Data Center

New User Seminar: Part 2 (best practices)

Introduction to HPC Using the New Cluster at GACRC

Introduction to High Performance Computing at UEA. Chris Collins Head of Research and Specialist Computing ITCS

New High Performance Computing Cluster For Large Scale Multi-omics Data Analysis. 28 February 2018 (Wed) 2:30pm 3:30pm Seminar Room 1A, G/F

Introduction to HPC2N

Intel Manycore Testing Lab (MTL) - Linux Getting Started Guide

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine

Using Cartesius and Lisa. Zheng Meyer-Zhao - Consultant Clustercomputing

Slurm Version Overview

Introduction to GALILEO

LaPalma User's Guide

Using the IBM Opteron 1350 at OSC. October 19-20, 2010

Versions and 14.11

TITANI CLUSTER USER MANUAL V.1.3

Slurm at the George Washington University Tim Wickberg - Slurm User Group Meeting 2015

Slurm at UPPMAX. How to submit jobs with our queueing system. Jessica Nettelblad sysadmin at UPPMAX

Using the Yale HPC Clusters

Introduction to NCAR HPC. 25 May 2017 Consulting Services Group Brian Vanderwende

Introduction to BioHPC

Compiling applications for the Cray XC

Introduction to SLURM on the High Performance Cluster at the Center for Computational Research

UMass High Performance Computing Center

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Slurm Overview. Brian Christiansen, Marshall Garey, Isaac Hartung SchedMD SC17. Copyright 2017 SchedMD LLC

Introduction to PICO Parallel & Production Enviroment

Triton file systems - an introduction. slide 1 of 28

INTRODUCTION TO THE CLUSTER

Using the SLURM Job Scheduler

Working with Shell Scripting. Daniel Balagué

Minnesota Supercomputing Institute Regents of the University of Minnesota. All rights reserved.

SCALABLE HYBRID PROTOTYPE

Kimmo Mattila Ari-Matti Sarén. CSC Bioweek Computing intensive bioinformatics analysis on Taito

Introduction to Discovery.

HPCC New User Training

Transcription:

Introduction to High-Performance Computing (HPC)

Computer components CPU : Central Processing Unit cores : individual processing units within a CPU Storage : Disk drives HDD : Hard Disk Drive SSD : Solid State Drive Memory : small amount of volatile or temporary information storage

Computer components (my Macbook Pro) Model Name: MacBook Pro Number of Processors: 1 Total Number of Cores: 4 Memory: 16 GB Data storage: 512 GB

High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business. http://insidehpc.com/hpc-basic-training/what-is-hpc/

Computer resources required for NGS Data Analysis 100s of cores for processing! 100s of Gigabytes or even Petabytes of storage! 100s of Gigabytes of memory!

High-Performance Computing Provides all the resources to run the desired Omics analysis in one place. Provides software that is unavailable or unusable on your computer/local system

HPC cluster structure

HPC cluster components Nodes: Individual computers in the cluster Cores (threads): individual processing units available within each CPU of each Node e.g. a Node with eight quad -core CPUs = 32 cores for that node. Shared disk: storage that can be shared (and accessed) by all nodes

Parallel Computing Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently ("in parallel )." http://en.wikipedia.org/wiki/parallel_computing

High-Performance Computing For 1 sample Serial Multithreaded input input CPU core CPU core CPU core CPU core CPU core CPU core CPU core CPU core CPU core output output Faster and more efficient NGS data analysis is very amenable to this strategy

High-Performance Computing For 3 samples Serial Multithreaded & Serial Multithreaded and Parallel

HPC Cluster multi-user, shared resource lots of nodes = lots of processing capacity + lots of memory a system like this requires constant maintenance and upkeep, and there is an associated cost Wiki page: hmsrc.me/o2docs

Introduction to High Performance Computing and O2 for New Users HMS Research Computing (Slides courtesy of Kris Holton at HMS-RC) Research Computing https://rc.hms.harvard.edu/ 13

Welcome to O2! HMS Research Computing s second High- Performance Compute cluster to enhance the compute capacity available to HMS Researchers Homogeneous environment of newer, faster cores with high memory allocation to facilitate multi-core and parallelized workflows SLURM scheduler to efficiently dispatch jobs Research Computing https://rc.hms.harvard.edu/ 14

O2 Tech Specs 5000 homogeneous cores 16 x 2 (32) cores per node 256GB RAM (memory) per node (8GB/core) CentOS 7 Linux SLURM job scheduler Research Computing https://rc.hms.harvard.edu/ 15

Using O2!

1. Logging in to remote machines (securely) When logging in we used the ssh command, ssh stands for Secure SHell ssh is a protocol for data transfer that is secure, i.e the data is encrypted as it travels between your computer and the cluster (remote computer) Commonly used commands that use the ssh protocol for data transfer are, scp and sftp

Logging Into Orchestra 2 Open a terminal ssh YourECommons@o2.hms.harvard.edu Research Computing https://rc.hms.harvard.edu/ 18

Welcome to O2! Where are you in O2? mfk8@login01: ~$ You are logged into a shell login server, login01-05.these are not meant for heavy lifting! mfk8@login01: ~$ pwd You are in your home directory. This is symbolized by the tilde (~). This is shorthand for: /home/ecommons Research Computing https://rc.hms.harvard.edu/ 19

Interactive Sessions The login servers are not designed to handle intensive processes, and CPU usage is throttled. Start by entering your first job! This will (usually) log you into a compute node! mfk8@login01:~$ srun --pty p interactive t 0-12:00 -mem 8G bash srun -pty is how interactives are started -p interactive is the partition -t 0-12:00 is the time limit (12 hours) --mem 8G is the memory requested mfk8@compute-a:~/$ Research Computing https://rc.hms.harvard.edu/ 20

2. Using & installing software

LMOD: Software Modules Most software on Orchestra2 is installed as an environment module. LMOD system adds directory paths of software into $PATH variable, to make sure the program runs without any issues. Allows for clean, easy loading, including most dependencies, and switching versions. Research Computing https://rc.hms.harvard.edu/ 22

LMOD: Software Modules Most software is compiled against gcc-6.2.0 so load this first $ module load gcc/6.2.0 $ module avail #to see software now available $ module spider #verbose software currently available Research Computing https://rc.hms.harvard.edu/ 23

Loading/Unloading Modules Loading modules $ module load bowtie2/2.2.9 Which module version is loaded (if at all)? $ which bowtie2 Need help with the module? $ module help bowtie2/2.2.9 Unloading modules $ module unload bowtie/2.2.9 Dump all modules $ module purge Research Computing https://rc.hms.harvard.edu/ 24

3. The Job Scheduler, SLURM

Submitting Jobs In an interactive session, programs can be called directly. mfk8@compute-a:~$ bowtie n 4 hg19 file1_1.fq file1_2.fq What if you wanted to run the program and come back later to check on it? a program is submitted to O2 via a job (sbatch) mfk8@compute-a:~$ sbatch mybowtiejob.sh Research Computing https://rc.hms.harvard.edu/ 26

Simple Linux Utility for Resource Management (SLURM) Fairly allocates access to resources (computer nodes) to users for some duration of time so they can perform work Provides a framework for starting, executing, and monitoring batch jobs Manages a queue of pending jobs; ensures that no single user or core monopolizes the cluster

Choosing the proper resources for your job with the appropriate SBATCH options

The sbatch" $ sbatch p short t 0-1:00 --wrap= cp file.txt.. Necessary to specify: - p (partition) - t 0-1:00 (time) - - wrap (write it all in one line) Recommended: write a job submission script $ sbatch completeslurmjob.run #!/bin/bash #SBATCH p short #SBATCH t 0-1:00 cp file.txt.. Research Computing https://rc.hms.harvard.edu/ 29

sbatch options #SBATCH p #partition #SBATCH t 0-01:00 #time days-hr:min #SBATCH n X #number of cores #SBATCH N 1 #confine cores to 1 node, default #SBATCH --mem=xg #memory per job (all cores), GB #SBATCH -J name_of_job (default = name of job script) #SBATCH o %j.out #out file #SBATCH e %j.err #error file #SBATCH --mail-type=begin/end/fail/all #SBATCH -mail-user=mfk8@med.harvard.edu Research Computing https://rc.hms.harvard.edu/ 30

Partitions -p Partition Priority Max Runtime Max Cores Limits short 10 12 hours 20 medium 8 5 days 20 long 6 30 days 20 interactive 14 12 hours 20 2 job limit priority 14 30 days 20 2 job limit mpi 12 5 days 640 20 core min transfer 30 days 4 Research Computing https://rc.hms.harvard.edu/ 31

Runtime: -t -t days-hours:minutes -t hours:minutes:seconds Need to specify how long you estimate your job will run for Aim for 125% over Subject to maximum per partition Excessive runlimits (like partition max) take longer to dispatch, and affect fairshare Research Computing https://rc.hms.harvard.edu/ 32

Cores: -n -n X to designate cores: max 20 -N X to constrain all cores to X nodes CPU time: wall time (-t) * (-n) cores used Unable to use cores not requested (no overefficient jobs): cgroups constraint Adding more cores does not mean jobs will scale linearly with time, and causes longer pend times Research Computing https://rc.hms.harvard.edu/ 33

Memory: --mem Only 1G is allocated by default --mem XG #total memory over all cores --mem-per-cpu XG #total memory per CPU requested, use for MPI No unit request (G) defaults to Megabytes º 8G ~= 8000 Research Computing https://rc.hms.harvard.edu/ 34

Job submission scripts

Creating a job submission script #! /bin/sh #Always at the top of the script #SBATCH -p short #SBATCH n 4 #SBATCH --mem=8g #SBATCH o %j.out #SBATCH e %j.err #SBATCH -J bowtie2_run1 #SBATCH --mail-type=all #SBATCH -mail-user=mfk8@med.harvard.edu module load seq/bowtie2/2.2.9 bowtie -n 4 hg19 file1_1.fq file1_2.fq Save as myjobscript.run Run as $ sbatch myjobscript.run **O2 will notify you when the job is done, or if there is an error Research Computing https://rc.hms.harvard.edu/ 36

Job Priority Dynamically assigned Factors contributing: Age, Fairshare, Partition, QOS, Nice Fairshare: 0-1 scale Research Computing https://rc.hms.harvard.edu/ 37

Managing jobs and getting information about submitted/running jobs

Job Monitoring $ squeue u ecommons t RUNNING/ PENDING $ squeue u ecommons p partition $ squeue u ecommons --start Detailed job info: $ scontrol show jobid <jobid> Completed job statistics: $ sacct -j <jobid> -- format=jobid,jobname,maxrss,elapsed Research Computing https://rc.hms.harvard.edu/ 39

Cancelling/Pausing Jobs $ scancel <jobid> $ scancel t PENDING $ scancel --name JOBNAME $ scancel jobid_[indices] #array indices $ scontrol hold <jobid> #pause pending jobs $ scontrol release <jobid> #resume $ scontrol requeue <jobid> #cancel and rerun Research Computing https://rc.hms.harvard.edu/ 40

4. Filesystems and storage

Filesystems and storage

Filesystems and storage Storage on HPC systems is organized differently than on your personal machine Physical disks are bundled together into a virtual volume; this volume may represent a single filesystem, or may be divided up, or partitioned, into multiple filesystems Filesystems are accessed over the internal network

O2 Primary Storage O2 Cluster 5000+ cores SLURM batch system /home /home/user_id quota: 100GB per user Backup: extra copy & snapshots: daily to 14 days, weekly up to 60 days Your computer /n/data1, /n/data2, /n/groups /n/data1/institution/dept/lab/ your_dir quota: expandable Backup: extra copy & snapshots: daily to 14 days, weekly up to 60 days Research Computing https://rc.hms.harvard.edu/ 44

Temporary Scratch storage /n/scratch2 For data only needed temporarily during analyses. Each account can use up to 10 TB and 1 million files/directories º Lustre --> a high-performance parallel file system running on DDN Storage. º More than 1 PB of total shared disk space. º No backups! Files are automatically deleted after unaccessed for 30 days, to save space. º More info at: http://hmsrc.me/o2docs Research Computing https://rc.hms.harvard.edu/ 45

For more direction http://hmsrc.me/o2docs http://rc.hms.harvard.edu RC Office Hours: Wed 1-3p Gordon Hall 500 rchelp@hms.harvard.edu Research Computing https://rc.hms.harvard.edu/ 46