Choosing Resources Wisely. What is Research Computing?

Size: px

Start display at page:

Download "Choosing Resources Wisely. What is Research Computing?"

Alexander Sparks
5 years ago
Views:

1 Choosing Resources Wisely Scott Yockel, PhD Harvard - Research Computing What is Research Computing? Faculty of Arts and Sciences (FAS) department that handles nonenterprise IT requests from researchers. (Contact HUIT for most Desktop, Laptop, networking, printing, and issues.) RC Primary Services: Odyssey Supercomputing Environment Lab Storage Instrument Computing Support Hosted Machines (virtual or physical) RC Staff: 20 staff with backgrounds ranging from systems administration to development-operations to Ph.D. research scientists. Supporting 600 research groups and users across FAS, SEAS, HSPH, HBS, GSE. For bio-informatics researchers the Harvard Informatics group is closely tied to RC and is there to support the specific problems for that domain. 2 spring-2017/ 1

FAS Research Computing https://rc.fas.harvard.

March 2nd 11:00AM 12:00PM NWL 426 FAS Research Computing will be offering a Spring Training series beginning February 2nd.

In addition to training sessions, FASRC has a large offering of self-help documentation at https://rc.fas.harvard.edu.

edu/office-hours For other questions or issues, please submit a ticket on the FASRC Portal https://portal.rc.fas.harvard.

2 FAS Research Computing Intro to Odyssey Thursday, February 2nd 11:00AM 12:00PM NWL 426 Intro to Unix Thursday, February 16th 11:00AM 12:00PM NWL 426 Extended Unix Thursday, March 2nd 11:00AM 12:00PM NWL 426 FAS Research Computing will be offering a Spring Training series beginning February 2nd. This series will include topics ranging from our Intro to Odyssey training to more advanced job and software topics. In addition to training sessions, FASRC has a large offering of self-help documentation at We also hold office hours every Wednesday from 12:00PM-3:00PM at 38 Oxford, Room For other questions or issues, please submit a ticket on the FASRC Portal Or, for shorter questions, chat with us on Odybot Modules and Software Thursday, March 16th 11:00AM 12:00PM NWL 426 Choosing Resources Wisely Thursday, March 30th 11:00AM 12:00PM NWL 426 Troubleshooting Jobs Thursday, April 6th 11:00AM 12:00PM NWL 426 Parallel Job Workflows on Odyssey Thursday, April 20th 11:00AM 12:00PM NWL 426 Registration not required limited seating spring-2017/ 2

3 Objectives What computational resources are available? How much resources (cores, memory, storage, time) do I need? Provide guidance for scaling up your applications and performing computations more efficiently More efficient use = more resources available to do research 5 Odyssey Components Compute: 60,000+ compute cores Cores/node: 8 to 64 Memory/node: 12GB to 512GB (4GB/core) 1,000,000+ NVIDIA GPU cores Software: Operating System CentOS 6 SLURM job manager 1,000+ scientific tools and programs Interconnect: 2 underlying networks connecting 3 data centers TCP/IP network Low-latency 56 GB/s InfiniBand network: inter-node parallel computing fast access to Lustre mounted storage 6 spring-2017/ 3

4 Mount Point Home Directories /n/home#/ $USER Storage Grid Lab Storage Local Scratch Global Scratch Persistent Research Data /n/pi_lab /scratch /n/regal /n/holylfs Size Limit 100GB 4TB+ 270GB/node 1.2PB total 3PB Availability Backup All cluster nodes + Desktop/ laptop Hourly snapshot + Daily Offsite All cluster nodes + Desktop/laptop Local compute node only. All cluster nodes Daily Offsite No backup No backup Only IB connected cluster nodes External Repos No backup Retention Policy Indefinite Indefinite Job duration 90 days 3-9 mo Performance Moderate. Not suitable for high I/O Moderate. Not suitable for high I/O Suited for small file I/O intensive jobs Appropriate for large file I/ O intensive jobs Appropriate for large I/O intensive jobs Cost Free 4TB Free + Expansion at $45/TB/yr Free Free Free 7 What resources are needed? How big is the Input / Output Data for each run? How is the input data read by the code (e.g., hardcoded, keyboard, parameter/data file(s), external database/website, etc.)? How is the output data written by the code (standard output/screen, data file(s), etc.)? How often does the code write data to file? How many tasks/jobs/runs do I need to complete? Is my software code serial (single core) or parallel? What is my timeframe / deadline for the project (e.g., paper, conference, thesis, etc.)? How long does it take a run to complete? 8 spring-2017/ 4

Basic SLURM commands: sbatch: submit a batch job script srun: submit an interactive test job squeue: contact slurmctld for currently running jobs sacct: contact slurmdb for accounting stats after job

5 What is SLURM? Simple Linux Utility for Resource Management User tasks (jobs) on the cluster are containerized so that users cannot interfere with other jobs or exceed their resource request (cores, memory, time) Basic SLURM commands: sbatch: submit a batch job script srun: submit an interactive test job squeue: contact slurmctld for currently running jobs sacct: contact slurmdb for accounting stats after job ends scancel: cancel a job(s) 9 SLURM Scheduler Partitions: general serial_requeue interact bigmem unrestricted pi_lab Time Limit 7 days 7 days 3 days no limit no limit no limit # Nodes # Cores / Node Memory / Node (GB) 64 varies varies Batch jobs: #SBATCH -p general # Partition name Interactive or Test jobs: srun -p interact OTHER_OPTIONS 10 spring-2017/ 5

6 SLURM Scheduler How long does my code take to run? Batch jobs: #SBATCH -p serial_requeue #SBATCH -t 0-02:00 # Partition # Runtime in D:HH:MM Interactive jobs: srun -t 0-02:00 -p interact OTHER_JOB_OPTIONS Slide 11 Is my code serial or parallel? Serial (single-core) jobs SLURM Scheduler Batch jobs: #SBATCH -p serial_requeue #SBATCH -t 0-02:00 #SBATCH -n 1 # Partition # Runtime in D:HH:MM # Number of cores/tasks Interactive jobs: srun -t 0-02:00 -n 1 -p interact OTHER_JOB_OPTIONS Slide 12 spring-2017/ 6

SLURM Schedular Parallel shared memory (single node) jobs Examples: OpenMP (Fortran, C/C++) MATLAB Parallel ComputingToolbox (PCT) Python (e.g., threading, multiprocessing) R (e.

cores/tasks # Number of nodes srun -c 4 code PROGRAM_OPTIONS Interactive jobs: srun -t 0-02:00 -c 1 -N 1 -p interact OTHER_JOB_OPTIONS Slide 13 SLURM

7 SLURM Schedular Parallel shared memory (single node) jobs Examples: OpenMP (Fortran, C/C++) MATLAB Parallel ComputingToolbox (PCT) Python (e.g., threading, multiprocessing) R (e.g., multicore) Batch jobs: #SBATCH -p serial_requeue #SBATCH -t 0-02:00 #SBATCH -c 4 #SBATCH -N 1 # Partition # Runtime in D:HH:MM # Number of cores/tasks # Number of nodes srun -c 4 code PROGRAM_OPTIONS Interactive jobs: srun -t 0-02:00 -c 1 -N 1 -p interact OTHER_JOB_OPTIONS Slide 13 SLURM Schedular Parallel distributed memory (multi-node) jobs Examples: MPI (openmpi, impi, mvapich) with Fortran or C/C++ code MATLAB Distributed Computing Server (DCS) Python (e.g., mpi4py) R (e.g., Rmpi, snow) Batch jobs: #SBATCH -p serial_requeue #SBATCH -t 0-02:00 #SBATCH -n 4 # Partition # Runtime in D:HH:MM # Number of cores/tasks Interactive jobs: srun -t 0-02:00 -n 4 -p interact OTHER_JOB_OPTIONS Slide 14 spring-2017/ 7

SLURM Scheduler Serial and parallel shared memory (single node) jobs Batch jobs: #SBATCH -p serial_requeue #SBATCH -t 0-02:00 #SBATCH -c 4 #SBATCH -N 1 #SBATCH --mem=2000 srun -c 4 code

Parallel distributed memory (multi-node) jobs Batch jobs: #SBATCH -p serial_requeue #SBATCH -t 0-02:00 #SBATCH -n 4 #SBATCH --mem-per-cpu=4000 Interactive jobs: # Partition # Runtime in D:HH:MM #

8 SLURM Scheduler Serial and parallel shared memory (single node) jobs Batch jobs: #SBATCH -p serial_requeue #SBATCH -t 0-02:00 #SBATCH -c 4 #SBATCH -N 1 #SBATCH --mem=2000 srun -c 4 code PROGRAM_OPTIONS # Partition # Runtime in D:HH:MM # Number of cores/tasks # Number of nodes # MB Memory per node Interactive jobs: srun -t 0-02:00 -c 1 -N 1 --mem=2000 -p interact OTHER_JOB_OPTIONS Parallel distributed memory (multi-node) jobs Batch jobs: #SBATCH -p serial_requeue #SBATCH -t 0-02:00 #SBATCH -n 4 #SBATCH --mem-per-cpu=4000 Interactive jobs: # Partition # Runtime in D:HH:MM # Number of cores/tasks # Memory / core in MB srun -t 0-02:00 -n 4 --mem-per-cpu=4000 -p interact JOB_OPTIONSSlide 15 Memory Requirements How much memory does my code require? Understand your code and how the algorithms scale analytically (e.g. X= [R] and x 2 vs x 3 ) Run an interactive job and monitor memory usage (with the top Unix command) Run a test batch job and check memory usage after the job has completed (with the sacct SLURM command) Slide 16 spring-2017/ 8

9 Know your code Example: Memory Requirements A real*8 (Fortran), or double (C/C++), matrix of dimension 100,000 X 100,000 requires ~80GB of RAM Data Type: Fortran / C Bytes integer*4 / int 4 integer*8 / long 8 real*4 / float 4 real*8 / double 8 complex*8 / float complex 8 complex*16 / double complex 16 Slide 17 Run an interactive job and monitor memory usage (with the top Unix command) Example: Check the memory usage of a matrix diagonalization code Request an interactive bash shell session: srun -p interact -n 1 -t 0-02:00 --pty --mem=4000 bash Run the code, e.g.,./matrix_diag.x Memory Usage Open a new shell terminal and ssh to the compute node where the interactive job dispatched, e.g., ssh holy2a18307 In the new shell terminal run top, e.g., top -u pkrastev Slide 18 spring-2017/ 9

Memory Usage Run 1: Matrix dimension = 3000 X 3000 (real*8) Needs 3,000 X 3000 X 8 / 1000000 = ~72 MB of RAM https://github.

10 Memory Usage Run 1: Matrix dimension = 3000 X 3000 (real*8) Needs 3,000 X 3000 X 8 / = ~72 MB of RAM Slide 19 Memory Usage Run 2: Input size changed Double matrix dimension, Quadrupole required memory Matrix dimension = 6000 X 6000 (real*8) Needs 6,000 X 6000 X 8 / = ~288MB of RAM Slide 20 spring-2017/ 10

11 Memory Example 2 Do another example where the algorithm changes the complexity. See: Computational_complexity_of_mathematical_operations Slide 21 sacct overview sacct = SLURM accounting database every 30 sec the node collects the amount of cpu and memory usage that all of the process ID are using for the given job. After the job ends this data is set to slurmdb. Common flags -j jobid or --name=jobname -S YYYY-MM-DD and -E YYYY-MM-DD -o ouput_options JobID,JobName,NCPUS,Nnodes,Submit,Start,End,CPUTime,TotalCPU, ReqMem,MaxRSSMaxVMSize,State,Exit,Node SLURM Metrics on Demand: Slide 22 spring-2017/ 11

12 Run a test batch job and check memory usage after the job has completed (with the sacct SLURM command) Example: Memory Usage sacct -o MaxRSS -j MaxRSS or K KB = MB Slide 23 Storage Considerations Home directories, /n/home*, and Lab storage are not appropriate for I/O intensive or large number of jobs. Typical utilization would be jobscripts, and in-house analysis codes or self-installed software For jobs that create high-volume of small file (> 10 MB) jobs, use local scratch. You need to copy your input data to /scratch and move output data to a different location after the job completes For I/O intensive jobs large data files (> 100 MB) and/or large number of data files (100s of MB) use the global scratch file-system /n/regal 24 spring-2017/ 12

13 Topology Matters compute, storage & login are located in 3 data centers: 60 Oxford St. Cambridge PI nodes ~200 x 8-64 cores GB RAM PI queue rclogin## VPN required /n/labs ~ 1 PB no quota holy2x#### 255 x 64 cores 256 GB RAM general holybigmem 8 x 64 cores 512 GB RAM bigmem holygpu 16 x 16c 4992gpu 32 GB RAM gpgpu /n/regal 1.2 PB no quota; retention PI nodes ~200 x 8-64 cores GB RAM PI queue rclogin## /n/labs ~ 1 PB no quota /n/home## Isilon 780 TB 40 GB quota rcnx01 MGHPCC, Holyoke MA 1 Summer St. Boston, MA Topology can effect the performance of network and thus storage 25 Storage Utilization du - estimate file space usage Ex: du -sh data_dir quota - display disk usage and limits tree - list contents of directories in a tree-like format Ex: tree -fisud -o /tmp/mytree.lst 26 spring-2017/ 13

14 Test first Before diving right into submitting 100s or 1000s of research jobs. ALWAYS test a few first. ensure the job will finish to completion without error ensure you understand the resources needs and how they scale with different data sizes and input options Slide 27 Request Help - Resources Documentation Portal rchelp@fas.harvard.edu Odybot Office Hours Wednesday noon-3pm 38 Oxford - every other Thursday 12:30pm-1:30pm Training Slide 28 spring-2017/ 14

15 Questions??? Scott Yockel, PhD Harvard - Research Computing spring-2017/ 15

Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 FAS Research Computing

Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 Email:plamenkrastev@fas.harvard.edu Objectives Inform you of available computational resources Help you choose appropriate computational