Guillimin HPC Users Meeting March 17, 2016 guillimin@calculquebec.ca McGill University / Calcul Québec / Compute Canada Montréal, QC Canada
Outline Compute Canada News System Status Software Updates Training News Special Topic Best Practices for Job Submission 2
Compute Canada News Compute Canada Account Renewals Mandatory annual renewal process for all Compute Canada account holders including faculty, researchers, students and Compute Canada staff Renewal process includes gathering of information about researchers and their research activities as required by our funding agencies including CFI All Compute Canada users contacted via email in early March to begin renewal process Deadline for renewals: March 31 Accounts not renewed will be de-activated in early April Questions? renewals@computecanada.ca 3
Network Instability System Status February 23 between 22:00-23:30 and February 25 late evening VLAG connection issue between two site routers, restored after one router restart stable since then but investigating further Security Updates Rolling updates to all nodes including login nodes for CVE-2015-7547 (glibc) 4
Software Update New Installations PyQt, rpy2, JupyterHub, IPython for Python 3.5.0 (via Lmod only) About the Lmod/EasyBuild based module structure: Now backwards compatible, opt-in by doing: touch ~/.lmod_legacy Default on March 22, opt-out via ~/.lmod_disabled Old modulefiles keep working, including those in $HOME/modulefiles Most new modulefiles accessed via: module load iomkl/2015b (loads GCC 4.9.3+Intel 15.0.3+OpenMPI 1.8.8+MKL); see http://www.hpc.mcgill.ca/index.php/starthere/81-doc-pages/88-guillimin-modules 5
Training News See Training and Outreach at www.hpc.mcgill.ca for our calendar of training and workshops for 2016 and for links to registration pages Upcoming events: calculquebec.eventbrite.ca March 22 - Two Python workshops based on Software and Data Carpentry material (U. Laval) March 24 - Introduction to R (U. McGill) March 31 - Profiling and Optimization Tools (U. McGill) April, May - Suggestions for training? Please let us know! June 6-10 - Calcul Québec ARC spring school All materials from previous workshops are available online: https://wiki.calculquebec.ca/w/formations/en Recently completed: February 17 - Introduction to Python (McGill U.) 6
User Feedback and Discussion Questions? Comments? We value your feedback. Contact us at: guillimin@calculquebec.ca Guillimin Operational News for Users Status Pages http://www.hpc.mcgill.ca/index.php/guillimin-status http://serveurscq.computecanada.ca (all CQ systems) Follow us on Twitter http://twitter.com/mcgillhpc 7
Best Practices for Job Submission March 17, 2016 McGill University / Calcul Québec / Compute Canada Montréal, QC Canada
Example for serial job submission qsub script: #!/bin/bash #PBS -l nodes=1:ppn=1 #PBS -l walltime=00:10:00 #PBS -A xyz-123-aa #PBS -N JobTest #PBS -M email@example.ca #PBS -m abe cd $PBS_O_WORKDIR module load iomkl/2015b./your_app arg1 arg2 arg3... > output.txt Note: No #PBS -V, no modules loaded in.bashrc, gives self-contained, more easily reproducible submission script. 9
Example for parallel job submission qsub script #!/bin/bash #PBS -l nodes=3:ppn=12 #PBS -l pmem=2700m #PBS -l walltime=00:10:00 #PBS -A xyz-123-aa #PBS -N JobTest cd $PBS_O_WORKDIR module load iomkl/2015b mpiexec -n 36./your_app arg1 arg2 arg3... > output.txt Note: No #PBS -V, no modules loaded in.bashrc, gives self-contained, more easily reproducible submission script. 10
Submission styles Serial: default memory: pmem=2700m (2.7G per core). #PBS -l nodes=1:ppn=m, m 12 Recommended: m 6, or m=12 (full node) Serial (Sandy Bridge): #PBS -l nodes=1:ppn=m:sandybridge, m<12 or #PBS -l nodes=1:ppn=16 Recommended: m 8, or m=16 (full node) Parallel (Westmere): default pmem=1700m #PBS -l nodes=n:ppn=12, n>1 Parallel (Sandy Bridge): default pmem=3700m #PBS -l nodes=n:ppn=16, n>1 Parallel (Any): default pmem=1700m #PBS -l procs=m (m>11, multiples of 48 are best). 11
Submission styles (accelerators, debug) GPUs #PBS -l nodes=2:ppn=16:gpus=2 #PBS -l pmem=123200m Reserves two full nodes with 2 GPUs each pmem is per node for GPUs! Xeon Phi: #PBS -l nodes=1:ppn=8:mics=1,pmem=29600m Queues: Default queue: metaq, generally no need to specify queue name Exception: debug queue: #PBS -q debug, for test jobs (default walltime 30 mins, max 2 hours) 12
MPI/OpenMP hybrid jobs Challenges: Special mpiexec syntax Worry about processor affinity Example: one MPI process per node: Switch off affinity at MPI level, otherwise MPI processes often bound to one core only! MVAPICH2 (for 2 nodes) export IPATH_NO_CPUAFFINITY=1 mpiexec -n 2 -ppn 1 executable OpenMPI < 1.8 (for 2 nodes) mpiexec -n 2 -npernode 1 executable OpenMPI 1.8+ (for 2 nodes with 12 cores): slot is core on Guillimin, PE=processing element mpiexec -n 2 -map-by slot:pe=12 executable 13
MPI/OpenMP hybrid jobs (2) Example: 4 MPI processes x 3 threads per node: Best to assign specific cores to each MPI process! Example: MATLAB: it will otherwise spawn too many threads leading to high load and inefficiencies. MVAPICH2 (for 2 nodes) mpiexec -n 8 -ppn 4 -bind-to core:3 -map-by core executable OpenMPI < 1.8 (for 2 nodes) mpiexec -n 8 -npernode 4 -cpus-per-proc 3 executable OpenMPI 1.8+ (for 2 nodes): mpiexec -n 8 -map-by slot:pe=3 executable Use --report-bindings option for OpenMPI to see how MPI processes are bound to cores, or export MV2_SHOW_CPU_BINDING=1 for MVAPICH2. 14
Job Scheduling Tetris Time On Guillimin, many nodes accept either (but not both!): Parallel jobs, e.g. nodes=n:ppn=12 or 16, n > 0, or procs=p, p > 11 or Serial jobs, nodes=1:ppn=m, m < 12 Each colour = one job Some (single-node serial only!) jobs can be split on the cores axis; parallel jobs can only be split on node boundaries. Unused cores 12 Cores 24 Nodes 1 2 15
Job Scheduling Tetris Time Unused cores Cores 12 24 1 Nodes 2 16
Job Scheduling Tetris Lower priority Time Low priority High priority (reservation) Unused cores 12 Cores 24 1 Nodes 2 17
Job Scheduling Tetris Backfill (small, low priority job can run when higher priority jobs can't) Time Unused cores 12 Cores 24 1 Nodes 2 18
Backfilling tips Submit short (30 mins - 36 hours) jobs Design tasks for maximum scheduler flexibility Low memory per core (-l pmem=1700m) Pack tasks into full nodes ~ 12000 cores available for short, low-memory ppn=12 jobs, shared with up to 30 day jobs ~ 5000 cores available for short, low-memory ppn=1 jobs, but with much faster churn rate Walltime < 36 hours ppn = 12 (hbplus) ppn < 12 (serial-short) 19
Data Parallel Jobs Data parallel: Parallelize by processing each chunk of data as a separate task Strategies Job arrays Background processing GNU Parallel Note that each process will compete for resources (filesystem access, memory, CPUs, etc.) 20
Job Arrays Job arrays are useful for submitting a large number of related tasks at one time Example for qsub (Torque): #!/bin/bash #PBS -l walltime=30:00:00 #PBS -l nodes=1:ppn=12 #PBS -t 0-31 SRC=$HOME/program_dir LOWER_BOUND =$((12 * $PBS_ARRAYID )) UPPER_BOUND =$(($LOWER_BOUND + 11)) for i in $( seq $LOWER_BOUND $UPPER_BOUND ) do cd $SCRATCH/dir$i ; $SRC/prog > output & done wait 21
Background tasks The Linux operating system can run your process in the background so that your script continues without waiting for it to finish Use the ampersand symbol, & The wait command says to wait for all background processes to finish #!/bin/bash #PBS -l walltime=30:00:00 #PBS -l nodes=1:ppn=12 SRC=$HOME/program_dir cd $SCRATCH/dir1 ; $SRC/prog > output & cd $SCRATCH/dir2 ; $SRC/prog > output & cd $SCRATCH/dir3 ; $SRC/prog > output &... cd $SCRATCH/dir12 ; $SRC/prog > output& wait #!/bin/bash #PBS -l walltime=30:00:00 #PBS -l nodes=1:ppn=12 SRC=$HOME/program_dir for i in $(seq 12) do cd $SCRATCH/dir$i ; $SRC/prog > output & done wait 22
GNU-Parallel GNU-Parallel is an easy-to-use tool for launching processes in parallel Example: loop to apply a command (file) to multiple files $ find x*.gz -type f -print0 parallel -q0 file xdc.gz: gzip compressed data, was "xdc", from Unix, last modified: Wed Apr 8 16:09:51 2015, max speed xda.gz: gzip compressed data, was "xda", from Unix, last modified: Wed Apr 8 16:09:51 2015, max speed xdb.gz: gzip compressed data, was "xdb", from Unix, last modified: Wed Apr 8 16:09:50 2015, max speed... 23
GNU Parallel Run different commands in parallel $ parallel ::: hostname date echo hello world Input sources from a file $ parallel -a input-file echo Input sources from the command line $ parallel echo ::: A B C Input sources from STDIN $ cat input-file parallel echo Input from multiple sources $ parallel -a abc-file -a def-file echo $ cat abc-file parallel -a - -a def-file echo Will operate on each pair of inputs 24
Conclusion Documentation: http://www.hpc.mcgill.ca/index.php/starthere/81-docpages/322-simple-job-submission http://www.hpc.mcgill.ca/index.php/starthere/81-docpages/91-guillimin-job-submit https://wiki.calculquebec.ca/w/running_jobs#tab=tab4 For any other questions: guillimin@calculquebec.ca 25