How to access Geyser and Caldera from Cheyenne. 19 December 2017 Consulting Services Group Brian Vanderwende

Size: px

Start display at page:

Download "How to access Geyser and Caldera from Cheyenne. 19 December 2017 Consulting Services Group Brian Vanderwende"

Sabina Watts
5 years ago
Views:

1 How to access Geyser and Caldera from Cheyenne 19 December 2017 Consulting Services Group Brian Vanderwende

2 Geyser nodes useful for large-scale data analysis and post-processing tasks 16 nodes with: 40 cores (Westmere) 1000 GB usable memory 1 NVIDIA Quadro K5000 GPU High memory nodes ideal for serial processing of big data Slower CPUs a poor choice for heavy computation Older GPUs good for 3D visualization 2

3 Caldera nodes useful for smaller analysis tasks and general-purpose GPU code 30 nodes with: 16 cores (Sandy Bridge) 62 GB usable memory 16/30 caldera nodes: 2 NVIDIA Tesla K20Xm GPUs for GPGPU computations Nodes without GPUs are called pronghorn 3

4 On Cheyenne, Geyser and Caldera are accessed using the Slurm scheduler In Slurm, both machines are considered part of the dav system (data analysis and visualization) You can access DAV resources using the dav partition (Slurm term for queue) 4

5 Basic commands for managing jobs when using Slurm 1. List your current jobs: squeue -u $USER [13:57] $ squeue -u $USER JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 1113 dav srun vanderwb R 0:07 1 caldera dav srun vanderwb R 0:09 1 geyser13 5

6 Basic commands for managing jobs when using Slurm 1. List your current jobs: squeue -u $USER 2. Examine a job in detail: scontrol show job <ID> 3. Kill a job: scancel <ID> [14:01] $ scontrol show job 1112 JobId=1112 JobName=srun UserId=vanderwb(27236) Account=scsg0001 JobState=RUNNING Reason=None Requeue=1 Restarts=0 ExitCode=0:0 RunTime=00:04:00 TimeLimit=06:00:00 SubmitTime= T13:57:18 StartTime= T13:57:18 EndTime= T19:57:18 Partition=dav NodeList=geyser13 BatchHost=geyser13 NumNodes=1 NumTasks=1 CPUs/Task=1 6

7 Geyser/Caldera will continue to have existing environment and Yellowstone software modules Your Cheyenne environment will NOT carry over into DAV jobs! 7

8 Scripts available to start interactive jobs By default, these scripts start a six-hour, single-core interactive session: 1. execgy -a <project> - run on a geyser node 2. execca -a <project> - run on a caldera node 3. execdav -a <project> - run on the first available DAV resources, regardless of location You can also specify a default project to use by setting the DAV_PROJECT environment variable. 8

9 Specific resource requirements can be made when using execdav -n <number of cores> Defaults to 1 -t <time in HH:MM:SS> Defaults to 06:00:00 -m <memory needed in ng> Defaults to 1.8G (gigabytes) per core requested -g <GPU type> Options are k20, k5000, any, or none Defaults to none 9

10 Batch jobs are submitted using sbatch [14:05] $ cat run_hw.slurm #!/bin/tcsh #SBATCH -J sample_dav #SBATCH -n 12 #SBATCH --ntasks-per-node=4 #SBATCH -t 05:00 #SBATCH -A <project_code> #SBATCH -p dav #SBATCH -o hw.out ### Initialize the DAV Slurm environment source /glade/u/apps/opt/slurm_init/init.csh setenv TMPDIR /glade/scratch/${user}/temp module load openmpi-slurm/3.0.0 ### Run Open MPI Program mpiexec./hello_world Queues are known as partitions in Slurm All jobs should be submitted to the dav partition Batch scripts should call the appropriate init script and set TMPDIR MPI programs should be compiled on DAV using the Slurm-integrated Open MPI sbatch run_hw.slurm 10

11 [14:05] $ cat run_hw.slurm #!/bin/tcsh #SBATCH -J sample_dav #SBATCH -n 12 #SBATCH --ntasks-per-node=4 #SBATCH -t 05:00 #SBATCH -A <project_code> #SBATCH -p dav #SBATCH -o hw.out ### Initialize the DAV Slurm environment source /glade/u/apps/opt/slurm_init/init.csh setenv TMPDIR /glade/scratch/${user}/temp Where will this job run? module load openmpi-slurm/2.1.0 ### Run Open MPI Program mpiexec./hello_world 11

12 For multi-core jobs, it s often useful to constrain requested batch resources [14:12] $ cat geyser_job.slurm #!/bin/tcsh #SBATCH -J sample_geyser #SBATCH -n 6 #SBATCH --ntasks-per-node=3 #SBATCH -t 05:00 #SBATCH -A <project_code> #SBATCH -p dav #SBATCH -o hw.out #SBATCH -C geyser #SBATCH --mem=100g #SBATCH --gres=gpu:k This job can only run on a geyser node with 100 GB of free memory If multiple resources are specified, they must be compatible Otherwise, the job will be stuck in a pending state 12

13 Other examples of node constraints #SBATCH -C caldera This constraint will place you on one of the 16 caldera nodes #SBATCH -C caldera pronghorn This constraint places you on a caldera OR pronghorn node If you want the newer caldera processors but don t need GPUs, use this constraint to get a node more quickly! #SBATCH -C geyser caldera This constraint places you on a geyser or caldera node; e.g., if you want GPUs but do not care whether they are K20 or K

14 For intensive visual programs (e.g., VAPOR), consider using TurboVNC VNC can be used to run an optimized remote GNOME 2 desktop Usage: vncserver_submit -a <project> (or set DAV_PROJECT) 14

15 Important considerations when using DAV It s best to compile the type of node you will run on If you plan to use both, compile on Geyser rather than Caldera Do not try to use software compiled for Cheyenne on Geyser/Caldera - it will likely produce an error Startup files are shared between Cheyenne and DAV; conditionally execute system-specific settings The DAV nodes are shared and subject to fair use. Please avoid monopolizing resources - we will kill jobs if they are disrupting usage 15

16 CISL Help Desk / Consulting Walk-in: ML 1B Suite 55 cislhelp@ucar.edu Phone: Specific questions from today and/or feedback: vanderwb@ucar.edu 16

Introduction to the NCAR HPC Systems. 25 May 2018 Consulting Services Group Brian Vanderwende

Introduction to the NCAR HPC Systems 25 May 2018 Consulting Services Group Brian Vanderwende Topics to cover Overview of the NCAR cluster resources Basic tasks in the HPC environment Accessing pre-built