Cluster Computing in Frankfurt

Size: px

Start display at page:

Download "Cluster Computing in Frankfurt"

Solomon Floyd
5 years ago
Views:

1 Cluster Computing in Frankfurt Goethe University in Frankfurt/Main Center for Scientific Computing December 12, 2017

2 Center for Scientific Computing What can we provide you? CSC Center for Scientific Computing Capability computing Capacity computing Access to licensed software Introductory Courses HKHLR Hessischen Kompetenzzentrum für Hochleistungsrechnen Access to hessian clusters HiPerCH Workshops Cluster Computing in Frankfurt

3 Center for Scientific Computing Capability computing is thought of as using the maximum computing power to solve a single large problem in the shortest amount of time. Capacity computing in contrast is thought of as using efficient cost-effective computing power to solve a small number of somewhat large problems or a large number of small problems. Access to licensed software to commercial packages: TotalView Debugger, Vampir Profiler, Intel Compilers, Tools and Libraries. Access to hessian clusters of the universities of Darmstadt, Frankfurt, Giessen, Kassel, and Marburg. Cluster Computing in Frankfurt

4 Center for Scientific Computing Introductory Courses UNIX, Shell Scripting, Software Tools, Cluster Computing (for MPI/OpenMP & Matlab users), Python, C++, TotalView, Make - Build-Management-Tool. HiPerCH Workshops offers users twice a year an insight into the high-performance computing with different HPC topics. Cluster Computing in Frankfurt

5 Introduction to LOEWE-CSC & FUCHS Cluster Computing in Frankfurt Goethe University in Frankfurt/Main Center for Scientific Computing December 12, 2017

6 HPC Terminology General Information Cluster-Usage Environments Modules Partitions Cluster A group of identical computers connected by a high-speed network are forming a supercomputer. Node Currently most compute node is equivalent to a high-end workstation and is a part of a cluster. with two sockets, each with a single CPU, volatile working memory (RAM), a hard drive CPU A Central Processing Unit (CPU) is a processor which may have one or more cores to perform tasks at a given time. Core A core is the basic computation unit of the CPU. with its own computing pipeline, logical units, memory controller Thread Each CPU core service a number of CPU threads. each having an independent instruction stream but sharing the cores memory controller & other logical units FLOPS Performance is measured in FLoating-point Operations Per Second (FLOPS) Introduction to LOEWE-CSC & FUCHS

7 Formula General Information Cluster-Usage Environments Modules Partitions The full and complete sample formula using dimensional analysis: GFLOPS = #chassis # nodes chassis #sockets node # cores socket GHz core FLOPs cycle Note TFLOPS = TeraFLOPS = FLOPS = GFLOPS 1000 GFLOPS = GigaFLOPS = 10 9 FLOPS = MFLOPS 1000 MFLOPS = MegaFLOPS = 10 6 FLOPS The use of a GHz processor yields GFLOPS of theoretical performance. Introduction to LOEWE-CSC & FUCHS

8 HPC Terminology General Information Cluster-Usage Environments Modules Partitions The Past, but times changes... 1 A chassis contained a single node. 2 A single node contained a single processor. 3 A processor contained a single CPU core and fit into a single socket.... with recent computer systems: 1 A single chassis containing multiple nodes. 2 Those nodes contain multiple sockets. 3 The processors in those sockets contain multiple CPU cores. Introduction to LOEWE-CSC & FUCHS

9 HPC Terminology General Information Cluster-Usage Environments Modules Partitions On current computer systems: 1 A chassis houses one or more compute nodes. 2 A node contains one or more sockets. 3 A socket holds one processor. 4 A processor contains one or more CPU cores. 5 The CPU cores perform the actual mathematical computations. 6 One sequence of these mathematical operations involves the exclusive use of floating point numbers (FLOPS). 7 One or more of rack computers builds a computer system. Introduction to LOEWE-CSC & FUCHS

10 HPC Terminology General Information Cluster-Usage Environments Modules Partitions dual-core CPU processor quad-core CPU processor core core core core core core MEMORY MEMORY Introduction to LOEWE-CSC & FUCHS

11 HPC Terminology General Information Cluster-Usage Environments Modules Partitions NODE processor processor core core core core core core core core 24GB MEMORY Introduction to LOEWE-CSC & FUCHS

12 Setting of the LOEWE-CSC Cluster Cluster-Usage Environments Modules Partitions CPU Core Intel Node Core RAM Cluster AMD Node RAM CPU Core Core graphic CPU Core GPU Node Core RAM graphic Core Core Core Core card Core Core card Input / Output Input / Output Input / Output HDD HDD HDD interconnect Fabric Storage Storage Introduction to LOEWE-CSC & FUCHS

13 Access to the Cluster of Frankfurt Cluster-Usage Environments Modules Partitions LOEWE-CSC ssh FUCHS ssh Go to CSC-Website/Access/LOEWE & CSC-Website/Access/FUCHS to get an account at the Clusters. The project manager has to send a request to Prof. Lüdde to get CPU-Time for research projects. Please download the file & use a regular PDF viewer to open the forms. Introduction to LOEWE-CSC & FUCHS

14 Organization of a Cluster Cluster-Usage Environments Modules Partitions Compute nodes Login nodes 600+ nodes Your PC ssh connection loewe-csc.hhlr-gu.de hhlr.csc.uni-frankfurt.de 2-4 general login nodes Batch job Infiniband network connects nodes Introduction to LOEWE-CSC & FUCHS

15 Idea behind Batch Processing Cluster-Usage Environments Modules Partitions Whatever you would normally type at the command line goes into your batch script Output that would normally go to the screen goes into a log file The system runs your job when resources become available Very efficient in terms of resource utilization Introduction to LOEWE-CSC & FUCHS

16 Cluster-Usage Environments Modules Partitions Hardware Resources of the LOEWE-CSC Cluster #Node CPU GPU GHz # Cores CPU Cores Node Threads Node RAM [in GB] 438 2xAMD Opteron xAMD HD GB xIntel Xeon E5-2670v xIntel Xeon E5-2640v xIntel Xeon E5-2630v2 2xAMD FirePro S GB Introduction to LOEWE-CSC & FUCHS

17 Filesystem of the Clusters Cluster-Usage Environments Modules Partitions Warning Use the /scratch-directory instead of /home to write out the standard output and error. LOEWE-CSC mountpoint /home /scratch /local /data0[1 2] size 10GB per user 764 TB 1.4 T 500TB each access time slow fast fast slow system NFS FhGFS ext3 NFS network Ethernet InfiniBand Ethernet FUCHS Introduction to LOEWE-CSC & FUCHS

18 Environments Modules General Information Cluster-Usage Environments Modules Partitions Definition Environments Modules provide software for specific purposes. Syntax: avail list load add <module> load unstable unload rm <module> module <command> <modulename> display all available modules display all loaded modules load a module load a deprecated or unstable module unload a module Introduction to LOEWE-CSC & FUCHS

19 Environments Modules General Information Cluster-Usage Environments Modules Partitions Definition Environments Modules provide software for specific purposes. Syntax: module <command> <modulename> switch swap <old-module> <new-module> first unloads an old module then loads a new module purge unload all currently loaded modules Introduction to LOEWE-CSC & FUCHS

20 Environments Modules General Information Cluster-Usage Environments Modules Partitions Syntax: module load <modulename> If you (un)load a mpi module you will automatically (un)load a compiler: No MPI Type 1 mpi/mvapich2/gcc/2.0 gcc 2 mpi/mvapich2/intel /2.0 intel 3 mpi/mvapich2/pgi-14.7/2.0 pgi 1 openmpi/gcc/1.8.1 gcc 2 openmpi/intel /1.8.1 intel No Compiler 1 intel/compiler/64/ pgi/14.7 generic term (version of) flavour of MPI compiler (version) with which was compiled Intel software Bit Introduction to LOEWE-CSC & FUCHS

21 Environments Modules Use custom modules General Information Cluster-Usage Environments Modules Partitions 1 writing a module file in tcl to set environment variables 2 module load use.own enables you to load your own modules 3 module load ~/privatemodules/modulename 4 use facilities provided by module Introduction to LOEWE-CSC & FUCHS

22 Partitions of the Cluster General Information Cluster-Usage Environments Modules Partitions Cluster Partition run time Max Nodes Max NodesPU Max JobsPU Max SubmitPU LOEWE parallel 30d gpu 30d test 1h FUCHS parallel 30d test 12h The maximum array size of the LOEWE-CSC-cluster is Introduction to LOEWE-CSC & FUCHS

23 Cluster-Usage Environments Modules Partitions Architecture of the LOEWE-Partitions Cluster partition parallel CPU Core Intel Node Core RAM CPU Core AMD Node Core RAM graphic CPU Core GPU Node Core RAM graphic Core Core Core Core card Core Core card partition = parallel constraint = dual (AMD) constraint = intel20 (Intel) constraint = broadwell (Intel) Input / Output HDD Input / Output HDD Input / Output HDD partition = gpu interconnect Fabric Storage Storage Introduction to LOEWE-CSC & FUCHS

24 Cluster-Usage Environments Modules Partitions Architecture of the FUCHS-Partitions Processor Type AMD #AMD Socket #CPU RAM [in GB] Magny-Cours AMD 72 dual Magny-Cours AMD 36 quad Istanbul AMD 250 dual 12 32/64 The architecture is called with --constraint. magnycours = 72 dual-socket AMD Magny-Cours nodes dual = 250 dual-socket AMD Istanbul nodes quad = 36 quad-socket AMD Magny-Cours nodes --constraint=magnycours dual to avoid quad Introduction to LOEWE-CSC & FUCHS

25 on LOEWE-CSC & FUCHS Cluster Computing in Frankfurt Goethe University in Frankfurt/Main Center for Scientific Computing December 12, 2017

26 Batch System Concepts Cluster consists of a set of tightly connected identical computers as a single system & work together to solve computation-intensive problems. Resource Manager is responsible for managing the resources of a cluster, like tasks, nodes, CPUs, memory & network. Scheduler controls user s jobs on a cluster. Batch System combines all the features of a scheduler & a resource manager in an efficient way. SLURM offers both functionality, scheduling & resource management. on LOEWE-CSC & FUCHS

27 Batch System Concepts Batch Processing executes programs or jobs without user s intervention. Job consists with a description of required resources & job steps user-defined work-flows by the batch system. Job Steps describe tasks that must be done. on LOEWE-CSC & FUCHS

28 Batch System Concepts Cluster consists of a set of tightly connected identical computers computers are presented as a single system & work together to solve computation-intensive problems node are connected through high speed local network node have access to shared resources like shared file-systems Resource Manager responsible for managing the resources of a cluster, like tasks, nodes, CPUs, memory & network manages the execution of jobs makes sure that jobs are not overlapping on the resources & handles also their I/O on LOEWE-CSC & FUCHS

29 Batch System Concepts Scheduler receives jobs from the users controls user s jobs on a cluster controls the resource manager to make sure that the jobs are completed successfully handles the job submissions & put jobs into queues offers many features like: user commands for managing the jobs (start, stop, hold) interfaces for defining work-flows & job dependencies interfaces for job monitoring & profiling (accounting) partitions & queues to control jobs according to policies & limits scheduling mechanisms, like backfilling according to priorities on LOEWE-CSC & FUCHS

30 Batch System Concepts Batch System is the combination of a scheduler & a resource manager combines all the features of these two parts in an efficient way SLURM offers both functionality, scheduling & resource management Batch Processing composition of programs, so-called Jobs, is achieved by batch processing & realized by batch systems execution of programs or jobs without user s intervention on LOEWE-CSC & FUCHS

31 Batch System Concepts Job execution of user-defined work-flows by the batch system a job consists of a description of required resources & job steps Job Steps job steps describe tasks that must be done resource requests consist in a number of CPUs, computing expected duration, amounts of RAM or disk space the script itself is a job step other job steps are created with the srun command when a job started, the job would run a first job step srun on LOEWE-CSC & FUCHS

32 SLURM Resource Manager on the Cluster General Information SLURM stand for Simple Linux Utility for Resource Management. The user sends a job via sbatch to SLURM. SLURM calculates the work priority of each job. SLURM starts a job according to the priority & the resources availabilty. There is an exclusive node assignment per job. SLURM allocates resources of the jobs. SLURM provides a framework for starting & monitoring of the jobs. on LOEWE-CSC & FUCHS

33 SLURM Commands General Information 1 job submission & execution 2 manage a job salloc requests interactive jobs/allocations sbatch submits a batch script srun run jobs interactively (implicit resource allocation) scancel cancels a pending or running job sinfo shows information about nodes & partitions squeue allows to query the list of pending & running jobs scontrol shows detailed information about compute nodes 3 accounting information sacct displays accounting data for all jobs & job steps sacctmgr shows slurm account information on LOEWE-CSC & FUCHS

34 Backfilling Scheduling General Information Backfilling Scheduling Algorithm may schedule jobs with lower priorities that can fit in the gap created by freeing resources for the next highest priority jobs. nodes A time A is 1 node job & starts at -1. Consider a 2-node cluster, job A is running and will take until time point 2

35 Backfilling Scheduling General Information Backfilling Scheduling Algorithm may schedule jobs with lower priorities that can fit in the gap created by freeing resources for the next highest priority jobs. nodes A B time B starts at -1. Now job B is submitted and scheduled. It will start after A, as it will take all 2 nodes.

36 Backfilling Scheduling General Information Backfilling Scheduling Algorithm may schedule jobs with lower priorities that can fit in the gap created by freeing resources for the next highest priority jobs. nodes A time B C C starts at 0. Now job C is submitted. It will start after B, if the scheduler has to assume it will take longer than time point 2.

37 Backfilling Scheduling General Information Backfilling Scheduling Algorithm may schedule jobs with lower priorities that can fit in the gap created by freeing resources for the next highest priority jobs. nodes A C time B However, C, if promised to end before B it will start now. This is backfilling simplified. The actual process will take into account all resources. C on LOEWE-CSC & FUCHS

38 Job-Submission General Information 1 Login to the cluster ssh <username>@loewe-csc.hhlr-gu.de ssh <username>@hhlr.csc.uni-frankfurt.de 2 Create a job script e.g. with the.slurm extension Example script name: workshop_batch_script.slurm 3 Submit this script to the cluster with sbatch sbatch workshop_batch_script.slurm salloc 4 Use of allocated resources LOEWE FUCHS (indirect) (interactive mode) on LOEWE-CSC & FUCHS

39 Job-Submission General Information commands for job allocation sbatch is used to submit batch jobs to the queue sbatch [options] jobscript [args...] salloc is used to allocate resource for interactive jobs salloc [options] [<command> [command args]] command for job execution with srun the users can spawn any kind of application, process or task inside a job allocation 1 Inside a job script submitted by sbatch (starts a job step) 2 After calling salloc (execute programs interactively) srun [options] executable [args...] on LOEWE-CSC & FUCHS

40 Indirect Job-Submission sbatch encapsulation of job parameters and user program call in a job script to the handover to submit command Features: create prefabricated job scripts with important parameters eliminates operator error simply add additional functionality allows transfer of additional parameters to submit command one-time additional expenses by draft the job-scripts on LOEWE-CSC & FUCHS

41 Direct Job-Submission General Information salloc transfer of job parameters and user program to submit command Features: allows simple, quick and flexible change of job parameters prefered in many of the same jobs that differ only in very few parameters (eg. benchmarks, the same process, different number of CPUs) prone to faulty operation additional functionality only via encapsulation in self-generated scripts (eg. load the library) on LOEWE-CSC & FUCHS

42 Job Execution General Information srun used to initiate job steps mainly within a job start an interactive jobs a job can contain multiple job steps executing sequentially or in parallel on independent nodes within the job s node allocation After modulefiles are loaded and resources have been allocated, an application on the assigned node can be started with preceding srun mvapich mpirun openmpi In this shell window more applications mpiexec openmpi can be started. Running jobs interactively implicit resource allocation with salloc. on LOEWE-CSC & FUCHS

43 Job-Submission General Information List of the submission/allocation options for sbatch and salloc: -p, --partition partition to be used from the job -c, --cpus-per-task logical CPUs (hardware threads) per task -N, --nodes compute nodes used by the job -n, --ntasks total No. of processes (MPI processes) --ntasks-per-node tasks per compute node -t, --time max wall-clock time of the job -J, --job-name set the name of the job -o, --output path to the job s standard output -e, --error path of the job s standard error srun accepts almost all allocation options of sbatch and salloc Note Option --partition has to be set. on LOEWE-CSC & FUCHS

44 Job Scripts Toy Examples 1 #!/bin/bash 2 3 #SBATCH --job-name=testjobserial 4 #SBATCH --nodes=1 5 #SBATCH --output=testjobserial-%j.out 6 #SBATCH --error=testjobserial-%j.err 7 #SBATCH --time= hostname Listing 1: A naive script on LOEWE-CSC & FUCHS

45 Job Scripts Toy Examples 1 #!/bin/bash 2 3 #SBATCH --job-name=testjobparallel 4 #SBATCH --nodes=1 5 #SBATCH --output=testjobparallel-%j.out 6 #SBATCH --error=testjobparallel-%j.err 7 #SBATCH --time= srun --ntasks-per-node=2 hostname Listing 2: Going parallel on LOEWE-CSC & FUCHS

46 Job Scripts Toy Examples 1 #!/bin/bash 2 3 #SBATCH --job-name=testjobparallel 4 #SBATCH --nodes=4 5 #SBATCH --output=testjobparallel-%j.out 6 #SBATCH --error=testjobparallel-%j.err 7 #SBATCH --time= srun --ntasks-per-node=2 hostname Listing 3: Going parallel across nodes on LOEWE-CSC & FUCHS

47 Creating a Parallel Job in SLURM There are several ways a parallel job, one whose tasks are run simultaneously, can be created: 1 by running several instances of multi-programs 2 by running a multi-process program (MPI) 3 by running a multi-threaded program (OpenMP or pthreads) on LOEWE-CSC & FUCHS

48 Creating a Parallel Job in SLURM In SLURM context, a task is to be understood as a process. A multi-threaded program is consists in 1 task that uses several CPUs. Option --cpus-per-task is defined for multi-threaded programs. Multi-threaded jobs run on a single node, but use more than one processor on the node. Tasks cannot be split across several compute nodes, so requesting several CPUs with the --cpus-per-task option will ensure all CPUs are allocated on the same compute node. A multi-process program is made of several tasks. Option --ntasks is defined for multi-process programs. By contrast, requesting the same amount of CPUs with the --ntasks option may lead to several CPUs being allocated on several, distinct compute nodes. on LOEWE-CSC & FUCHS

49 Trivial Parallelization Simple Loop General Information Listing 4: scriptexp.sh 1 #!/bin/bash 2 3 echo hostname $1 4 exit 0 Listing 5: Serial Job, simpleloop.slurm 1 #!/bin/bash 2 #SBATCH --job-name=oneforloop 3 #SBATCH --output=expscript.out 4 #SBATCH --error=expscript.err 5 #SBATCH --partition=parallel 6 #SBATCH --constraint=dual 7 #SBATCH --ntasks=48 8 #SBATCH --cpus-per-task=1 9 #SBATCH --mem-per-cpu= #SBATCH --mail-type=fail for N in seq 1 48 ; do 13 srun -N 1 -n 1./scriptexp.sh $N & 14 done 15 wait 16 sleep 300 on LOEWE-CSC & FUCHS

50 Trivial Parallelization Nested Loop General Information Listing 6: scriptexp.sh 1 #!/bin/bash 2 3 echo hostname $1 4 sleep 2 5 exit 0 Listing 7: Serial Job, nestedloop.slurm 1 #!/bin/bash 2 #SBATCH --job-name=twoforloops 3 #SBATCH --output=expscript.out 4 #SBATCH --error=expscript.err 5 #SBATCH --partition=parallel 6 #SBATCH --constraint=dual 7 #SBATCH --ntasks=48 8 #SBATCH --cpus-per-task=1 9 #SBATCH --mem-per-cpu= #SBATCH --mail-type=fail for i in seq 0 3 ; do 13 for M in seq 1 48 ; do 14 let N=$i*48+$M 15 srun -N 1 -n 1./scriptexp.sh $N & 16 done 17 wait 18 done 19 wait on LOEWE-CSC & FUCHS

51 Job Scripts Toy Examples 1 #!/bin/bash 2 3 #SBATCH -J NaiveOMP 4 #SBATCH -N #SBATCH -o TestOMP-%j.out 7 #SBATCH -e TestOMP-%j.err 8 #SBATCH --time=2:00: #SBATCH --constraint=dual # AMD nodes 13 export OMP_NUM_THREADS= /home/user/omp-prog Listing 8: Naive OpenMP Job on LOEWE-CSC & FUCHS

52 Job Scripts Toy Examples 1 #!/bin/bash 2 3 #SBATCH -J NaiveOMP 4 #SBATCH -N #SBATCH -o TestOMP-%j.out 7 #SBATCH -e TestOMP-%j.err 8 #SBATCH --time=2:00: #SBATCH --constraint=dual # AMD nodes 13 export OMP_NUM_THREADS= /home/user/omp-prog Listing 9: Naive OpenMP Job max Cores Node on LOEWE-CSC & FUCHS

53 Job Scripts Toy Examples Listing 10: MPI Job 1 #!/bin/bash 2 3 #SBATCH -J TestMPI 4 #SBATCH --nodes=4 5 #SBATCH --ntasks= = 96 6 #SBATCH -o TestMPI-%j.out 7 #SBATCH -e TestMPI-%j.err 8 #SBATCH --time=0:15: #SBATCH --partition=parallel # implied --ntasks-per-node=24 13 srun./mpi-prog on LOEWE-CSC & FUCHS

54 Job Scripts Toy Examples Listing 11: Multiple Job Steps 1 #!/bin/bash 2 3 #SBATCH -J TestJobSteps 4 #SBATCH -N 32 5 #SBATCH --partition=parallel 6 #SBATCH -o TestJobSteps-%j.out 7 #SBATCH -e TestJobSteps-%j.err 8 #SBATCH --time=6:00: srun -N 16 -n 32 -t 00:50:00./mpi-prog_1 11 srun -N 2 -n 4 -t 00:10:00./mpi-prog_2 12 srun -N 32 --ntasks-per-node=2 -t 05:00:00./mpi-prog_3 on LOEWE-CSC & FUCHS

55 Job Scripts Toy Examples Listing 12: Multiple Job Steps 1 #!/bin/bash 2 3 #SBATCH -J TestJobSteps 4 #SBATCH -N 32 5 #SBATCH --partition=parallel 6 #SBATCH -o TestJobSteps-%j.out 7 #SBATCH -e TestJobSteps-%j.err 8 #SBATCH --time=6:00: srun -N 16 -n 32 -t 00:50:00./mpi-prog_1 11 srun -N 2 -n 4 -t 00:10:00./mpi-prog_2 12 srun -N 32 --ntasks-per-node=2 -t 05:00:00./mpi-prog_3 on LOEWE-CSC & FUCHS

56 Job Scripts Toy Examples Listing 13: Multiple Job Steps 1 #!/bin/bash 2 3 #SBATCH -J TestJobSteps 4 #SBATCH -N 32 5 #SBATCH --partition=parallel 6 #SBATCH -o TestJobSteps-%j.out 7 #SBATCH -e TestJobSteps-%j.err 8 #SBATCH --time=6:00: srun -N 16 -n 32 -t 00:50:00./mpi-prog_1 11 srun -N 2 -n 4 -t 00:10:00./mpi-prog_2 12 srun -N 32 --ntasks-per-node=2 -t 05:00:00./mpi-prog_3 on LOEWE-CSC & FUCHS

57 Job Scripts Toy Examples Listing 14: Job Arrays 1 #!/bin/bash 2 3 #SBATCH -J TestJobArrays will cause 20 array-tasks 4 #SBATCH --nodes=1 5 #SBATCH -o TestJobArrays-%A_%a.out (numbered 1, 2,..., 20) 6 #SBATCH -e TestJobArrays-%A_%a.err 7 #SBATCH --time=2:00:00 8 #SBATCH --array= srun -N 1 --ntasks-per-node=1./prog input_${slurm_array_task_id}.txt array-tasks are simply copies of this master script SLURM supports job arrays with option --array SLURM_ARRAY_JOB_ID : %A : base array job id SLURM_ARRAY_TASK_ID : %a : array index on LOEWE-CSC & FUCHS

58 Job Array Support General Information Job arrays offer a mechanism for submitting and managing collections of similar jobs quickly and easily. Job arrays with many tasks can be submitted in milliseconds. All jobs have the same initial options (e.g. size, time, limit) Users may limit how many such jobs are running simultaneously. Job arrays are only supported for batch jobs. To address a job array, SLURM provides a base array ID & an array index for each job, specify with <base job id>_<array index> SLURM exports environment variables on LOEWE-CSC & FUCHS

59 CPU Management with SLURM 1 Selection of Nodes 2 Allocation of CPUs from Selected Nodes 3 Distribution of Tasks to Selected Nodes 4 Optional Distribution & Binding of Tasks to Allocated CPUs within a Node on LOEWE-CSC & FUCHS

60 CPU Management General Information Allocation Assignment of a specific set of CPU resources (nodes, sockets, cores and/or threads) to a specific job or step Distribution 1 Assignment of a specific task to a specific node 2 Assignment of a specific task to a specific set of CPUs within a node (used for optional Task-to-CPU binding) Core Binding Confinement/locking of a specific set of tasks to a specific set of CPUs within a node on LOEWE-CSC & FUCHS

61 CPU Management with SLURM Selection of resource with sbatch #SBATCH --partition=parallel #SBATCH --nodes=6 #SBATCH --constraint=intel20 #SBATCH --mem=512 #SBATCH --ntasks=12 #SBATCH --cpus-per-task=3 Allocation of resource with sbatch Distribution with srun srun --distribution=block:cyclic./my_program Core Binding Process/Task Binding with srun srun --cpu_bind=cores./my_program on LOEWE-CSC & FUCHS

62 HOW-TO General Information first you have to login to one of the login nodes prepare a batch script with your requirements execute the batch script to run your application monitor the batch script on the terminal on LOEWE-CSC & FUCHS

63 Login to one of the Login Nodes on LOEWE-CSC & FUCHS

64 Executing the Batch Script to run your Application on LOEWE-CSC & FUCHS

65 Monitoring the Batch Script on the Terminal squeue -u <user> scancel <jobid> on LOEWE-CSC & FUCHS

66 Monitoring the Batch Script on the Terminal scontrol show job <jobid> on LOEWE-CSC & FUCHS

67 Setting of the LOEWE-CSC Cluster Intel Node RAM AMD Node RAM GPU Node RAM important parameters for sbatch: CPU Core Core CPU Core Core graphic CPU Core Core graphic -p, --partition Core Core Core Core card Core Core card -C, --constraint Input / Output Input / Output Input / Output -J, --job-name HDD HDD HDD -t, --time interconnect Fabric -N, --nodes --mem-per-cpu Storage Storage -n, --ntasks -c, --cpus-per-task on LOEWE-CSC & FUCHS

68 Job Scripts Examples General Information Listing 15: Parallel MPI Job #!/bin/bash #SBATCH --job-name=parallelmpi #SBATCH --output=expscript-%j.out #SBATCH --error=expscript-%j.err # #SBATCH --partition=parallel #SBATCH --constraint=dual # #SBATCH --ntasks=4 #SBATCH --time=00:10:00 #SBATCH --mem-per-cpu=100 module load mpi/mvapich2/gcc/2.0 mpiexec helloworld.mpi on LOEWE-CSC & FUCHS

69 Job Scripts Examples General Information Listing 16: OpenMP Job #!/bin/bash #SBATCH --job-name=parallelopenmp #SBATCH --output=expscript-%j.out #SBATCH --error=expscript-%j.err # #SBATCH --partition=parallel #SBATCH --constraint=dual # #SBATCH --ntasks=1 #SBATCH --cpus-per-task=4 #SBATCH --time=00:10:00 #SBATCH --mem-per-cpu=100 export OMP_NUM_THREADS=4./helloworld.omp on LOEWE-CSC & FUCHS

70 #!/bin/bash #SBATCH --nodes=1 #SBATCH --tasks-per-node=1 #SBATCH --cpus-per-task=16 export OMP_NUM_THREADS=16 Listing 17: OpenMP Job srun -n 1 --cpus-per-task $OMP_NUM_THREADS./application Node 1 Socket 1 Socket 2 Cores 0-7 Cores physical view on LOEWE-CSC & FUCHS

71 Job Scripts Examples General Information Listing 18: MPI Job #!/bin/bash #SBATCH --nodes=2 #SBATCH --tasks-per-node=16 #SBATCH --cpus-per-task=1 Node 1 Socket 1 Socket 2 Cores 0-7 Cores 8-15 srun -n 32./application Node 2 Socket 1 Socket 2 Cores 0-7 Cores on LOEWE-CSC & FUCHS

72 Job Scripts Examples General Information #!/bin/bash #SBATCH --nodes=2 #SBATCH --tasks-per-node=4 #SBATCH --cpus-per-task=4 export OMP_NUM_THREADS=4 Listing 19: Hybrid MPI/OpenMP Job srun -n 8 --cpus-per-task $OMP_NUM_THREADS./application Node 1 Socket 1 Socket 2 Cores 0-7 Cores 8-15 Node 2 Socket 1 Socket 2 Cores 0-7 Cores 8-15 Rank 0 Rank 1 Rank 2 Rank 3 0,1 2,1 4,5 6,7 Rank 0 Rank 1 Rank 2 Rank 3 8,9 10,11 12,13 14,15 Rank 4 Rank 5 Rank 6 Rank 7 0,1 2,1 4,5 6,7 Rank 4 Rank 5 Rank 6 Rank 7 8,9 10,11 12,13 14,15 on LOEWE-CSC & FUCHS

73 Submitting a Batch Script Suppose you need 16 cores. Have the control on how the cores are allocated using --cpus-per-task & --ntasks-per-node options. With those options, there are several ways to get the same allocation: Example for --cpus-per-task & --ntasks-per-node Equivalence in terms of resource allocation: --nodes=4 --ntasks=4 --cpus-per-task=4 --ntasks=16 --ntasks-per-node=4 with srun 4 processes are launched mpirun 16 processes are launched on LOEWE-CSC & FUCHS

74 Submitting a Batch Script Suppose you need 16 cores. Example for --cpus-per-task & --ntasks-per-node use mpi & don t care about where those cores are distributed: --ntasks=16 launch 16 independent processes (no communication): --ntasks=16 want those cores to spread across distinct nodes: --ntasks=16 --ntasks-per-node=1 or --ntasks=16 --nodes=16 want those cores to spread across distinct nodes & no interference from other jobs: --ntasks=16 --nodes=16 --exclusive on LOEWE-CSC & FUCHS

75 Submitting a Batch Script Suppose you need 16 cores. Example for --cpus-per-task & --ntasks-per-node 16 processes to spread across 8 nodes to have two processes per node: --ntasks=16 --ntasks-per-node=2 16 processes to stay on the same node: --ntasks=16 --ntasks-per-node=16 one process that can use 16 cores for multithreading: --ntasks=1 --cpus-per-task=16 4 processes that can use 4 cores each for multithreading: --ntasks=4 --cpus-per-task=4 on LOEWE-CSC & FUCHS

76 Submitting a Batch Script Example for --mem & --mem-per-cpu If you request two cores (-n 2) and 4 Gb with --mem, each core will receive 2 Gb RAM. If you specify 4 Gb with --mem-per-cpu, each core will receive 4 Gb for a total of 8 Gb. on LOEWE-CSC & FUCHS

77 Different Batch-Scripts Cluster Computing in Frankfurt Goethe University in Frankfurt/Main Center for Scientific Computing December 12, 2017

78 OpenMP example General Information 1 #!/bin/bash 2 #SBATCH --job-name=openmpexp 3 #SBATCH --output=expscript.out 4 #SBATCH --error=expscript.err 5 #SBATCH --partition=parallel 6 #SBATCH --constraint=dual 7 #SBATCH --ntasks=1 8 #SBATCH --cpus-per-task=24 9 #SBATCH --mem-per-cpu= #SBATCH --time=48:00:00 11 #SBATCH --mail-type=all 12 # 13 export OMP_NUM_THREADS=24 14./example_program Listing 20: OpenMP If your application needs 4800 MB you want to run 24 threads set mem-per-cpu=200 (4800/24 = 200) Different Batch-Scripts

79 MPI example General Information Listing 21: MPI 1 #!/bin/bash 2 #SBATCH --job-name=mpiexp 3 #SBATCH --output=expscript.out 4 #SBATCH --error=expscript.err SLURM_NTASKS = OpenMPI ranks 5 #SBATCH --partition=parallel 6 #SBATCH --constraint=dual 7 #SBATCH --ntasks=slurm_ntasks 8 #SBATCH --cpus-per-task=1 9 #SBATCH --mem-per-cpu= #SBATCH --time=48:00:00 11 #SBATCH --mail-type=all 12 # 13 module load openmpi/gcc/ export OMP_NUM_THREADS=1 15 mpirun -np 96./example_program Different Batch-Scripts

80 MPI example General Information Listing 22: MPI 1 #!/bin/bash 2 #SBATCH --job-name=mpiexp 3 #SBATCH --output=expscript.out 4 #SBATCH --error=expscript.err 1200 MB of RAM are allocated for each rank 5 #SBATCH --partition=parallel 6 #SBATCH --constraint=dual 7 #SBATCH --ntasks=slurm_ntasks 8 #SBATCH --cpus-per-task=1 9 #SBATCH --mem-per-cpu= #SBATCH --time=48:00:00 11 #SBATCH --mail-type=all 12 # 13 module load openmpi/gcc/ export OMP_NUM_THREADS=1 15 mpirun -np 96./example_program Different Batch-Scripts

81 small MPI example General Information 1 #!/bin/bash 2 #SBATCH --job-name=smallmpiexp 3 #SBATCH --output=expscript.out 4 #SBATCH --error=expscript.err 5 #SBATCH --partition=parallel 6 #SBATCH --constraint=dual 7 #SBATCH --ntasks=24 8 #SBATCH --nodes=1 9 #SBATCH --cpus-per-task=1 10 #SBATCH --mem-per-cpu= #SBATCH --time=48:00:00 12 #SBATCH --mail-type=fail 13 # 14 export OMP_NUM_THREADS=1 15 mpirun -np 12./program input01 >& 01.out & 16 sleep 3 17 mpirun -np 12./program input02 >& 02.out & 18 wait Listing 23: small MPI example if you have several 12-rank MPI jobs you can start more than one computations Different Batch-Scripts

82 Hybrid MPI+OpenMP example 1 #!/bin/bash 2 #SBATCH --job-name=hybridexp 3 #SBATCH --output=expscript.out 4 #SBATCH --error=expscript.err 5 #SBATCH --partition=parallel 6 #SBATCH --constraint=dual 7 #SBATCH --ntasks=24 8 #SBATCH --cpus-per-task=6 9 #SBATCH --mem-per-cpu= #SBATCH --time=48:00:00 11 #SBATCH --mail-type=all 12 # 13 export OMP_NUM_THREADS=6 14 export MV2_ENABLE_AFFINITY=0 15 srun -n 24./example_program Listing 24: Hybrid MPI+OpenMP 24 ranks Different Batch-Scripts

83 Hybrid MPI+OpenMP example 1 #!/bin/bash 2 #SBATCH --job-name=hybridexp 3 #SBATCH --output=expscript.out 4 #SBATCH --error=expscript.err 5 #SBATCH --partition=parallel 6 #SBATCH --constraint=dual 7 #SBATCH --ntasks=24 8 #SBATCH --cpus-per-task=6 9 #SBATCH --mem-per-cpu= #SBATCH --time=48:00:00 11 #SBATCH --mail-type=all 12 # 13 export OMP_NUM_THREADS=6 14 export MV2_ENABLE_AFFINITY=0 15 srun -n 24./example_program Listing 25: Hybrid MPI+OpenMP 6 threads each Different Batch-Scripts

84 Hybrid MPI+OpenMP example 1 #!/bin/bash 2 #SBATCH --job-name=hybridexp 3 #SBATCH --output=expscript.out 4 #SBATCH --error=expscript.err 5 #SBATCH --partition=parallel 6 #SBATCH --constraint=dual 7 #SBATCH --ntasks=24 8 #SBATCH --cpus-per-task=6 9 #SBATCH --mem-per-cpu= #SBATCH --time=48:00:00 11 #SBATCH --mail-type=all 12 # 13 export OMP_NUM_THREADS=6 14 export MV2_ENABLE_AFFINITY=0 15 srun -n 24./example_program Listing 26: Hybrid MPI+OpenMP 200 MB per threads Different Batch-Scripts

85 Hybrid MPI+OpenMP example 1 #!/bin/bash 2 #SBATCH --job-name=hybridexp 3 #SBATCH --output=expscript.out 4 #SBATCH --error=expscript.err 5 #SBATCH --partition=parallel 6 #SBATCH --constraint=dual 7 #SBATCH --ntasks=24 8 #SBATCH --cpus-per-task=6 9 #SBATCH --mem-per-cpu= #SBATCH --time=48:00:00 11 #SBATCH --mail-type=all 12 # 13 export OMP_NUM_THREADS=6 14 export MV2_ENABLE_AFFINITY=0 15 srun -n 24./example_program Listing 27: Hybrid MPI+OpenMP 24 6 threads you will get six 24 core nodes Different Batch-Scripts

86 Job Scripts Toy Examples Listing 28: Hybrid Job with Simultaneous multithreading (SMT) 1 #!/bin/bash 2 3 #SBATCH -J TestHybrid 4 #SBATCH --ntasks=6 5 #SBATCH --ntasks-per-node=2 6 #SBATCH --cpus-per-task=24 7 #SBATCH -o TestMPI-%j.out 8 #SBATCH -e TestMPI-%j.err 9 #SBATCH --time=0:20:00 10 #SBATCH --partition=parallel export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} srun./hybrid-prog Different Batch-Scripts

87 Job Scripts Toy Examples Listing 29: Hybrid Job with Simultaneous multithreading (SMT) 1 #!/bin/bash 2 3 #SBATCH -J TestHybrid 4 #SBATCH --ntasks=6 5 #SBATCH --ntasks-per-node=2 6 #SBATCH --cpus-per-task=24 7 #SBATCH -o TestMPI-%j.out 8 #SBATCH -e TestMPI-%j.err 9 #SBATCH --time=0:20:00 10 #SBATCH --partition=parallel export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} srun./hybrid-prog Different Batch-Scripts

88 Closing Remarks Cluster Computing in Frankfurt Goethe University in Frankfurt/Main Center for Scientific Computing December 12, 2017

89 Checklist for a successful Cluster Usage General Information Account exists? I know how to access the cluster. I know the parallel behavior of my software (and I know whether it is parallel at all). I can approximate the runtime behavior and memory usage of my software. I know how to run my software on the operating system of the cluster. I know where to find help when I have problems. HKHLR-members. Closing Remarks

90 Summary Resource Allocation Specifications General Information Syntax: Features of a Cluster Node count Node restrictions Task count Task specifications Memory per node per CPU sbatch mybatchscript.sh -C, --constraint -N, --nodes -w, --nodelist -n, --ntasks --ntasks-per-node --ntasks-per-socket --ntasks-per-core --cpus-per-task --mem --mem-per-cpu Closing Remarks

91 Cluster Quick Reference Guide Version 1.0 February 14, 2017 Cluster quick reference Frankfurt 1 Cluster Usage Architecture & Constraints Cluster Frankfurt Partitions Cluster Frankfurt Access Cluster Frankfurt ssh <username>@loewe-csc.hhlr-gu.de LOEWE ssh <username>@hhlr.csc.uni-frankfurt.de FUCHS Go to CSC-Website/Access to get an account at the clusters. The project manager has to send for LOEWE a request to Prof. Lüdde to get CPU- Time for research projects. Getting Help Cluster Frankfurt You will find further information about usable commands on the clusters with man <command>. How-To execute mybatchscript.sh 1 first you have to login to one of the login nodes 2 prepare a batch script with your requirements 3 execute the batch script to run your application Module setting program environments Syntax: module <command> <modulename> avail display all available modules list display all loaded modules load add <module> load a module load unstable load a deprecated or unstable module unload rm <module> unload a module switch swap <old-module> <new-module> first unloads an old module then loads a new module purge unload all currently loaded modules How-To use custom modules 1 writing a module file in tcl* to set environment variables 2 module load use.own enables you to load your own modules 3 module load ~/privatemodules/modulename 4 use facilities provided by module * look for examples in /cm/shared/modulefiles Cores LOEWE Cluster Frankfurt RAM #nodes CPU GHz # CPUs GPU 2/24 64GB 1xATI 438 AMD 2.10 Radeon Magny-Cours HD5870 1GB Opteron 6172 Intel Xeon /20 128GB E5-2670v2 Ivy Bridge Intel Xeon 139 2/20 128GB E5-2640v2 Broadwell Intel Xeon /12 128GB 2xAMD FirePro E5-2650v2 S GB Ivy Bridge The architecture will be selectable via the --constraint option, dual = dual-socket AMD Magny-Cours CPU/GPU nodes, intel20 = dual-socket Intel Ivy Bridge CPU nodes, broadwell = dual-socket Intel Broadwell CPU nodes. FUCHS Cluster Frankfurt Processor Type #AMD Socket #CPU RAM [in GB] Magny-Cours 72 dual Magny-Cours 36 quad Istanbul 250 dual 12 32/64 The architecture is called with --constraint. magnycours = 72 dual-socket AMD Magny-Cours nodes dual = 250 dual-socket Istanbul nodes quad = 36 quad-socket AMD Magny-Cours nodes --constraint=magnycours dual to avoid quad Contact HPC Frankfurt If you have any HPC-questions about SLURM and want help by debugging & optimizing your program, please write to hpc-support@csc.uni-frankfurt.de. Else, you can contact the system administrators if you need software to be installed: support@csc.uni-.... Detailed documentation on using the cluster can be found at CSC-Website. cluster partition run time Max Nodes Max NodesPU Max JobsPU Max SubmitPU LOEWE parallel 30d gpu 30d test 1h FUCHS parallel 30d test 12h The maximum array size of the cluster is To view such informations on the cluster, use the command: sacctmgr list QOS partition format=maxnodes,maxnodesperuser,maxjobsperuser,maxsubmitjobsperuser scontrol show partition sinfo -p partition squeue -p partition partition description LOEWE parallel A mix of AMD Magny-Cours nodes, Intel Xeon Ivy Bridges & Broadwell nodes. gpu dual-socket Intel Xeon Ivy Bridge E5-2650v2 CPU/GPU nodes, each with two AMD FirePro S10000 dual-gpu cards --constraint=gpu become obsolete, use --partition=gpu instead. Mixed node types gpu*3&intel20*2 is possible. Ensure, that the No of nodes you request matches the No of nodes in your constraints. Per-User Resource Limits limit MaxNodes MaxNodesPU MaxJobsPU MaxSubmitPU MaxArraySize File Systems description max No of nodes Cluster Frankfurt max No of nodes to use at the same time max No of jobs to run simultaneously max No of jobs in running or pending state max job array size storage systems mountpoint /home /scratch /local /data0[1 2] size 10GB PU 764 TB 1.4 T 500TB each access time slow fast fast slow system NFS FhGFS ext3 NFS network Ethernet InfiniBand Ethernet Center for Scientific Computing Hessisches Kompetenzzentrum für Hochleistungsrechnen Closing Remarks

92 Cluster Quick Reference Guide Version 1.0 February 14, 2017 Cluster quick reference Frankfurt Resource Manager Cluster Frankfurt srun run parallel jobs interactive mode sinfo view info about nodes and partitions On our systems, compute jobs are managed by SLURM. At the clusters, the node allocation is exclusive. You can find more examples on our CSC- Website/ClusterUsage. In SlurmCommands, there is a detailed summary of the different options. 2 Job Submission & Execution sbatch batch mode salloc interactive allocate resources Syntax: salloc [options] [<command> [command args]] sbatch mybatchscript.sh -a, --array=<indexes> submit a job array -C, --constraint=<feature> specify features of a Cluster -c, --cpus-per-task=<ncpus> Threads How many threads run on the node? with OpenMP -J, --job-name=<job-name> specify a name for the allocation -m, --distribution=<block cyclic arbitrary plane> mapping of processes --mem=<mb> specify real memory required per node --mem-per-cpu min memory required per allocated CPU --mem_bind=<type> bind tasks to memory -N, --nodes=<min[-max]> Nodes How many nodes will be allocated to this job? -n, --ntasks=<number> Tasks How many processes are started? important for OpenMP -p <partition> request specific partition for the resource -t <time> set limit on total run time of the job -w, --nodelist=<node_name_list> request a specific list of node names sbatch execute mybatchscript.sh batch mode #!/bin/bash #SBATCH -p parallel # partition (queue) #SBATCH -C dual intel20 # class of nodes #SBATCH -N -n -c 1 # number of nodes processes cores #SBATCH --mem 100 # memory pool for all cores #SBATCH -t 0-2:00 # time (D-HH:MM) srun helloworld.sh # start program After modulefiles are loaded and resources have been allocated, an application on the assigned node can be started with preceding srun run parallel jobs mpiexec run mpi program In this shell window more applications can be started. process binding constraints each process to run on specific processors --cpu_bind process binding to cores & CPUs srun --bind-to -core -socket -none mpirun --cpus-per-proc <#perproc> bind each process to the specified number of cpus --report-bindings report any bindings for launched processes --slot-list <id> list of processor IDs to be used for binding MPI processes 3 Accounting sacct display accounting data Syntax: sacct [options] -b, --brief displays jobid, status, exitcode -e, --helpformat print a list of available fields -o, --format comma separated list of fields sacctmgr view Slurm account information Syntax: sacctmgr [options] [command] list show display information about the specified entity 4 Job Management scancel cancel a job Syntax: scancel <jobid> -u <username> cancel all the jobs for a user -t PD -u <username> cancel all the pending jobs for a user Syntax: sinfo [options] -i <seconds> print state on a periodic basis -l, --long print more detailed information -n <nodes> print info only about the specific node -p <partition> print info about the specified partition -R, --list-reasons list reasons why nodes are in the down, drained, fail or failing state -s, --summarize list only a partition state summary with no node state details squeue view job info located in scheduling queue Syntax: squeue [options] -i <seconds> report requested information -j <job_id_list> print list of job IDs -r print one job array element per line --start report expected start time & resources to be allocated for pending jobs -t <state_list> print specified states of jobs -u <user_list> print jobs from list of users scontrol view state of specified entity Syntax: scontrol [options] [command] ENTITY_ID options: -d, --details print show command print more details -o, --oneliner print information one line per record command <ENTITY_ID>: hold <jobid> pause a particular job resume <jobid> resume a particular job requeue <jobid> requeue (cancel & rerun) a particular job suspend <jobid> suspend a running job scontrol show ENTITY_ID: job <job_id> print job informations node <name> print node informations partition <name> print partition informations reservation print list of reservations Center for Scientific Computing Hessisches Kompetenzzentrum für Hochleistungsrechnen Closing Remarks

93 ISC STEM STUDENT DAY & STEM GALA Purpose? HPC skills can positively shape STEM students future careers Introduction of current & predicted HPC job landscape& how European HPC workforce look like in 2020 Audience? Undergraduate and graduate students pursuing STEM degrees When? Wednesday, June 27; 9:30am 9:30pm What? Where? Day Program & Evening Program in Frankfurt Fee? Free admission for STEM Students Registration? Registration will open in spring 2018, just for 70 attendees first come first serve Infos? Announcement Closing Remarks

94 Day Program & Evening Program in Frankfurt Tutorial on HPC Applications, Systems & Programming Languages Dr.-Ing. Bernd Mohr Tutorial on Machine Learning & Data Analytics Prof. Dr.-Ing. Morris Riedel Guided Tour of ISC Exhibition & Student Cluster Competition Keynote Thomas Sterling at Room Konstant, Forum, Messe Frankfurt Welcome Kim McMahon Introduction by Addison Snell Job Fair & Dinner at Marriott Frankfurt Closing Remarks

curriculum HPC: technological foundation of machine learning, AI and the Internet of Things well-paying HPC-related jobs are

95 Why Attend the STEM Day? science, technology, engineering or mathematics rely on HPC STEM degree programs are not including HPC courses into their curriculum HPC: technological foundation of machine learning, AI and the Internet of Things well-paying HPC-related jobs are not being filled due to a shortage of HPC skills depend on your skills the salary is between $80 K to $150 K free introduction to HPC & its role in STEM careers introduce the organizations that offer training in HPC Closing Remarks

96 Feedback General Information Questionnaire Questionnaire Cluster Computing Course Date: What did you like most about the course? Evaluation of the Course very good very bad Total impression How would you evaluate the content and target of the workshop? actuality comprehensibility relevance of content practical relevance handout... the professional competence of the course instructor?... the presentation?... the methodical-didactic competence with regard to... the structure of learning content and its presentation? Which course... you join? did would UNIX TOOLS SHELL CLUSTER PYTHON CPP TOTALVIEW MAKE HPC LIKWID VAMPIR Is there any topic missing that you are interested in? What did you like least about the course? Which ideas and suggestions do you have for this course? What content would you have additionally preferred? How did you like the exercises?... the participant orientation?... the equipment and environment? Length of the course adequate too short too long Depth of the content adequate too superficial too profound Subject of the course important for me minor for me Do you wish further courses on this Yes No subject? Would you recommend this course? Yes No Will you be using the material later on? Yes No Follow up courses in Python How were you informed about this course? WWW colleagues direct mail introductory courses other About You Affiliation (type): student PhD student employee (but not PhD student) Employment: Would you be interested in a follow-up course about TDD with Python? Yes No... Python project development? Yes No... Other python related topic? Which one? at University of Frankfurt at University of Kassel at University of Marburg at University of Gießen Faculty: at University of Darmstadt at GSI at German federal research labs, MPI, FhG at other university at other institute/company Physics Computer Science Mathematics Chemistry Biology Engineer other Center for Scientific Computing Hessisches Kompetenzzentrum für Hochleistungsrechnen Closing Remarks Center for Scientific Computing Hessisches Kompetenzzentrum für Hochleistungsrechnen

97 General Contact Information M. Sc. Computer Science Physics Building, room Thank you for your attention. Questions? Tel.: Fax: HKHLR: Questions: software HPC-questions Website: csc.uni-frankfurt.de public CSC-Meeting: Every first Wednesday of the month at 10:00am in the physics building, room Closing Remarks

98 SLURM Glossary Cluster Computing in Frankfurt Goethe University in Frankfurt/Main Center for Scientific Computing December 12, 2017

99 sbatch Submitting a Batch Script General Information Just exclusive nodes are available on the LOEWE-CSC-Cluster. The following commands are therefore not considered: --exclusive exklusive nodes -s, --share shared nodes SLURM Glossary

100 sbatch Submitting a Batch Script General Information Syntax: sbatch mybatchscript.sh -C, --constraint=<feature> specify features of a Cluster Intel --constraint=intel20 Intel Ivy Bridge Intel --constraint=broadwell Intel Broadwell AMD --constraint=dual AMD Magny-Cour with AMD Radeon HD 5800 GPU --partition=gpu Intel with AMD FirePro S10000 SLURM Glossary

101 sbatch Submitting a Batch Script General Information Syntax: sbatch mybatchscript.sh -p, --partition=<partition> -J, --job-name=<jobname> -w, --nodelist=<list> -A, --account=<account> -t, --time=<time> -a, --array=<indexes> -o, --output=<name-%j>.out -e, --error=<name-%j>.err specify partition for the resource specify a name for allocation specify list of node names select a project set limit on total run time of job submit a job array save the output file save the error log to a file SLURM Glossary

Batch Usage on JURECA Introduction to Slurm. May 2016 Chrysovalantis Paschoulas HPS JSC

Batch Usage on JURECA Introduction to Slurm May 2016 Chrysovalantis Paschoulas HPS group @ JSC Batch System Concepts Resource Manager is the software responsible for managing the resources of a cluster,