Introduction Workshop 11th 12th November Lecture II: Access and Batchsystem Dr. Andreas Wolf Gruppenleiter Hochleistungsrechnen Hochschulrechenzentrum
Overview Access and Requirements Software packages Module system Batch system Queueing rules Commands Batch script examples for MPI and OpenMP Lecture II: Access and Batchsystem Dr. Andreas Wolf 2
Access and Requirements Software packages Module system Batch system Queueing rules Commands Batch script examples for MPI and OpenMP Lecture II: Access and Batchsystem Dr. Andreas Wolf 3
Access Requirements for the new Cluster Because of the size of the system Each potential user needs to be proved first against the export limitations of the Bundesamt für Wirtschaft und Ausfuhrkontrolle (BAFA) - Export control/embargo E-Mail at HHLR@hrz.tu-darmstadt.de The new user rules document ( Nutzungsordnung ) Names, TU-ID, Email address, Institute and Institution affinity, Citizenship Project title Reports No private data (email, pictures etc.) No commercial use Limited data storage life Lecture II: Access and Batchsystem Dr. Andreas Wolf 4
http://www.hhlr.tu-darmstadt.de 1. 3. 2. Lecture II: Access and Batchsystem Dr. Andreas Wolf 5
Get Access and be Connected Registration Send the filled user rules document to us Wait for the automated answer for granting access Follow the instruction to setup your password via ANDO Accept the additional email for registering to the HPC-Beta email list Here we will inform you about News and Trouble Get Connected Windows: download SSH clients, e.g. putty Linux: typically SSH already installed Lecture II: Access and Batchsystem Dr. Andreas Wolf 6
Access to the Lichtenberg Cluster (lcluster) per SSH Login Nodes: lcluster1.hrz.tu-darmstadt.de lcluster2.hrz.tu-darmstadt.de lcluster3.hrz.tu-darmstadt.de lcluster4.hrz.tu-darmstadt.de <some where>:~> ssh <user name>@lcluster1.hrz.tu-darmstadt.de <user name>@lcluster1.hrz.tu-darmstadt.de's password: <user name>@hla0001:~> Lecture II: Access and Batchsystem Dr. Andreas Wolf 7
Data Transfer between your Client and the Cluster per SCP command (a part of the SSH tools) Transfer over Login nodes <some where>:~> scp <file names> <user name>@lcluster1.hrz de: Password: <file name> 100% <Bytes> <1.0>KB/s <01:00> <some where>:~> <some where>:~> scp <user name>@lcluster1.hrz de:<file name> <dir> Password: <file name> 100% <Bytes> <1.0>KB/s <01:00> <some where>:~> Lecture II: Access and Batchsystem Dr. Andreas Wolf 8
HPC Newsletter Email Newsletter: HPC@lists.tu-darmstadt.de Newsletter subscription https://lists.tu-darmstadt.de/mailman/listinfo/hpc Information about Planned Events, User meetings Planned Lectures, Workshops etc. Common information of the system / news Lecture II: Access and Batchsystem Dr. Andreas Wolf 9
Access and Requirements Software packages Module system Batch system Queueing rules Commands Batch script examples for MPI and OpenMP Lecture II: Access and Batchsystem Dr. Andreas Wolf 10
Software available / installable Operation System SLES (SP2), x86-64, 64Bit System Tools GCC 4.4.6, 4.7.2, 4.8.1 Intel 12.1.0, 13.0.1 (incl. Intel Cluster Studio XE last Workshop) PGI 13.1 ACML, Intel-MKL, SCALAPACK... OpenMPI 1.6.5, Intel-MPI... Totalview ( Dr. S. Boldyrev), Vampir ( C. Iwainsky) Applications Ansys 140, 145... Abaqus 6.12-1 Matlab 2012a COMSOL 4.3 Lecture II: Access and Batchsystem Dr. Andreas Wolf 11
Modular Load and Unload 1 > module list Shows all currently load software environments (load packages of the user) > module load <module name> Loads a specific software environment module Only when the module is successfully loaded the software is really useable! <user name>@hla0001:~> module load ansys Modules: loading ansys/14.5 <user name>@hla0001:~> > module unload <module name> Unloads a software module Lecture II: Access and Batchsystem Dr. Andreas Wolf 12
Modular Load and Unload 2 > module avail Shows all available software packages currently installed <user name>@hla0001:~> module avail --------------- ---------- /shared/modulefiles ------------------------ abaqus/6.12-1 gcc/4.6.4 openmpi/gcc/1.6.5(def ansys/13.0 gcc/4.7.3(default) openmpi/intel/1.6.5 ansys/14.5(default) gcc/4.8.1 openspeedshop/2.0.2( <user name>@hla0001:~> Lecture II: Access and Batchsystem Dr. Andreas Wolf 13
Access and Requirements Software packages Module system Batch system Queueing rules Commands Batch script examples for MPI and OpenMP Lecture II: Access and Batchsystem Dr. Andreas Wolf 14
Queueing Scheduling Different queues for different purposes deflt Limited to maximal 24 hours Main part of all computing nodes long Limited to maximal 7 days Very small part (~ 8) of all computing nodes short Limited to maximal 30 minutes Advantages Depending on the demand reserved nodes, on all others nodes with highest scheduling priority Maintainability of the most computing nodes within 24 hours Because of the main focus to 24 hours jobs Small test jobs (30 minutes) will scheduled promptly Lecture II: Access and Batchsystem Dr. Andreas Wolf 15
Additional Queues Special queues may be changed in the future multi Limited to maximal 24 hours Inter island computing only MPI section testmem Limited to maximal 24 hours 1 TByte memory nodes MEM section testnvd Limited to maximal 24 hours Nvidia Tesla accelerators ACC-G section testphi Limited to maximal 24 hours Intel Xeon Phi accelerators ACC-M section Lecture II: Access and Batchsystem Dr. Andreas Wolf 16
Batch System - LSF Why using LSF? Scalability for a large number of nodes Professional support (for fine tuning) WebGUI graphical front-end for Creating Submitting Monitoring of batch jobs (At present unfortunately not ready for use later) Usability also from a Windows client Lecture II: Access and Batchsystem Dr. Andreas Wolf 17
Batch System Web GUI - 1 Simply connecting to the first Login node 'lcluster1' via web-browser - later Lecture II: Access and Batchsystem Dr. Andreas Wolf 18
Batch System Web GUI - 1 Lecture II: Access and Batchsystem Dr. Andreas Wolf 19
Batch System Commands - LSF > bsub < <batch script> Submit a new batch job to the queueing system > bqueue Shows all presently submitted or active batch jobs and their batch-id numbers > bkill <batch-id> Deletes own batch jobs (with ID...) > bjobs <batch-id> Shows specific configuration or runtime information for a batch job Lecture II: Access and Batchsystem Dr. Andreas Wolf 20
LSF bsub & bkill <user name>@hla0001:~> bsub < sample-script.sh Job <12345> is submitted to default queue <deflt>. <user name>@hla0001:~> bkill 12345 Job <12345> is being terminated <user name>@hla0001:~> The sign < is important for using a script Lecture II: Access and Batchsystem Dr. Andreas Wolf 21
LSF - bqueue <user name>@hla0001:~> bqueues QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP short 100 Open:Active - - - - 0 0 0 0 multi 11 Open:Active - - - - 0 0 0 0 deflt 10 Open:Active - - - - 39731 30080 9651 0 testnvd 10 Open:Active - - - - 0 0 0 0 testphi 10 Open:Active - - - - 0 0 0 0 testmem 10 Open:Active - - - - 128 64 64 0 long 1 Open:Active - - - - 1541 1413 128 0 <user name>@hla0001:~> Lecture II: Access and Batchsystem Dr. Andreas Wolf 22
LSF - bjobs <user name>@hla0001:~> bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME APS 12345 <user> RUN testmem hla0001 64*mem <title> Nov 4 14:50-12346 <user> PEND testmem hla0001 <title> Nov 4 14:50 9.1 <user name>@hla0001:~> Absolute priority scheduling depends on the users computing history, a queue factor and the job size Lecture II: Access and Batchsystem Dr. Andreas Wolf 23
Access and Requirements Software packages Module system Batch system Queueing rules Commands Batch script examples for MPI and OpenMP Lecture II: Access and Batchsystem Dr. Andreas Wolf 24
Batch Script for running a MPI Program - 1 #Job name #BSUB -J MPItest #File / path where STDOUT will be written, the %J is the job id #BSUB -o /home/<tu-id>/mpitest.out%j suppress full path #File / path where STDERR will be written, the %J is the job id #BSUB -e /home/<tu-id>/mpitest.err%j #Request the time you need for execution in minutes #The format for the parameter is: [hour:]minute, #that means for 80 minutes you could also use this: 1:20 #BSUB -W 10 #Request virtual memory you need for your job in MB #BSUB -M 18000 hard limit, but not for job scheduling Lecture II: Access and Batchsystem Dr. Andreas Wolf 25
Batch Script for running a MPI Program - 2... #Request the number of compute slots / #MPI tasks you want to use #BSUB -n 64 #Specify the MPI support #BSUB -a openmpi #Specify your mail address #BSUB -u <email address> #Send a mail when job is done #BSUB -N Lecture II: Access and Batchsystem Dr. Andreas Wolf 26
Batch Script for running a MPI Program - 3... module load openmpi/gcc/1.6.5 #Prove the loaded modules module list Loading modules is saver than using the submit environment cd ~/<working path> Common use without any parameters given by LSF mpirun <program> #Otherwise generating an explicit hostfile echo "$LSB_HOSTS" sed -e "s/ /\n/g" > hostfile.$lsb_jobid #Specify number of tasks and hosts mpirun -n 64 -hostfile hostfile.$lsb_jobid <program> Lecture II: Access and Batchsystem Dr. Andreas Wolf 27
Batch Script for running a OpenMP Program... #Specify the OpenMP support #BSUB -a openmp #export OMP_NUM_THREADS=16 <OpenMP program call> Lecture II: Access and Batchsystem Dr. Andreas Wolf 28
Thank you for your attention Questions??? Lecture II: Access and Batchsystem Dr. Andreas Wolf 29