bwfortreff bwhpc user meeting bwhpc Competence Center MLS&WISO Universitätsrechenzentrum Heidelberg Rechenzentrum der Universität Mannheim Steinbuch Centre for Computing (SCC) Funding: www.bwhpc-c5.de
What is bwfortreff? Participants Users of bwgrid/bwhpc systems Students and scientists interested in HPC Members of all bwhpc partner universities Scope System status bwgrid/bwhpc HPC related lectures and workshops Questions and Discussions User contributions 2
bwfortreff 23.04.2014 - Agenda 3 Time Topic 16:15 bwhpc and bwhpc-c5 (H. Kredel, MA) 16:30 bwunicluster (S. Richling, HD) 17:30 bwfilestorage (T. Kienzle, MA) 17:45 Q&A 18:00 End
bwhpc and bwhpc-c5 Steinbuch Centre for Computing (SCC) Funding: www.bwhpc-c5.de
What is bwhpc/bwhpc-c5? 5 bw = Baden-Würtemberg bwhpc = Strategy for high perfomance computing in BW for 2013 to 2018 (in particular for Tier 3) bwhpc-c5 = Federated user and IT support activities for bwhpc
bwhpc - Tier Classification 6
bwhpc: Tier 3 (2013-2018) MA/HD: bwforcluster MLS&WISO Mannheim Heidelberg KA: bwunicluster TU: bwforcluster BinAC Karlsruhe Tübingen Ulm Freiburg FR: bwforcluster NME 7 UL: bwforcluster TheoChem
bwhpc - covered research areas Economy and social science General purpose, Teaching Molecular life science Mannheim Heideberg Astrophysics Karlsruhe Neurosciences Bioinformatics Tübingen Ulm Freiburg Theoretical chemistry Microsystems engineering Elementary particle physics Independent of location, users of research area X use science cluster X 8
Federated HPC@Tier3 (1) bwunicluster: Uni = universal or University Financed by Baden-Württemberg's ministry of science, research and arts and the shareholders: Freiburg, Tübingen, KIT, Heidelberg, Ulm, Hohenheim, Konstanz, Mannheim, Stuttgart Usage: Free of charge General purpose, teaching & education Technical computing (sequential & weak parallel) & parallel computing Access / limitations: For all members of shareholder's university Quota and computation share based on university's share 9
Federated HPC@Tier3 (2) bwforcluster: For = Forschung (research) Financed by German Research Society (DFG) and Baden-Württemberg's ministry of science, research and arts Access: All university members in Baden-Württemberg For science communities according to DFG proposal Usage, limitations: Free of charge Access only to bwforcluster matching users field of research Access requires approval of compute proposal 10
What is bwhpc-c5? C5 = Coordinated compute cluster competence centers Federated user support and IT services activities for bwhpc For User: Send your support requests to competence centers (CC) CC are not local, but distributed over whole BW CC are community specific For BW: Bridging Science & HPC Bridging HPC tiers Embedding services Science 11 HPC infrastructure bw services bwhpc-c5
bwhpc-c5: Location of project partners Uni Heidelberg Uni Mannheim Uni Stuttgart HTF Stuttgart KIT Uni Hohenheim HS Esslingen Uni Tübingen Uni Freiburg Uni Konstanz 12 Uni Ulm
bwhpc-c5: Federated science support HPC competence centers Formation of a HPC expert pool (related to field of research and knowledge in parallel software development) Coordination of tiger team activities (high level support teams) Buildup of a best practice repository Coordination of teaching activities Evaluation of courses Generate offline and online material (elearning, MOOC) Cluster Innovations New Technology: Accelerators etc. HPC virtualization, Compute cloud Innovative cluster access, pre- and post-processing tools 13
bwhpc-c5: What kind of support? Information seminars, hands-on, HPC specific workshops Documentation + best practices repository www.bwhpc-c5.de/wiki Providing/maintaining/developing: simplified access to all bwhpc resources software portfolio cluster independent & unified user environment tools for data management trouble ticket system cluster information system Migration support: code adaptation, e.g. MPI or OpenMP parallelisation code porting (from desktop or old HPC clusters) to tier 2 and 1 14
bwhpc-lna: Scientific Steering Committee LNA = Landesnutzerausschuss: Scientific steering of bwhpc and bwdata Website: http://www.bwhpc-c5.de/98.php Tasks: Set bwhpc access formalities Assessment of bwhpc workload Regulation of bwhpc cluster expansion Assignment of science community to science clusters Representation of HPC user interests concerning: resource demands, HPC technologies and software licenses, adjustment of resource quota 15
Status bwgrid/bwhpc bwgrid Cluster: Stuttgart, Karlsruhe and Ulm already offline Heidelberg/Mannheim: will close 2014 (running without support) Freiburg, Tübingen, Esslingen: will close 2015 or later Storage: closed in 12/2013 bwhpc (Tier 3) bwcluster: bwunicluster: available since Q1/2014 bwforcluster TheoChem and MLS&WISO: in Q3/2014 bwforcluster BinAC and MNE: in 2015 bwfilestorage: open since 12/2013 16
bwunicluster First steps (HD/MA) Steinbuch Centre for Computing (SCC) Funding: www.bwhpc-c5.de
1. Intro 18
Documentation/Literature 1. 1. Intro Intro bwunicluster Wiki http://www.bwhpc-c5.de/wiki bwfortreff Slides http://www.urz.uni-heidelberg.de/hpc/bwfortreff.html Introduction to Unix/Linux commands http://freeengineer.org/learnunixin10minutes.html Bash scripting: http://tldp.org/howto/bash-prog-intro-howto.html (intro) http://tldp.org/ldp/abs/html (advanced) Environment modulefiles: http://modules.sourceforge.net MOAB queueing system: http://docs.adaptivecomputing.com/mwm/7-2-6/help.htm 19
2. Hardware 20
bwunicluster - Hardware Compute Nodes: 512 thin nodes: 16 cores (2 x 8-core Intel Xeon Sandy Bridge), 64 GB RAM, 2 TB local disk 8 fat nodes: 32 cores (4 x 8-core Intel Xeon Sandy Bridge), 1 TB RAM, 7 TB local disk Interconnect: InfiniBand 4X FDR Parallel filesystem Lustre for $HOME (469 TB) and $WORK (938 TB) 21
2. Access 22
bwunicluster Registration (1) 2. 2. Access Access Access to bwunicluster: Step A: Application for bwunicluster Entitlement Step B: Web Registration for bwunicluster Documentation: http://www.bwhpc-c5.de/wiki/index.php/bwunicluster_user_access Step A is different for each University University of Heidelberg: Apply for bwunicluster entitlement at https://www.urz.uni-heidelberg.de/landesdienste/bwunicluster University of Mannheim: Apply for bwgrid Mannheim/Heidelberg at https://sp-grid-webregistration.uni-mannheim.de/ 23
2. 2. Access Access bwunicluster Registration (2) Step B Web Registration: https://bwidm.scc.kit.edu/ Choose your organization 24
2. 2. Access Access bwunicluster Registration (3a) Step B Web Registration: https://bwidm.scc.kit.edu/ University of Heidelberg: Login with Uni-ID 25
2. 2. Access Access bwunicluster Registration (3b) Step B Web Registration: https://bwidm.scc.kit.edu/ University of Mannheim: Login with RUM account 26
2. 2. Access Access bwunicluster Registration (4) Select bwunicluster Service Description 27
2. 2. Access Access bwunicluster Registration (5) Read terms of usage and accept 28
2. 2. Access Access bwunicluster Registration (6) Read registry information Lookup localuid Deregistration possible 29
2. 2. Access Access bwunicluster Registration (7) Set service password (required): 30
bwunicluster - Login 2. 2. Access Access Name of login server: bwunicluster.scc.kit.edu Login with localuid + service password via SSH Examples: HD: $ ssh hd_ab123@bwunicluster.scc.kit.edu MA: $ ssh ma_amuster@bwunicluster.scc.kit.edu 31
3. Usage 32
Software modules 3. 3. Usage Usage Environment modules dynamic modification of the session environment instructions stored in modulefiles Why environment modules? multiple versions of the same software can be installed and be used in a controlled manner, i.e., by load and unload modulefiles How to use modulefiles in general? $ module help More information: http://www.bwhpc-c5.de/wiki/index.php/bwunicluster_environment_modules 33
Modulefiles (1) 3. 3. Usage Usage Display all modulefiles: $ module avail Display all modulefiles in category devel: $ module avail devel Show help of modulefiles: $ module help <modulefile> List all instructions of modulefile: $ module show <modulefile> Display all loaded modules: $ module list Modulefiles are sorted in categories, software name and versions: $ module load <category>/<software_name>/<version> Load a default software: $ module load <category>/<software_name> e.g. Intel compiler: $ module load compiler/intel To remove module: $ module unload <modulefile> 34 $ module remove <modulefile>
Modulefiles (2) 3. 3. Usage Usage Conflicts: load different software version in the same session, e.g. Intel: compiler/intel/12.1(376):error:150: Module 'compiler/intel/12.1' conflicts with the currently loaded module(s) 'compiler/intel/13.1' load module with dependencies on other modules: $ module load mpi/openmpi/1.6.5-intel-13.1 Loading module dependency 'compiler/intel/13.1'. compiler/intel/13.1(386):error:150: Module 'compiler/intel/13.1' conflicts with the currently loaded module(s) 'compiler/intel/12.1' Be aware that you can create inconsistencies: e.g. you can remove compiler/intel/13.1 while mpi/openmpi/1.6.5-intel-13.1 is still loaded swap = remove + load e.g.: $ module swap compiler/intel/13.1 compiler/intel/12.1 35
3. 3. Usage Usage File System Characteristics of bwunicluster 36 $HOME, $WORK and workspaces are on the parallel file system Lustre
$HOME and $WORK 3. 3. Usage Usage $HOME: Quota: $ lfs quota -u $USER $HOME $WORK: Change to it via: $ cd $WORK Quota: $ lfs quota -u $USER $WORK files older than 28 days will be deleted guaranteed lifetime for files is 7 days 37
Workspaces 3. 3. Usage Usage Workspaces: allocated folders with lifetime Howto: http://www.bwhpc-c5.de/wiki/index.php/bwunicluster_file_system#workspaces $ ws_allocate foo 10 Allocate a workspace named foo for 10 days $ ws_list -a List all your workspaces $ ws_find foo Get absolute path of workspace foo $ ws_extend foo 5 Extend lifetime of your workspace foo by 5 days from now. $ ws_release foo Manually erase your workspace foo Maximum for lifetime: 60 days Number of extensions: 3 times 38
Workspaces 3. 3. Usage Usage Workspaces: allocated folders with lifetime Howto: http://www.bwhpc-c5.de/wiki/index.php/bwunicluster_file_system#workspaces $ ws_allocate foo 10 Allocate a workspace named foo for 10 days $ ws_list -a List all your workspaces $ ws_find foo Get absolute path of workspace foo $ ws_extend foo 5 Extend lifetime of your workspace foo by 5 days from now. $ ws_release foo Manually erase your workspace foo Maximum for lifetime: 60 days Number of extensions: 3 times 39
4. Batch System 40
4. 4. Batch Batch System System Resource and workload manager Job submission via MOAB commands Use of MOAB commands is planned for all bwforcluster Example job submission: $ msub <resource_options> <job_script> Compute nodes are shared according to resource requests Fairshare based Queues: waiting time depends on: your university's share your job demands your demand history 41
4. 4. Batch Batch System System msub options http://www.bwhpc-c5.de/wiki/index.php/bwunicluster_batch_jobs#msub_command msub options: command line or in job script command line option overwrites script option 42
4. 4. Batch Batch System System msub -l resources http://www.bwhpc-c5.de/wiki/index.php/bwunicluster_batch_jobs#msub_-l_resource_list Resources can combined, but must be separated by comma, e.g.: $ msub -l nodes=1:ppn=1,walltime=00:01:00,pmem=1gb <job_script> Request exclusive usage of nodes with option: -l naccesspolicy=singlejob 43
msub -q queues 4. 4. Batch Batch System System http://www.bwhpc-c5.de/wiki/index.php/bwunicluster_batch_jobs#msub_-q_queues If queue not specified: Job is assigned to develop, singlenode and multinode based on requested walltime, nodes and processes. No automatic assignment to: verylong fat 44
4. 4. Batch Batch System System Environment variables MOAB adds the following variables to the job's environment MOAB variables can be used to generalize you job scripts, e.g.: ## add suffix to job output file./program > $program_${moab_jobid}.log 45
4. 4. Batch Batch System System Check/change status of your jobs After submission msub returns <job-id> $ msub job.sh uc1.22108 Monitoring commands: 46 $ showq All your active, eligible, blocked, and/or recently completed jobs $ showstart <job-id> Get information about start time of job with <job-id> $ showstart 16@12:00:00 Get information about start time of 16 procs with run time of 12 hours $ checkjob <job-id> Get detailed information of your job explains why your job is pending $ showq -c -u $(whoami) Display completed job $ canceljob <job-id> Cancel the job with <job-id>
Bash job script 5. 5. Job Job scripts scripts Define workload manager options via #MSUB Job starts in submit directory Minimal job script: #!/bin/bash #MSUB -l nodes=1:ppn=4 #MSUB -l walltime=00:10:00 # Load required module files module load mpi/openmpi/1.6.5-gnu-4.4 # Jobs starts in submit directory, change if necessary cd $HOME/example # Start program mpiexec simple > simple.out 47
bwfilestorage Steinbuch Centre for Computing (SCC) Funding: www.bwhpc-c5.de
bwfilestorage Replacement for bwgrid central storage Location KIT Karlsruhe Starting Size 600 TB For users of bwgrid and bwhpc Requirements Entitlement bwfilestorage (granted to users with bwgrid and/or bwunicluster entitlement) Web Registration 49
bwfilestorage Limits 100 GB quota for new users 40 TB quota per organization Snapshots: 7 daily, 4 weekly, 2 monthly Backup: for disaster recovery only Temp: 7 days unchanged 100.000 files (soft), 200.000 files for 7 days (hard) 100 GB (soft), 200 GB for 7 days (hard) 50
bwfilestorage Registration Web Registration: https://bwidm.scc.kit.edu/ Choose your organisation 51
bwfilestorage Registration Web Registration: https://bwidm.scc.kit.edu/ University of Heidelberg: Login with Uni-ID 52
bwfilestorage Registration Web Registration: https://bwidm.scc.kit.edu/ University of Mannheim: Login with RUM account 53
bwfilestorage Registration Web Registration: https://bwidm.scc.kit.edu/ Service Description 54
bwfilestorage Summary Web Registration: https://bwidm.scc.kit.edu/ Hosts bwfilestorage.lsdf.kit.edu / http://bwfilestorage.lsdf.kit.edu bwfilestorage-login.lsdf.kit.edu (SSH) Commands: scp -c aes128-cbc testfile hd_jsmith@bwfilestorage.lsdf.kit.edu: scp -c arcfour128 testfile hd_jsmith@bwfilestorage.lsdf.kit.edu: Performance you should expect: Test MB/s time (32GB file) uc1 -> bwfilestorage 110-150 (aes128) 3-4 min. frbw4 -> bwfilestorage 65-75 (arcfour128) 8-12 min. For more information please follow the user manual (german). http://www.scc.kit.edu/downloads/sdm/nutzerhandbuch-bwfilestorage.pdf 55