bwfortreff bwhpc user meeting

Similar documents
Hands-On Workshop bwunicluster June 29th 2015

bwunicluster Tutorial Access, Data Transfer, Compiling, Modulefiles, Batch Jobs

bwunicluster Tutorial Access, Data Transfer, Compiling, Modulefiles, Batch Jobs

Access: bwunicluster, bwforcluster, ForHLR

Operating two InfiniBand grid clusters over 28 km distance

bwgrid Treff am URZ Sabine Richling, Heinz Kredel Universitätsrechenzentrum Heidelberg Rechenzentrum Universität Mannheim 29.

A Long-distance InfiniBand Interconnection between two Clusters in Production Use

Performance Analysis and Prediction for distributed homogeneous Clusters

Implementierung eines Dynamic Remote Storage Systems (DRS) für Applikationen mit hohen IO Anforderungen

Virtualization of the ATLAS Tier-2/3 environment on the HPC cluster NEMO

Now SAML takes it all:

Before We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop

Outline. March 5, 2012 CIRMMT - McGill University 2

Tutorial: Compiling, Makefile, Parallel jobs

UAntwerpen, 24 June 2016

bwfdm Communities - a Research Data Management Initiative in the State of Baden-Wuerttemberg

Extraordinary HPC file system solutions at KIT

Introduction to Discovery.

The JANUS Computing Environment

Practice of Software Development: Dynamic scheduler for scientific simulations

Workshop Agenda Feb 25 th 2015

Introduction to Discovery.

Using file systems at HC3

Assistance in Lustre administration

Introduction to High-Performance Computing (HPC)

Minnesota Supercomputing Institute Regents of the University of Minnesota. All rights reserved.

Using the IAC Chimera Cluster

Introduction to High Performance Computing (HPC) Resources at GACRC

High Performance Computing (HPC) Using zcluster at GACRC

How to run applications on Aziz supercomputer. Mohammad Rafi System Administrator Fujitsu Technology Solutions

Introduction to HPC Using zcluster at GACRC

Introduction to HPC Using the New Cluster at GACRC

Introduction to High-Performance Computing (HPC)

Introduction to HPC Using zcluster at GACRC

OBTAINING AN ACCOUNT:

CENTER FOR HIGH PERFORMANCE COMPUTING. Overview of CHPC. Martin Čuma, PhD. Center for High Performance Computing

Introduction to High Performance Computing (HPC) Resources at GACRC

New User Seminar: Part 2 (best practices)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0)

How to Use a Supercomputer - A Boot Camp

Guillimin HPC Users Meeting. Bart Oldeman

The cluster system. Introduction 22th February Jan Saalbach Scientific Computing Group

Introduction to HPC Using zcluster at GACRC On-Class GENE 4220

Introduction to GALILEO

Illinois Proposal Considerations Greg Bauer

Introduction to High Performance Computing Using Sapelo2 at GACRC

Introduction to PICO Parallel & Production Enviroment

Our new HPC-Cluster An overview

Introduction to Discovery.

Minnesota Supercomputing Institute Regents of the University of Minnesota. All rights reserved.

Grid Computing Competence Center Large Scale Computing Infrastructures (MINF 4526 HS2011)

LBRN - HPC systems : CCT, LSU

Minnesota Supercomputing Institute Regents of the University of Minnesota. All rights reserved.

Introduction to GACRC Storage Environment. Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer

Graham vs legacy systems

Knights Landing production environment on MARCONI

Introduction to the NCAR HPC Systems. 25 May 2018 Consulting Services Group Brian Vanderwende

Using Sapelo2 Cluster at the GACRC

XSEDE New User Training. Ritu Arora November 14, 2014

Introduction to GACRC Storage Environment. Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer

Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 FAS Research Computing

Introduction to HPC Using zcluster at GACRC

Guillimin HPC Users Meeting. Bryan Caron

Grid Computing Activities at KIT

HPC at UZH: status and plans

RHRK-Seminar. High Performance Computing with the Cluster Elwetritsch - II. Course instructor : Dr. Josef Schüle, RHRK

Introduction to HPC Using the New Cluster at GACRC

Moab Workload Manager on Cray XT3

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Guillimin HPC Users Meeting March 16, 2017

Slurm basics. Summer Kickstart June slide 1 of 49

Getting started with the CEES Grid

KISTI TACHYON2 SYSTEM Quick User Guide

Introduction to HPC Using the New Cluster at GACRC

Introduction to High-Performance Computing (HPC)

Introduction to HPC Resources and Linux

CYFRONET SITE REPORT IMPROVING SLURM USABILITY AND MONITORING. M. Pawlik, J. Budzowski, L. Flis, P. Lasoń, M. Magryś

Genius Quick Start Guide

INTRODUCTION TO THE CLUSTER

Introduction to BioHPC

Choosing Resources Wisely. What is Research Computing?

The GPU-Cluster. Sandra Wienke Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

Introduction to HPC Using zcluster at GACRC

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0)

Compute Cluster Server Lab 1: Installation of Microsoft Compute Cluster Server 2003

UL HPC Monitoring in practice: why, what, how, where to look

Working on the NewRiver Cluster

Challenges in making Lustre systems reliable

Computing for LHC in Germany

High Performance Computing Cluster Basic course

Lessons learned from Lustre file system operation

PACE. Instructional Cluster Environment (ICE) Orientation. Research Scientist, PACE

Guillimin HPC Users Meeting March 17, 2016

RWTH GPU-Cluster. Sandra Wienke March Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

High Performance Computing (HPC) Club Training Session. Xinsheng (Shawn) Qin

Introduction to HPCC at MSU

Name Department/Research Area Have you used the Linux command line?

University at Buffalo Center for Computational Research

Computing with the Moore Cluster

Batch Usage on JURECA Introduction to Slurm. May 2016 Chrysovalantis Paschoulas HPS JSC

Transcription:

bwfortreff bwhpc user meeting bwhpc Competence Center MLS&WISO Universitätsrechenzentrum Heidelberg Rechenzentrum der Universität Mannheim Steinbuch Centre for Computing (SCC) Funding: www.bwhpc-c5.de

What is bwfortreff? Participants Users of bwgrid/bwhpc systems Students and scientists interested in HPC Members of all bwhpc partner universities Scope System status bwgrid/bwhpc HPC related lectures and workshops Questions and Discussions User contributions 2

bwfortreff 23.04.2014 - Agenda 3 Time Topic 16:15 bwhpc and bwhpc-c5 (H. Kredel, MA) 16:30 bwunicluster (S. Richling, HD) 17:30 bwfilestorage (T. Kienzle, MA) 17:45 Q&A 18:00 End

bwhpc and bwhpc-c5 Steinbuch Centre for Computing (SCC) Funding: www.bwhpc-c5.de

What is bwhpc/bwhpc-c5? 5 bw = Baden-Würtemberg bwhpc = Strategy for high perfomance computing in BW for 2013 to 2018 (in particular for Tier 3) bwhpc-c5 = Federated user and IT support activities for bwhpc

bwhpc - Tier Classification 6

bwhpc: Tier 3 (2013-2018) MA/HD: bwforcluster MLS&WISO Mannheim Heidelberg KA: bwunicluster TU: bwforcluster BinAC Karlsruhe Tübingen Ulm Freiburg FR: bwforcluster NME 7 UL: bwforcluster TheoChem

bwhpc - covered research areas Economy and social science General purpose, Teaching Molecular life science Mannheim Heideberg Astrophysics Karlsruhe Neurosciences Bioinformatics Tübingen Ulm Freiburg Theoretical chemistry Microsystems engineering Elementary particle physics Independent of location, users of research area X use science cluster X 8

Federated HPC@Tier3 (1) bwunicluster: Uni = universal or University Financed by Baden-Württemberg's ministry of science, research and arts and the shareholders: Freiburg, Tübingen, KIT, Heidelberg, Ulm, Hohenheim, Konstanz, Mannheim, Stuttgart Usage: Free of charge General purpose, teaching & education Technical computing (sequential & weak parallel) & parallel computing Access / limitations: For all members of shareholder's university Quota and computation share based on university's share 9

Federated HPC@Tier3 (2) bwforcluster: For = Forschung (research) Financed by German Research Society (DFG) and Baden-Württemberg's ministry of science, research and arts Access: All university members in Baden-Württemberg For science communities according to DFG proposal Usage, limitations: Free of charge Access only to bwforcluster matching users field of research Access requires approval of compute proposal 10

What is bwhpc-c5? C5 = Coordinated compute cluster competence centers Federated user support and IT services activities for bwhpc For User: Send your support requests to competence centers (CC) CC are not local, but distributed over whole BW CC are community specific For BW: Bridging Science & HPC Bridging HPC tiers Embedding services Science 11 HPC infrastructure bw services bwhpc-c5

bwhpc-c5: Location of project partners Uni Heidelberg Uni Mannheim Uni Stuttgart HTF Stuttgart KIT Uni Hohenheim HS Esslingen Uni Tübingen Uni Freiburg Uni Konstanz 12 Uni Ulm

bwhpc-c5: Federated science support HPC competence centers Formation of a HPC expert pool (related to field of research and knowledge in parallel software development) Coordination of tiger team activities (high level support teams) Buildup of a best practice repository Coordination of teaching activities Evaluation of courses Generate offline and online material (elearning, MOOC) Cluster Innovations New Technology: Accelerators etc. HPC virtualization, Compute cloud Innovative cluster access, pre- and post-processing tools 13

bwhpc-c5: What kind of support? Information seminars, hands-on, HPC specific workshops Documentation + best practices repository www.bwhpc-c5.de/wiki Providing/maintaining/developing: simplified access to all bwhpc resources software portfolio cluster independent & unified user environment tools for data management trouble ticket system cluster information system Migration support: code adaptation, e.g. MPI or OpenMP parallelisation code porting (from desktop or old HPC clusters) to tier 2 and 1 14

bwhpc-lna: Scientific Steering Committee LNA = Landesnutzerausschuss: Scientific steering of bwhpc and bwdata Website: http://www.bwhpc-c5.de/98.php Tasks: Set bwhpc access formalities Assessment of bwhpc workload Regulation of bwhpc cluster expansion Assignment of science community to science clusters Representation of HPC user interests concerning: resource demands, HPC technologies and software licenses, adjustment of resource quota 15

Status bwgrid/bwhpc bwgrid Cluster: Stuttgart, Karlsruhe and Ulm already offline Heidelberg/Mannheim: will close 2014 (running without support) Freiburg, Tübingen, Esslingen: will close 2015 or later Storage: closed in 12/2013 bwhpc (Tier 3) bwcluster: bwunicluster: available since Q1/2014 bwforcluster TheoChem and MLS&WISO: in Q3/2014 bwforcluster BinAC and MNE: in 2015 bwfilestorage: open since 12/2013 16

bwunicluster First steps (HD/MA) Steinbuch Centre for Computing (SCC) Funding: www.bwhpc-c5.de

1. Intro 18

Documentation/Literature 1. 1. Intro Intro bwunicluster Wiki http://www.bwhpc-c5.de/wiki bwfortreff Slides http://www.urz.uni-heidelberg.de/hpc/bwfortreff.html Introduction to Unix/Linux commands http://freeengineer.org/learnunixin10minutes.html Bash scripting: http://tldp.org/howto/bash-prog-intro-howto.html (intro) http://tldp.org/ldp/abs/html (advanced) Environment modulefiles: http://modules.sourceforge.net MOAB queueing system: http://docs.adaptivecomputing.com/mwm/7-2-6/help.htm 19

2. Hardware 20

bwunicluster - Hardware Compute Nodes: 512 thin nodes: 16 cores (2 x 8-core Intel Xeon Sandy Bridge), 64 GB RAM, 2 TB local disk 8 fat nodes: 32 cores (4 x 8-core Intel Xeon Sandy Bridge), 1 TB RAM, 7 TB local disk Interconnect: InfiniBand 4X FDR Parallel filesystem Lustre for $HOME (469 TB) and $WORK (938 TB) 21

2. Access 22

bwunicluster Registration (1) 2. 2. Access Access Access to bwunicluster: Step A: Application for bwunicluster Entitlement Step B: Web Registration for bwunicluster Documentation: http://www.bwhpc-c5.de/wiki/index.php/bwunicluster_user_access Step A is different for each University University of Heidelberg: Apply for bwunicluster entitlement at https://www.urz.uni-heidelberg.de/landesdienste/bwunicluster University of Mannheim: Apply for bwgrid Mannheim/Heidelberg at https://sp-grid-webregistration.uni-mannheim.de/ 23

2. 2. Access Access bwunicluster Registration (2) Step B Web Registration: https://bwidm.scc.kit.edu/ Choose your organization 24

2. 2. Access Access bwunicluster Registration (3a) Step B Web Registration: https://bwidm.scc.kit.edu/ University of Heidelberg: Login with Uni-ID 25

2. 2. Access Access bwunicluster Registration (3b) Step B Web Registration: https://bwidm.scc.kit.edu/ University of Mannheim: Login with RUM account 26

2. 2. Access Access bwunicluster Registration (4) Select bwunicluster Service Description 27

2. 2. Access Access bwunicluster Registration (5) Read terms of usage and accept 28

2. 2. Access Access bwunicluster Registration (6) Read registry information Lookup localuid Deregistration possible 29

2. 2. Access Access bwunicluster Registration (7) Set service password (required): 30

bwunicluster - Login 2. 2. Access Access Name of login server: bwunicluster.scc.kit.edu Login with localuid + service password via SSH Examples: HD: $ ssh hd_ab123@bwunicluster.scc.kit.edu MA: $ ssh ma_amuster@bwunicluster.scc.kit.edu 31

3. Usage 32

Software modules 3. 3. Usage Usage Environment modules dynamic modification of the session environment instructions stored in modulefiles Why environment modules? multiple versions of the same software can be installed and be used in a controlled manner, i.e., by load and unload modulefiles How to use modulefiles in general? $ module help More information: http://www.bwhpc-c5.de/wiki/index.php/bwunicluster_environment_modules 33

Modulefiles (1) 3. 3. Usage Usage Display all modulefiles: $ module avail Display all modulefiles in category devel: $ module avail devel Show help of modulefiles: $ module help <modulefile> List all instructions of modulefile: $ module show <modulefile> Display all loaded modules: $ module list Modulefiles are sorted in categories, software name and versions: $ module load <category>/<software_name>/<version> Load a default software: $ module load <category>/<software_name> e.g. Intel compiler: $ module load compiler/intel To remove module: $ module unload <modulefile> 34 $ module remove <modulefile>

Modulefiles (2) 3. 3. Usage Usage Conflicts: load different software version in the same session, e.g. Intel: compiler/intel/12.1(376):error:150: Module 'compiler/intel/12.1' conflicts with the currently loaded module(s) 'compiler/intel/13.1' load module with dependencies on other modules: $ module load mpi/openmpi/1.6.5-intel-13.1 Loading module dependency 'compiler/intel/13.1'. compiler/intel/13.1(386):error:150: Module 'compiler/intel/13.1' conflicts with the currently loaded module(s) 'compiler/intel/12.1' Be aware that you can create inconsistencies: e.g. you can remove compiler/intel/13.1 while mpi/openmpi/1.6.5-intel-13.1 is still loaded swap = remove + load e.g.: $ module swap compiler/intel/13.1 compiler/intel/12.1 35

3. 3. Usage Usage File System Characteristics of bwunicluster 36 $HOME, $WORK and workspaces are on the parallel file system Lustre

$HOME and $WORK 3. 3. Usage Usage $HOME: Quota: $ lfs quota -u $USER $HOME $WORK: Change to it via: $ cd $WORK Quota: $ lfs quota -u $USER $WORK files older than 28 days will be deleted guaranteed lifetime for files is 7 days 37

Workspaces 3. 3. Usage Usage Workspaces: allocated folders with lifetime Howto: http://www.bwhpc-c5.de/wiki/index.php/bwunicluster_file_system#workspaces $ ws_allocate foo 10 Allocate a workspace named foo for 10 days $ ws_list -a List all your workspaces $ ws_find foo Get absolute path of workspace foo $ ws_extend foo 5 Extend lifetime of your workspace foo by 5 days from now. $ ws_release foo Manually erase your workspace foo Maximum for lifetime: 60 days Number of extensions: 3 times 38

Workspaces 3. 3. Usage Usage Workspaces: allocated folders with lifetime Howto: http://www.bwhpc-c5.de/wiki/index.php/bwunicluster_file_system#workspaces $ ws_allocate foo 10 Allocate a workspace named foo for 10 days $ ws_list -a List all your workspaces $ ws_find foo Get absolute path of workspace foo $ ws_extend foo 5 Extend lifetime of your workspace foo by 5 days from now. $ ws_release foo Manually erase your workspace foo Maximum for lifetime: 60 days Number of extensions: 3 times 39

4. Batch System 40

4. 4. Batch Batch System System Resource and workload manager Job submission via MOAB commands Use of MOAB commands is planned for all bwforcluster Example job submission: $ msub <resource_options> <job_script> Compute nodes are shared according to resource requests Fairshare based Queues: waiting time depends on: your university's share your job demands your demand history 41

4. 4. Batch Batch System System msub options http://www.bwhpc-c5.de/wiki/index.php/bwunicluster_batch_jobs#msub_command msub options: command line or in job script command line option overwrites script option 42

4. 4. Batch Batch System System msub -l resources http://www.bwhpc-c5.de/wiki/index.php/bwunicluster_batch_jobs#msub_-l_resource_list Resources can combined, but must be separated by comma, e.g.: $ msub -l nodes=1:ppn=1,walltime=00:01:00,pmem=1gb <job_script> Request exclusive usage of nodes with option: -l naccesspolicy=singlejob 43

msub -q queues 4. 4. Batch Batch System System http://www.bwhpc-c5.de/wiki/index.php/bwunicluster_batch_jobs#msub_-q_queues If queue not specified: Job is assigned to develop, singlenode and multinode based on requested walltime, nodes and processes. No automatic assignment to: verylong fat 44

4. 4. Batch Batch System System Environment variables MOAB adds the following variables to the job's environment MOAB variables can be used to generalize you job scripts, e.g.: ## add suffix to job output file./program > $program_${moab_jobid}.log 45

4. 4. Batch Batch System System Check/change status of your jobs After submission msub returns <job-id> $ msub job.sh uc1.22108 Monitoring commands: 46 $ showq All your active, eligible, blocked, and/or recently completed jobs $ showstart <job-id> Get information about start time of job with <job-id> $ showstart 16@12:00:00 Get information about start time of 16 procs with run time of 12 hours $ checkjob <job-id> Get detailed information of your job explains why your job is pending $ showq -c -u $(whoami) Display completed job $ canceljob <job-id> Cancel the job with <job-id>

Bash job script 5. 5. Job Job scripts scripts Define workload manager options via #MSUB Job starts in submit directory Minimal job script: #!/bin/bash #MSUB -l nodes=1:ppn=4 #MSUB -l walltime=00:10:00 # Load required module files module load mpi/openmpi/1.6.5-gnu-4.4 # Jobs starts in submit directory, change if necessary cd $HOME/example # Start program mpiexec simple > simple.out 47

bwfilestorage Steinbuch Centre for Computing (SCC) Funding: www.bwhpc-c5.de

bwfilestorage Replacement for bwgrid central storage Location KIT Karlsruhe Starting Size 600 TB For users of bwgrid and bwhpc Requirements Entitlement bwfilestorage (granted to users with bwgrid and/or bwunicluster entitlement) Web Registration 49

bwfilestorage Limits 100 GB quota for new users 40 TB quota per organization Snapshots: 7 daily, 4 weekly, 2 monthly Backup: for disaster recovery only Temp: 7 days unchanged 100.000 files (soft), 200.000 files for 7 days (hard) 100 GB (soft), 200 GB for 7 days (hard) 50

bwfilestorage Registration Web Registration: https://bwidm.scc.kit.edu/ Choose your organisation 51

bwfilestorage Registration Web Registration: https://bwidm.scc.kit.edu/ University of Heidelberg: Login with Uni-ID 52

bwfilestorage Registration Web Registration: https://bwidm.scc.kit.edu/ University of Mannheim: Login with RUM account 53

bwfilestorage Registration Web Registration: https://bwidm.scc.kit.edu/ Service Description 54

bwfilestorage Summary Web Registration: https://bwidm.scc.kit.edu/ Hosts bwfilestorage.lsdf.kit.edu / http://bwfilestorage.lsdf.kit.edu bwfilestorage-login.lsdf.kit.edu (SSH) Commands: scp -c aes128-cbc testfile hd_jsmith@bwfilestorage.lsdf.kit.edu: scp -c arcfour128 testfile hd_jsmith@bwfilestorage.lsdf.kit.edu: Performance you should expect: Test MB/s time (32GB file) uc1 -> bwfilestorage 110-150 (aes128) 3-4 min. frbw4 -> bwfilestorage 65-75 (arcfour128) 8-12 min. For more information please follow the user manual (german). http://www.scc.kit.edu/downloads/sdm/nutzerhandbuch-bwfilestorage.pdf 55