Hands-On Workshop bwunicluster June 29th 2015 Agenda Welcome Introduction to bwhpc and the bwunicluster Modules - Software Environment Management Job Submission and Monitoring Interactive Work and Remote Visualisation Questions and Answers, Open Discussion End 1
High performance computing in Baden-Württemberg An introduction to bwhpc and the bwunicluster Jürgen Salk (bwhpc-c5) 2
1. bwhpc concept 3
bwhpc: Where do we come from? bwgrid@ulm bwhpc is the successor of bwgrid bwgrid: Clusters located at 9 universities in BW Homogeneous resources Common hardware Feel at home on all 9 bwgrid sites One-size-fits-all approach One-size-fits-all: describes a piece of clothing that is designed to fit a person of any size. Source: http://dictionary.cambridge.org/dictionary/british/one-size-fits-all 4
bwhpc: Where do we come from? bwgrid@ulm bwhpc is the successor of bwgrid bwgrid: Clusters located at 9 universities in BW Homogeneous resources Common hardware Feel at home on all 9 bwgrid sites One-size-fits-all approach One-size-fits-all: describes a piece of clothing that is designed to fit a person of any size. Source: http://dictionary.cambridge.org/dictionary/british/one-size-fits-all 5
bwhpc Strategy for high perfomance computing in BW from 2013 to 2018, in particular for Tier 3 Provision of computing systems tailored to the needs of specific scientific communities Economics & social science, General sciences supply Molecular life science Bioinformatics Mannheim Heidelberg Karlsruhe Neurosciences Astrophysics Tübingen Ulm Freiburg Micro systems engineering 6 Elementary particle physics Computational chemistry
bwhpc Strategy for high perfomance computing in BW from 2013 to 2018, in particular for Tier 3 Provision of computing systems tailored to the needs of specific scientific communities Economics & social science, General sciences supply Molecular life science Bioinformatics Mannheim Heidelberg Karlsruhe Neurosciences Astrophysics Tübingen Ulm Freiburg Micro systems engineering Elementary particle physics Computational chemistry JUSTUS 7
bwhpc Strategy for high perfomance computing in BW from 2013 to 2018, in particular for Tier 3 Provision of computing systems tailored to the needs of specific scientific communities Economics & social science, General sciences supply Molecular life science Bioinformatics Mannheim Heidelberg bwunicluster Karlsruhe Neurosciences Astrophysics Tübingen Ulm Freiburg Micro systems engineering Elementary particle physics Computational chemistry JUSTUS 8
2. Introduction to the bwunicluster 9
bwunicluster Physically located at KIT in Karlsruhe Co-financed by Baden-Württemberg's ministry of science, research and arts and the shareholders: 10 40,25 10 Usage: 10 7,5 2,25 Stuttgart Freiburg Ulm Hohenheim Konstanz Heidelberg Tübingen Mannheim KIT 5 5 10 Free of charge General purpose, teaching Technical computing (sequential & weak parallel) & parallel computing Access / limitations: For all members of shareholder's university, but user needs to be entitled by home university Registration at https://bwidm.scc.kit.edu Participate questionaire at https://www.bwhpc-c5.de/en/zas/bwunicluster_survey.php Filsesystem quota and computation share based on own university's share 10
bwunicluster hardware architecture 2 x Login Nodes Nodes that are directly accessible by end users. interactive login, file management, program development and interactive pre- and postprocessing. 520 x Compute Nodes 512 thin nodes: 16-way (2x8) Intel Xeon E5-2670, clock speed 2.6 GHz 64 GB RAM 2 TB local disk space 8 fat nodes: 32-way (4x8) Intel Xeon E5-4640, clock speed 2.4 GHz 1 TB RAM 7 TB local disk space fast interconnect Infiniband 4 x FDR (4 x 14 Gbit/s) Access is managed by a batch system Jobs are submitted via MOAB Job is executed depending on its priority, when required resources are available. 11
bwunicluster hardware architecture 2 TB 4 TB 2 TB 4 TB 2 TB 2 TB 7 TB 8x 8x $HOME 469 TB 7 TB 8x 8x $WORK / workspaces 938 TB Global shared storage by parallel files system Lustre 12
bwunicluster HOME file system Any user will be automatically placed into $HOME upon login Environment variable: $HOME (e.g.: /home/ul/ul_theophys/ul_<username>) Intended to keep important permanent user's files only, e.g. program source codes, final result files, personal configuration files, Daily backups Group quotas for disk space and number of files (no quota for individual users) How to check quota and disk usage: $ cat $HOME/../diskusage For users from Ulm group quota is regulary adjusted to reflect group size Aggregated read/write performance is low (~8 GB/s) DO NOT COMPUTE IN $HOME! 13
bwunicluster work file systems Aggregated read/write performance is much better than for $HOME (~16 GB/s) Intended for parallel access (shared across multiple nodes) and for high throughput to large files, e.g. temporary job files, intermediate result files (checkpoint files), No backups!!! Limited lifetime of files!!! 2 different concepts to access work file system: (a) via $WORK environment variable (b) via Workspace tools 14
bwunicluster work file systems (a) $WORK Automatically created for any user upon first login Environment variable: $WORK (e.g.: /work/ul/ul_theophys/ul_<username>) Change to it: $ cd $WORK Limited lifetime: Any file in $WORK not accessed by more than 28 days will be automatically deleted. Maximum lifetime of a file is 280 days. Files no longer needed should be removed by the user Group quotas for disk space and number of files may be introduced if required How to check quota and disk usage: $ cat $WORK/../diskusage 15
bwunicluster work file systems (b) Workspace tools (highly recommended) Advantage: Provides more control over lifetime and location of files Create a workspace folder named Simulation with a lifetime of 30 days (max. 60 days) from now: $ ws_allocate Simulation 30 List your workspaces with location, creation date and remaining lifetime: $ ws_list Extend lifetime of existing workspace (up to 3x): $ ws_extend Simulation 60 Find location of workspace folder by it's name: $ ws_find Simulation Release (delete!) workspace. (Remember: There is no backup): $ ws_release Simulation Example usage: $ ws_allocate Simulation 30 $ SIMWS=`ws_find Simulation` $ ln s $SIMWS $HOME 16
bwunicluster local file systems Higher aggregated read/write performance than global file systems Temporary subdirectory automatically created for every individual job on the compute node Environment variable: $TMP (e.g.: /scratch/slurm_tmpdir/job_<jobnumber>) Intended for single node jobs with massive IO demands. Data stored in $TMP will be deleted at the end of the job. Copy important results to $HOME or $WORK or an allocated workspace at end of job No backup!!! Example usage (somewhat simplified): cp $HOME/inputfile $TMP cd $TMP program <inputfile >outfile cp outfile $HOME 17
bwunicluster file systems at a glance Property $TMP $HOME $WORK / workspace Visibility local global global Lifetime batch job runtime permanent max. 240 days Disk space 2 TB @ thin nodes 7 TB @ fat nodes 469 TB 938 TB Quotas no yes if required Backup no yes no low high Aggr. read/write Very high performance 18
Documentation and Support Website: General info: www.bwhpc-c5.de in English and German Best-practices-guide (documentation on clusters): www.bwhpc-c5.de/wiki in English User Support: Send email to: <bwunicluster-hotline@lists.kit.edu> Ticket-system: http://www.support.bwhpc-c5.de 19
Thank you for your attention! Questions? 20
3. Get ready to start 21
Prerequisites Register at bwunicluster and/or check your registration status in a web browser: https://bwidm.scc.kit.edu What's your localuid? Optionally set a reasonably strong password for the bwunicluster Check your status and/or participate the questionaire in the web browser at https://www.bwhpc-c5.de/en/zas/bwunicluster_survey.php On your local desktop open a terminal window in KDE: press <ALT>+<F2>, type konsole, press <Enter> Log into the bwunicluster: At the local desktop's terminal command prompt type: ssh X <UserID>@bwunicluster.scc.kit.edu 22