Blue Gene/Q User Workshop. User Environment & Job submission
|
|
- Joshua Walker
- 5 years ago
- Views:
Transcription
1 Blue Gene/Q User Workshop User Environment & Job submission
2 Topics Blue Joule User Environment Loadleveler Task Placement & BG/Q Personality 2
3 Blue Joule User Accounts Home directories organised on a project basis /home/[project name]/[pi]/[username]-[pi] Each project has a shared directory $HOME/../shared Writable/Readable by all project members All members of a project have the same group and can read each other home-dirs You cannot read your own home-dirs in other projects Per-User storage: Unlimited (4.6 PB in total) Archiving and Back-Up: None $HOME directory is used for all runs there is no $WORK/$TEMP dir 3
4 Blue Joule System Access Front-End Node (FEN) joule.hartree.stfc.ac.uk Access via ssh key-exchange only To access from other hosts Create new private key and add public key to.ssh/authorized_keys Copy the private key to.ssh/ on host (must not be world-readable) Remote copying use scp or rsync as usual However to copy between your home-dirs in different projects directly you will need private/public key-pairs to be present in the source/dest accounts Accessing outside services (wget, curl etc) export http_proxy= set ftp_proxy & https_proxy to be the same Svn: Edit ~/.subversion/servers adding http-proxy-host = http-proxy-port =
5 Blue Joule Environment Modules Open source utility used to managed centrally available software on Hartree Systems module avail shows available software module load [name] loads the available software into your environment Sets the value of various environment variables to make the software available: PATH, LD_RUN_PATH, LD_LIBRARY_PATH etc module show [name] Shows the variables set by loading the module Other useful commands: module unload [name], module list The main module you will need to load is ibmmpi You have to explicitly define the module function in job-scripts if you want to load modules source /etc/profile.d/modules.sh (bourne based shells) source /etc/profile.d/modules.csh (csh derived shells) 5
6 BlueJoule Configuration 6 production racks Torus: 12*4*8*8*2 Midplanes: 3*1*2 *2 1 development rack (BGAS) Torus: 4*4*4*8*2 Midplanes: 1*1*1*2 1 I/O node per 4 node boards minimum block size 128 nodes Job scheduler: loadleveler Jobs are queued based on required resources (nodes and walltime) For the workshop hands-ons we have exclusive use of the development rack 8*128 node blocks 6
7 BG/Q Driver The collection of BG/Q specific software is referred to as a driver Kernel interfaces, Compute Node Kernel (CNK) gcc toolchain/binutils, BG/Q commands, mpiwrapper scripts, communication libraries, the CNK There are multiple driver versions current V1M1R2 located under /bgsys/drivers/ Makes it possible to switch drivers e.g. on discovering a problem /bgsys/drivers/ppcfloor is a link to the current driver The ibmmpi module sets a number of the paths necessary to use/access current driver software Mpi wrappers, communication libs (xl version), kernel headers and libraries Developer driver directories spi headers/libraries for accessing BGQ hardware features comm Various communication libraries gnu-linux The gnu-linux toolchain (gcc,cpp,binutils etc). bgpm Hardware performance monitor Note: The driver does not contain the xl compilers but it does contain the gcc compilers (and the rest of the toolchain). Xl compilers are located under /opt/ibmcmp/ 7
8 The runjob command
9 runjob: Overview Command syntax runjob <options> : <executable> <arguments> Example runjob --cwd /scratch/nt05984s/hello_world --env-all --label --np 2 --ranks-per-node 32 :./hw Important Notice The memory per process is allocated per job based on ranks-per-node (16GBytes shared memory)/ ranks-per-node Argument Value Purpose --cwd Working directory Change to execution directory --env-all --envs - <var>=<value> Export environment (all or specific variable) --label - Prefix stdout records with MPI rank --np # MPI tasks Total number of MPI tasks 9
10 runjob: np, ranks-per-node & threads The total number of (mpi) processes you want to run is controlled by the np parameters e.g --np 1024 The maximum number of mpi processes placed on each node is controlled by the ranks-per-node parameters e.g. --ranks-per-node 16 Process are assigned to nodes up to this value before moving to the next node The total number of nodes you require is the (--np)/(--ranks-per-node) e.g (1024/16) = 64 The number of OpenMP threads per process is controlled by OMP_NUM_THREADS There are maximum 64 OMP threads Therefore (--ranks-per-node)*(omp_num_threads) <= 64 Memory is assigned to a process based on ranks-per-node Each process gets 32/(--ranks-per-node) GB Note: if np < ranks-per-node then each process still only gets the above amount of memory e.g. np=1 and ranks-per-node=16 one process on one node with 2GB of memory 10
11 runjob: Process Mapping The default mapping is to place MPI ranks on the system in ABCDET order where the rightmost letter increments first, and where <A,B,C,D,E> are torus coordinates and T is the processor ID in each node (T = 0 to N -1, where N is the number of processes per node being used) Change default mapping runjob --mapping TEDCBA or --mapping my.map Note: Mapping will be covered in detail later 11
12 MPMD execution Multiple program multiple data (MPMD) jobs are jobs for which a different executable and arguments can be supplied for a single job. All tasks of the job share MPICOMMWORLD and can share data between different executables via the torus. To enable MPMD support, specify a mapping file with the runjob mapping option. Within the mapping file, there are keywords that control MPMD behavior on the nodes. #mpmdbegin {ranks} #mpmdcmd <executable> <arg0> <arg1>... <argn> #mpmdend {ranks} specifies the MPI rank numbers. Multiple MPI ranks can be specified with a comma. #mpmdbegin 3,6,9 It is also possible to specify ranges of MPI ranks using a dash. For example: #mpmdbegin 0-15 Additionally, ranges can be specified with a stride 'x' option. #mpmdbegin 0-15x2 (Ranks 0, 2, 4, 6, 8, 10, 12, and 14 are included). Sets and ranges can also be mixed: #mpmdbegin 0,2,5-15 However, care must be taken to avoid oversubscribing a rank to multiple programs. There is also a restriction on MPMD ranks. All ranks in the same node must have the same program. 12
13 Tools the user wants to launch a tool during their job, they need to tell the Control System. The start_tool executable is used to tell the Control System to start tool-daemons on all of the I/O nodes that are servicing the job to be controlled or monitored. The tools communicate with the Common I/O Services (CIOS) daemons running on the I/O nodes to pass messages back and forth to the compute nodes, where the user code is running. No user code runs on the I/O node; it acts as a proxy/manager for compute nodes. $~> start_tool -id 123 -tool /path/to/my_great_tool args one hello two world $~> end_tool -id 123 -tool 1 13
14 LoadLeveler
15 Changes in Blue Gene Terminology Terminology changes have been made in Blue Gene/Q Terminology in BG/[L,P] Terminology in BG/Q Base Partitions Midplanes Partitions Blocks Wires Cables NodeCards NodeBoards LoadLeveler externals will reflect these changes to be consistent with Blue Gene/Q 15 Jan 31,
16 LoadLeveler Job Command File The following table summarizes the main keywords required Keyword Notes class queue to submit to (prod) bg_nodes Number of nodes executable runjob/command file name (script) job_type blue_gene wall_clock_limit Determines queue 16 Jan 31,
17 LoadLeveler Command File Variables $(home) The home directory for the user on the cluster selected to run the job. $(jobid) The sequential number assigned to this job by the Schedd daemon. The $ (jobid) variable and the $(cluster) variable are equivalent. $(stepid) The sequential number assigned to this job step when multiple queue statements are used with the job command file. The $(stepid) variable and the $(process) variable are equivalent. $(user) The user name on the cluster selected to run the job. The following keywords are also available as variables if defined in the job command file. $(executable) $(class) $(comment) $(job_name) $(step_name) 17 Jan 31,
18 LoadLeveler Job Command File Sample Job Command File: job_type = bluegene class = prod error = size512.$(host).$(cluster).$(process).err output = size512.$(host).$(cluster).$(process).out executable = /bgsys/drivers/ppcfloor/hlcs/bin/runjob arguments = --exe /bin/date bg_size = 512 bg_connectivity = Torus queue 18 Jan 31,
19 LoadLeveler Job Command File Sample Job Command File (script) #! /bin/bash job_type = bluegene class = prod error = size512.$(host).$(cluster).$(process).err output = size512.$(host).$(cluster).$(process).out executable = loadleveler-script.sh bg_size = 512 bg_connectivity = Torus queue export BG_THREADLAYOUT=2 llq -l $LOADL_STEP_ID runjob env-all ranks-per-node 16 :./myexec 19 Jan 31,
20 LoadLeveler Multistep Jobs A job-command file can specify multiple jobs Each job is termed a job-step Useful when a computation is composed of multiple parts Each job-step is delimited by the #queue keyword Values of keywords are inherited from previous job-steps Each job-step is treated independently by default Can use the dependancy keyword to specify dependancy Uses the exit status of a previous job-step to determine if it should be executed 20 Jan 31,
21 Shapes and Connectivity BG/Q supports 5 dimensional shapes, AxBxCxDxE. The 5 th dimension (E) is located on the Node Board and always size 2. Therefore, for scheduling large block jobs, (>= 1 midplane), only a 4 dimensional shape is allowed. In BG/P, connectivity was specified for the block as a whole, Torus or Mesh. In BG/Q, connectivity can now be specified for large blocks per each dimension, A, B, C, and D with values Torus or Mesh. 21 Jan 31,
22 LoadLeveler Job Command File The following table summarizes changes in the job command file (JCF) keywords that are applicable to Blue Gene type jobs Keyword Type Notes bg_block new replaces bg_partition bg_shape existing accepts 4D shape only bg_connectivity new replaces bg_connection bg_requrements existing no change bg_rotate existing no change bg_node_configuration new 22 Jan 31,
23 LoadLeveler Job Command File - bg_shape = AxBxCxD Specifies a 4 dimensional shape (large block) to create for the BG job Example: bg_shape=1x2x1x1, specifies a job requesting 2 Midplanes in the B dimension bg_connectivity = Torus Mesh Either Xa Xb Xc Xd where Xa = Torus Mesh Xb = Torus Mesh Xc = Torus Mesh Xd = Torus Mesh Specifies the connectivity of large blocks for the entire block, or per each dimension Example: bg_connectivity = Torus Mesh Torus Torus, requests Mesh connectivity in the B dimension, Torus connectivity in the A,C, and D dimensions 23 Jan 31,
24 LoadLeveler Job Command File bg_rotate = true false Specifies whether the scheduler should rotate the shape when trying to find a block to run the job. Connectivity is preserved on the dimension when rotating the shape. Example: bg_rotate = false, do not rotate the shape when scheduling the job bg_node_configuration = node_configuration Option that allows users to specify a customer node configuration to use when booting the block. Node configurations are created by the administrator from the bg_console. The default node configuration used is CNKDefault. 24 Jan 31,
25 LoadLeveler Commands Normal LoadLeveler operating commands are identical in BG/Q as they are in BG/P. Some examples are provided in the table below. Action Command Submitting Jobs llsubmit Monitoring Jobs llq Modify Attributes for Idle Jobs llmodify Making Reservations llmkres Changing Reservations llchres 25 Jan 31,
26 LoadLeveler Commands Normal LoadLeveler operating commands are identical in BG/Q as they are in BG/P. Some examples are provided in the table below. Action Command Submitting Jobs llsubmit Monitoring Jobs llq Modify Attributes for Idle Jobs llmodify Making Reservations llmkres Changing Reservations llchres 26 Jan 31,
27 Query Blue Gene System A new command, llbgstatus, is provided to query Blue Gene system information Usage: llbgstatus [-? -H -v [-X cluster_list -X all] [-l -M all -M midplane_list -B all -B block_list]] Blue Gene query options have been removed from the llstatus command 27 Jan 31,
28 Query Blue Gene System The table below shows the previous llstatus command used in BG/[L,P] and the new llbgstatus equivalent command in BG/Q Description BG/[L,P] Command BG/Q Command Query BG Machine llstatus b llbgstatus Query BG Machine (details) llstatus b l llbgstatus -l Query Midplanes llstatus B R00-M0 llbgstatus M R00-M0 Query Blocks llstatus P LL llbgstatus B LL Jan 31,
29 Task placement Processor affinity
30 Compute Node resources The compute node contains 17 physical cores 16 physical cores are dedicated to user application 1 core is dedicated to system Each core has 4-way SMT (can run 4 threads) 64 hardware threads in total Each hardware thread can support a fixed maximum number of software threads (pthreads) the current number is five Once a pthread is bound to a hardware thread, it does not move from that hardware thread unless acted upon by a set affinity action such as pthread_setaffinity. The main thread of a process cannot be moved to another hardware thread. It remains on the original hardware thread it was started on Naming conventions Processor Core IDs identify physical cores Ranging in values from 0 to 16 (17 by node in total) Processor Thread IDs identify (SMT) threads on the same physical core Ranging in values from 0 to 3 (4 by physical core in total) Hardware threads can also be identified by the Processor ID Ranging in values from 0 to 67 (68 by node in total) 30
31 Processor Core IDs / Thread IDs / Processor IDs Processor Core ID Thread ID Processor ID 0 0,1,2,3 0,1,2,3 1 0,1,2,3 4,5,6,7 2 0,1,2,3 8,9,10,11 0,1,2,3 14 0,1,2,3 56,57,58, ,1,2,3 60,61,62, ,1,2,3 64,65,66,67 31
32 Execution modes Jobs can be run with varying numbers of processes per node From 1 process per node to 64 processes per node 1 process per node = minimum node utilization But multi-threading can occur 64 processes per node = all hardware threads occupied by one distinct process Hardware threads are dedicated to single user process/thread All processes are given an equal number of hardware threads across the node These hardware threads can be used for multi-threading 32
33 # processes per node # physical cores per process # hardware threads per process
34 Threads affinity/layout Breadth-first assignment Breadth-first is the default thread layout algorithm. This algorithm corresponds to BG_THREADLAYOUT = 1. With breadth-first assignment, the hardware thread-selection algorithm progresses across the cores assigned to the process before selecting additional threads within a given core. Round-robin allocation Depth-first assignment This algorithm corresponds to BG_THREADLAYOUT = 2. With depth-first assignment, the hardware thread-selection algorithm progresses within each core before moving to another core defined within the process. Fill-up allocation Processor affinity is enforced by the Blue Gene control system Processes are assigned to one or more hardware threads At job initialization time Assignment will not change for the life of the job 34
35 Thread layout Breadth assignment [BG_THREADLAYOUT=1] Processes per node Processor ID assignment order 1 0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60, 2,6,10,14,18,22,26,30,34,38,42,46,50,54,58,62, 1,5,9,13,17,21,25,29,33,37,41,45,49,53,57,61, 3,7,11,15,19,23,27,31,35,39,43,47,51,55,59,63 2 0,4,8,12,16,20,24,28,2,6,10,14,18,22,26,30, 1,5,9,13,17,21,25,29,3,7,11,15,19,23,27,31 32,36,40,44,48,52,56,60,34,38,42,46,50,54,58,62, 33,37,41,45,49,53,57,61,35,39,43,47,51,55,59,63 0,4,8,12,2,6,10,14,1,5,9,13,3,7,11, ,20,24,28,18,22,26,30,17,21,25,29,19,23,27,31 32,36,40,44,34,38,42,33,37,41,45,46,35,39,43,47 48,52,56,60,50,54,58,62,49,53,57,61,51,55,59,63 35
36 Thread layout Depth assignment [BG_THREADLAYOUT=2] Processes per node Processor ID assignment order 1 0,2,1,3,...,60,62,61,63 2 0,2,1,3,...,28,30,29,31 32,34,33,35,...,60,62,61,63 0,2,1,3,...,12,14,13, ,18,17,19,...28,30,29,31 32,34,33,35,...,44,46,45,47 48,50,49,51...,60,62,61,63 36
37 BG/Q Personality
38 Personality Definition Two definitions Static data given to every compute node and I/O node at boot time by the Control System Contains information that is specific to the node, with respect to the block that is being booted. Set of C language structures that contains such items as the node coordinates on the torus network Useful to determine, at run time, where the tasks of the application are running Allows fine tuning of the application performance For instance: which set of tasks shares the same I/O node 38
39 Personality Usage Include file #include <spi/include/kernel/location.h> Structure Personality_t pers Query function Kernel_GetPersonality(&pers, sizeof(pers)); Properties pers.network_config.[a-e]nodes Nb nodes in each torus dimension pers.network_config.[a-e]coord Coordinates of the nodes in the torus pers.network_config.[a-e]bridge Coordinates of the IO bridges in the torus Other routines Kernel_ProcessorID() Processor ID (0-63) Kernel_ProcessorCoreID() Processor core ID (0-15) Kernel_ProcessorThreadID() Processor thread ID (0-3) 39
40 Additional Slides 40
41 Job submission architecture 41
42 LoadLeveler Job Command File job_type = bluegene This keyword must be specified in the JCF to identify a Blue Gene job bg_size = <number> Specifies the number of compute nodes requested by the BG job Example: bg_size=256, specifies a job requested 256 compute nodes bg_block = <block name> Specifies the name of the block created outside of LoadLeveler to run the job Example: bg_block = s32, requests the job runs on compute block s32 42 Jan 31,
Introduction to HPC Numerical libraries on FERMI and PLX
Introduction to HPC Numerical libraries on FERMI and PLX HPC Numerical Libraries 11-12-13 March 2013 a.marani@cineca.it WELCOME!! The goal of this course is to show you how to get advantage of some of
More informationContent. MPIRUN Command Environment Variables LoadLeveler SUBMIT Command IBM Simple Scheduler. IBM PSSC Montpellier Customer Center
Content IBM PSSC Montpellier Customer Center MPIRUN Command Environment Variables LoadLeveler SUBMIT Command IBM Simple Scheduler Control System Service Node (SN) An IBM system-p 64-bit system Control
More informationDEBUGGING ON FERMI PREPARING A DEBUGGABLE APPLICATION GDB. GDB on front-end nodes
DEBUGGING ON FERMI Debugging your application on a system based on a BG/Q architecture like FERMI could be an hard task due to the following problems: the core files generated by a crashing job on FERMI
More informationBlue Gene/Q User Workshop. Debugging
Blue Gene/Q User Workshop Debugging Topics GDB Core Files Coreprocessor 2 GNU Debugger (GDB) The GNU Debugger (GDB) The Blue Gene/Q system includes support for running GDB with applications that run on
More informationIntel Manycore Testing Lab (MTL) - Linux Getting Started Guide
Intel Manycore Testing Lab (MTL) - Linux Getting Started Guide Introduction What are the intended uses of the MTL? The MTL is prioritized for supporting the Intel Academic Community for the testing, validation
More informationAdvanced cluster techniques with LoadLeveler
Advanced cluster techniques with LoadLeveler How to get your jobs to the top of the queue Ciaron Linstead 10th May 2012 Outline 1 Introduction 2 LoadLeveler recap 3 CPUs 4 Memory 5 Factors affecting job
More informationAdvanced Job Launching. mapping applications to hardware
Advanced Job Launching mapping applications to hardware A Quick Recap - Glossary of terms Hardware This terminology is used to cover hardware from multiple vendors Socket The hardware you can touch and
More informationIBM Scheduler for High Throughput Computing on IBM Blue Gene /P Table of Contents
IBM Scheduler for High Throughput Computing on IBM Blue Gene /P Table of Contents Introduction...3 Architecture...4 simple_sched daemon...4 startd daemon...4 End-user commands...4 Personal HTC Scheduler...6
More informationEarly experience with Blue Gene/P. Jonathan Follows IBM United Kingdom Limited HPCx Annual Seminar 26th. November 2007
Early experience with Blue Gene/P Jonathan Follows IBM United Kingdom Limited HPCx Annual Seminar 26th. November 2007 Agenda System components The Daresbury BG/P and BG/L racks How to use the system Some
More informationCompiling applications for the Cray XC
Compiling applications for the Cray XC Compiler Driver Wrappers (1) All applications that will run in parallel on the Cray XC should be compiled with the standard language wrappers. The compiler drivers
More informationSymmetric Computing. SC 14 Jerome VIENNE
Symmetric Computing SC 14 Jerome VIENNE viennej@tacc.utexas.edu Symmetric Computing Run MPI tasks on both MIC and host Also called heterogeneous computing Two executables are required: CPU MIC Currently
More informationSymmetric Computing. Jerome Vienne Texas Advanced Computing Center
Symmetric Computing Jerome Vienne Texas Advanced Computing Center Symmetric Computing Run MPI tasks on both MIC and host Also called heterogeneous computing Two executables are required: CPU MIC Currently
More informationSymmetric Computing. John Cazes Texas Advanced Computing Center
Symmetric Computing John Cazes Texas Advanced Computing Center Symmetric Computing Run MPI tasks on both MIC and host and across nodes Also called heterogeneous computing Two executables are required:
More informationBatch environment PBS (Running applications on the Cray XC30) 1/18/2016
Batch environment PBS (Running applications on the Cray XC30) 1/18/2016 1 Running on compute nodes By default, users do not log in and run applications on the compute nodes directly. Instead they launch
More informationOur new HPC-Cluster An overview
Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization
More informationIntroduction to GALILEO
Introduction to GALILEO Parallel & production environment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Domenico Guida d.guida@cineca.it Maurizio Cremonesi m.cremonesi@cineca.it
More informationCLE and How to. Jan Thorbecke
CLE and How to Start Your Application i Jan Thorbecke Scalable Software Architecture t 2 Scalable Software Architecture: Cray Linux Environment (CLE) Specialized ed Linux nodes Microkernel on Compute nodes,
More informationSymmetric Computing. ISC 2015 July John Cazes Texas Advanced Computing Center
Symmetric Computing ISC 2015 July 2015 John Cazes Texas Advanced Computing Center Symmetric Computing Run MPI tasks on both MIC and host Also called heterogeneous computing Two executables are required:
More informationAnswers to Federal Reserve Questions. Training for University of Richmond
Answers to Federal Reserve Questions Training for University of Richmond 2 Agenda Cluster Overview Software Modules PBS/Torque Ganglia ACT Utils 3 Cluster overview Systems switch ipmi switch 1x head node
More informationRunning applications on the Cray XC30
Running applications on the Cray XC30 Running on compute nodes By default, users do not access compute nodes directly. Instead they launch jobs on compute nodes using one of three available modes: 1. Extreme
More informationTech Computer Center Documentation
Tech Computer Center Documentation Release 0 TCC Doc February 17, 2014 Contents 1 TCC s User Documentation 1 1.1 TCC SGI Altix ICE Cluster User s Guide................................ 1 i ii CHAPTER 1
More informationBatch Systems. Running calculations on HPC resources
Batch Systems Running calculations on HPC resources Outline What is a batch system? How do I interact with the batch system Job submission scripts Interactive jobs Common batch systems Converting between
More informationJob Management on LONI and LSU HPC clusters
Job Management on LONI and LSU HPC clusters Le Yan HPC Consultant User Services @ LONI Outline Overview Batch queuing system Job queues on LONI clusters Basic commands The Cluster Environment Multiple
More informationIntroduction to CINECA Computer Environment
Introduction to CINECA Computer Environment Today you will learn... Basic commands for UNIX environment @ CINECA How to submitt your job to the PBS queueing system on Eurora Tutorial #1: Example: launch
More informationHTC Brief Instructions
HTC Brief Instructions Version 18.08.2018 University of Paderborn Paderborn Center for Parallel Computing Warburger Str. 100, D-33098 Paderborn http://pc2.uni-paderborn.de/ 2 HTC BRIEF INSTRUCTIONS Table
More informationImage Sharpening. Practical Introduction to HPC Exercise. Instructions for Cirrus Tier-2 System
Image Sharpening Practical Introduction to HPC Exercise Instructions for Cirrus Tier-2 System 2 1. Aims The aim of this exercise is to get you used to logging into an HPC resource, using the command line
More informationIntroduction to CINECA HPC Environment
Introduction to CINECA HPC Environment 23nd Summer School on Parallel Computing 19-30 May 2014 m.cestari@cineca.it, i.baccarelli@cineca.it Goals You will learn: The basic overview of CINECA HPC systems
More informationIntroduction to Unix Environment: modules, job scripts, PBS. N. Spallanzani (CINECA)
Introduction to Unix Environment: modules, job scripts, PBS N. Spallanzani (CINECA) Bologna PATC 2016 In this tutorial you will learn... How to get familiar with UNIX environment @ CINECA How to submit
More informationMitglied der Helmholtz-Gemeinschaft JUQUEEN. Best Practices. Florian Janetzko. 29. November 2013
Mitglied der Helmholtz-Gemeinschaft JUQUEEN Best Practices 29. November 2013 Florian Janetzko Outline Production Environment Module Environment Job Execution Basic Porting Compilers and Wrappers Compiler/Linker
More informationNew User Seminar: Part 2 (best practices)
New User Seminar: Part 2 (best practices) General Interest Seminar January 2015 Hugh Merz merz@sharcnet.ca Session Outline Submitting Jobs Minimizing queue waits Investigating jobs Checkpointing Efficiency
More informationIntroduction to PICO Parallel & Production Enviroment
Introduction to PICO Parallel & Production Enviroment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Domenico Guida d.guida@cineca.it Nicola Spallanzani n.spallanzani@cineca.it
More informationIBM PSSC Montpellier Customer Center. Content
Content IBM PSSC Montpellier Customer Center Standard Tools Compiler Options GDB IBM System Blue Gene/P Specifics Core Files + addr2line Coreprocessor Supported Commercial Software TotalView Debugger Allinea
More informationBefore We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop
Before We Start Sign in hpcxx account slips Windows Users: Download PuTTY Google PuTTY First result Save putty.exe to Desktop Research Computing at Virginia Tech Advanced Research Computing Compute Resources
More informationAutomated Configuration and Administration of a Storage-class Memory System to Support Supercomputer-based Scientific Workflows
Automated Configuration and Administration of a Storage-class Memory System to Support Supercomputer-based Scientific Workflows J. Bernard 1, P. Morjan 2, B. Hagley 3, F. Delalondre 1, F. Schürmann 1,
More informationHigh Performance Computing Cluster Advanced course
High Performance Computing Cluster Advanced course Jeremie Vandenplas, Gwen Dawes 9 November 2017 Outline Introduction to the Agrogenomics HPC Submitting and monitoring jobs on the HPC Parallel jobs on
More informationSharpen Exercise: Using HPC resources and running parallel applications
Sharpen Exercise: Using HPC resources and running parallel applications Contents 1 Aims 2 2 Introduction 2 3 Instructions 3 3.1 Log into ARCHER frontend nodes and run commands.... 3 3.2 Download and extract
More informationEE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California
EE/CSCI 451 Introduction to Parallel and Distributed Computation Discussion #4 2/3/2017 University of Southern California 1 USC HPCC Access Compile Submit job OpenMP Today s topic What is OpenMP OpenMP
More informationSharpen Exercise: Using HPC resources and running parallel applications
Sharpen Exercise: Using HPC resources and running parallel applications Andrew Turner, Dominic Sloan-Murphy, David Henty, Adrian Jackson Contents 1 Aims 2 2 Introduction 2 3 Instructions 3 3.1 Log into
More informationPROGRAMMING MODEL EXAMPLES
( Cray Inc 2015) PROGRAMMING MODEL EXAMPLES DEMONSTRATION EXAMPLES OF VARIOUS PROGRAMMING MODELS OVERVIEW Building an application to use multiple processors (cores, cpus, nodes) can be done in various
More informationIntroduction to GALILEO
November 27, 2016 Introduction to GALILEO Parallel & production environment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it SuperComputing Applications and Innovation Department
More informationBlue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft
Blue Gene/Q Hardware Overview 02.02.2015 Michael Stephan Blue Gene/Q: Design goals System-on-Chip (SoC) design Processor comprises both processing cores and network Optimal performance / watt ratio Small
More informationIntroduction to GALILEO
Introduction to GALILEO Parallel & production environment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Alessandro Grottesi a.grottesi@cineca.it SuperComputing Applications and
More informationNUMA Control for Hybrid Applications. Hang Liu TACC February 7 th, 2011
NUMA Control for Hybrid Applications Hang Liu TACC February 7 th, 2011 Hybrid Applications Typical definition of hybrid application Uses both message passing (MPI) and a form of shared memory algorithm
More informationPractical Introduction to
1 2 Outline of the workshop Practical Introduction to What is ScaleMP? When do we need it? How do we run codes on the ScaleMP node on the ScaleMP Guillimin cluster? How to run programs efficiently on ScaleMP?
More informationMIC Lab Parallel Computing on Stampede
MIC Lab Parallel Computing on Stampede Aaron Birkland and Steve Lantz Cornell Center for Advanced Computing June 11 & 18, 2013 1 Interactive Launching This exercise will walk through interactively launching
More informationCerebro Quick Start Guide
Cerebro Quick Start Guide Overview of the system Cerebro consists of a total of 64 Ivy Bridge processors E5-4650 v2 with 10 cores each, 14 TB of memory and 24 TB of local disk. Table 1 shows the hardware
More informationLeibniz Supercomputer Centre. Movie on YouTube
SuperMUC @ Leibniz Supercomputer Centre Movie on YouTube Peak Performance Peak performance: 3 Peta Flops 3*10 15 Flops Mega 10 6 million Giga 10 9 billion Tera 10 12 trillion Peta 10 15 quadrillion Exa
More informationSingularity: container formats
Singularity Easy to install and configure Easy to run/use: no daemons no root works with scheduling systems User outside container == user inside container Access to host resources Mount (parts of) filesystems
More informationBatch Systems & Parallel Application Launchers Running your jobs on an HPC machine
Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike
More informationHigh Performance Computing Cluster Basic course
High Performance Computing Cluster Basic course Jeremie Vandenplas, Gwen Dawes 30 October 2017 Outline Introduction to the Agrogenomics HPC Connecting with Secure Shell to the HPC Introduction to the Unix/Linux
More informationOpenPBS Users Manual
How to Write a PBS Batch Script OpenPBS Users Manual PBS scripts are rather simple. An MPI example for user your-user-name: Example: MPI Code PBS -N a_name_for_my_parallel_job PBS -l nodes=7,walltime=1:00:00
More informationAmbiente CINECA: moduli, job scripts, PBS. A. Grottesi (CINECA)
Ambiente HPC @ CINECA: moduli, job scripts, PBS A. Grottesi (CINECA) Bologna 2017 In this tutorial you will learn... How to get familiar with UNIX environment @ CINECA How to submit your job to the PBS
More informationSHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008
SHARCNET Workshop on Parallel Computing Hugh Merz Laurentian University May 2008 What is Parallel Computing? A computational method that utilizes multiple processing elements to solve a problem in tandem
More informationRunning Jobs on Blue Waters. Greg Bauer
Running Jobs on Blue Waters Greg Bauer Policies and Practices Placement Checkpointing Monitoring a job Getting a nodelist Viewing the torus 2 Resource and Job Scheduling Policies Runtime limits expected
More informationARMINIUS Brief Instructions
ARMINIUS Brief Instructions Version 19.12.2017 University of Paderborn Paderborn Center for Parallel Computing Warburger Str. 100, D-33098 Paderborn http://pc2.uni-paderborn.de/ 2 ARMINIUS BRIEF INSTRUCTIONS
More informationTutorial: Compiling, Makefile, Parallel jobs
Tutorial: Compiling, Makefile, Parallel jobs Hartmut Häfner Steinbuch Centre for Computing (SCC) Funding: www.bwhpc-c5.de Outline Compiler + Numerical Libraries commands Linking Makefile Intro, Syntax
More informationSLURM Operation on Cray XT and XE
SLURM Operation on Cray XT and XE Morris Jette jette@schedmd.com Contributors and Collaborators This work was supported by the Oak Ridge National Laboratory Extreme Scale Systems Center. Swiss National
More informationSCALABLE HYBRID PROTOTYPE
SCALABLE HYBRID PROTOTYPE Scalable Hybrid Prototype Part of the PRACE Technology Evaluation Objectives Enabling key applications on new architectures Familiarizing users and providing a research platform
More informationTivoli Workload Scheduler LoadLeveler V3.4.2 and V3.4.1 documentation updates
Tivoli Workload Scheduler LoadLeveler V3.4.2 and V3.4.1 documentation updates This file contains updates to the IBM Tivoli Workload Scheduler (TWS) LoadLeveler Version 3.4 documentation. v TWS LoadLeveler:
More informationIntroduction to SLURM on the High Performance Cluster at the Center for Computational Research
Introduction to SLURM on the High Performance Cluster at the Center for Computational Research Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St Buffalo, NY
More informationRunning in parallel. Total number of cores available after hyper threading (virtual cores)
First at all, to know how many processors/cores you have available in your computer, type in the terminal: $> lscpu The output for this particular workstation is the following: Architecture: x86_64 CPU
More informationEffective Use of CCV Resources
Effective Use of CCV Resources Mark Howison User Services & Support This talk... Assumes you have some familiarity with a Unix shell Provides examples and best practices for typical usage of CCV systems
More informationHigh Performance Computing (HPC) Using zcluster at GACRC
High Performance Computing (HPC) Using zcluster at GACRC On-class STAT8060 Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu Outline What is GACRC?
More informationTECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx
More informationIntroduction to HPC Using zcluster at GACRC
Introduction to HPC Using zcluster at GACRC On-class PBIO/BINF8350 Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu Outline What is GACRC? What
More informationCluster Clonetroop: HowTo 2014
2014/02/25 16:53 1/13 Cluster Clonetroop: HowTo 2014 Cluster Clonetroop: HowTo 2014 This section contains information about how to access, compile and execute jobs on Clonetroop, Laboratori de Càlcul Numeric's
More informationUsing SDSC Systems (part 2)
Using SDSC Systems (part 2) Running vsmp jobs, Data Transfer, I/O SDSC Summer Institute August 6-10 2012 Mahidhar Tatineni San Diego Supercomputer Center " 1 vsmp Runtime Guidelines: Overview" Identify
More informationHigh Performance Beowulf Cluster Environment User Manual
High Performance Beowulf Cluster Environment User Manual Version 3.1c 2 This guide is intended for cluster users who want a quick introduction to the Compusys Beowulf Cluster Environment. It explains how
More informationLab: Hybrid Programming and NUMA Control
Lab: Hybrid Programming and NUMA Control Steve Lantz Workshop: Parallel Computing on Ranger and Longhorn May 17, 2012 Based on materials developed by by Kent Milfeld at TACC 1 What You Will Learn How to
More informationThe IBM Blue Gene/Q: Application performance, scalability and optimisation
The IBM Blue Gene/Q: Application performance, scalability and optimisation Mike Ashworth, Andrew Porter Scientific Computing Department & STFC Hartree Centre Manish Modani IBM STFC Daresbury Laboratory,
More informationIntroduction to HPC Using zcluster at GACRC
Introduction to HPC Using zcluster at GACRC Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu Outline What is GACRC? What is HPC Concept? What is
More informationQuick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing
Quick Start Guide by Burak Himmetoglu Supercomputing Consultant Enterprise Technology Services & Center for Scientific Computing E-mail: bhimmetoglu@ucsb.edu Contents User access, logging in Linux/Unix
More informationKohinoor queuing document
List of SGE Commands: qsub : Submit a job to SGE Kohinoor queuing document qstat : Determine the status of a job qdel : Delete a job qhost : Display Node information Some useful commands $qstat f -- Specifies
More informationIntroduction to HPC Using zcluster at GACRC
Introduction to HPC Using zcluster at GACRC On-class STAT8330 Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala pakala@uga.edu Slides courtesy: Zhoufei Hou 1 Outline What
More informationMPICH User s Guide Version Mathematics and Computer Science Division Argonne National Laboratory
MPICH User s Guide Version 3.1.4 Mathematics and Computer Science Division Argonne National Laboratory Pavan Balaji Wesley Bland William Gropp Rob Latham Huiwei Lu Antonio J. Peña Ken Raffenetti Sangmin
More informationA Hands-On Tutorial: RNA Sequencing Using High-Performance Computing
A Hands-On Tutorial: RNA Sequencing Using Computing February 11th and 12th, 2016 1st session (Thursday) Preliminaries: Linux, HPC, command line interface Using HPC: modules, queuing system Presented by:
More informationCOMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP
COMP4510 Introduction to Parallel Computation Shared Memory and OpenMP Thanks to Jon Aronsson (UofM HPC consultant) for some of the material in these notes. Outline (cont d) Shared Memory and OpenMP Including
More informationHybrid MPI/OpenMP parallelization. Recall: MPI uses processes for parallelism. Each process has its own, separate address space.
Hybrid MPI/OpenMP parallelization Recall: MPI uses processes for parallelism. Each process has its own, separate address space. Thread parallelism (such as OpenMP or Pthreads) can provide additional parallelism
More informationUsing and Administering
IBM LoadLeeler Version 5 Release 1 Using and Administering SC23-6792-04 IBM LoadLeeler Version 5 Release 1 Using and Administering SC23-6792-04 Note Before using this information and the product it supports,
More informationXeon Phi Native Mode - Sharpen Exercise
Xeon Phi Native Mode - Sharpen Exercise Fiona Reid, Andrew Turner, Dominic Sloan-Murphy, David Henty, Adrian Jackson Contents June 19, 2015 1 Aims 1 2 Introduction 1 3 Instructions 2 3.1 Log into yellowxx
More informationImplementation of Parallelization
Implementation of Parallelization OpenMP, PThreads and MPI Jascha Schewtschenko Institute of Cosmology and Gravitation, University of Portsmouth May 9, 2018 JAS (ICG, Portsmouth) Implementation of Parallelization
More informationbwunicluster Tutorial Access, Data Transfer, Compiling, Modulefiles, Batch Jobs
bwunicluster Tutorial Access, Data Transfer, Compiling, Modulefiles, Batch Jobs Frauke Bösert, SCC, KIT 1 Material: Slides & Scripts https://indico.scc.kit.edu/indico/event/263/ @bwunicluster/forhlr I/ForHLR
More informationSlurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT November 13, 2013
Slurm and Abel job scripts Katerina Michalickova The Research Computing Services Group SUF/USIT November 13, 2013 Abel in numbers Nodes - 600+ Cores - 10000+ (1 node->2 processors->16 cores) Total memory
More informationMulticore Performance and Tools. Part 1: Topology, affinity, clock speed
Multicore Performance and Tools Part 1: Topology, affinity, clock speed Tools for Node-level Performance Engineering Gather Node Information hwloc, likwid-topology, likwid-powermeter Affinity control and
More informationIntroduction to Discovery.
Introduction to Discovery http://discovery.dartmouth.edu The Discovery Cluster 2 Agenda What is a cluster and why use it Overview of computer hardware in cluster Help Available to Discovery Users Logging
More informationShell Scripting. With Applications to HPC. Edmund Sumbar Copyright 2007 University of Alberta. All rights reserved
AICT High Performance Computing Workshop With Applications to HPC Edmund Sumbar research.support@ualberta.ca Copyright 2007 University of Alberta. All rights reserved High performance computing environment
More informationSlurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT October 23, 2012
Slurm and Abel job scripts Katerina Michalickova The Research Computing Services Group SUF/USIT October 23, 2012 Abel in numbers Nodes - 600+ Cores - 10000+ (1 node->2 processors->16 cores) Total memory
More informationIntroduction to Discovery.
Introduction to Discovery http://discovery.dartmouth.edu The Discovery Cluster 2 Agenda What is a cluster and why use it Overview of computer hardware in cluster Help Available to Discovery Users Logging
More informationGrid Examples. Steve Gallo Center for Computational Research University at Buffalo
Grid Examples Steve Gallo Center for Computational Research University at Buffalo Examples COBALT (Computational Fluid Dynamics) Ercan Dumlupinar, Syracyse University Aerodynamic loads on helicopter rotors
More informationPorting Applications to Blue Gene/P
Porting Applications to Blue Gene/P Dr. Christoph Pospiech pospiech@de.ibm.com 05/17/2010 Agenda What beast is this? Compile - link go! MPI subtleties Help! It doesn't work (the way I want)! Blue Gene/P
More informationUser Guide of High Performance Computing Cluster in School of Physics
User Guide of High Performance Computing Cluster in School of Physics Prepared by Sue Yang (xue.yang@sydney.edu.au) This document aims at helping users to quickly log into the cluster, set up the software
More informationPractical Introduction to Message-Passing Interface (MPI)
1 Outline of the workshop 2 Practical Introduction to Message-Passing Interface (MPI) Bart Oldeman, Calcul Québec McGill HPC Bart.Oldeman@mcgill.ca Theoretical / practical introduction Parallelizing your
More informationIntroduction to HPC Using zcluster at GACRC On-Class GENE 4220
Introduction to HPC Using zcluster at GACRC On-Class GENE 4220 Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala pakala@uga.edu Slides courtesy: Zhoufei Hou 1 OVERVIEW GACRC
More informationSlurm basics. Summer Kickstart June slide 1 of 49
Slurm basics Summer Kickstart 2017 June 2017 slide 1 of 49 Triton layers Triton is a powerful but complex machine. You have to consider: Connecting (ssh) Data storage (filesystems and Lustre) Resource
More informationSupercomputing environment TMA4280 Introduction to Supercomputing
Supercomputing environment TMA4280 Introduction to Supercomputing NTNU, IMF February 21. 2018 1 Supercomputing environment Supercomputers use UNIX-type operating systems. Predominantly Linux. Using a shell
More informationDebugging Intel Xeon Phi KNC Tutorial
Debugging Intel Xeon Phi KNC Tutorial Last revised on: 10/7/16 07:37 Overview: The Intel Xeon Phi Coprocessor 2 Debug Library Requirements 2 Debugging Host-Side Applications that Use the Intel Offload
More informationIntroduction to Discovery.
Introduction to Discovery http://discovery.dartmouth.edu March 2014 The Discovery Cluster 2 Agenda Resource overview Logging on to the cluster with ssh Transferring files to and from the cluster The Environment
More informationKISTI TACHYON2 SYSTEM Quick User Guide
KISTI TACHYON2 SYSTEM Quick User Guide Ver. 2.4 2017. Feb. SupercomputingCenter 1. TACHYON 2 System Overview Section Specs Model SUN Blade 6275 CPU Intel Xeon X5570 2.93GHz(Nehalem) Nodes 3,200 total Cores
More informationAn Introduction to Cluster Computing Using Newton
An Introduction to Cluster Computing Using Newton Jason Harris and Dylan Storey March 25th, 2014 Jason Harris and Dylan Storey Introduction to Cluster Computing March 25th, 2014 1 / 26 Workshop design.
More informationbwunicluster Tutorial Access, Data Transfer, Compiling, Modulefiles, Batch Jobs
bwunicluster Tutorial Access, Data Transfer, Compiling, Modulefiles, Batch Jobs Frauke Bösert, SCC, KIT 1 Material: Slides & Scripts https://indico.scc.kit.edu/indico/event/263/ @bwunicluster/forhlr I/ForHLR
More information