June Workshop Series June 27th: All About SLURM University of Nebraska Lincoln Holland Computing Center. Carrie Brown, Adam Caprez
|
|
- Adela O’Neal’
- 5 years ago
- Views:
Transcription
1 June Workshop Series June 27th: All About SLURM University of Nebraska Lincoln Holland Computing Center Carrie Brown, Adam Caprez
2 Setup Instructions Please complete these steps before the lessons start at 1:00 PM: Setup instructions: If you need to use a demo account please speak with one of the helpers If you need help with the setup, please put a red sticky note at the top of your laptop. When you are done with the setup, please put a green sticky note at the top of your laptop.
3 June Workshop Series Schedule June 6th: Introductory Bash June 13th: Advanced Bash and Git June 20th: Introductory HCC June 27th: All about SLURM Learn all about the Simple Linux Utility for Resource Management (SLURM), HCC's workload manager (scheduler) and how to select the best options to streamline your jobs. Upcoming Software Carpentry Workshops UNL: HCC Kickstart Bash, Git and HCC Basics September 5 th and 6 th UNO: Software Carpentry Bash, Git and R October 16 th and 17 th
4 Logistics Name tags, sign-in sheet Sticky notes: Red = need help, Green = all good Link to Workshop Materials: Etherpad: Terminal commands are in this font Any entries surrounded by <brackets> need to be filled in with information Example: <username>@crane.unl.edu becomes demo01@crane.unl.edu if your username=demo01. Today we will be using the reservation hccjune for all jobs Make sure your submit scripts include the line: #SBATCH --reservation=hccjune
5 What is a Cluster?
6 Exercises 1. If you aren t already, connect to the Crane cluster 2. Navigate to your $WORK directory 3. If you were not here last week, or do not have the tutorial directory, clone the files to your $WORK directory with the command: git clone 4. Make a new directory inside the tutorial directory (./HCCWorkshops/) named slurm this is where we will put all of our tutorial files for today. Once you have finished, put up your green sticky note. If you have issues, put up your red sticky note and one of the helpers will be around to assist.
7 SLURM Simple Linux Utility for Resource Management Open source, scalable cluster management and job scheduling system Used on ~60% of the TOP500 supercomputers 3 key functions Allocates exclusive or non-exclusive access to resources Framework for starting, executing and monitoring work Manages a queue of pending jobs Uses a best fit algorithm to assign tasks Fair Tree Fairshare Algorithm
8 Slurm vs PBS To PBS/SGE Command Slurm Equivalent Submit a job qsub <script_file> sbatch <script_file> Cancel a job qdel <job_id> scancel <job_id> Check the status of a job qstat <job_id> squeue <job_id> Check the status of all jobs by user qstat u <user_name> squeue u <user_name> Hold a job qhold <job_id> scontrol hold <job_id> Release a job qrls <job_id> scontrol release <job_id> More commands and schedulers:
9 sinfo Shows a listing of all partitions on a cluster Use #SBATCH --partition=<partition_name> All partitions have a 7 day run-time limitation Publically available partitions: Partition Description Limitations Clusters batch Defaultpartition 2000 max CPUs per user Crane, Tusker guest Uses free time on owned or leased Intellaband (IB) or Omni-Path Architecture (OPA) nodes Pre-emptable Max 158 IB CPU s and 2000 OPA CPU s per user Crane highmem High memory nodes (512 and 1024 GB) 192 max CPUs per user Tusker gpu_k20 GPU nodes include 3x Tesla K20m per node with IB 48 max CPUs per user Crane gpu_m2070 GPU nodes include 2x Tesla M2070 per node, non-ib 48 max CPUs per user Crane gpu_p100 GPU nodes include 2x Tesla P100 per node with OPA 40 max CPUs per user Crane
10 Fair Tree Fairshare Algorithm Fair Tree prioritizes users such that if accounts A and B are siblings and A has a higher fairshare factor than B, all children of A will have higher fairshare factors than all children of B Benefits: All users in a higher priority account receive a higher fair share factor than all users from a lower priority account Users in a more active group have lower priority than users in a less active group Users are sorted and ranked to prevent precision loss Priority is calculated based on rank, not directly off of Level FS value New jobs are immediately assigned a priority User ranking is calculated at 5 minute intervals
11 Calculation of Level FS (LF) Where: LF = $ 0 LF % S = Shares Norm assigned shares normalized to the shares assigned to itself and its siblings S = $ *+,-./0 $ *+,-./01-23/245-0 S 1 U = Effective Usage usage normalized to the account s usage U = % *+,-./0 % *+,-./01-23/245-0 U 1
12 Fairshare Algorithm root Groups gprof1 gprof2 Users uprof1 ustudent3 uprof2 ucollab78 uphd17 Uses a rooted plane tree (aka rooted ordered tree) sorted by Level FS descending from left to right Tree is traversed depth-first users are assigned rank and given a fairshare factor Process: Calculate Level FS for subtree s children Sort children of the subtree Visit children in descending order and assign fairshare factor fairshare factor = rank total # of users
13 Exercises 1. You can check on the share division and usage on Holland clusters with the sshare command. The output of this command can be quite long, combine it with head or grep to see individual portions of it. Can you write a command so you only see the first 10 lines of output? Modify the previous command to use grep to find your user and group information Compare the amount of your EffectvUsage to your NormShares Have you used more than your NormShares? How about your group overall? How does the group s EffectvUsage compare to the NormShares? 2. The sshare argument -l shows extended output, including the current calculated LevelFS values. Repeat the steps in #1, but with the -l argument this time. How does your LevelFS value compare to your group s LevelFS value? Does the calculated LevelFS value correspond to the differences you observed EffectvUsage? Once you have finished, put up your green sticky note. If you have issues, put up your red sticky note and one of the helpers will be around to assist.
14 sbatch Used to asynchronously submit a batch job to execute on allocated resources. Sequence of events: 1. User submits a script via sbatch 2. When resources become available they are allocated to the job 3. The script is executed on one node (the master node) The script must launch other tasks on allocated nodes STDOUT and STDERR are captured and redirected to the output file(s) 4. When script terminates, the allocation is released Any non-zero exit will be interpreted as a failure
15 Submit Scripts Shebang The shebang tells Slurm what interpreter to use for this file. This one is for the shell (Bash) Name of the submit file This can be anything. Here we are using invert_single.slurm the.slurm makes it easy to recognize that this is a submit file. Commands Any commands after the SBATCH lines will be executed by the interpreter specified in the shebang similar to what would happen if you were to type the commands interactively SBATCH options These must be immediately after the shebang and before any commands. The only required SBATCH options are time, nodes and mem, but there are many that you can use to fully customize your allocation.
16 Submit Files Best Practices Put all module loads immediately after SBATCH lines Quickly locate what modules and versions were used. Specify versions on module loads Allows you to see what versions were used during the analysis Use a separate submit file for each analysis Instead of editing and resubmitting a submit files, copy a previous one and make changes to it Keep a running record of your analyses Redirect output and error to separate files Allows you to see quickly whether a job completes with errors or not Separate individual workflow steps into individual jobs Avoid putting too many steps into a single job
17 Shebang! - Interpreters Must be included in the first line of the submit script Must be an absolute path Specifies which program is used to execute the contents of the script The shebang in the submit file can be one of the following: #!/bin/bash The most common shell and also the default shell at HCC #!/bin/csh - symlink to tcsh #!/usr/bin/perl #!/usr/bin/python Using Perl or Python interpreters can make loading modules difficult Scripts that return anything but 0 will be interpreted as a failed job by Slurm
18 Common SBATCH Options Command --nodes Number of nodes requested What it does --time --mem --ntasks-per-node --mem-per-cpu --output --error --job-name Maximum walltime for the job in DD-HHH:MM:SS format maximum of 7 days on batch partition Real memory (RAM) required per node - can use KB, MB, and GB units default is MB Request less memory than total available on the node - The maximum available on a 512 GB RAM node is 500, for 256 GB RAM node is 250 Number of tasks per node used to request a specific number of cores Minimum of memory required per allocated CPU default is 1 GB Filename where all STDOUT will be directed default is slurm-<jobid>.out Filename where all STDERR will be directed default is slurm-<jobid>.out How the job will show up in the queue For more information: sbatch help SLURM Documentation:
19 scancel Use to cancel jobs prior to completion Usage: scancel <job_id> Use other arguments to cancel multiple jobs at once or combine both to prevent accidentally canceling the wrong job Other arguments: Argument Cancel --name=<job_name> --partition=<partition> --user=<user_name> --state=<job_state> jobs with this name jobs in this partition jobs of this user jobs in this state Valid states: PENDING, RUNNING, and SUSPENDED
20 Short qos Increases a jobs priority, allowing it to run as soon as possible Useful for testing and developmental work Limitations: 6 hour runtime 1 job of 16 CPU s or fewer Max of 2 jobs per user Max of 256 CPU s in use for all short jobs from all users To use, include this line in your submit script: #SBATCH --qos=short For more information:
21 Exercise 1. Write a submit script from scratch. (No copying previous ones!) The script should use the following parameters: Uses 1 node Uses 10 GB RAM 10 minutes Runtime Executes the command: echo I can write submit scripts! Submit your script and watch for output. If you run into errors, copy the error to Etherpad. If you were able to fix the error, add a brief note explaining how you did. Once you have finished, put up your green sticky note. If you have issues, put up your red sticky note and one of the helpers will be around to assist.
22 Exercise Solution
23 squeue Job ID The ID number assigned to your job by Slurm Name The name you gave the job as specified in the submit script Time The length of time the job has been running Nodes The number of nodes the job is running on Partition The partition the job is running on or assigned to User The user that owns the job State The current status of the job. Common states include: CD Completed CA Canceled F Failed PD Pending R Running Nodelist If the job is running: the names of the nodes the job is running on If the job is pending: the reason the job is pending For more information:
24 Common Reason Codes Job Reason Codes Dependency NodeDown PartitionDown Priority ReqNodeNotAvail Reservation Description This job is waiting for a dependent job to complete. A node required by the job is down. The partition (queue) required by this job is in a DOWN state and temporarily accepting no jobs, for instance because of maintainance. Note that this message may be displayed for a time even after the system is back up. One or more higher priority jobs exist for this partition or advanced reservation. Other jobs in the queue have higher priority than yours. No nodes can be found satisfying your limits, for instance because maintainance is scheduled and the job can not finish before it The job is waiting for its advanced reservation to become available. More information: squeue --help
25 Common squeue Options Option -u <user_name> --user=<user_name> Displays information about -j <job_list> specified job(s) * jobs owned by the specified user_name(s) * -p <part_list> jobs in a specified partition(s) * -t <state_list> jobs in the specified state(s) {PD, R, S, CG, CD, CF, CA, F, TO, PR, NF} * -i <interval> --interate= <interval> -S <sort_list> --sort=<sort_list> --start jobs repeatedly reported at intervals (in seconds) jobs sorted by specified field(s) * pending jobs and scheduled start times * Indicates arguments that can take a comma-separated list For more options:
26 Exercise 1. Use the squeue command to determine the following. Hint: Don t forget about wc l How many jobs are currently Running? How many jobs are currently Pending? The grid partition is composed of resources that are made available to the Open Science Grid. How many jobs are currently in the queue for this partition? How many jobs are currently in queue for the user root? 1. Edit the submit script you made previously. Add the following command to execute after the echo command: sleep 120 Submit the updated script file and monitor its progress with squeue. If it is pending for a while, use --start to see how much longer until it is expected to start. How accurate was the estimate? Can you guess what sleep does just by how your job changes? If not, take a look at the documentation (sleep --help).
27 Customizing squeue output Use the --Format argument (must be capitalized) Fields you want displayed are specified in a comma-separated list without spaces after the argument Fields of note: priority reason dependency eligibletime endtime state / statecompact submittime Even more customization options are available for --Format and the -- format flag check out man squeue for more information.
28 Environmental Variables and Replacement Symbols Environmental Variables Can be used in the command section of a submit file (passed to scripts or programs via arguments) Cannot be used within an #SBATCH directive Use Replacement Symbols instead Environment Variable SLURM_JOB_ID SLURM_JOB_NAME SLURM_NNODES SLURM_NODELIST SLURM_NTASKS SLURM_QUEUE SLURM_SUBMIT_DIR SLURM_TASKS_PER_NODE Description batch job id assigned by Slurm upon submission user-assigned job name number of nodes list of nodes total number of tasks queue (partition) directory of submission number of tasks per node Replacement Symbols Symbol %A Value Job array s master job allocation number %a Job array ID (index) number %j Job allocation number (job id) %N Node name will be replaced by the name of the first node in the job (the one that runs the script) %u User name %% The character % A number can be placed between % and the following character to zero-pad the result For example: job%j.out would create job out for job_id= job%9j.out would create job out for job_id=
29 Additional sbatch Options Argument --begin:<time> --deadline=<time> --hold --immediate --mail-type=<type> --mail-user=<user_ > --open-mode=<append truncate> --test-only --tmp=<mb> Details The controller will wait to allocate the job until the specified time Specific Time: HH:MM:SS Specific Date: MMDDYY or MM/DD/YY or YYY-MM-DD Specific Date and Time: YYYY-MM-DD[THH:MM:SS] Keywords can be used now, today, tomorrow Can also be relative in format now+<time> Remove the job if it cannot finish before the deadline Valid time formats: HH:MM[:SS] [AM PM] MMDD[YY] or MM/DD[/YY] or MM.DD[.YY] MM/DD[/YY]-HH:MM[:SS] YYYY-MM-DD[THH:MM[:SS]]] Will hold the job in held state until released manually using the command scontrol release <job_id> Will only release the job if the resources are immediately available Notify user by when certain event types occur. Valid type include: BEGIN, END, FAIL, ALL, TIME_LIMIT, TIME_LIMIT_X (When X% of the time is up, where X is 90, 80, or 50) Specify an to send event notifications to Specify how to open output files default is truncate Validates the script and returns a starting estimate based on the current queue and job requirements Does not submit the job Minimum amount of temporary disk space on the allocated node
30 3. \ Exercises 1. Edit the submit script you created previous to: Include at least two of the additional options we discussed. Submit the script to see how they work. Try changing some of the parameters (number of nodes, memory, or time) and use the #SBATCH --testonly argument to see how the estimated start time changes. Which parameter seems to affect it the most? 2. Using the cd command, navigate to the matlab directory inside of HCCWorkshops. Use less to view the contents of the invertrand.submit file. Can you find all of the environmental variables and replacement symbols used? What role do each of them play in this script? 4. Navigate back into the directory which contains the submit script you made today. Edit the script to include one environmental variable and one replacement symbol. Submit the script and check to see if your changes worked the way you expected. Once you have finished, put up your green sticky note. If you have issues, put up your red sticky note and one of the helpers will be around to assist.
31 Array Job Submissions Submits a specified number of identical jobs Use environmental variables and replacement symbols to separate output Usage: #SBATCH --array=<array numbers or ranges> Array list can any combination of the following: a comma separated list of values. #SBATCH --array=1,5,10 : submits 3 array jobs with array ids 1, 5, 10 a range of values with a separator. #SBATCH --array=0-5 : submits 6 array jobs with array ids 0, 1, 2, 3, 4, 5 A range of values with a : to indicate step value #SBATCH array=1-9:2 : submits 5 array jobs with array ids 1, 3, 5, 7, 9 A % can be used to specify the maximum number of simultaneous tasks (default is 1000) #SBATCH --array=1-10%4 : submits 10 array jobs with 4 simultaneous running jobs To cancel array jobs: Usage: scancel <job_id>_<array numbers> Cancel all array jobs: scancel <job_id> Cancel single array ids: scancel <job_id>_<array id>
32 Exercises 1. Specify how many jobs these commands will create. What are they re array id s? How many will run simultaneously? #SBATCH --array=5-10 #SBATCH --array=0-4,15-20 #SBATCH --array=1,3-10:2 #SBATCH --array=0-20:2%10 2. When we looked at the output of the example array job, the output is not in numeric order. Can you think of a reason why that happens? 3. Edit the example array job to do the following: Run 15 array tasks, each one with an odd array id Run 5 array tasks, each one with a unique 3 digit id Once you have finished, put up your green sticky note. If you have issues, put up your red sticky note and one of the helpers will be around to assist.
33 Job Dependencies Allows you to queue multiple jobs that depend on the completion of one or more previous jobs When submitting the job, use the -d argument followed by specification of what jobs and when to execute <when_to_execute>:<job_id> After successful completion afterok:<job_id> After non-successful completion afternotok:<job_id> Multiple job ids can be specified, separate with colons afterok:<job_id1>:<job_id2> Dependent jobs can use output and files created from previous jobs
34 Exercises 1. Copy the JobB.submit script, calling the new one JobC.submit and edit the contents accordingly (replace all instances of B with C ). Using sbatch, queue JobA. Then queue JobB and JobC, setting them both to begin after the successful completion of JobA. 2. Using the previous three submit scripts, create a new submit script which will do the following: Combine the output from both JobB and JobC into a text file called JobD.txt Add the line Sample job D output to this new text file 3. Using these four submit scripts, Run them so the jobs trigger in the order according to the diagram to the right Once you have finished, put up your green sticky note. If you have issues, put up your red sticky note and one of the helpers will be around to assist.
35 Exercise Solution
36 srun Used to synchronously submit a single command Commonly used to start interactive sessions Sequence of Events: 1. User submits a command for execution May include command line arguments will be executed exactly as specified 2. If allocation exists, the job executes immediately Otherwise, the job will block until a new allocation is established 3. n identical copies of the command are run simultaneously on allocated resources as individual tasks --pty induces pseudo-terminal mode input and output is directed to the users shell 4. Once all tasks terminate, the srun session will terminate If the allocation was created with srun, it will be released
37 Using srun to monitor batch jobs 1. Connect to the node running the job: srun -j <job_id> --pty bash {or top} srun -nodelist=<node_id> --pty bash {or top} 2. Monitor: top (if not already running) Use to monitor core use ideal for multi-core processes Press u to search for your username cat /cgroup/memory/slurm/uid_<uid>/job_<job_id>/memory.max_usage_in_bytes Use to monitor memory use To determine your uid use: id -u <user_name> Match with watch -n to specify a refresh interval - default is 2 seconds CTRL + C to exit
Introduction to SLURM & SLURM batch scripts
Introduction to SLURM & SLURM batch scripts Anita Orendt Assistant Director Research Consulting & Faculty Engagement anita.orendt@utah.edu 6 February 2018 Overview of Talk Basic SLURM commands SLURM batch
More informationIntroduction to SLURM on the High Performance Cluster at the Center for Computational Research
Introduction to SLURM on the High Performance Cluster at the Center for Computational Research Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St Buffalo, NY
More informationIntroduction to SLURM & SLURM batch scripts
Introduction to SLURM & SLURM batch scripts Anita Orendt Assistant Director Research Consulting & Faculty Engagement anita.orendt@utah.edu 16 Feb 2017 Overview of Talk Basic SLURM commands SLURM batch
More informationSlurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT November 13, 2013
Slurm and Abel job scripts Katerina Michalickova The Research Computing Services Group SUF/USIT November 13, 2013 Abel in numbers Nodes - 600+ Cores - 10000+ (1 node->2 processors->16 cores) Total memory
More informationSlurm basics. Summer Kickstart June slide 1 of 49
Slurm basics Summer Kickstart 2017 June 2017 slide 1 of 49 Triton layers Triton is a powerful but complex machine. You have to consider: Connecting (ssh) Data storage (filesystems and Lustre) Resource
More informationSubmitting batch jobs
Submitting batch jobs SLURM on ECGATE Xavi Abellan Xavier.Abellan@ecmwf.int ECMWF February 20, 2017 Outline Interactive mode versus Batch mode Overview of the Slurm batch system on ecgate Batch basic concepts
More informationDuke Compute Cluster Workshop. 3/28/2018 Tom Milledge rc.duke.edu
Duke Compute Cluster Workshop 3/28/2018 Tom Milledge rc.duke.edu rescomputing@duke.edu Outline of talk Overview of Research Computing resources Duke Compute Cluster overview Running interactive and batch
More informationDuke Compute Cluster Workshop. 11/10/2016 Tom Milledge h:ps://rc.duke.edu/
Duke Compute Cluster Workshop 11/10/2016 Tom Milledge h:ps://rc.duke.edu/ rescompu>ng@duke.edu Outline of talk Overview of Research Compu>ng resources Duke Compute Cluster overview Running interac>ve and
More informationSlurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT October 23, 2012
Slurm and Abel job scripts Katerina Michalickova The Research Computing Services Group SUF/USIT October 23, 2012 Abel in numbers Nodes - 600+ Cores - 10000+ (1 node->2 processors->16 cores) Total memory
More informationHigh Performance Computing Cluster Basic course
High Performance Computing Cluster Basic course Jeremie Vandenplas, Gwen Dawes 30 October 2017 Outline Introduction to the Agrogenomics HPC Connecting with Secure Shell to the HPC Introduction to the Unix/Linux
More informationIntroduction to SLURM & SLURM batch scripts
Introduction to SLURM & SLURM batch scripts Anita Orendt Assistant Director Research Consulting & Faculty Engagement anita.orendt@utah.edu 23 June 2016 Overview of Talk Basic SLURM commands SLURM batch
More informationBatch Systems & Parallel Application Launchers Running your jobs on an HPC machine
Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike
More informationDuke Compute Cluster Workshop. 10/04/2018 Tom Milledge rc.duke.edu
Duke Compute Cluster Workshop 10/04/2018 Tom Milledge rc.duke.edu rescomputing@duke.edu Outline of talk Overview of Research Computing resources Duke Compute Cluster overview Running interactive and batch
More informationSherlock for IBIIS. William Law Stanford Research Computing
Sherlock for IBIIS William Law Stanford Research Computing Overview How we can help System overview Tech specs Signing on Batch submission Software environment Interactive jobs Next steps We are here to
More informationSlurm at UPPMAX. How to submit jobs with our queueing system. Jessica Nettelblad sysadmin at UPPMAX
Slurm at UPPMAX How to submit jobs with our queueing system Jessica Nettelblad sysadmin at UPPMAX Slurm at UPPMAX Intro Queueing with Slurm How to submit jobs Testing How to test your scripts before submission
More informationBatch Systems. Running your jobs on an HPC machine
Batch Systems Running your jobs on an HPC machine Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationSubmitting and running jobs on PlaFRIM2 Redouane Bouchouirbat
Submitting and running jobs on PlaFRIM2 Redouane Bouchouirbat Summary 1. Submitting Jobs: Batch mode - Interactive mode 2. Partition 3. Jobs: Serial, Parallel 4. Using generic resources Gres : GPUs, MICs.
More informationHow to run a job on a Cluster?
How to run a job on a Cluster? Cluster Training Workshop Dr Samuel Kortas Computational Scientist KAUST Supercomputing Laboratory Samuel.kortas@kaust.edu.sa 17 October 2017 Outline 1. Resources available
More informationSubmitting batch jobs Slurm on ecgate
Submitting batch jobs Slurm on ecgate Xavi Abellan xavier.abellan@ecmwf.int User Support Section Com Intro 2015 Submitting batch jobs ECMWF 2015 Slide 1 Outline Interactive mode versus Batch mode Overview
More informationExercises: Abel/Colossus and SLURM
Exercises: Abel/Colossus and SLURM November 08, 2016 Sabry Razick The Research Computing Services Group, USIT Topics Get access Running a simple job Job script Running a simple job -- qlogin Customize
More informationIntroduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU
Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU What is Joker? NMSU s supercomputer. 238 core computer cluster. Intel E-5 Xeon CPUs and Nvidia K-40 GPUs. InfiniBand innerconnect.
More informationIntroduction to High-Performance Computing (HPC)
Introduction to High-Performance Computing (HPC) Computer components CPU : Central Processing Unit cores : individual processing units within a CPU Storage : Disk drives HDD : Hard Disk Drive SSD : Solid
More informationHPC Introductory Course - Exercises
HPC Introductory Course - Exercises The exercises in the following sections will guide you understand and become more familiar with how to use the Balena HPC service. Lines which start with $ are commands
More informationHigh Performance Computing Cluster Advanced course
High Performance Computing Cluster Advanced course Jeremie Vandenplas, Gwen Dawes 9 November 2017 Outline Introduction to the Agrogenomics HPC Submitting and monitoring jobs on the HPC Parallel jobs on
More informationCRUK cluster practical sessions (SLURM) Part I processes & scripts
CRUK cluster practical sessions (SLURM) Part I processes & scripts login Log in to the head node, clust1-headnode, using ssh and your usual user name & password. SSH Secure Shell 3.2.9 (Build 283) Copyright
More informationName Department/Research Area Have you used the Linux command line?
Please log in with HawkID (IOWA domain) Macs are available at stations as marked To switch between the Windows and the Mac systems, press scroll lock twice 9/27/2018 1 Ben Rogers ITS-Research Services
More informationHeterogeneous Job Support
Heterogeneous Job Support Tim Wickberg SchedMD SC17 Submitting Jobs Multiple independent job specifications identified in command line using : separator The job specifications are sent to slurmctld daemon
More informationSLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education
SLURM: Resource Management and Job Scheduling Software Advanced Computing Center for Research and Education www.accre.vanderbilt.edu Simple Linux Utility for Resource Management But it s also a job scheduler!
More informationBatch Usage on JURECA Introduction to Slurm. May 2016 Chrysovalantis Paschoulas HPS JSC
Batch Usage on JURECA Introduction to Slurm May 2016 Chrysovalantis Paschoulas HPS group @ JSC Batch System Concepts Resource Manager is the software responsible for managing the resources of a cluster,
More informationSetup InstrucIons. If you need help with the setup, please put a red sicky note at the top of your laptop.
Setup InstrucIons Please complete these steps for the June 26 th workshop before the lessons start at 1:00 PM: h;p://hcc.unl.edu/june-workshop-setup#weekfour And make sure you can log into Crane. OS-specific
More informationIntroduction to RCC. September 14, 2016 Research Computing Center
Introduction to HPC @ RCC September 14, 2016 Research Computing Center What is HPC High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers
More informationIntroduction to High Performance Computing at Case Western Reserve University. KSL Data Center
Introduction to High Performance Computing at Case Western Reserve University Research Computing and CyberInfrastructure team KSL Data Center Presenters Emily Dragowsky Daniel Balagué Guardia Hadrian Djohari
More informationWorking with Shell Scripting. Daniel Balagué
Working with Shell Scripting Daniel Balagué Editing Text Files We offer many text editors in the HPC cluster. Command-Line Interface (CLI) editors: vi / vim nano (very intuitive and easy to use if you
More informationIntroduction to GACRC Teaching Cluster
Introduction to GACRC Teaching Cluster Georgia Advanced Computing Resource Center (GACRC) EITS/University of Georgia Zhuofei Hou zhuofei@uga.edu 1 Outline GACRC Overview Computing Resources Three Folders
More informationUsing a Linux System 6
Canaan User Guide Connecting to the Cluster 1 SSH (Secure Shell) 1 Starting an ssh session from a Mac or Linux system 1 Starting an ssh session from a Windows PC 1 Once you're connected... 1 Ending an
More informationCNAG Advanced User Training
www.bsc.es CNAG Advanced User Training Aníbal Moreno, CNAG System Administrator Pablo Ródenas, BSC HPC Support Rubén Ramos Horta, CNAG HPC Support Barcelona,May the 5th Aim Understand CNAG s cluster design
More informationIntroduction to GACRC Teaching Cluster
Introduction to GACRC Teaching Cluster Georgia Advanced Computing Resource Center (GACRC) EITS/University of Georgia Zhuofei Hou zhuofei@uga.edu 1 Outline GACRC Overview Computing Resources Three Folders
More informationHow to access Geyser and Caldera from Cheyenne. 19 December 2017 Consulting Services Group Brian Vanderwende
How to access Geyser and Caldera from Cheyenne 19 December 2017 Consulting Services Group Brian Vanderwende Geyser nodes useful for large-scale data analysis and post-processing tasks 16 nodes with: 40
More informationSubmitting batch jobs Slurm on ecgate Solutions to the practicals
Submitting batch jobs Slurm on ecgate Solutions to the practicals Xavi Abellan xavier.abellan@ecmwf.int User Support Section Com Intro 2015 Submitting batch jobs ECMWF 2015 Slide 1 Practical 1: Basic job
More informationQuick Guide for the Torque Cluster Manager
Quick Guide for the Torque Cluster Manager Introduction: One of the main purposes of the Aries Cluster is to accommodate especially long-running programs. Users who run long jobs (which take hours or days
More informationIntroduction to RCC. January 18, 2017 Research Computing Center
Introduction to HPC @ RCC January 18, 2017 Research Computing Center What is HPC High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers much
More informationIntroduction to GACRC Teaching Cluster PHYS8602
Introduction to GACRC Teaching Cluster PHYS8602 Georgia Advanced Computing Resource Center (GACRC) EITS/University of Georgia Zhuofei Hou zhuofei@uga.edu 1 Outline GACRC Overview Computing Resources Three
More informationChoosing Resources Wisely. What is Research Computing?
Choosing Resources Wisely Scott Yockel, PhD Harvard - Research Computing What is Research Computing? Faculty of Arts and Sciences (FAS) department that handles nonenterprise IT requests from researchers.
More informationRHRK-Seminar. High Performance Computing with the Cluster Elwetritsch - II. Course instructor : Dr. Josef Schüle, RHRK
RHRK-Seminar High Performance Computing with the Cluster Elwetritsch - II Course instructor : Dr. Josef Schüle, RHRK Overview Course I Login to cluster SSH RDP / NX Desktop Environments GNOME (default)
More informationIntroduction to the NCAR HPC Systems. 25 May 2018 Consulting Services Group Brian Vanderwende
Introduction to the NCAR HPC Systems 25 May 2018 Consulting Services Group Brian Vanderwende Topics to cover Overview of the NCAR cluster resources Basic tasks in the HPC environment Accessing pre-built
More informationUser Guide of High Performance Computing Cluster in School of Physics
User Guide of High Performance Computing Cluster in School of Physics Prepared by Sue Yang (xue.yang@sydney.edu.au) This document aims at helping users to quickly log into the cluster, set up the software
More informationNBIC TechTrack PBS Tutorial
NBIC TechTrack PBS Tutorial by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen Visit our webpage at: http://www.nbic.nl/support/brs 1 NBIC PBS Tutorial
More informationPBS Pro Documentation
Introduction Most jobs will require greater resources than are available on individual nodes. All jobs must be scheduled via the batch job system. The batch job system in use is PBS Pro. Jobs are submitted
More informationECE 574 Cluster Computing Lecture 4
ECE 574 Cluster Computing Lecture 4 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 31 January 2017 Announcements Don t forget about homework #3 I ran HPCG benchmark on Haswell-EP
More informationBash for SLURM. Author: Wesley Schaal Pharmaceutical Bioinformatics, Uppsala University
Bash for SLURM Author: Wesley Schaal Pharmaceutical Bioinformatics, Uppsala University wesley.schaal@farmbio.uu.se Lab session: Pavlin Mitev (pavlin.mitev@kemi.uu.se) it i slides at http://uppmax.uu.se/support/courses
More informationSlurm at UPPMAX. How to submit jobs with our queueing system. Jessica Nettelblad sysadmin at UPPMAX
Slurm at UPPMAX How to submit jobs with our queueing system Jessica Nettelblad sysadmin at UPPMAX Free! Watch! Futurama S2 Ep.4 Fry and the Slurm factory Simple Linux Utility for Resource Management Open
More informationA Hands-On Tutorial: RNA Sequencing Using High-Performance Computing
A Hands-On Tutorial: RNA Sequencing Using Computing February 11th and 12th, 2016 1st session (Thursday) Preliminaries: Linux, HPC, command line interface Using HPC: modules, queuing system Presented by:
More informationTroubleshooting Jobs on Odyssey
Troubleshooting Jobs on Odyssey Paul Edmon, PhD ITC Research CompuGng Associate Bob Freeman, PhD Research & EducaGon Facilitator XSEDE Campus Champion Goals Tackle PEND, FAIL, and slow performance issues
More informationChoosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 FAS Research Computing
Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 Email:plamenkrastev@fas.harvard.edu Objectives Inform you of available computational resources Help you choose appropriate computational
More informationUsing Compute Canada. Masao Fujinaga Information Services and Technology University of Alberta
Using Compute Canada Masao Fujinaga Information Services and Technology University of Alberta Introduction to cedar batch system jobs are queued priority depends on allocation and past usage Cedar Nodes
More informationIntroduction to the Cluster
Follow us on Twitter for important news and updates: @ACCREVandy Introduction to the Cluster Advanced Computing Center for Research and Education http://www.accre.vanderbilt.edu The Cluster We will be
More informationAnswers to Federal Reserve Questions. Training for University of Richmond
Answers to Federal Reserve Questions Training for University of Richmond 2 Agenda Cluster Overview Software Modules PBS/Torque Ganglia ACT Utils 3 Cluster overview Systems switch ipmi switch 1x head node
More informationIntroduction to High-Performance Computing (HPC)
Introduction to High-Performance Computing (HPC) Computer components CPU : Central Processing Unit cores : individual processing units within a CPU Storage : Disk drives HDD : Hard Disk Drive SSD : Solid
More informationIntroduction to UBELIX
Science IT Support (ScITS) Michael Rolli, Nico Färber Informatikdienste Universität Bern 06.06.2017, Introduction to UBELIX Agenda > Introduction to UBELIX (Overview only) Other topics spread in > Introducing
More informationIntroduction to GALILEO
Introduction to GALILEO Parallel & production environment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Domenico Guida d.guida@cineca.it Maurizio Cremonesi m.cremonesi@cineca.it
More informationUsing Cartesius and Lisa. Zheng Meyer-Zhao - Consultant Clustercomputing
Zheng Meyer-Zhao - zheng.meyer-zhao@surfsara.nl Consultant Clustercomputing Outline SURFsara About us What we do Cartesius and Lisa Architectures and Specifications File systems Funding Hands-on Logging
More information1 Bull, 2011 Bull Extreme Computing
1 Bull, 2011 Bull Extreme Computing Table of Contents Overview. Principal concepts. Architecture. Scheduler Policies. 2 Bull, 2011 Bull Extreme Computing SLURM Overview Ares, Gerardo, HPC Team Introduction
More informationLinux Tutorial. Ken-ichi Nomura. 3 rd Magics Materials Software Workshop. Gaithersburg Marriott Washingtonian Center November 11-13, 2018
Linux Tutorial Ken-ichi Nomura 3 rd Magics Materials Software Workshop Gaithersburg Marriott Washingtonian Center November 11-13, 2018 Wireless Network Configuration Network Name: Marriott_CONFERENCE (only
More informationGrid Engine Users Guide. 5.5 Edition
Grid Engine Users Guide 5.5 Edition Grid Engine Users Guide : 5.5 Edition Published May 08 2012 Copyright 2012 University of California and Scalable Systems This document is subject to the Rocks License
More informationAn Introduction to Gauss. Paul D. Baines University of California, Davis November 20 th 2012
An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012 What is Gauss? * http://wiki.cse.ucdavis.edu/support:systems:gauss * 12 node compute cluster (2 x 16 cores per
More informationImage Sharpening. Practical Introduction to HPC Exercise. Instructions for Cirrus Tier-2 System
Image Sharpening Practical Introduction to HPC Exercise Instructions for Cirrus Tier-2 System 2 1. Aims The aim of this exercise is to get you used to logging into an HPC resource, using the command line
More informationIntroduction to PICO Parallel & Production Enviroment
Introduction to PICO Parallel & Production Enviroment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Domenico Guida d.guida@cineca.it Nicola Spallanzani n.spallanzani@cineca.it
More informationGPU Cluster Usage Tutorial
GPU Cluster Usage Tutorial How to make caffe and enjoy tensorflow on Torque 2016 11 12 Yunfeng Wang 1 PBS and Torque PBS: Portable Batch System, computer software that performs job scheduling versions
More informationTraining day SLURM cluster. Context Infrastructure Environment Software usage Help section SLURM TP For further with SLURM Best practices Support TP
Training day SLURM cluster Context Infrastructure Environment Software usage Help section SLURM TP For further with SLURM Best practices Support TP Context PRE-REQUISITE : LINUX connect to «genologin»
More informationFor Dr Landau s PHYS8602 course
For Dr Landau s PHYS8602 course Shan-Ho Tsai (shtsai@uga.edu) Georgia Advanced Computing Resource Center - GACRC January 7, 2019 You will be given a student account on the GACRC s Teaching cluster. Your
More informationKimmo Mattila Ari-Matti Sarén. CSC Bioweek Computing intensive bioinformatics analysis on Taito
Kimmo Mattila Ari-Matti Sarén CSC Bioweek 2018 Computing intensive bioinformatics analysis on Taito 7. 2. 2018 CSC Environment Sisu Cray XC40 Massively Parallel Processor (MPP) supercomputer 3376 12-core
More informationRunning Jobs on Blue Waters. Greg Bauer
Running Jobs on Blue Waters Greg Bauer Policies and Practices Placement Checkpointing Monitoring a job Getting a nodelist Viewing the torus 2 Resource and Job Scheduling Policies Runtime limits expected
More informationCompiling applications for the Cray XC
Compiling applications for the Cray XC Compiler Driver Wrappers (1) All applications that will run in parallel on the Cray XC should be compiled with the standard language wrappers. The compiler drivers
More informationGraham vs legacy systems
New User Seminar Graham vs legacy systems This webinar only covers topics pertaining to graham. For the introduction to our legacy systems (Orca etc.), please check the following recorded webinar: SHARCNet
More informationOpenPBS Users Manual
How to Write a PBS Batch Script OpenPBS Users Manual PBS scripts are rather simple. An MPI example for user your-user-name: Example: MPI Code PBS -N a_name_for_my_parallel_job PBS -l nodes=7,walltime=1:00:00
More informationCOSC 6374 Parallel Computation. Debugging MPI applications. Edgar Gabriel. Spring 2008
COSC 6374 Parallel Computation Debugging MPI applications Spring 2008 How to use a cluster A cluster usually consists of a front-end node and compute nodes Name of the front-end node: shark.cs.uh.edu You
More informationLAB. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers
LAB Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers Dan Stanzione, Lars Koesterke, Bill Barth, Kent Milfeld dan/lars/bbarth/milfeld@tacc.utexas.edu XSEDE 12 July 16, 2012 1 Discovery
More informationUsing the SLURM Job Scheduler
Using the SLURM Job Scheduler [web] [email] portal.biohpc.swmed.edu biohpc-help@utsouthwestern.edu 1 Updated for 2015-05-13 Overview Today we re going to cover: Part I: What is SLURM? How to use a basic
More informationContents. Note: pay attention to where you are. Note: Plaintext version. Note: pay attention to where you are... 1 Note: Plaintext version...
Contents Note: pay attention to where you are........................................... 1 Note: Plaintext version................................................... 1 Hello World of the Bash shell 2 Accessing
More informationCSC209H Lecture 1. Dan Zingaro. January 7, 2015
CSC209H Lecture 1 Dan Zingaro January 7, 2015 Welcome! Welcome to CSC209 Comments or questions during class? Let me know! Topics: shell and Unix, pipes and filters, C programming, processes, system calls,
More informationBatch Systems. Running calculations on HPC resources
Batch Systems Running calculations on HPC resources Outline What is a batch system? How do I interact with the batch system Job submission scripts Interactive jobs Common batch systems Converting between
More informationMartinos Center Compute Cluster
Why-N-How: Intro to Launchpad 8 September 2016 Lee Tirrell Laboratory for Computational Neuroimaging Adapted from slides by Jon Kaiser 1. Intro 2. Using launchpad 3. Summary 4. Appendix: Miscellaneous
More informationUoW HPC Quick Start. Information Technology Services University of Wollongong. ( Last updated on October 10, 2011)
UoW HPC Quick Start Information Technology Services University of Wollongong ( Last updated on October 10, 2011) 1 Contents 1 Logging into the HPC Cluster 3 1.1 From within the UoW campus.......................
More informationNBIC TechTrack PBS Tutorial. by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen
NBIC TechTrack PBS Tutorial by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen 1 NBIC PBS Tutorial This part is an introduction to clusters and the PBS
More informationUsing the Yale HPC Clusters
Using the Yale HPC Clusters Robert Bjornson Yale Center for Research Computing Yale University Feb 2017 What is the Yale Center for Research Computing? Independent center under the Provost s office Created
More informationSGE Roll: Users Guide. Version Edition
SGE Roll: Users Guide Version 4.2.1 Edition SGE Roll: Users Guide : Version 4.2.1 Edition Published Sep 2006 Copyright 2006 University of California and Scalable Systems This document is subject to the
More informationSlurm Overview. Brian Christiansen, Marshall Garey, Isaac Hartung SchedMD SC17. Copyright 2017 SchedMD LLC
Slurm Overview Brian Christiansen, Marshall Garey, Isaac Hartung SchedMD SC17 Outline Roles of a resource manager and job scheduler Slurm description and design goals Slurm architecture and plugins Slurm
More informationBefore We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop
Before We Start Sign in hpcxx account slips Windows Users: Download PuTTY Google PuTTY First result Save putty.exe to Desktop Research Computing at Virginia Tech Advanced Research Computing Compute Resources
More informationIntroduction to HPC Using zcluster at GACRC
Introduction to HPC Using zcluster at GACRC On-class STAT8330 Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala pakala@uga.edu Slides courtesy: Zhoufei Hou 1 Outline What
More informationThe cluster system. Introduction 22th February Jan Saalbach Scientific Computing Group
The cluster system Introduction 22th February 2018 Jan Saalbach Scientific Computing Group cluster-help@luis.uni-hannover.de Contents 1 General information about the compute cluster 2 Available computing
More informationIntroduction to Slurm
Introduction to Slurm Tim Wickberg SchedMD Slurm User Group Meeting 2017 Outline Roles of resource manager and job scheduler Slurm description and design goals Slurm architecture and plugins Slurm configuration
More informationBrigham Young University
Brigham Young University Fulton Supercomputing Lab Ryan Cox Slurm User Group September 16, 2015 Washington, D.C. Open Source Code I'll reference several codes we have open sourced http://github.com/byuhpc
More informationIntroduction to Linux Part 2b: basic scripting. Brett Milash and Wim Cardoen CHPC User Services 18 January, 2018
Introduction to Linux Part 2b: basic scripting Brett Milash and Wim Cardoen CHPC User Services 18 January, 2018 Overview Scripting in Linux What is a script? Why scripting? Scripting languages + syntax
More informationSCALABLE HYBRID PROTOTYPE
SCALABLE HYBRID PROTOTYPE Scalable Hybrid Prototype Part of the PRACE Technology Evaluation Objectives Enabling key applications on new architectures Familiarizing users and providing a research platform
More informationOBTAINING AN ACCOUNT:
HPC Usage Policies The IIA High Performance Computing (HPC) System is managed by the Computer Management Committee. The User Policies here were developed by the Committee. The user policies below aim to
More informationHigh Performance Computing (HPC) Using zcluster at GACRC
High Performance Computing (HPC) Using zcluster at GACRC On-class STAT8060 Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu Outline What is GACRC?
More informationMinnesota Supercomputing Institute Regents of the University of Minnesota. All rights reserved.
Minnesota Supercomputing Institute Introduction to Job Submission and Scheduling Andrew Gustafson Interacting with MSI Systems Connecting to MSI SSH is the most reliable connection method Linux and Mac
More informationIntel Manycore Testing Lab (MTL) - Linux Getting Started Guide
Intel Manycore Testing Lab (MTL) - Linux Getting Started Guide Introduction What are the intended uses of the MTL? The MTL is prioritized for supporting the Intel Academic Community for the testing, validation
More informationTITANI CLUSTER USER MANUAL V.1.3
2016 TITANI CLUSTER USER MANUAL V.1.3 This document is intended to give some basic notes in order to work with the TITANI High Performance Green Computing Cluster of the Civil Engineering School (ETSECCPB)
More informationKamiak Cheat Sheet. Display text file, one page at a time. Matches all files beginning with myfile See disk space on volume
Kamiak Cheat Sheet Logging in to Kamiak ssh your.name@kamiak.wsu.edu ssh -X your.name@kamiak.wsu.edu X11 forwarding Transferring Files to and from Kamiak scp -r myfile your.name@kamiak.wsu.edu:~ Copy to
More information