Boost your efficiency when dealing with multiple jobs on the Cray XC40 supercomputer Shaheen II. KAUST Supercomputing Laboratory KSL Workshop Series

Size: px
Start display at page:

Download "Boost your efficiency when dealing with multiple jobs on the Cray XC40 supercomputer Shaheen II. KAUST Supercomputing Laboratory KSL Workshop Series"

Transcription

1 Boost your efficiency when dealing with multiple jobs on the Cray XC40 supercomputer Shaheen II Samuel KORTAS KAUST Supercomputing Laboratory KSL Workshop Series June 5th t 2016

2 Agenda A few tips when dealing with numerous jobs Slurm way (up to a limit) Four KSL tools to move you further Breakit (1 to 10000s, all same) KTF (1 to 100, tuned) Avati (1 to 1000s, programmed) Decimate (dependent jobs) Hands-out session: /scratch/tmp/ksl_workshop Documentation on hpc.kaust.edu.sa/1001_jobs (to be completed today) Conclusion

3 Launching thousands of jobs Some of our users use shaheen to explore parameters sweeping involving thousands of jobs saving thousands of temporary files Need a result in a guaranteed time Are not hpc experts, but are challenging problem in terms of scheduling and file system stress. Implement complex workflows sending the output of one code into the input of others and producing a lot of small files

4 Scheduling thousands of jobs KSL does its best but it's not that easy folks! The tetris Game gets rough with long rectangles ;-( Time X 1000s!!!! 6144 Nodes availables `

5 Let's help the scheduler! (1/5) Putting the right elapsed time

6 Let's help the scheduler! (2/5) Let's share resources better among us Current policy of scheduler is first in first served Your priority increases as long as you are waiting 'actively' in the queue, hold or dependent jobs are not counted Slurm takes into account your backfilling potential But we have to share guys number of jobs in the queue is limited Fair share slurm implementation is reported to work well with only a small number of projects

7 Let's help the scheduler! (3/5) Let's lower the stress on the filesystem Each one of the 1000s jobs may need to read, probe or write a file. We got a unique filesystem shared by all the jobs, let's save it Lustre is not tuned for little files Let's use ramdisk when it's possible and save data that matters to Lustre (see next slide) Let's communicate in memory instead of via files Let's choose the right stripe count

8 Let's help the scheduler! (4/5) How to use ramdisk? On each shaheen II computing, /tmp is a ramdisk, a POSIX filesystem hosted directly in memory starting at 64 GB, it shrinks as your program uses more and more memory an additional memory requests or a write in /tmp fails when : size(os) + size(program instructions) + size(program variable) + size(/tmp) > 128 GB Still /tmp is the fastest filesystem of all (compared to lustre and datawarp) But it's distibuted and lost at the end of the job. think of storing temporary files in /tmp and save them at the end of the job think of storing frequently accessed files in /tmp

9 Let's help the scheduler! (5/5) Off-loading the cdls to compute nodes You may need to Pre/postprocess Monitor a job Relaunch it Get notified when it's starting or ending... Automate all this and move the load from the cdl to the compute nodes Use #SBATCH mail-user Use breakit, ktf, maestro, decimate Ask KSL team for help: it's only a script away

10 Managing 1001 jobs 1 - the SLURM way submitting Arrays...

11 Slurm Way (1/3) Slurm can Submit and manage collection of similar jobs easily job_array To submit 500 element job array: sbatch --array= N1 -i my_in_%a -o my_out_%a job.sh where %a in file name mapped to array task ID (1 500) squeue -r -user <my_user_name> 'unfolds' job queued as job array More info at

12 Slurm Way (2/3) Job environment variables squeue and scancel commands plus some scontrol options can operate on entire job array or select task IDs squeue -r option prints each task ID separately

13 Slurm Way (3/3) Job example Possible commands: sbatch --array=1-16 my_job sbatch --array=1-500%20 my_job only allow 20 active running jobs at a given time Taken from

14 Slurm Way But Slurm count each job of the array as a job per se: as for now the total number of jobs in the queue is limited to 800 jobs per user Pending job are not gaining priority Only one parameter can vary if need to work on several parameter, the script himself has to deduce them from the number in the array...

15 Slurm Way hands-on Submit the job /scratch/tmp/ksl_workshop/slurm/demo.job As an array of 20 occurrences, check the script, its output The queue Cancel it

16 Slurm Way hands-on solution Submit the job /scratch/tmp/ksl_workshop/slurm/demo.job sbatch array=1-20 /scratch/tmp/ksl_workshop/slurm/demo.job As an array of 20 occurrences, check the script, its output, The queue, squeue -r --user=<my_user> Cancel it scancel -n <my_job_name>

17 Managing 1001 jobs? 4 KSL open source Tools

18 Why? Ease your life and centralize some common developments breakit ktf maestro Availaible on shaheen as modules decimate Under development for 2 PIs released soon on bitbucket.org Soon Available at (GNU GPL License) Written in python 2.7 Installed on Shaheen II, Portable on workstation, Noor Our Goal: Hiding Complexity All share common api and internal library engine also available on bitbucket.org/kaust_ksl Maintained by KSL (samuel.kortas (at) kaust.edu.sa)

19 Managing 1001 jobs Using the breakit wrapper

20 Breakit (1/3) Idea and status To allow you to cope seamlessly with the limit of 800 jobs No need to change your job array Breakit automatically monitors the process for you version 0.1 I need your feedback!

21 Slurm way (1/2) How to handle it with slurm? You Or prog on cdl Max number of jobs in queue

22 Slurm way (2/2) How to handle it with slurm You Or prog on cdl Max number of jobs in queue

23 Breakit (2/3) How does it work? breakit Max number of jobs

24 Breakit (2/3) How does it work? breakit Max number of jobs

25 Breakit (2/3) How does it work? Gone! Max number of jobs Breakit is not active anymore!

26 Breakit (2/3) How does it work? Gone!t Max number of jobs The jobs are starting

27 Breakit (2/3) How does it work? Max number of jobs They submit the next jobs with a dependency

28 Breakit (2/3) How does it work? Max number of jobs First stop are done dependency is solved Next ones are prending

29 Breakit (2/3) How does it work? Max number of jobs They submit the next jobs with a dependency

30 Breakit (2/3) How does it work? Instead of submitting all the jobs, they are submitted by chunks Chunk #n is running or pending Chunk #n+1 is depending on Chunk #n, Starts only when every jobs of chunk #n have completed Submit Chunk #n+2 setting a dependency on Chunk # n+1.we did offload some task from the cdl on compute nodes ;-)

31 Breakit (3/3) How to use it? 1) Load the breakit module module load breakit man breakit (to be completed) breakit -h 2) Launch your job: breakit --job=your.job array=<nb of jobs> --chunk=<max_nb_of_jobs_in_queue> 3) Manage it: squeue -r -u <user> -n <job_name> scancel -n <job_name>

32 Breakit Hands on Via breakit submit an array of 100 occurrences of job /scratch/tmp/breakit/demo.job only having 16 jobs simultaneously in the queue

33 Breakit Hands on (solution) Via breakit submit an array of 100 occurrences of job /scratch/tmp/breakit/demo.job only having 16 jobs simultaneously in the queue module load breakit breakit --job=/scratch/tmp/breakit/demo.job --range=100 --chunk=16

34 Breakit Next steps Find a better name! Support all array range (not only 1-n) Provide an easy restart Provide an easier way to kill jobs

35 Managing 101 jobs Using KTF

36 KTF Idea At a certain point, you may need: to evaluate the performance of a code under different conditions, to run a parametric study. the same executable is run several times with a different set of parameters Physical values characterizing the problem, number of processors, threads and/or nodes compiler used compiling option parameters passed on the srun command line to experiment different placement strategies KTF (Kaust Test Framework) can help you on this!

37 What is KTF? KTF (Kaust Test Framework) has been designed and used during Shaheen II procurement in order to ease Generation Submission Monitoring Result collecting Of a set of jobs depending on a set of parameters to explore. Written in python 2.7 Self-contained and portable Available on bitbucket.org/kaust_ksl/ktf

38 How does KTF works? A few definitions An 'experiment' A case is one single run of this experiment with a given set of parameters A test gathers a number of cases

39 How does KTF works? KTF relies on A centralized file listing all combinations of parameters to address : ie shaheen_cases.kt A set of template files where the parameters needs to be replaced before the submission in all files ending by.template

40 KTF hands-on! (1/) Initialize environment 1) Load the environment, and check that ktf is available module load ktf man ktf ktf -h 2) Create and initialize your working directory mkdir <my_test_dir> cd <my_test_dir> ktf --init you should get a ktf-like tree structure with some example of centralize case files and associated templates 3) Examine the case file shaheen_cases.ktf, understands the ktf syntax, modify parameters and check your change by listing all the combinations ktf --exp

41 KTF Centralized case file (see file shaheen_zephyr0.ktf) KTF comment list of parameters third test case # is a comment not parsed by KTF First line gives the name of the parameters Case and Experiment are absolutely mandatory Each line following is a test case, setting value for EACH of parameter According to this case file, for the third test case, in each file ending by.template: Case will be replaced by 128 Experiment will be replaced by zephyr/stong NX will be replaced by 255 NY will be replaced by 255 NB_CORES will be replaced by 128 ELLAPSED_TIME will be replaced by 0:05:00

42 KTF Directory initial structure subdirectory containing files common to all the experiments one experiment directory one experiment directory default case file ktf

43 KTF job.shaheen.template (see files in tests/zephyr/strong/) KTF comment list of parameters third test case file job.shaheen.template./zephyr input

44 KTF job.shaheen.template (see files in tests/zephyr/strong/) KTF comment list of parameters third test case file input.template

45 KTF commands ktf... --help : get help on command line --init : initialize the environment copying example.template and.kt files --build : generate all combination listed in the case file --launch: generate all combination listed in the case file and submit them --exp : list all combination present in the case.ktf file --monitor: monitor all the experiments and displays all results in a dashboard --kill : kill all jobs related to this ktf session --status : list all stamp dates and cases of the experiments made or currently occuring

46 KTF hands-on! (2/) Prepare a first experiment 4) Examine the case file shaheen_cases.ktf, understands the ktf syntax, modify parameters and check your change by listing all the combinations ktf --exp 5) Build an experiment and check that the templated files have been well processed ktf --build should create one tests_ directories : tests_shaheen_<date>_<time>

47 KTF Directory KTF Directory after --build Initial template Third case

48 KTF Directory KTF Directory after --launch Zephyr is copied from the common directory job.shaheen processed from job.shaheen.template input processed from input.template

49 KTF Centralized case file Handling constant parameters File shaheen_zephyr0.ktf. KTF comment list of parameters third test case. strictly identical to File shaheen_zephyr1.ktf list of parameters #KTF pragma declaring new parameters that will keep same value ever after

50 Another example KTF case file Case Experiment Experiment

51 KTF filters and flags ktf --xxx... --case-file=<case file> : use another case files than shahen_cases.kt --what=zzzz : filters on some cases --reservation=<reservation name> : submit within a reservation ktf --exp --what=128 ktf --launch what=64 --reservation=workshop ktf --exp case-file=shaheen_zephyr1.ktf

52 KTF filters and flags ktf --xxx... --ktf-file=<case file> : use another case files than shahen_cases.ktf --what=zzzz : filters on some cases --when=yyyy --today --now : filters on some date stamps --times=<nb>: repeat submission <nb> times --info : switch on informative traces --info-level=[ ] : change informative trace level --debug : switch on debugging traces --debug-level=[ ] : change debugging trace level

53 KTF hands-on! (3/) Playing with what filter 4) Examine the case file shaheen_cases.ktf, understands the ktf syntax, modify parameters and check your change by listing all the combinations with or without filtering and using other cases files ktf --exp ktf --exp --what=<your filter> ktf --exp case-file=shaheen_zephyr1.ktf 5) Build an experiment and check that the templated files have been well processed ktf --build ktf --build --what=<your filter> should create two tests directories from where you call ktf tests_shaheen_<date>_<time>

54 KTF hands-on! (3/) launch and monitor our first experiment 6) Build an experiment and submit it ktf launch [ --reservation=workshop ] should create a new tests directory and spawn the jobs./tests_shaheen_<date>_<time> ktf --monitor will monitor your current ktf session check what shows in the R/ directory 7) Play with repeating experiments and filtering results ktf --launch --what=<your filter> [ --reservation=workshop ] ktf --launch --times=5 [ --reservation=workshop ] ktf --monitor ktf --monitor --what=<your case filter> --when=<your date filter> check what shows in the R/ directory

55 KTF results dashboard reading the result dashboard % ktf --monitor

56 KTF results dashboard reading the result dashboard % ktf --monitor When What! r.er Job mpty e not ir bd Su / in R us t Sta e Timt No yet hed s fini

57 KTF R/ directory quick access to results This R/ directory is updated each time you call kt --monitor It builds symbolic links to the results directory in order to provide you quick access to the results you want to check.

58 KTF R/ directory quick access to results directory ^

59 KTF results configuration implementation and default printing In fact alias ktf = python run_test.py alias ki = python run_test.py --init alias km = python run_test.py --monitor In run_test.py, is encoded the value to be displayed in the dashboard (printed when calling monitor) By default, it is <ellapsed time taken by the whole test>/<status of the test> with a '!' after the status if ever job.err is not empty with a '!' before the status if ever the job is not terminated properly remember you can use cat or more or tail R/*/job.err to scan all these files!

60 KTF results configuration changing default printing But you can change the displayed values at will! And adapt it to your own needs: Other values: Flops, intermediate results, total number of iterations, convergence rate, Several values : <flops>/<time>/<status> Other event to trigger the '!' sign Other typographic signs how to do it

61 KTF run_test.py file

62 KTF hands-on! (5/) modifying the result printed 8) Check what ktf prints of it: ktf --monitor and understand how run_test.py is working 9) Modify run_test.py in order to print the time per iteration

63 KTF Next steps Gather tests into campaign Have a better display --monitor option, Web interface, Automated generation of plots Enrich the filtering feature : regular expression, several filters possible Enable coding capability inside the case file Complete the documentation Save results into database and be able to compute statistics Cover the compiling step

64 KTF Next steps Support clean and campaigns Chains several jobs into one Support job arrays, dependencies, mail to user Port on Noor and workstation Offload from workstation to shaheen Better versioning of the template file Decline one ktf initial environment per science fields

65 Managing 1001 jobs using Maestro

66 Maestro principles (1/2) Handling these studies should be same on: A linux box Shaheen, Noor, Stampede A laptop under windows or mac OS A given set of linux boxes The only prerequisite: Python > 2.4 and MPI on a supercomputer Python > 2.4 on a workstation

67 Maestro principles (2/2) Minimal or no knowledge of HPC environment required Easy management of the jobs handled as a whole.

68 A set of tools adapted to a distributed execution (1/3) No pre-installation needed on the machines: maestro is self contained Easy and quick prototyping on workstation with immediate porting on supercomputer Global Error signals easy to throw and trace Global handling of the jobs as a whole study (launching, monitoring, killing and restarting through one command)

69 A set of tools adapted to a distributed execution (2/3) All the flexibility of python available to the user in a distributed environment (class inheritance, modules ) production of code robust, easy to read with an explicit error stack in case of problem to debug Transparent replication of the environment on each of the compute nodes Work in /tmp of each compute node to minimize the stress of the filesystem

70 A set of tools adapted to a distributed execution (3/3) Extended Grep (multi-line, multi-column, regular expressions) to postprocess the output files Centralized management of the template to replace Global selection of files to be kept and parametrization of the receiving directory A console to explore easily subdirectories where results are saved Each running process can write in a same global file

71 Maestro Principles maestro

72 Maestro Principles maestro Maestro Allocate A pool of Nodes and runs elementary job in it

73 Maestro Principles maestro Maestro Allocate A pool of Nodes and runs elementary job in it

74 Maestro Principles maestro Maestro Allocate A pool of Nodes and runs elementary job in it

75 An example File to save Directory name where Results are saved Elementary computation Sending local and Global messages Parametrized Z range Definition of the domain to sweep

76 Command line options <no option> : classical sequential run on 1 core stopping at the first error encountered --cores=<n> : parallel run on n cores --depth=<p> : partial parallelisation up to level p --stat : live status of ongoing computation --reservation=<id> : run inside a reservation --time=hh:mm:ss : set the elapsed duration of the overall job --kill : kills ongoing computation and clean environment --resume : resume a computation --restart : restart from scratch a computation --help : help screen

77 Demo!

78 Next Steps Allowing maestro to launch multicore jobs More clever sweeping algorithms decime project Support of a given set of workstation Coupling maestro with website Remote launching and dynamic off-loading from workstation to supercomputer

79 Managing depedent jobs in complex workflow Using Decimate

80 Idea Some workflow involve several steps depending of one another several jobs with a dependency between them Some intermediate steps may break dependency will break the workflow will remain idle, requesting an action We want to automate it

81 What is decimate? Add-ons and goodies Tool in python written for two different PIs with the same need Launch, monitor, heal dependent jobs Make things automated and smooth

82 What is decimate? Add-ons Centralized log files, Global resume, --status and kill command Sends a mail at any time to the user to keep him updated Can make decision when dependency is broken Relaunch same job again and fix dependency Change input data, relaunch and fix dependency cancel only this job and move on. Cancel the whole workflow.

83 Some example of workflow

84 Conclusion We have presented some useful tools to handle many jobs at a time slurm breakit ktf maestro decimate Typical # job < 800 > ? Job are same same different different different parameter 1 1 several many any #nodes/job same same any same Any dependent One at a time One at a time no no yes Your feedback is needed! help@hpc.kaust.edu.sa

How to run a job on a Cluster?

How to run a job on a Cluster? How to run a job on a Cluster? Cluster Training Workshop Dr Samuel Kortas Computational Scientist KAUST Supercomputing Laboratory Samuel.kortas@kaust.edu.sa 17 October 2017 Outline 1. Resources available

More information

Slurm basics. Summer Kickstart June slide 1 of 49

Slurm basics. Summer Kickstart June slide 1 of 49 Slurm basics Summer Kickstart 2017 June 2017 slide 1 of 49 Triton layers Triton is a powerful but complex machine. You have to consider: Connecting (ssh) Data storage (filesystems and Lustre) Resource

More information

An introduction to checkpointing. for scientifc applications

An introduction to checkpointing. for scientifc applications damien.francois@uclouvain.be UCL/CISM An introduction to checkpointing for scientifc applications November 2016 CISM/CÉCI training session What is checkpointing? Without checkpointing: $./count 1 2 3^C

More information

An introduction to checkpointing. for scientific applications

An introduction to checkpointing. for scientific applications damien.francois@uclouvain.be UCL/CISM - FNRS/CÉCI An introduction to checkpointing for scientific applications November 2013 CISM/CÉCI training session What is checkpointing? Without checkpointing: $./count

More information

Introduction to RCC. September 14, 2016 Research Computing Center

Introduction to RCC. September 14, 2016 Research Computing Center Introduction to HPC @ RCC September 14, 2016 Research Computing Center What is HPC High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers

More information

Introduction to High-Performance Computing (HPC)

Introduction to High-Performance Computing (HPC) Introduction to High-Performance Computing (HPC) Computer components CPU : Central Processing Unit cores : individual processing units within a CPU Storage : Disk drives HDD : Hard Disk Drive SSD : Solid

More information

Introduction to RCC. January 18, 2017 Research Computing Center

Introduction to RCC. January 18, 2017 Research Computing Center Introduction to HPC @ RCC January 18, 2017 Research Computing Center What is HPC High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers much

More information

Compiling applications for the Cray XC

Compiling applications for the Cray XC Compiling applications for the Cray XC Compiler Driver Wrappers (1) All applications that will run in parallel on the Cray XC should be compiled with the standard language wrappers. The compiler drivers

More information

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike

More information

High Performance Computing Cluster Advanced course

High Performance Computing Cluster Advanced course High Performance Computing Cluster Advanced course Jeremie Vandenplas, Gwen Dawes 9 November 2017 Outline Introduction to the Agrogenomics HPC Submitting and monitoring jobs on the HPC Parallel jobs on

More information

High Performance Computing Cluster Basic course

High Performance Computing Cluster Basic course High Performance Computing Cluster Basic course Jeremie Vandenplas, Gwen Dawes 30 October 2017 Outline Introduction to the Agrogenomics HPC Connecting with Secure Shell to the HPC Introduction to the Unix/Linux

More information

Slurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT October 23, 2012

Slurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT October 23, 2012 Slurm and Abel job scripts Katerina Michalickova The Research Computing Services Group SUF/USIT October 23, 2012 Abel in numbers Nodes - 600+ Cores - 10000+ (1 node->2 processors->16 cores) Total memory

More information

Batch Systems. Running your jobs on an HPC machine

Batch Systems. Running your jobs on an HPC machine Batch Systems Running your jobs on an HPC machine Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

New User Seminar: Part 2 (best practices)

New User Seminar: Part 2 (best practices) New User Seminar: Part 2 (best practices) General Interest Seminar January 2015 Hugh Merz merz@sharcnet.ca Session Outline Submitting Jobs Minimizing queue waits Investigating jobs Checkpointing Efficiency

More information

Introduction to GALILEO

Introduction to GALILEO Introduction to GALILEO Parallel & production environment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Domenico Guida d.guida@cineca.it Maurizio Cremonesi m.cremonesi@cineca.it

More information

Introduction to the Cluster

Introduction to the Cluster Follow us on Twitter for important news and updates: @ACCREVandy Introduction to the Cluster Advanced Computing Center for Research and Education http://www.accre.vanderbilt.edu The Cluster We will be

More information

Slurm Overview. Brian Christiansen, Marshall Garey, Isaac Hartung SchedMD SC17. Copyright 2017 SchedMD LLC

Slurm Overview. Brian Christiansen, Marshall Garey, Isaac Hartung SchedMD SC17. Copyright 2017 SchedMD LLC Slurm Overview Brian Christiansen, Marshall Garey, Isaac Hartung SchedMD SC17 Outline Roles of a resource manager and job scheduler Slurm description and design goals Slurm architecture and plugins Slurm

More information

STARTING THE DDT DEBUGGER ON MIO, AUN, & MC2. (Mouse over to the left to see thumbnails of all of the slides)

STARTING THE DDT DEBUGGER ON MIO, AUN, & MC2. (Mouse over to the left to see thumbnails of all of the slides) STARTING THE DDT DEBUGGER ON MIO, AUN, & MC2 (Mouse over to the left to see thumbnails of all of the slides) ALLINEA DDT Allinea DDT is a powerful, easy-to-use graphical debugger capable of debugging a

More information

Duke Compute Cluster Workshop. 3/28/2018 Tom Milledge rc.duke.edu

Duke Compute Cluster Workshop. 3/28/2018 Tom Milledge rc.duke.edu Duke Compute Cluster Workshop 3/28/2018 Tom Milledge rc.duke.edu rescomputing@duke.edu Outline of talk Overview of Research Computing resources Duke Compute Cluster overview Running interactive and batch

More information

Introduction to SLURM on the High Performance Cluster at the Center for Computational Research

Introduction to SLURM on the High Performance Cluster at the Center for Computational Research Introduction to SLURM on the High Performance Cluster at the Center for Computational Research Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St Buffalo, NY

More information

Sherlock for IBIIS. William Law Stanford Research Computing

Sherlock for IBIIS. William Law Stanford Research Computing Sherlock for IBIIS William Law Stanford Research Computing Overview How we can help System overview Tech specs Signing on Batch submission Software environment Interactive jobs Next steps We are here to

More information

Slurm at UPPMAX. How to submit jobs with our queueing system. Jessica Nettelblad sysadmin at UPPMAX

Slurm at UPPMAX. How to submit jobs with our queueing system. Jessica Nettelblad sysadmin at UPPMAX Slurm at UPPMAX How to submit jobs with our queueing system Jessica Nettelblad sysadmin at UPPMAX Slurm at UPPMAX Intro Queueing with Slurm How to submit jobs Testing How to test your scripts before submission

More information

Introduction to SLURM & SLURM batch scripts

Introduction to SLURM & SLURM batch scripts Introduction to SLURM & SLURM batch scripts Anita Orendt Assistant Director Research Consulting & Faculty Engagement anita.orendt@utah.edu 6 February 2018 Overview of Talk Basic SLURM commands SLURM batch

More information

Submitting batch jobs

Submitting batch jobs Submitting batch jobs SLURM on ECGATE Xavi Abellan Xavier.Abellan@ecmwf.int ECMWF February 20, 2017 Outline Interactive mode versus Batch mode Overview of the Slurm batch system on ecgate Batch basic concepts

More information

Introduction to High-Performance Computing (HPC)

Introduction to High-Performance Computing (HPC) Introduction to High-Performance Computing (HPC) Computer components CPU : Central Processing Unit cores : individual processing units within a CPU Storage : Disk drives HDD : Hard Disk Drive SSD : Solid

More information

Working with Shell Scripting. Daniel Balagué

Working with Shell Scripting. Daniel Balagué Working with Shell Scripting Daniel Balagué Editing Text Files We offer many text editors in the HPC cluster. Command-Line Interface (CLI) editors: vi / vim nano (very intuitive and easy to use if you

More information

OBTAINING AN ACCOUNT:

OBTAINING AN ACCOUNT: HPC Usage Policies The IIA High Performance Computing (HPC) System is managed by the Computer Management Committee. The User Policies here were developed by the Committee. The user policies below aim to

More information

CNAG Advanced User Training

CNAG Advanced User Training www.bsc.es CNAG Advanced User Training Aníbal Moreno, CNAG System Administrator Pablo Ródenas, BSC HPC Support Rubén Ramos Horta, CNAG HPC Support Barcelona,May the 5th Aim Understand CNAG s cluster design

More information

Applications Software Example

Applications Software Example Applications Software Example How to run an application on Cluster? Rooh Khurram Supercomputing Laboratory King Abdullah University of Science and Technology (KAUST), Saudi Arabia Cluster Training: Applications

More information

Lustre Parallel Filesystem Best Practices

Lustre Parallel Filesystem Best Practices Lustre Parallel Filesystem Best Practices George Markomanolis Computational Scientist KAUST Supercomputing Laboratory georgios.markomanolis@kaust.edu.sa 7 November 2017 Outline Introduction to Parallel

More information

Introduction to the Cluster

Introduction to the Cluster Introduction to the Cluster Advanced Computing Center for Research and Education http://www.accre.vanderbilt.edu Follow us on Twitter for important news and updates: @ACCREVandy The Cluster We will be

More information

Beginner's Guide for UK IBM systems

Beginner's Guide for UK IBM systems Beginner's Guide for UK IBM systems This document is intended to provide some basic guidelines for those who already had certain programming knowledge with high level computer languages (e.g. Fortran,

More information

Duke Compute Cluster Workshop. 11/10/2016 Tom Milledge h:ps://rc.duke.edu/

Duke Compute Cluster Workshop. 11/10/2016 Tom Milledge h:ps://rc.duke.edu/ Duke Compute Cluster Workshop 11/10/2016 Tom Milledge h:ps://rc.duke.edu/ rescompu>ng@duke.edu Outline of talk Overview of Research Compu>ng resources Duke Compute Cluster overview Running interac>ve and

More information

Introduction to PICO Parallel & Production Enviroment

Introduction to PICO Parallel & Production Enviroment Introduction to PICO Parallel & Production Enviroment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Domenico Guida d.guida@cineca.it Nicola Spallanzani n.spallanzani@cineca.it

More information

RHRK-Seminar. High Performance Computing with the Cluster Elwetritsch - II. Course instructor : Dr. Josef Schüle, RHRK

RHRK-Seminar. High Performance Computing with the Cluster Elwetritsch - II. Course instructor : Dr. Josef Schüle, RHRK RHRK-Seminar High Performance Computing with the Cluster Elwetritsch - II Course instructor : Dr. Josef Schüle, RHRK Overview Course I Login to cluster SSH RDP / NX Desktop Environments GNOME (default)

More information

Introduction to SLURM & SLURM batch scripts

Introduction to SLURM & SLURM batch scripts Introduction to SLURM & SLURM batch scripts Anita Orendt Assistant Director Research Consulting & Faculty Engagement anita.orendt@utah.edu 16 Feb 2017 Overview of Talk Basic SLURM commands SLURM batch

More information

Slurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT November 13, 2013

Slurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT November 13, 2013 Slurm and Abel job scripts Katerina Michalickova The Research Computing Services Group SUF/USIT November 13, 2013 Abel in numbers Nodes - 600+ Cores - 10000+ (1 node->2 processors->16 cores) Total memory

More information

HPC Introductory Course - Exercises

HPC Introductory Course - Exercises HPC Introductory Course - Exercises The exercises in the following sections will guide you understand and become more familiar with how to use the Balena HPC service. Lines which start with $ are commands

More information

Heterogeneous Job Support

Heterogeneous Job Support Heterogeneous Job Support Tim Wickberg SchedMD SC17 Submitting Jobs Multiple independent job specifications identified in command line using : separator The job specifications are sent to slurmctld daemon

More information

Duke Compute Cluster Workshop. 10/04/2018 Tom Milledge rc.duke.edu

Duke Compute Cluster Workshop. 10/04/2018 Tom Milledge rc.duke.edu Duke Compute Cluster Workshop 10/04/2018 Tom Milledge rc.duke.edu rescomputing@duke.edu Outline of talk Overview of Research Computing resources Duke Compute Cluster overview Running interactive and batch

More information

Introduction to SLURM & SLURM batch scripts

Introduction to SLURM & SLURM batch scripts Introduction to SLURM & SLURM batch scripts Anita Orendt Assistant Director Research Consulting & Faculty Engagement anita.orendt@utah.edu 23 June 2016 Overview of Talk Basic SLURM commands SLURM batch

More information

Using a Linux System 6

Using a Linux System 6 Canaan User Guide Connecting to the Cluster 1 SSH (Secure Shell) 1 Starting an ssh session from a Mac or Linux system 1 Starting an ssh session from a Windows PC 1 Once you're connected... 1 Ending an

More information

BRC HPC Services/Savio

BRC HPC Services/Savio BRC HPC Services/Savio Krishna Muriki and Gregory Kurtzer LBNL/BRC kmuriki@berkeley.edu, gmk@lbl.gov SAVIO - The Need Has Been Stated Inception and design was based on a specific need articulated by Eliot

More information

How to access Geyser and Caldera from Cheyenne. 19 December 2017 Consulting Services Group Brian Vanderwende

How to access Geyser and Caldera from Cheyenne. 19 December 2017 Consulting Services Group Brian Vanderwende How to access Geyser and Caldera from Cheyenne 19 December 2017 Consulting Services Group Brian Vanderwende Geyser nodes useful for large-scale data analysis and post-processing tasks 16 nodes with: 40

More information

Introduction to Slurm

Introduction to Slurm Introduction to Slurm Tim Wickberg SchedMD Slurm User Group Meeting 2017 Outline Roles of resource manager and job scheduler Slurm description and design goals Slurm architecture and plugins Slurm configuration

More information

HPC Input/Output. I/O and Darshan. Cristian Simarro User Support Section

HPC Input/Output. I/O and Darshan. Cristian Simarro User Support Section HPC Input/Output I/O and Darshan Cristian Simarro Cristian.Simarro@ecmwf.int User Support Section Index Lustre summary HPC I/O Different I/O methods Darshan Introduction Goals Considerations How to use

More information

CSC BioWeek 2018: Using Taito cluster for high throughput data analysis

CSC BioWeek 2018: Using Taito cluster for high throughput data analysis CSC BioWeek 2018: Using Taito cluster for high throughput data analysis 7. 2. 2018 Running Jobs in CSC Servers Exercise 1: Running a simple batch job in Taito We will run a small alignment using BWA: https://research.csc.fi/-/bwa

More information

Számítogépes modellezés labor (MSc)

Számítogépes modellezés labor (MSc) Számítogépes modellezés labor (MSc) Running Simulations on Supercomputers Gábor Rácz Physics of Complex Systems Department Eötvös Loránd University, Budapest September 19, 2018, Budapest, Hungary Outline

More information

Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU

Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU What is Joker? NMSU s supercomputer. 238 core computer cluster. Intel E-5 Xeon CPUs and Nvidia K-40 GPUs. InfiniBand innerconnect.

More information

SCALABLE HYBRID PROTOTYPE

SCALABLE HYBRID PROTOTYPE SCALABLE HYBRID PROTOTYPE Scalable Hybrid Prototype Part of the PRACE Technology Evaluation Objectives Enabling key applications on new architectures Familiarizing users and providing a research platform

More information

An Introduction to Gauss. Paul D. Baines University of California, Davis November 20 th 2012

An Introduction to Gauss. Paul D. Baines University of California, Davis November 20 th 2012 An Introduction to Gauss Paul D. Baines University of California, Davis November 20 th 2012 What is Gauss? * http://wiki.cse.ucdavis.edu/support:systems:gauss * 12 node compute cluster (2 x 16 cores per

More information

Using Cartesius and Lisa. Zheng Meyer-Zhao - Consultant Clustercomputing

Using Cartesius and Lisa. Zheng Meyer-Zhao - Consultant Clustercomputing Zheng Meyer-Zhao - zheng.meyer-zhao@surfsara.nl Consultant Clustercomputing Outline SURFsara About us What we do Cartesius and Lisa Architectures and Specifications File systems Funding Hands-on Logging

More information

Introduction to Visualization on Stampede

Introduction to Visualization on Stampede Introduction to Visualization on Stampede Aaron Birkland Cornell CAC With contributions from TACC visualization training materials Parallel Computing on Stampede June 11, 2013 From data to Insight Data

More information

CRUK cluster practical sessions (SLURM) Part I processes & scripts

CRUK cluster practical sessions (SLURM) Part I processes & scripts CRUK cluster practical sessions (SLURM) Part I processes & scripts login Log in to the head node, clust1-headnode, using ssh and your usual user name & password. SSH Secure Shell 3.2.9 (Build 283) Copyright

More information

Introduction to the NCAR HPC Systems. 25 May 2018 Consulting Services Group Brian Vanderwende

Introduction to the NCAR HPC Systems. 25 May 2018 Consulting Services Group Brian Vanderwende Introduction to the NCAR HPC Systems 25 May 2018 Consulting Services Group Brian Vanderwende Topics to cover Overview of the NCAR cluster resources Basic tasks in the HPC environment Accessing pre-built

More information

UL HPC Monitoring in practice: why, what, how, where to look

UL HPC Monitoring in practice: why, what, how, where to look C. Parisot UL HPC Monitoring in practice: why, what, how, where to look 1 / 22 What is HPC? Best Practices Getting Fast & Efficient UL HPC Monitoring in practice: why, what, how, where to look Clément

More information

Exercises: Abel/Colossus and SLURM

Exercises: Abel/Colossus and SLURM Exercises: Abel/Colossus and SLURM November 08, 2016 Sabry Razick The Research Computing Services Group, USIT Topics Get access Running a simple job Job script Running a simple job -- qlogin Customize

More information

Data storage on Triton: an introduction

Data storage on Triton: an introduction Motivation Data storage on Triton: an introduction How storage is organized in Triton How to optimize IO Do's and Don'ts Exercises slide 1 of 33 Data storage: Motivation Program speed isn t just about

More information

Using the Yale HPC Clusters

Using the Yale HPC Clusters Using the Yale HPC Clusters Robert Bjornson Yale Center for Research Computing Yale University Feb 2017 What is the Yale Center for Research Computing? Independent center under the Provost s office Created

More information

1 Bull, 2011 Bull Extreme Computing

1 Bull, 2011 Bull Extreme Computing 1 Bull, 2011 Bull Extreme Computing Table of Contents Overview. Principal concepts. Architecture. Scheduler Policies. 2 Bull, 2011 Bull Extreme Computing SLURM Overview Ares, Gerardo, HPC Team Introduction

More information

ECE 598 Advanced Operating Systems Lecture 22

ECE 598 Advanced Operating Systems Lecture 22 ECE 598 Advanced Operating Systems Lecture 22 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 19 April 2016 Announcements Project update HW#9 posted, a bit late Midterm next Thursday

More information

Introduction to Discovery.

Introduction to Discovery. Introduction to Discovery http://discovery.dartmouth.edu The Discovery Cluster 2 Agenda What is a cluster and why use it Overview of computer hardware in cluster Help Available to Discovery Users Logging

More information

Day 9: Introduction to CHTC

Day 9: Introduction to CHTC Day 9: Introduction to CHTC Suggested reading: Condor 7.7 Manual: http://www.cs.wisc.edu/condor/manual/v7.7/ Chapter 1: Overview Chapter 2: Users Manual (at most, 2.1 2.7) 1 Turn In Homework 2 Homework

More information

AASPI Software Structure

AASPI Software Structure AASPI Software Structure Introduction The AASPI software comprises a rich collection of seismic attribute generation, data conditioning, and multiattribute machine-learning analysis tools constructed by

More information

CSC BioWeek 2016: Using Taito cluster for high throughput data analysis

CSC BioWeek 2016: Using Taito cluster for high throughput data analysis CSC BioWeek 2016: Using Taito cluster for high throughput data analysis 4. 2. 2016 Running Jobs in CSC Servers A note on typography: Some command lines are too long to fit a line in printed form. These

More information

LAB. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers

LAB. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers LAB Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers Dan Stanzione, Lars Koesterke, Bill Barth, Kent Milfeld dan/lars/bbarth/milfeld@tacc.utexas.edu XSEDE 12 July 16, 2012 1 Discovery

More information

Execo tutorial Grid 5000 school, Grenoble, January 2016

Execo tutorial Grid 5000 school, Grenoble, January 2016 Execo tutorial Grid 5000 school, Grenoble, January 2016 Simon Delamare Matthieu Imbert Laurent Pouilloux INRIA/CNRS/LIP ENS-Lyon 03/02/2016 1/34 1 introduction 2 execo, core module 3 execo g5k, Grid 5000

More information

CSCS Proposal writing webinar Technical review. 12th April 2015 CSCS

CSCS Proposal writing webinar Technical review. 12th April 2015 CSCS CSCS Proposal writing webinar Technical review 12th April 2015 CSCS Agenda Tips for new applicants CSCS overview Allocation process Guidelines Basic concepts Performance tools Demo Q&A open discussion

More information

Using Compute Canada. Masao Fujinaga Information Services and Technology University of Alberta

Using Compute Canada. Masao Fujinaga Information Services and Technology University of Alberta Using Compute Canada Masao Fujinaga Information Services and Technology University of Alberta Introduction to cedar batch system jobs are queued priority depends on allocation and past usage Cedar Nodes

More information

ReFrame: A Regression Testing Framework Enabling Continuous Integration of Large HPC Systems

ReFrame: A Regression Testing Framework Enabling Continuous Integration of Large HPC Systems ReFrame: A Regression Testing Framework Enabling Continuous Integration of Large HPC Systems HPC Advisory Council 2018 Victor Holanda, Vasileios Karakasis, CSCS Apr. 11, 2018 ReFrame in a nutshell Regression

More information

Using and Modifying the BSC Slurm Workload Simulator. Slurm User Group Meeting 2015 Stephen Trofinoff and Massimo Benini, CSCS September 16, 2015

Using and Modifying the BSC Slurm Workload Simulator. Slurm User Group Meeting 2015 Stephen Trofinoff and Massimo Benini, CSCS September 16, 2015 Using and Modifying the BSC Slurm Workload Simulator Slurm User Group Meeting 2015 Stephen Trofinoff and Massimo Benini, CSCS September 16, 2015 Using and Modifying the BSC Slurm Workload Simulator The

More information

Introduction to High Performance Computing at Case Western Reserve University. KSL Data Center

Introduction to High Performance Computing at Case Western Reserve University. KSL Data Center Introduction to High Performance Computing at Case Western Reserve University Research Computing and CyberInfrastructure team KSL Data Center Presenters Emily Dragowsky Daniel Balagué Guardia Hadrian Djohari

More information

COSC 6374 Parallel Computation. Debugging MPI applications. Edgar Gabriel. Spring 2008

COSC 6374 Parallel Computation. Debugging MPI applications. Edgar Gabriel. Spring 2008 COSC 6374 Parallel Computation Debugging MPI applications Spring 2008 How to use a cluster A cluster usually consists of a front-end node and compute nodes Name of the front-end node: shark.cs.uh.edu You

More information

Submitting and running jobs on PlaFRIM2 Redouane Bouchouirbat

Submitting and running jobs on PlaFRIM2 Redouane Bouchouirbat Submitting and running jobs on PlaFRIM2 Redouane Bouchouirbat Summary 1. Submitting Jobs: Batch mode - Interactive mode 2. Partition 3. Jobs: Serial, Parallel 4. Using generic resources Gres : GPUs, MICs.

More information

XSEDE New User Training. Ritu Arora November 14, 2014

XSEDE New User Training. Ritu Arora   November 14, 2014 XSEDE New User Training Ritu Arora Email: rauta@tacc.utexas.edu November 14, 2014 1 Objectives Provide a brief overview of XSEDE Computational, Visualization and Storage Resources Extended Collaborative

More information

Slurm at UPPMAX. How to submit jobs with our queueing system. Jessica Nettelblad sysadmin at UPPMAX

Slurm at UPPMAX. How to submit jobs with our queueing system. Jessica Nettelblad sysadmin at UPPMAX Slurm at UPPMAX How to submit jobs with our queueing system Jessica Nettelblad sysadmin at UPPMAX Free! Watch! Futurama S2 Ep.4 Fry and the Slurm factory Simple Linux Utility for Resource Management Open

More information

Introduction to GACRC Teaching Cluster PHYS8602

Introduction to GACRC Teaching Cluster PHYS8602 Introduction to GACRC Teaching Cluster PHYS8602 Georgia Advanced Computing Resource Center (GACRC) EITS/University of Georgia Zhuofei Hou zhuofei@uga.edu 1 Outline GACRC Overview Computing Resources Three

More information

Bash for SLURM. Author: Wesley Schaal Pharmaceutical Bioinformatics, Uppsala University

Bash for SLURM. Author: Wesley Schaal Pharmaceutical Bioinformatics, Uppsala University Bash for SLURM Author: Wesley Schaal Pharmaceutical Bioinformatics, Uppsala University wesley.schaal@farmbio.uu.se Lab session: Pavlin Mitev (pavlin.mitev@kemi.uu.se) it i slides at http://uppmax.uu.se/support/courses

More information

Practical: a sample code

Practical: a sample code Practical: a sample code Alistair Hart Cray Exascale Research Initiative Europe 1 Aims The aim of this practical is to examine, compile and run a simple, pre-prepared OpenACC code The aims of this are:

More information

Linux Essentials. Smith, Roderick W. Table of Contents ISBN-13: Introduction xvii. Chapter 1 Selecting an Operating System 1

Linux Essentials. Smith, Roderick W. Table of Contents ISBN-13: Introduction xvii. Chapter 1 Selecting an Operating System 1 Linux Essentials Smith, Roderick W. ISBN-13: 9781118106792 Table of Contents Introduction xvii Chapter 1 Selecting an Operating System 1 What Is an OS? 1 What Is a Kernel? 1 What Else Identifies an OS?

More information

CSCI 447 Operating Systems Filip Jagodzinski

CSCI 447 Operating Systems Filip Jagodzinski Filip Jagodzinski Announcements Homework 1 An extension of Lab 1 Big picture : for Homework 1 and 2, we ll focus on the lowlevel mechanics of the OS. Per the instructions, create a new branch in your gitlab

More information

George Markomanolis IO500 Committee: John Bent, Julian M. Kunkel, Jay Lofstead 2017-11-12 http://www.io500.org IBM Spectrum Scale User Group, Denver, Colorado, USA Why? The increase of the studied domains,

More information

For Dr Landau s PHYS8602 course

For Dr Landau s PHYS8602 course For Dr Landau s PHYS8602 course Shan-Ho Tsai (shtsai@uga.edu) Georgia Advanced Computing Resource Center - GACRC January 7, 2019 You will be given a student account on the GACRC s Teaching cluster. Your

More information

Monitoring and Trouble Shooting on BioHPC

Monitoring and Trouble Shooting on BioHPC Monitoring and Trouble Shooting on BioHPC [web] [email] portal.biohpc.swmed.edu biohpc-help@utsouthwestern.edu 1 Updated for 2017-03-15 Why Monitoring & Troubleshooting data code Monitoring jobs running

More information

Operating Systems 2 nd semester 2016/2017. Chapter 4: Threads

Operating Systems 2 nd semester 2016/2017. Chapter 4: Threads Operating Systems 2 nd semester 2016/2017 Chapter 4: Threads Mohamed B. Abubaker Palestine Technical College Deir El-Balah Note: Adapted from the resources of textbox Operating System Concepts, 9 th edition

More information

Introduction to GACRC Teaching Cluster

Introduction to GACRC Teaching Cluster Introduction to GACRC Teaching Cluster Georgia Advanced Computing Resource Center (GACRC) EITS/University of Georgia Zhuofei Hou zhuofei@uga.edu 1 Outline GACRC Overview Computing Resources Three Folders

More information

Chapter 1: Distributed Information Systems

Chapter 1: Distributed Information Systems Chapter 1: Distributed Information Systems Contents - Chapter 1 Design of an information system Layers and tiers Bottom up design Top down design Architecture of an information system One tier Two tier

More information

XSEDE New User Tutorial

XSEDE New User Tutorial October 20, 2017 XSEDE New User Tutorial Jay Alameda National Center for Supercomputing Applications XSEDE Training Survey Please complete a short on line survey about this module at http://bit.ly/xsedesurvey.

More information

Operating Systems 2014 Assignment 2: Process Scheduling

Operating Systems 2014 Assignment 2: Process Scheduling Operating Systems 2014 Assignment 2: Process Scheduling Deadline: April 6, 2014, at 23:59. 1 Introduction Process scheduling is an important part of the operating system and has influence on the achieved

More information

Chapter 4: Multithreaded Programming

Chapter 4: Multithreaded Programming Chapter 4: Multithreaded Programming Silberschatz, Galvin and Gagne 2013! Chapter 4: Multithreaded Programming Overview Multicore Programming Multithreading Models Threading Issues Operating System Examples

More information

A declarative programming style job submission filter.

A declarative programming style job submission filter. A declarative programming style job submission filter. Douglas Jacobsen Computational Systems Group Lead NERSC -1- Slurm User Group 2018 NERSC Vital Statistics 860 projects 7750 users Edison NERSC-7 Cray

More information

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Lecture 3: Processes Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Process in General 3.3 Process Concept Process is an active program in execution; process

More information

How to Use a Supercomputer - A Boot Camp

How to Use a Supercomputer - A Boot Camp How to Use a Supercomputer - A Boot Camp Shelley Knuth Peter Ruprecht shelley.knuth@colorado.edu peter.ruprecht@colorado.edu www.rc.colorado.edu Outline Today we will discuss: Who Research Computing is

More information

David R. Mackay, Ph.D. Libraries play an important role in threading software to run faster on Intel multi-core platforms.

David R. Mackay, Ph.D. Libraries play an important role in threading software to run faster on Intel multi-core platforms. Whitepaper Introduction A Library Based Approach to Threading for Performance David R. Mackay, Ph.D. Libraries play an important role in threading software to run faster on Intel multi-core platforms.

More information

HTC Brief Instructions

HTC Brief Instructions HTC Brief Instructions Version 18.08.2018 University of Paderborn Paderborn Center for Parallel Computing Warburger Str. 100, D-33098 Paderborn http://pc2.uni-paderborn.de/ 2 HTC BRIEF INSTRUCTIONS Table

More information

HPC Workshop. Nov. 9, 2018 James Coyle, PhD Dir. Of High Perf. Computing

HPC Workshop. Nov. 9, 2018 James Coyle, PhD Dir. Of High Perf. Computing HPC Workshop Nov. 9, 2018 James Coyle, PhD Dir. Of High Perf. Computing NEEDED EQUIPMENT 1. Laptop with Secure Shell (ssh) for login A. Windows: download/install putty from https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html

More information

XSEDE New User Tutorial

XSEDE New User Tutorial May 13, 2016 XSEDE New User Tutorial Jay Alameda National Center for Supercomputing Applications XSEDE Training Survey Please complete a short on-line survey about this module at http://bit.ly/hamptonxsede.

More information

ECE 574 Cluster Computing Lecture 4

ECE 574 Cluster Computing Lecture 4 ECE 574 Cluster Computing Lecture 4 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 31 January 2017 Announcements Don t forget about homework #3 I ran HPCG benchmark on Haswell-EP

More information

Introduction to Linux and Supercomputers

Introduction to Linux and Supercomputers Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist Department of Statistics Purdue University dgc@purdue.edu What you will learn How to log into a Linux Supercomputer Basics

More information

Chapter 4: Threads. Chapter 4: Threads

Chapter 4: Threads. Chapter 4: Threads Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples

More information