Advanced cluster techniques with LoadLeveler
|
|
- Pauline Ward
- 5 years ago
- Views:
Transcription
1 Advanced cluster techniques with LoadLeveler How to get your jobs to the top of the queue Ciaron Linstead 10th May 2012
2 Outline 1 Introduction 2 LoadLeveler recap 3 CPUs 4 Memory 5 Factors affecting job priority 6 Multi-step jobs 7 Common errors 8 Useful bits and pieces 9 New tools on the cluster 10 Finally Ciaron Linstead IT Services 2
3 Introduction Resources on the cluster Job priority Using multiple jobsteps Useful techniques and new tools Ciaron Linstead IT Services 3
4 Outline 1 Introduction 2 LoadLeveler recap 3 CPUs 4 Memory 5 Factors affecting job priority 6 Multi-step jobs 7 Common errors 8 Useful bits and pieces 9 New tools on the cluster 10 Finally Ciaron Linstead IT Services 4
5 LoadLeveler Recap Schedules workload by matching jobs to available resources Typical workflow: Write a Job Command File (JCF) a shell script with LL-specific instructions ( ) use llsubmit to start a run llsubmit example.jcf check progress with llq [-l] check cluster load with llstatus [-l] check class load with llclass [-l] Ciaron Linstead IT Services 5
6 The Job Command File Simple serial (one-task) example 1 #!/bin/bash 2 job_name = hello_world 3 class = short 4 group = its 5 notify_user = linstead 6 output = /scratch/01/$(user)/$(job_name)_$(cluster)_$(stepid).out 7 error = /scratch/01/$(user)/$(job_name)_$(cluster)_$(stepid).err 8 queue 9 10 time /home/linstead/examples/c/hello Lines 6 and 7 use variables to send output/error from different runs to different files Ciaron Linstead IT Services 6
7 Outline 1 Introduction 2 LoadLeveler recap 3 CPUs 4 Memory 5 Factors affecting job priority 6 Multi-step jobs 7 Common errors 8 Useful bits and pieces 9 New tools on the cluster 10 Finally Ciaron Linstead IT Services 7
8 CPU layout on the idataplex cluster 320 machines (nodes) with 8 CPUs each 2x 4-core Intel Xeon processors 1 task gets 1 CPU Ciaron Linstead IT Services 8
9 Mapping tasks to nodes Parallel applications: layout can help performance Minimise network connections with dense packing of tasks on nodes Tasks on the same node use (faster) shared memory to communicate Maximise memory or disk IO bandwidth per task by sparse packing (and perhaps not sharing the node with other user s jobs) Ciaron Linstead IT Services 9
10 Method 1: total tasks and blocking total tasks = 24...I have this many tasks/processes blocking = unlimited...and I don t care where they re located or blocking = 4...put at most 4 of my tasks on a node Ciaron Linstead IT Services 10
11 Method 2: node and tasks per node node = 3...I want this many nodes tasks per node = 8...and this many tasks per node Same as total tasks = 24 && blocking = 8 Ciaron Linstead IT Services 11
12 Method 3: task geometry Useful if I want to take advantage of the communication pattern of my program Tasks on the same node use shared memory instead of the network to communicate e.g. six tasks on four different nodes: task_geometry={(0,1) (3) (5,4) (2)} Ciaron Linstead IT Services 12
13 shared vs. not shared nodes Nodes share memory, network and IO bandwidth I can specify that I need all the resources! node_usage = not_shared or using LoadLeveler resources keyword ConsumableCpus means the number of CPUs each task needs resources = ConsumableCpus(8) Ciaron Linstead IT Services 13
14 unshared nodes lead to: long queue times (LL needs to reserve entire nodes) low overall utilisation of the cluster (bad for our statistics!) Ciaron Linstead IT Services 14
15 Outline 1 Introduction 2 LoadLeveler recap 3 CPUs 4 Memory 5 Factors affecting job priority 6 Multi-step jobs 7 Common errors 8 Useful bits and pieces 9 New tools on the cluster 10 Finally Ciaron Linstead IT Services 15
16 Physical memory Total RAM per node: 32GB (minus operating system) = 28GB default: 3.5GB per core (28/8) largemem class: 14GB for each of 2 cores (6 cores idle) set in system configuration with ulimit Ciaron Linstead IT Services 16
17 Available memory Linux will kill processes if the node runs out of memory (OOM-killer) Sometimes includes the LoadLeveler Starter Daemon Very bad things happen on an interactive (login) node (filesystem daemons, LoadLeveler daemons disappearing) limited per-process memory with ulimit malloc-like functions will fail if ulimit bound is reached check return values R loads workspaces on startup from.rdata (use no-restore-data or no-restore) Ciaron Linstead IT Services 17
18 LoadLeveler s Consumable Resources: Memory resources = ConsumableMemory(count) Doesn t enforce limits (unlike ulimit) Just use the default Ciaron Linstead IT Services 18
19 Measure memory usage valgrind/massif, gprof (C, Fortran) Intel VTune memory profiler, Heapy, PySizer (Python) Ciaron Linstead IT Services 19
20 Outline 1 Introduction 2 LoadLeveler recap 3 CPUs 4 Memory 5 Factors affecting job priority 6 Multi-step jobs 7 Common errors 8 Useful bits and pieces 9 New tools on the cluster 10 Finally Ciaron Linstead IT Services 20
21 How LoadLeveler calculates job priority jobs are dispatched based on priority, but can be out-of-order, depending on resources SYSPRIO = (ClassSysprio * 100) + (UserSysprio * 10) + (GroupSysprio * 1) - (GroupRunningJobs) - (UserTotalJobs) Class priority goes from short (high) to long (low) All Users (and Groups) have equal priority You can prioritise your own jobs: user priority = n (0-100, default 50) re-prioritise queued jobs with llprio Ciaron Linstead IT Services 21
22 How you can influence job dispatch time Use a different class shorter running classes have higher priority Also important: wall clock limit defaults for short, medium, long are 1, 7 and 30 days jobs are stopped (SIGTERM) at the limit New (lower) limit can be set: wall clock limit = HH:MM:SS Ciaron Linstead IT Services 22
23 Aside: how the backfill scheduler works runs jobs out-of-order according to available resources jobs have a known start time and the wall clock limit so LoadLeveler knows the latest start time of the highest priority queued job (Could be earlier, if jobs finish before wall clock limit reached) LL won t start lower priority jobs if they would delay the start of the highest priority job (the top-dog )...even if there are unused resources Ciaron Linstead IT Services 23
24 Aside: how the backfill scheduler works Job 4 sets a lower wall-clock-limit than the default and can be backfilled. Ciaron Linstead IT Services 24
25 How you can influence job dispatch time Use a lower wall clock limit: wall_clock_limit = HH:MM:SS Ciaron Linstead IT Services 25
26 Outline 1 Introduction 2 LoadLeveler recap 3 CPUs 4 Memory 5 Factors affecting job priority 6 Multi-step jobs 7 Common errors 8 Useful bits and pieces 9 New tools on the cluster 10 Finally Ciaron Linstead IT Services 26
27 Using multiple jobsteps run a program multiple times with different input/output data with one JCF or do data-staging or post-processing, e.g. 1st jobstep: Use class io with 1 task to fetch archived input data 2nd jobstep: Use class short with n tasks to do model run 3rd jobstep: Use class io with 1 task to archive output data Ciaron Linstead IT Services 27
28 Multi-step jobs: Independent jobsteps Run multiple independent steps with one JCF 1 executable = longjob 2 input = longjob.in.$(stepid) 3 output = longjob.out.$(jobid).$(stepid) 4 error = longjob.err.$(jobid).$(stepid) 5 queue 6 queue 7 queue 8 queue 9 queue (Use $(stepid) to differentiate input, output and error files) Ciaron Linstead IT Services 28
29 Multi-step jobs: Dependent jobsteps Run job steps with dependencies on previous steps 1 step_name = step1 2 executable = executable1 3 input = step1.in1 4 output = step1.out1 5 error = step1.err1 6 queue 7 dependency = (step1 == 0) 8 step_name = step2 9 input = step2.in1 10 output = step2.out1 11 error = step2.err1 12 queue (Both steps use the same executable) Ciaron Linstead IT Services 29
30 Multi-step jobs: Dependent jobsteps Run job steps with dependencies on previous steps, different executables (lines 2 and 8) 1 step_name = step1 2 executable = executable1 3 input = step1.in1 4 output = step1.out1 5 error = step1.err1 6 queue 7 dependency = (step1 == 0) 8 executable = executable2 9 step_name = step2 10 input = step2.in1 11 output = step2.out1 12 error = step2.err1 13 queue Status indicators in llq: [C]ompleted, [N]ot[Q]ueued Ciaron Linstead IT Services 30
31 Outline 1 Introduction 2 LoadLeveler recap 3 CPUs 4 Memory 5 Factors affecting job priority 6 Multi-step jobs 7 Common errors 8 Useful bits and pieces 9 New tools on the cluster 10 Finally Ciaron Linstead IT Services 31
32 Common errors Most errors cause submission to fail: llsubmit: Class "short" is not valid for group "itss". llsubmit: This job has not been submitted to LoadLeveler. Job stays Idle cws02a linstead 5/7 11:06 I 50 short Waiting for resources, e.g. ConsumableCpus=9 Job keeps switching between [I]dle and [ST]arting cws02a linstead 5/7 11:06 I 50 short cws02a linstead 5/7 11:06 ST 50 short Checking for input, e.g. executable = /file/doesn t/exist Ciaron Linstead IT Services 32
33 Outline 1 Introduction 2 LoadLeveler recap 3 CPUs 4 Memory 5 Factors affecting job priority 6 Multi-step jobs 7 Common errors 8 Useful bits and pieces 9 New tools on the cluster 10 Finally Ciaron Linstead IT Services 33
34 Watch out for: unwanted restarts: Set restart = no prevent LoadLeveler from restarting your job after a machine failure...unless your code can cope with a restart (e.g. overwriting output files) what your application does with SIGTERM LL uses SIGTERM to cancel jobs some models trap SIGTERM to clean up but don t exit LoadLeveler gets confused Ciaron Linstead IT Services 34
35 Outline 1 Introduction 2 LoadLeveler recap 3 CPUs 4 Memory 5 Factors affecting job priority 6 Multi-step jobs 7 Common errors 8 Useful bits and pieces 9 New tools on the cluster 10 Finally Ciaron Linstead IT Services 35
36 New tools Python 2.7 and 3.2 Python pip and virtualenv for installing your own packages (2.7 only) Distributed and Parallel Matlab (with 16 worker licences) Submit Matlab tasks to compute nodes instead of running on login nodes No need to keep the Matlab client open Intel VTune with sampling driver on login01 (profile code hotspots/cache misses/branch mispredicts) Ciaron Linstead IT Services 36
37 Outline 1 Introduction 2 LoadLeveler recap 3 CPUs 4 Memory 5 Factors affecting job priority 6 Multi-step jobs 7 Common errors 8 Useful bits and pieces 9 New tools on the cluster 10 Finally Ciaron Linstead IT Services 37
38 Summary Default resources Specify job requirements for better performance Use multiple jobsteps for more flexible runs New tools and useful options Ciaron Linstead IT Services 38
39 New cluster 2014 Bid invitation preparation: mid-2013 What do you like about the cluster? What do you dislike? What s on your wishlist? Ciaron Linstead IT Services 39
40 Thank you! Ciaron Linstead IT Services 40
41 References Cluster documentation (inc. these slides): TWS LoadLeveler - Using and Administering: pik-potsdam.de/members/linstead/documentation IBM Redbook - Workload Management with LoadLeveler: documentation Distributed Matlab at PIK http: // Python memory profiler: /line-by-line-report-of-memory-usage/ Valgrind: Ciaron Linstead IT Services 41
Job Management on LONI and LSU HPC clusters
Job Management on LONI and LSU HPC clusters Le Yan HPC Consultant User Services @ LONI Outline Overview Batch queuing system Job queues on LONI clusters Basic commands The Cluster Environment Multiple
More informationBatch Systems & Parallel Application Launchers Running your jobs on an HPC machine
Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike
More informationBlue Gene/Q User Workshop. User Environment & Job submission
Blue Gene/Q User Workshop User Environment & Job submission Topics Blue Joule User Environment Loadleveler Task Placement & BG/Q Personality 2 Blue Joule User Accounts Home directories organised on a project
More informationBatch Systems. Running your jobs on an HPC machine
Batch Systems Running your jobs on an HPC machine Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationIntroduction to the Cluster
Introduction to the Cluster Advanced Computing Center for Research and Education http://www.accre.vanderbilt.edu Follow us on Twitter for important news and updates: @ACCREVandy The Cluster We will be
More informationTivoli Workload Scheduler LoadLeveler V3.4.2 and V3.4.1 documentation updates
Tivoli Workload Scheduler LoadLeveler V3.4.2 and V3.4.1 documentation updates This file contains updates to the IBM Tivoli Workload Scheduler (TWS) LoadLeveler Version 3.4 documentation. v TWS LoadLeveler:
More informationIntroduction to the Cluster
Follow us on Twitter for important news and updates: @ACCREVandy Introduction to the Cluster Advanced Computing Center for Research and Education http://www.accre.vanderbilt.edu The Cluster We will be
More informationLeibniz Supercomputer Centre. Movie on YouTube
SuperMUC @ Leibniz Supercomputer Centre Movie on YouTube Peak Performance Peak performance: 3 Peta Flops 3*10 15 Flops Mega 10 6 million Giga 10 9 billion Tera 10 12 trillion Peta 10 15 quadrillion Exa
More informationIntroduction to HPC Numerical libraries on FERMI and PLX
Introduction to HPC Numerical libraries on FERMI and PLX HPC Numerical Libraries 11-12-13 March 2013 a.marani@cineca.it WELCOME!! The goal of this course is to show you how to get advantage of some of
More informationBatch Systems. Running calculations on HPC resources
Batch Systems Running calculations on HPC resources Outline What is a batch system? How do I interact with the batch system Job submission scripts Interactive jobs Common batch systems Converting between
More informationIntroduction to HPC Using zcluster at GACRC
Introduction to HPC Using zcluster at GACRC On-class STAT8330 Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala pakala@uga.edu Slides courtesy: Zhoufei Hou 1 Outline What
More informationIBM Scheduler for High Throughput Computing on IBM Blue Gene /P Table of Contents
IBM Scheduler for High Throughput Computing on IBM Blue Gene /P Table of Contents Introduction...3 Architecture...4 simple_sched daemon...4 startd daemon...4 End-user commands...4 Personal HTC Scheduler...6
More informationIntroduction to PICO Parallel & Production Enviroment
Introduction to PICO Parallel & Production Enviroment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Domenico Guida d.guida@cineca.it Nicola Spallanzani n.spallanzani@cineca.it
More informationMartinos Center Compute Cluster
Why-N-How: Intro to Launchpad 8 September 2016 Lee Tirrell Laboratory for Computational Neuroimaging Adapted from slides by Jon Kaiser 1. Intro 2. Using launchpad 3. Summary 4. Appendix: Miscellaneous
More informationIntel Manycore Testing Lab (MTL) - Linux Getting Started Guide
Intel Manycore Testing Lab (MTL) - Linux Getting Started Guide Introduction What are the intended uses of the MTL? The MTL is prioritized for supporting the Intel Academic Community for the testing, validation
More informationIntroduction to Discovery.
Introduction to Discovery http://discovery.dartmouth.edu The Discovery Cluster 2 Agenda What is a cluster and why use it Overview of computer hardware in cluster Help Available to Discovery Users Logging
More informationProcess. Heechul Yun. Disclaimer: some slides are adopted from the book authors slides with permission
Process Heechul Yun Disclaimer: some slides are adopted from the book authors slides with permission 1 Recap OS services Resource (CPU, memory) allocation, filesystem, communication, protection, security,
More informationProgramming Techniques for Supercomputers. HPC RRZE University Erlangen-Nürnberg Sommersemester 2018
Programming Techniques for Supercomputers HPC Services @ RRZE University Erlangen-Nürnberg Sommersemester 2018 Outline Login to RRZE s Emmy cluster Basic environment Some guidelines First Assignment 2
More informationIntroduction to HPC Using the New Cluster at GACRC
Introduction to HPC Using the New Cluster at GACRC Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu Outline What is GACRC? What is the new cluster
More informationECE 598 Advanced Operating Systems Lecture 22
ECE 598 Advanced Operating Systems Lecture 22 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 19 April 2016 Announcements Project update HW#9 posted, a bit late Midterm next Thursday
More informationIntroduction to GALILEO
Introduction to GALILEO Parallel & production environment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Domenico Guida d.guida@cineca.it Maurizio Cremonesi m.cremonesi@cineca.it
More informationMoab Workload Manager on Cray XT3
Moab Workload Manager on Cray XT3 presented by Don Maxwell (ORNL) Michael Jackson (Cluster Resources, Inc.) MOAB Workload Manager on Cray XT3 Why MOAB? Requirements Features Support/Futures 2 Why Moab?
More informationBefore We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop
Before We Start Sign in hpcxx account slips Windows Users: Download PuTTY Google PuTTY First result Save putty.exe to Desktop Research Computing at Virginia Tech Advanced Research Computing Compute Resources
More informationNBIC TechTrack PBS Tutorial
NBIC TechTrack PBS Tutorial by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen Visit our webpage at: http://www.nbic.nl/support/brs 1 NBIC PBS Tutorial
More informationOpenPBS Users Manual
How to Write a PBS Batch Script OpenPBS Users Manual PBS scripts are rather simple. An MPI example for user your-user-name: Example: MPI Code PBS -N a_name_for_my_parallel_job PBS -l nodes=7,walltime=1:00:00
More informationSubmitting batch jobs
Submitting batch jobs SLURM on ECGATE Xavi Abellan Xavier.Abellan@ecmwf.int ECMWF February 20, 2017 Outline Interactive mode versus Batch mode Overview of the Slurm batch system on ecgate Batch basic concepts
More informationIntroduction to HPC Using zcluster at GACRC
Introduction to HPC Using zcluster at GACRC On-class PBIO/BINF8350 Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu Outline What is GACRC? What
More informationMERCED CLUSTER BASICS Multi-Environment Research Computer for Exploration and Discovery A Centerpiece for Computational Science at UC Merced
MERCED CLUSTER BASICS Multi-Environment Research Computer for Exploration and Discovery A Centerpiece for Computational Science at UC Merced Sarvani Chadalapaka HPC Administrator University of California
More informationIntroduction to Discovery.
Introduction to Discovery http://discovery.dartmouth.edu The Discovery Cluster 2 Agenda What is a cluster and why use it Overview of computer hardware in cluster Help Available to Discovery Users Logging
More informationand how to use TORQUE & Maui Piero Calucci
Queue and how to use & Maui Scuola Internazionale Superiore di Studi Avanzati Trieste November 2008 Advanced School in High Performance and Grid Computing Outline 1 We Are Trying to Solve 2 Using the Manager
More informationProcess. Heechul Yun. Disclaimer: some slides are adopted from the book authors slides with permission 1
Process Heechul Yun Disclaimer: some slides are adopted from the book authors slides with permission 1 Recap OS services Resource (CPU, memory) allocation, filesystem, communication, protection, security,
More informationAnnouncement. Exercise #2 will be out today. Due date is next Monday
Announcement Exercise #2 will be out today Due date is next Monday Major OS Developments 2 Evolution of Operating Systems Generations include: Serial Processing Simple Batch Systems Multiprogrammed Batch
More informationIntroduction to High-Performance Computing (HPC)
Introduction to High-Performance Computing (HPC) Computer components CPU : Central Processing Unit cores : individual processing units within a CPU Storage : Disk drives HDD : Hard Disk Drive SSD : Solid
More informationQueuing and Scheduling on Compute Clusters
Queuing and Scheduling on Compute Clusters Andrew Caird acaird@umich.edu Queuing and Scheduling on Compute Clusters p.1/17 The reason for me being here Give some queuing background Introduce some queuing
More informationECE 574 Cluster Computing Lecture 4
ECE 574 Cluster Computing Lecture 4 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 31 January 2017 Announcements Don t forget about homework #3 I ran HPCG benchmark on Haswell-EP
More informationCS 326: Operating Systems. CPU Scheduling. Lecture 6
CS 326: Operating Systems CPU Scheduling Lecture 6 Today s Schedule Agenda? Context Switches and Interrupts Basic Scheduling Algorithms Scheduling with I/O Symmetric multiprocessing 2/7/18 CS 326: Operating
More informationAnnouncements. Reading. Project #1 due in 1 week at 5:00 pm Scheduling Chapter 6 (6 th ed) or Chapter 5 (8 th ed) CMSC 412 S14 (lect 5)
Announcements Reading Project #1 due in 1 week at 5:00 pm Scheduling Chapter 6 (6 th ed) or Chapter 5 (8 th ed) 1 Relationship between Kernel mod and User Mode User Process Kernel System Calls User Process
More informationSubmitting batch jobs Slurm on ecgate Solutions to the practicals
Submitting batch jobs Slurm on ecgate Solutions to the practicals Xavi Abellan xavier.abellan@ecmwf.int User Support Section Com Intro 2015 Submitting batch jobs ECMWF 2015 Slide 1 Practical 1: Basic job
More informationSlurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT October 23, 2012
Slurm and Abel job scripts Katerina Michalickova The Research Computing Services Group SUF/USIT October 23, 2012 Abel in numbers Nodes - 600+ Cores - 10000+ (1 node->2 processors->16 cores) Total memory
More informationCS370: System Architecture & Software [Fall 2014] Dept. Of Computer Science, Colorado State University
Frequently asked questions from the previous class survey CS 370: SYSTEM ARCHITECTURE & SOFTWARE [CPU SCHEDULING] Shrideep Pallickara Computer Science Colorado State University OpenMP compiler directives
More informationA Brief Introduction to The Center for Advanced Computing
A Brief Introduction to The Center for Advanced Computing May 1, 2006 Hardware 324 Opteron nodes, over 700 cores 105 Athlon nodes, 210 cores 64 Apple nodes, 128 cores Gigabit networking, Myrinet networking,
More informationSubmitting batch jobs Slurm on ecgate
Submitting batch jobs Slurm on ecgate Xavi Abellan xavier.abellan@ecmwf.int User Support Section Com Intro 2015 Submitting batch jobs ECMWF 2015 Slide 1 Outline Interactive mode versus Batch mode Overview
More informationLearning Outcomes. Processes and Threads. Major Requirements of an Operating System. Processes and Threads
Learning Outcomes Processes and Threads An understanding of fundamental concepts of processes and threads 1 2 Major Requirements of an Operating System Interleave the execution of several processes to
More informationApplication and System Memory Use, Configuration, and Problems on Bassi. Richard Gerber
Application and System Memory Use, Configuration, and Problems on Bassi Richard Gerber Lawrence Berkeley National Laboratory NERSC User Services ScicomP 13, Garching, Germany, July 17, 2007 NERSC is supported
More informationAn introduction to checkpointing. for scientifc applications
damien.francois@uclouvain.be UCL/CISM An introduction to checkpointing for scientifc applications November 2016 CISM/CÉCI training session What is checkpointing? Without checkpointing: $./count 1 2 3^C
More informationAnswers to Federal Reserve Questions. Training for University of Richmond
Answers to Federal Reserve Questions Training for University of Richmond 2 Agenda Cluster Overview Software Modules PBS/Torque Ganglia ACT Utils 3 Cluster overview Systems switch ipmi switch 1x head node
More informationCluster Network Products
Cluster Network Products Cluster interconnects include, among others: Gigabit Ethernet Myrinet Quadrics InfiniBand 1 Interconnects in Top500 list 11/2009 2 Interconnects in Top500 list 11/2008 3 Cluster
More informationA Brief Introduction to The Center for Advanced Computing
A Brief Introduction to The Center for Advanced Computing February 8, 2007 Hardware 376 Opteron nodes, over 890 cores Gigabit networking, Myrinet networking, Infiniband networking soon Hardware: nyx nyx
More informationSherlock for IBIIS. William Law Stanford Research Computing
Sherlock for IBIIS William Law Stanford Research Computing Overview How we can help System overview Tech specs Signing on Batch submission Software environment Interactive jobs Next steps We are here to
More informationGuillimin HPC Users Meeting March 16, 2017
Guillimin HPC Users Meeting March 16, 2017 guillimin@calculquebec.ca McGill University / Calcul Québec / Compute Canada Montréal, QC Canada Please be kind to your fellow user meeting attendees Limit to
More informationKnights Landing production environment on MARCONI
Knights Landing production environment on MARCONI Alessandro Marani - a.marani@cineca.it March 20th, 2017 Agenda In this presentation, we will discuss - How we interact with KNL environment on MARCONI
More informationIntroduction to HPC Using zcluster at GACRC On-Class GENE 4220
Introduction to HPC Using zcluster at GACRC On-Class GENE 4220 Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala pakala@uga.edu Slides courtesy: Zhoufei Hou 1 OVERVIEW GACRC
More informationGetting Started with Serial and Parallel MATLAB on bwgrid
Getting Started with Serial and Parallel MATLAB on bwgrid CONFIGURATION Download either bwgrid.remote.r2014b.zip (Windows) or bwgrid.remote.r2014b.tar (Linux/Mac) For Windows users, unzip the download
More informationHigh Performance Computing (HPC) Using zcluster at GACRC
High Performance Computing (HPC) Using zcluster at GACRC On-class STAT8060 Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu Outline What is GACRC?
More informationLecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter
Lecture Topics Today: Advanced Scheduling (Stallings, chapter 10.1-10.4) Next: Deadlock (Stallings, chapter 6.1-6.6) 1 Announcements Exam #2 returned today Self-Study Exercise #10 Project #8 (due 11/16)
More informationSGE 6.0 configuration guide, version 1.1
SGE 6.0 configuration guide, version 1.1 Juha Jäykkä juolja@utu.fi Department of Physics Laboratory of Theoretical Physics University of Turku 18.03.2005 First, some notes This needs to be revised to include
More informationAssignment #4 due 10/21/04
10.675 Assignment #4 due 10/21/04 In this problem set, you will use Car-Parrinello Molecular Dynamics (CPMD) to calculate the adsorption energy of oxygen atom on the Si(100) surface and compare it to the
More informationA Brief Introduction to The Center for Advanced Computing
A Brief Introduction to The Center for Advanced Computing November 10, 2009 Outline 1 Resources Hardware Software 2 Mechanics: Access Transferring files and data to and from the clusters Logging into the
More informationMascot Insight Installation and setup guide
Mascot Insight Installation and setup guide System requirements These are the system requirements for a Mascot Insight server. On the client side, Mascot Insight can be accessed from most web browsers.
More informationBright Cluster Manager
Bright Cluster Manager Using Slurm for Data Aware Scheduling in the Cloud Martijn de Vries CTO About Bright Computing Bright Computing 1. Develops and supports Bright Cluster Manager for HPC systems, server
More informationProcesses and Threads
Processes and Threads 1 Learning Outcomes An understanding of fundamental concepts of processes and threads 2 Major Requirements of an Operating System Interleave the execution of several processes to
More informationQueue systems. and how to use Torque/Maui. Piero Calucci. Scuola Internazionale Superiore di Studi Avanzati Trieste
Queue systems and how to use Torque/Maui Piero Calucci Scuola Internazionale Superiore di Studi Avanzati Trieste March 9th 2007 Advanced School in High Performance Computing Tools for e-science Outline
More informationIntroduction to HPC Using zcluster at GACRC
Introduction to HPC Using zcluster at GACRC Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu Outline What is GACRC? What is HPC Concept? What is
More informationIntroduction to High Performance Computing Using Sapelo2 at GACRC
Introduction to High Performance Computing Using Sapelo2 at GACRC Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala pakala@uga.edu 1 Outline High Performance Computing (HPC)
More informationò mm_struct represents an address space in kernel ò task represents a thread in the kernel ò A task points to 0 or 1 mm_structs
Last time We went through the high-level theory of scheduling algorithms Scheduling Today: View into how Linux makes its scheduling decisions Don Porter CSE 306 Lecture goals Understand low-level building
More informationNew User Seminar: Part 2 (best practices)
New User Seminar: Part 2 (best practices) General Interest Seminar January 2015 Hugh Merz merz@sharcnet.ca Session Outline Submitting Jobs Minimizing queue waits Investigating jobs Checkpointing Efficiency
More informationScheduling. Don Porter CSE 306
Scheduling Don Porter CSE 306 Last time ò We went through the high-level theory of scheduling algorithms ò Today: View into how Linux makes its scheduling decisions Lecture goals ò Understand low-level
More informationProcess. One or more threads of execution Resources required for execution. Memory (RAM) Others
Memory Management 1 Learning Outcomes Appreciate the need for memory management in operating systems, understand the limits of fixed memory allocation schemes. Understand fragmentation in dynamic memory
More informationSlurm basics. Summer Kickstart June slide 1 of 49
Slurm basics Summer Kickstart 2017 June 2017 slide 1 of 49 Triton layers Triton is a powerful but complex machine. You have to consider: Connecting (ssh) Data storage (filesystems and Lustre) Resource
More informationIntroduction to High-Performance Computing (HPC)
Introduction to High-Performance Computing (HPC) Computer components CPU : Central Processing Unit CPU cores : individual processing units within a Storage : Disk drives HDD : Hard Disk Drive SSD : Solid
More informationCOMP 3430 Robert Guderian
Operating Systems COMP 3430 Robert Guderian file:///users/robg/dropbox/teaching/3430-2018/slides/03_processes/index.html?print-pdf#/ 1/53 1 Processes file:///users/robg/dropbox/teaching/3430-2018/slides/03_processes/index.html?print-pdf#/
More informationCompiling applications for the Cray XC
Compiling applications for the Cray XC Compiler Driver Wrappers (1) All applications that will run in parallel on the Cray XC should be compiled with the standard language wrappers. The compiler drivers
More informationPBS Pro Documentation
Introduction Most jobs will require greater resources than are available on individual nodes. All jobs must be scheduled via the batch job system. The batch job system in use is PBS Pro. Jobs are submitted
More informationCS2506 Quick Revision
CS2506 Quick Revision OS Structure / Layer Kernel Structure Enter Kernel / Trap Instruction Classification of OS Process Definition Process Context Operations Process Management Child Process Thread Process
More informationGraham vs legacy systems
New User Seminar Graham vs legacy systems This webinar only covers topics pertaining to graham. For the introduction to our legacy systems (Orca etc.), please check the following recorded webinar: SHARCNet
More informationCOSC243 Part 2: Operating Systems
COSC243 Part 2: Operating Systems Lecture 17: CPU Scheduling Zhiyi Huang Dept. of Computer Science, University of Otago Zhiyi Huang (Otago) COSC243 Lecture 17 1 / 30 Overview Last lecture: Cooperating
More informationUsing the Yale HPC Clusters
Using the Yale HPC Clusters Stephen Weston Robert Bjornson Yale Center for Research Computing Yale University Oct 2015 To get help Send an email to: hpc@yale.edu Read documentation at: http://research.computing.yale.edu/hpc-support
More informationReduces latency and buffer overhead. Messaging occurs at a speed close to the processors being directly connected. Less error detection
Switching Operational modes: Store-and-forward: Each switch receives an entire packet before it forwards it onto the next switch - useful in a general purpose network (I.e. a LAN). usually, there is a
More informationMemory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts
Memory management Last modified: 26.04.2016 1 Contents Background Logical and physical address spaces; address binding Overlaying, swapping Contiguous Memory Allocation Segmentation Paging Structure of
More informationTools for Intel Xeon Phi: VTune & Advisor Dr. Fabio Baruffa - LRZ,
Tools for Intel Xeon Phi: VTune & Advisor Dr. Fabio Baruffa - fabio.baruffa@lrz.de LRZ, 27.6.- 29.6.2016 Architecture Overview Intel Xeon Processor Intel Xeon Phi Coprocessor, 1st generation Intel Xeon
More informationSLURM Operation on Cray XT and XE
SLURM Operation on Cray XT and XE Morris Jette jette@schedmd.com Contributors and Collaborators This work was supported by the Oak Ridge National Laboratory Extreme Scale Systems Center. Swiss National
More informationBuilding Campus HTC Sharing Infrastructures. Derek Weitzel University of Nebraska Lincoln (Open Science Grid Hat)
Building Campus HTC Sharing Infrastructures Derek Weitzel University of Nebraska Lincoln (Open Science Grid Hat) HCC: Campus Grids Motivation We have 3 clusters in 2 cities. Our largest (4400 cores) is
More informationProcess Description and Control. Chapter 3
Process Description and Control Chapter 3 Contents Process states Process description Process control Unix process management Process From processor s point of view execute instruction dictated by program
More informationHiperDispatch Logical Processors and Weight Management
HiperDispatch Logical Processors and Weight Management Fabio Massimo Ottaviani EPV Technologies August 2008 1 Introduction In the last few years, the power and number of the physical processors available
More informationPrograms. Program: Set of commands stored in a file Stored on disk Starting a program creates a process static Process: Program loaded in RAM dynamic
Programs Program: Set of commands stored in a file Stored on disk Starting a program creates a process static Process: Program loaded in RAM dynamic Types of Processes 1. User process: Process started
More informationAn introduction to checkpointing. for scientific applications
damien.francois@uclouvain.be UCL/CISM - FNRS/CÉCI An introduction to checkpointing for scientific applications November 2013 CISM/CÉCI training session What is checkpointing? Without checkpointing: $./count
More informationProcess Description and Control. Major Requirements of an Operating System
Process Description and Control Chapter 3 1 Major Requirements of an Operating System Interleave the execution of several processes to maximize processor utilization while providing reasonable response
More informationMajor Requirements of an Operating System Process Description and Control
Major Requirements of an Operating System Process Description and Control Chapter 3 Interleave the execution of several processes to maximize processor utilization while providing reasonable response time
More informationPARDA: Proportional Allocation of Resources for Distributed Storage Access
PARDA: Proportional Allocation of Resources for Distributed Storage Access Ajay Gulati, Irfan Ahmad, Carl Waldspurger Resource Management Team VMware Inc. USENIX FAST 09 Conference February 26, 2009 The
More informationDistributed OrcaFlex. 1. Introduction. 2. What s New. Distributed OrcaFlex
1. Introduction is a suite of programs that enables a collection of networked, OrcaFlex licensed, computers to run OrcaFlex jobs as background tasks using spare processor time. consists of four separate
More informationSUBMITTING JOBS TO ARTEMIS FROM MATLAB
INFORMATION AND COMMUNICATION TECHNOLOGY SUBMITTING JOBS TO ARTEMIS FROM MATLAB STEPHEN KOLMANN, INFORMATION AND COMMUNICATION TECHNOLOGY AND SYDNEY INFORMATICS HUB 8 August 2017 Table of Contents GETTING
More informationXSEDE New User Tutorial
April 2, 2014 XSEDE New User Tutorial Jay Alameda National Center for Supercomputing Applications XSEDE Training Survey Make sure you sign the sign in sheet! At the end of the module, I will ask you to
More informationARCHER/RDF Overview. How do they fit together? Andy Turner, EPCC
ARCHER/RDF Overview How do they fit together? Andy Turner, EPCC a.turner@epcc.ed.ac.uk www.epcc.ed.ac.uk www.archer.ac.uk Outline ARCHER/RDF Layout Available file systems Compute resources ARCHER Compute
More informationHigh Performance Computing Cluster Advanced course
High Performance Computing Cluster Advanced course Jeremie Vandenplas, Gwen Dawes 9 November 2017 Outline Introduction to the Agrogenomics HPC Submitting and monitoring jobs on the HPC Parallel jobs on
More informationUser Guide of High Performance Computing Cluster in School of Physics
User Guide of High Performance Computing Cluster in School of Physics Prepared by Sue Yang (xue.yang@sydney.edu.au) This document aims at helping users to quickly log into the cluster, set up the software
More informationShadow: Real Applications, Simulated Networks. Dr. Rob Jansen U.S. Naval Research Laboratory Center for High Assurance Computer Systems
Shadow: Real Applications, Simulated Networks Dr. Rob Jansen Center for High Assurance Computer Systems Cyber Modeling and Simulation Technical Working Group Mark Center, Alexandria, VA October 25 th,
More informationUsing the computational resources at the GACRC
An introduction to zcluster Georgia Advanced Computing Resource Center (GACRC) University of Georgia Dr. Landau s PHYS4601/6601 course - Spring 2017 What is GACRC? Georgia Advanced Computing Resource Center
More informationGuillimin HPC Users Meeting March 17, 2016
Guillimin HPC Users Meeting March 17, 2016 guillimin@calculquebec.ca McGill University / Calcul Québec / Compute Canada Montréal, QC Canada Outline Compute Canada News System Status Software Updates Training
More informationOPERATING SYSTEMS & UTILITY PROGRAMS
OPERATING SYSTEMS & UTILITY PROGRAMS System Software System software consists of the programs that control the operations of the computer and its devices. Functions that system software performs include:
More informationOperating Systems. Introduction & Overview. Outline for today s lecture. Administrivia. ITS 225: Operating Systems. Lecture 1
ITS 225: Operating Systems Operating Systems Lecture 1 Introduction & Overview Jan 15, 2004 Dr. Matthew Dailey Information Technology Program Sirindhorn International Institute of Technology Thammasat
More information