Hosts & Partitions. Slurm Training 15. Jordi Blasco & Alfred Gil (HPCNow!)
|
|
- Ann Fleming
- 6 years ago
- Views:
Transcription
1 Slurm Training 15
2 Agenda 1 2 Compute Hosts State of the node FrontEnd Hosts FrontEnd Hosts Control Machine Define Partitions Job Preemption 3 4 Define Limits Define ACLs Shared resources Partition States Configuration Hierarchical Networks Examples
3 Compute Hosts NodeName - Name that SLURM uses to refer to a node (or base partition for BlueGene systems). NodeHostname - Typically this would be the string that "/bin/hostname -s" returns. NodeAddr - Name that a node should be referred to in establishing a communications path. Feature - A comma delimited list of arbitrary strings indicative of some characteristic associated with the node. Gres - A comma delimited list of generic resources specifications for a node. RealMemory - Size of real memory on the node in MegaBytes (e.g. "2048"). The default value is 1.
4 Compute Hosts Boards - Number of Baseboards in nodes with a baseboard controller. SocketsPerBoard - Number of physical processor sockets on a baseboard. CoresPerSocket - Number of cores in a single physical processor socket. CPUs - Number of logical processors on the node. Sockets - Number of physical processor sockets on the node. ThreadsPerCore - Number of logical threads in a single physical core1. 1 If you have more than 1 thread per core and select/cons_res plugin active, you will want to avoid CR_CPU in the SelectTypeParameters variable.
5 Compute Hosts TmpDisk - Total size of temporary disk storage in TmpFS in MegaBytes. TmpFS (for "Temporary File System") identifies the location which jobs should use for temporary storage. The Prolog and/or Epilog programs might be used to create per job based folder and destroy the content once the job is done.
6 Compute Hosts Weight The weight sets the priority of the node for scheduling purposes. Jobs will be allocated at nodes with the lowest weight satisfying their requirements. Usage Example It would be preferable to allocate nodes with less feature set rather than nodes with more features if either will satisfy a job s requirements. For example smaller memory nodes rather than larger memory nodes. The units of weight are arbitrary, but larger weights should be assigned to (more "expensive") nodes with more processors, memory, disk space, higher processor speed, etc.
7 Compute Hosts Weight The weight sets the priority of the node for scheduling purposes. Jobs will be allocated at nodes with the lowest weight satisfying their requirements. Usage Example It would be preferable to allocate nodes with less feature set rather than nodes with more features if either will satisfy a job s requirements. For example smaller memory nodes rather than larger memory nodes. The units of weight are arbitrary, but larger weights should be assigned to (more "expensive") nodes with more processors, memory, disk space, higher processor speed, etc.
8 State of the compute hosts State of the node with respect to the initiation of user jobs. CLOUD Indicates the node exists in the cloud. DOWN Indicates the node failed and is unavailable to be allocated work. DRAIN Indicates the node is unavailable to be allocated with work. FAIL Indicates the node is expected to fail soon. FAILING Indicates the node is running some jobs and expected to fail soon. FUTURE Indicates the node is defined for future use. UNKNOWN Indicates the node s state is undefined.
9 FrontEnd Hosts FrontEnd Hosts Some systems like BlueGene or Cray use frontend nodes to execute batch scripts rather than compute nodes. The options are very similar to those used in compute nodes2. 2 These options may only work on systems configured and built with the appropriate parameters (i.e. have-front-end or enable-bluegene-emulation)
10 FrontEnd Hosts FrontEnd Hosts setup FrontendName - Name that SLURM uses to refer to a frontend node. FrontendAddr - Name that a frontend node should be referred to in establishing a communications path. By default, the FrontendAddr will be identical in value to FrontendName. Port - The port number that the SLURM compute node daemon, slurmd, listens to for work on this particular frontend node. Use of this option is NOT generally recommended except for development or testing purposes.
11 FrontEnd Hosts FrontEnd Hosts ACLs AllowGroups - Comma separated list of group names which may execute jobs on this front end node. AllowUsers - Comma separated list of user names which may execute jobs on this front end node. DenyGroups - Comma separated list of group names which are prevented from executing jobs on this front end node. May not be used with the AllowGroups option. DenyUsers - Comma separated list of user names which are prevented from executing jobs on this front end node. May not be used with the AllowUsers option.
12 FrontEnd Hosts FrontEnd Hosts Status Reason - Identifies the reason for a frontend node being in state DOWN, DRAINED, DRAINING, FAIL or FAILING. State - State of the frontend node with respect to the initiation of user jobs. Acceptable values are : DOWN - the frontend node has failed and is unavailable to allocate work. DRAIN - the frontend node is unavailable to allocate work. FAIL - the frontend node is expected to fail soon, has no jobs allocated to it, and will not allocate new jobs. FAILING - the frontend node is expected to fail soon, has one or more jobs allocated to it, but no more jobs will be allocated. UNKNOWN - the frontend node s state is undefined (BUSY or IDLE), but will be established when the slurmd daemon on that node registers. The default value is "UNKNOWN".
13 Control Machine Control nodes in HA Cluster ControlMachine=slurm01 ControlAddr=slurm01-eth0 BackupController=slurm02 BackupAddr=slurm02-eth0 StateSaveLocation The backup controller recovers state information from the StateSaveLocation directory, which must be readable and writeable from both the primary and backup controllers. Use shared file system for this folder.
14 The partition configuration permits you to establish different job limits or access controls for various groups of nodes. Nodes may be in more than one partition, making partitions serve as general purpose queues.
15 PartitionName - This name can be specified by users when submitting jobs. DefMemPerCPU - Default real memory size available per allocated CPU in MegaBytes. DefMemPerNode - Default real memory size available per allocated node in MegaBytes. DefaultTime - Run time limit used for jobs that don t specify a value. If not set then MaxTime will be used. Nodes - Comma separated list of nodes (or base partitions for BlueGene systems) which are associated with this partition.
16 Default - If this keyword is set, jobs submitted without a partition specification will utilize this partition. Possible values are "YES" and "NO". The default value is "NO". Alternate - Partition name of alternate partition to be used if the state of this partition is "DRAIN" or "INACTIVE." Hidden - Specifies if the partition and its jobs are to be hidden by default. Possible values are "YES" and "NO".
17 Priority - Jobs submitted to a higher priority partition will be dispatched before pending jobs in lower priority partitions and if possible they will preempt running jobs from lower priority partitions. ReqResv - Specifies users of this partition are required to designate a reservation when submitting a job. Possible values are "YES" and "NO". The default value is "NO". SelectTypeParameters - Partition-specific resource allocation type. Supported values are CR_Core and CR_Socket. Use requires the system-wide SelectTypeParameters value be set plus CR_ALLOCATE_FULL_SOCKET.
18 Job Preemption based on Partition Priority PreemptMode - Mechanism used to preempt jobs from this partition when PreemptType=preempt/partition_prio is configured. The cluster-level PreemptMode must include the GANG option if PreemptMode is configured to SUSPEND for any partition. The cluster-level PreemptMode must not be OFF if PreemptMode is enabled for any partition. GraceTime - Specifies, in units of seconds, the preemption grace time to be extended to a job which has been selected for preemption. The default value is zero, no preemption grace time is allowed on this partition.
19 Job Preemption based on Partition Priority SLURM offers two ways for a queued job to preempt a running job : based on partition priority and based on QoS priority. Section 5 of this training will cover Job Preemption in detail.
20 MaxCPUsPerNode - Maximum number of CPUs on any node available to all jobs from this partition (useful to schedule GPUs). MaxMemPerCPU - Maximum real memory size available per allocated CPU in MegaBytes. MaxMemPerNode - Maximum real memory size available per allocated node in MegaBytes. MaxNodes - Maximum count of nodes which may be allocated to any single job. MaxTime - Maximum run time limit for jobs. MinNodes - Minimum count of nodes which may be allocated to any single job.
21 Partition ACLs AllocNodes - Comma separated list of nodes from which users can submit jobs in the partition. Node names may be specified using the node range expression syntax described above. The default value is "ALL". AllowAccounts - Comma separated list of accounts which may execute jobs in the partition. The default value is "ALL". NOTE: If AllowAccounts is used then DenyAccounts will not be enforced. Also refer to DenyAccounts. AllowGroups - Comma separated list of group names which may execute jobs in the partition. The default value is "ALL". AllowQos - Comma separated list of QoS which may execute jobs in the partition. Jobs executed as user root can use any partition without regard to the value of AllowQos. The default value is "ALL".
22 Partition ACLs DenyAccount - Comma separated list of accounts which may not execute jobs in the partition. By default, no accounts are denied access DenyQos - Comma separated list of Qos which may not execute jobs in the partition. By default, no QOS are denied access. DisableRootJobs - If set to "YES" then user root will be prevented from running any jobs on this partition. RootOnly - Specifies if only user ID zero may allocate resources in this partition3. Possible values are "YES" and "NO". The default value is "NO". 3 This option can be useful for a partition to be managed by some external entity and prevents users from directly using those resources.
23 Sharing resources Share - controls the ability of the partition to execute more than one job at a time on each resource according to SelectTypeParameters. SelectTypeParameters should be configured to treat memory as a consumable resource and the mem option should be used for job allocations. Sharing of resources is typically useful only when using gang scheduling (PreemptMode=suspend or PreemptMode=kill).
24 Sharing resources EXCLUSIVE - Allocates entire nodes to jobs even with select/cons_res configured. FORCE - Makes all resources in the partition available for sharing without any means for users to disable it. May be followed with a colon and maximum number of jobs in running or suspended state4. YES - Makes all resources in the partition available for sharing upon request by the job (- -share). NO - Selected resources are allocated to a single job. No resource will be allocated to more than one job. Note that a value of YES or FORCE can negatively impact performance for systems with many thousands of running jobs. The default value is NO. 4 Recommended only for BlueGene systems configured with small blocks or
25 Partition States State of partition or availability for use are: UP - Designates that new jobs may be queued on the partition, and that jobs may be allocated on nodes and run from the partition. DOWN - Designates that new jobs may be queued on the partition, but queued jobs may not be allocated nodes and run from the partition. Jobs already running on the partition continue to run. DRAIN - Designates that no new jobs may be queued on the partition, but jobs already queued on the partition may be allocated nodes and run. INACTIVE - Designates that no new jobs may be queued on the partition, and jobs already queued may not be allocated nodes and run.
26 TopologyPlugin optimizes job allocations in order to minimize network contention. It also can be used to setup low latency islands or to prevent to schedule MPI jobs across heterogeneous architectures. topology/none : default for other systems, best-fit logic over one-dimensional topology. topology/3d_torus : default for Sun Constellation systems (three-dimensional topology). topology/node_rank : Slurm performs a best-fit algorithm over those ordered nodes. topology/tree : hierarchical network topology described in topology.conf.
27 Hierarchical Networks (topology/tree) topology.conf describes the cluster s network topology for optimized job resource allocation. The configuration parameters available are: SwitchName : The (unique and internal) name of a switch. Switches : Child switches of the named switch. Nodes : Child Nodes of the named leaf switch. LinkSpeed : An optional value specifying the performance of this communication link (not used yet).
28 TopologyPlugin Tree - User Options User can specify the maximum number of leaf switches to be used and the maximum time the job should wait for this optimized setup. --switches=count[@time] TopologyPlugin Tree - Admin Options The system administrator can limit the maximum time that any job can wait for this optimized configuration using the SchedulerParameters configuration parameter with the max_switch_wait option.
29 Three Nodes Ring Topology (8:1 blocking)
30 Three Nodes Ring Topology (8:1 blocking) Three Nodes Ring Topology - each switch has four connections, 32 compute nodes per switch, 96 compute nodes per island. SwitchName=s0 Switches=s[1,2] Nodes=hwl[01-32] SwitchName=s1 Switches=s[0,2] Nodes=hwl[33-64] SwitchName=s2 Switches=s[0,1] Nodes=hwl[65-96]
31 Pruned Tree Topology (CLOS-3 Non-Blocking)
32 Pruned Tree Topology (CLOS-3 Non-Blocking) Pruned Tree Network Topology with two levels and each switch has nine connections. 18 compute nodes per switch and up to 64 nodes with non-blocking setup. SwitchName=s0 SwitchName=s1 SwitchName=s2 SwitchName=s3 SwitchName=s4 SwitchName=s5 Nodes=hwl[01-18] Nodes=hwl[19-32] Nodes=hwl[33-40] Nodes=hwl[41-64] Switches=s[0-3] Switches=s[0-3] LinkSpeed=80 LinkSpeed=80 LinkSpeed=80 LinkSpeed=80 LinkSpeed=720 LinkSpeed=720
33 There is still room for improvement 1 AUCSCHED3 Plugin Topologically Aware Job Scheduling for SLURM. 2 netloc: Towards a Comprehensive View of the HPC System Topology. 3 Topology-to-pattern matching (TreeMatch)
34
35 Hands-On Session 1
36
Slurm Workload Manager Introductory User Training
Slurm Workload Manager Introductory User Training David Bigagli david@schedmd.com SchedMD LLC Outline Roles of resource manager and job scheduler Slurm design and architecture Submitting and running jobs
More informationVersions and 14.11
Slurm Update Versions 14.03 and 14.11 Jacob Jenson jacob@schedmd.com Yiannis Georgiou yiannis.georgiou@bull.net V14.03 - Highlights Support for native Slurm operation on Cray systems (without ALPS) Run
More informationSlurm Workload Manager Overview SC15
Slurm Workload Manager Overview SC15 Alejandro Sanchez alex@schedmd.com Slurm Workload Manager Overview Originally intended as simple resource manager, but has evolved into sophisticated batch scheduler
More informationSLURM Operation on Cray XT and XE
SLURM Operation on Cray XT and XE Morris Jette jette@schedmd.com Contributors and Collaborators This work was supported by the Oak Ridge National Laboratory Extreme Scale Systems Center. Swiss National
More informationSlurm Version Overview
Slurm Version 18.08 Overview Brian Christiansen SchedMD Slurm User Group Meeting 2018 Schedule Previous major release was 17.11 (November 2017) Latest major release 18.08 (August 2018) Next major release
More information1 Bull, 2011 Bull Extreme Computing
1 Bull, 2011 Bull Extreme Computing Table of Contents Overview. Principal concepts. Architecture. Scheduler Policies. 2 Bull, 2011 Bull Extreme Computing SLURM Overview Ares, Gerardo, HPC Team Introduction
More informationSlurm Roadmap. Morris Jette, Danny Auble (SchedMD) Yiannis Georgiou (Bull)
Slurm Roadmap Morris Jette, Danny Auble (SchedMD) Yiannis Georgiou (Bull) Exascale Focus Heterogeneous Environment Scalability Reliability Energy Efficiency New models (Cloud/Virtualization/Hadoop) Following
More informationIntroduction to Slurm
Introduction to Slurm Tim Wickberg SchedMD Slurm User Group Meeting 2017 Outline Roles of resource manager and job scheduler Slurm description and design goals Slurm architecture and plugins Slurm configuration
More informationSlurm Overview. Brian Christiansen, Marshall Garey, Isaac Hartung SchedMD SC17. Copyright 2017 SchedMD LLC
Slurm Overview Brian Christiansen, Marshall Garey, Isaac Hartung SchedMD SC17 Outline Roles of a resource manager and job scheduler Slurm description and design goals Slurm architecture and plugins Slurm
More informationHigh Throughput Computing with SLURM. SLURM User Group Meeting October 9-10, 2012 Barcelona, Spain
High Throughput Computing with SLURM SLURM User Group Meeting October 9-10, 2012 Barcelona, Spain Morris Jette and Danny Auble [jette,da]@schedmd.com Thanks to This work is supported by the Oak Ridge National
More informationHigh Scalability Resource Management with SLURM Supercomputing 2008 November 2008
High Scalability Resource Management with SLURM Supercomputing 2008 November 2008 Morris Jette (jette1@llnl.gov) LLNL-PRES-408498 Lawrence Livermore National Laboratory What is SLURM Simple Linux Utility
More informationBatch Usage on JURECA Introduction to Slurm. May 2016 Chrysovalantis Paschoulas HPS JSC
Batch Usage on JURECA Introduction to Slurm May 2016 Chrysovalantis Paschoulas HPS group @ JSC Batch System Concepts Resource Manager is the software responsible for managing the resources of a cluster,
More informationResource Management at LLNL SLURM Version 1.2
UCRL PRES 230170 Resource Management at LLNL SLURM Version 1.2 April 2007 Morris Jette (jette1@llnl.gov) Danny Auble (auble1@llnl.gov) Chris Morrone (morrone2@llnl.gov) Lawrence Livermore National Laboratory
More informationCluster Computing. Resource and Job Management for HPC 16/08/2010 SC-CAMP. ( SC-CAMP) Cluster Computing 16/08/ / 50
Cluster Computing Resource and Job Management for HPC SC-CAMP 16/08/2010 ( SC-CAMP) Cluster Computing 16/08/2010 1 / 50 Summary 1 Introduction Cluster Computing 2 About Resource and Job Management Systems
More informationBrigham Young University
Brigham Young University Fulton Supercomputing Lab Ryan Cox Slurm User Group September 16, 2015 Washington, D.C. Open Source Code I'll reference several codes we have open sourced http://github.com/byuhpc
More informationIntroduction to High-Performance Computing (HPC)
Introduction to High-Performance Computing (HPC) Computer components CPU : Central Processing Unit cores : individual processing units within a CPU Storage : Disk drives HDD : Hard Disk Drive SSD : Solid
More informationAnnouncements. Reading. Project #1 due in 1 week at 5:00 pm Scheduling Chapter 6 (6 th ed) or Chapter 5 (8 th ed) CMSC 412 S14 (lect 5)
Announcements Reading Project #1 due in 1 week at 5:00 pm Scheduling Chapter 6 (6 th ed) or Chapter 5 (8 th ed) 1 Relationship between Kernel mod and User Mode User Process Kernel System Calls User Process
More informationScheduling By Trackable Resources
Scheduling By Trackable Resources Morris Jette and Dominik Bartkiewicz SchedMD Slurm User Group Meeting 2018 Thanks to NVIDIA for sponsoring this work Goals More flexible scheduling mechanism Especially
More informationSlurm Roll for Rocks Cluster. Werner Saar
Slurm Roll for Rocks Cluster Werner Saar April 14, 2016 Contents 1 Introduction 2 2 Admin Guide 4 3 Users Guide 7 4 GPU Computing 9 5 Green Computing 13 1 Chapter 1 Introduction The Slurm Roll provides
More informationIntroduction to High-Performance Computing (HPC)
Introduction to High-Performance Computing (HPC) Computer components CPU : Central Processing Unit cores : individual processing units within a CPU Storage : Disk drives HDD : Hard Disk Drive SSD : Solid
More informationPBS PROFESSIONAL VS. MICROSOFT HPC PACK
PBS PROFESSIONAL VS. MICROSOFT HPC PACK On the Microsoft Windows Platform PBS Professional offers many features which are not supported by Microsoft HPC Pack. SOME OF THE IMPORTANT ADVANTAGES OF PBS PROFESSIONAL
More informationIntroduction to SLURM on the High Performance Cluster at the Center for Computational Research
Introduction to SLURM on the High Performance Cluster at the Center for Computational Research Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St Buffalo, NY
More informationSlurm Birds of a Feather
Slurm Birds of a Feather Tim Wickberg SchedMD SC17 Outline Welcome Roadmap Review of 17.02 release (Februrary 2017) Overview of upcoming 17.11 (November 2017) release Roadmap for 18.08 and beyond Time
More informationMDHIM: A Parallel Key/Value Store Framework for HPC
MDHIM: A Parallel Key/Value Store Framework for HPC Hugh Greenberg 7/6/2015 LA-UR-15-25039 HPC Clusters Managed by a job scheduler (e.g., Slurm, Moab) Designed for running user jobs Difficult to run system
More informationIntroduction to SLURM & SLURM batch scripts
Introduction to SLURM & SLURM batch scripts Anita Orendt Assistant Director Research Consulting & Faculty Engagement anita.orendt@utah.edu 6 February 2018 Overview of Talk Basic SLURM commands SLURM batch
More information2. The system of... generally ran one job at a time. These were called single stream batch processing.
Set 1 1. Which of the following is/ are the part of operating system? A) Kernel services B) Library services C) Application level services D) All of the above 2. The system of... generally ran one job
More informationExecuting dynamic heterogeneous workloads on Blue Waters with RADICAL-Pilot
Executing dynamic heterogeneous workloads on Blue Waters with RADICAL-Pilot Research in Advanced DIstributed Cyberinfrastructure & Applications Laboratory (RADICAL) Rutgers University http://radical.rutgers.edu
More informationSlurm Roadmap. Danny Auble, Morris Jette, Tim Wickberg SchedMD. Slurm User Group Meeting Copyright 2017 SchedMD LLC https://www.schedmd.
Slurm Roadmap Danny Auble, Morris Jette, Tim Wickberg SchedMD Slurm User Group Meeting 2017 HPCWire apparently does awards? Best HPC Cluster Solution or Technology https://www.hpcwire.com/2017-annual-hpcwire-readers-choice-awards/
More informationDirections in Workload Management
Directions in Workload Management Alex Sanchez and Morris Jette SchedMD LLC HPC Knowledge Meeting 2016 Areas of Focus Scalability Large Node and Core Counts Power Management Failure Management Federated
More information2/26/2017. For instance, consider running Word Count across 20 splits
Based on the slides of prof. Pietro Michiardi Hadoop Internals https://github.com/michiard/disc-cloud-course/raw/master/hadoop/hadoop.pdf Job: execution of a MapReduce application across a data set Task:
More informationRunning in parallel. Total number of cores available after hyper threading (virtual cores)
First at all, to know how many processors/cores you have available in your computer, type in the terminal: $> lscpu The output for this particular workstation is the following: Architecture: x86_64 CPU
More informationPorting SLURM to the Cray XT and XE. Neil Stringfellow and Gerrit Renker
Porting SLURM to the Cray XT and XE Neil Stringfellow and Gerrit Renker Background Cray XT/XE basics Cray XT systems are among the largest in the world 9 out of the top 30 machines on the top500 list June
More informationMicrosoft SQL Server Fix Pack 15. Reference IBM
Microsoft SQL Server 6.3.1 Fix Pack 15 Reference IBM Microsoft SQL Server 6.3.1 Fix Pack 15 Reference IBM Note Before using this information and the product it supports, read the information in Notices
More informationOBTAINING AN ACCOUNT:
HPC Usage Policies The IIA High Performance Computing (HPC) System is managed by the Computer Management Committee. The User Policies here were developed by the Committee. The user policies below aim to
More informationAnnouncements. Program #1. Program #0. Reading. Is due at 9:00 AM on Thursday. Re-grade requests are due by Monday at 11:59:59 PM.
Program #1 Announcements Is due at 9:00 AM on Thursday Program #0 Re-grade requests are due by Monday at 11:59:59 PM Reading Chapter 6 1 CPU Scheduling Manage CPU to achieve several objectives: maximize
More informationSlurm basics. Summer Kickstart June slide 1 of 49
Slurm basics Summer Kickstart 2017 June 2017 slide 1 of 49 Triton layers Triton is a powerful but complex machine. You have to consider: Connecting (ssh) Data storage (filesystems and Lustre) Resource
More informationIRIX Resource Management Plans & Status
IRIX Resource Management Plans & Status Dan Higgins Engineering Manager, Resource Management Team, SGI E-mail: djh@sgi.com CUG Minneapolis, May 1999 Abstract This paper will detail what work has been done
More informationBatch Systems & Parallel Application Launchers Running your jobs on an HPC machine
Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike
More informationBright Cluster Manager
Bright Cluster Manager Using Slurm for Data Aware Scheduling in the Cloud Martijn de Vries CTO About Bright Computing Bright Computing 1. Develops and supports Bright Cluster Manager for HPC systems, server
More informationResource Management using SLURM
Resource Management using SLURM The 7 th International Conference on Linux Clusters University of Oklahoma May 1, 2006 Morris Jette (jette1@llnl.gov) Lawrence Livermore National Laboratory http://www.llnl.gov/linux/slurm
More informationMultiprocessor Scheduling. Multiprocessor Scheduling
Multiprocessor Scheduling Will consider only shared memory multiprocessor or multi-core CPU Salient features: One or more caches: cache affinity is important Semaphores/locks typically implemented as spin-locks:
More informationMultiprocessor Scheduling
Multiprocessor Scheduling Will consider only shared memory multiprocessor or multi-core CPU Salient features: One or more caches: cache affinity is important Semaphores/locks typically implemented as spin-locks:
More informationOPERATING SYSTEMS CS3502 Spring Processor Scheduling. Chapter 5
OPERATING SYSTEMS CS3502 Spring 2018 Processor Scheduling Chapter 5 Goals of Processor Scheduling Scheduling is the sharing of the CPU among the processes in the ready queue The critical activities are:
More information(MCQZ-CS604 Operating Systems)
command to resume the execution of a suspended job in the foreground fg (Page 68) bg jobs kill commands in Linux is used to copy file is cp (Page 30) mv mkdir The process id returned to the child process
More informationChoosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 FAS Research Computing
Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 Email:plamenkrastev@fas.harvard.edu Objectives Inform you of available computational resources Help you choose appropriate computational
More informationChapter 5: CPU Scheduling
Chapter 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Thread Scheduling Multiple-Processor Scheduling Operating Systems Examples Algorithm Evaluation Chapter 5: CPU Scheduling
More informationSmartSuspend. Achieve 100% Cluster Utilization. Technical Overview
SmartSuspend Achieve 100% Cluster Utilization Technical Overview 2011 Jaryba, Inc. SmartSuspend TM Technical Overview 1 Table of Contents 1.0 SmartSuspend Overview 3 2.0 How SmartSuspend Works 3 3.0 Job
More informationMERCED CLUSTER BASICS Multi-Environment Research Computer for Exploration and Discovery A Centerpiece for Computational Science at UC Merced
MERCED CLUSTER BASICS Multi-Environment Research Computer for Exploration and Discovery A Centerpiece for Computational Science at UC Merced Sarvani Chadalapaka HPC Administrator University of California
More informationSLURM Event Handling. Bill Brophy
SLURM Event Handling Bill Brophy email:bill.brophy@bull.com Outline SLURM HA Support Event Types STRIGGER Mechanism BULL Usage 2 Bull, 2011 SLURM User Group 2011 SLURM HA Support HA Support is an Integral
More informationPROOF-Condor integration for ATLAS
PROOF-Condor integration for ATLAS G. Ganis,, J. Iwaszkiewicz, F. Rademakers CERN / PH-SFT M. Livny, B. Mellado, Neng Xu,, Sau Lan Wu University Of Wisconsin Condor Week, Madison, 29 Apr 2 May 2008 Outline
More informationSlurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT November 13, 2013
Slurm and Abel job scripts Katerina Michalickova The Research Computing Services Group SUF/USIT November 13, 2013 Abel in numbers Nodes - 600+ Cores - 10000+ (1 node->2 processors->16 cores) Total memory
More informationM. Roehrig, Sandia National Laboratories. Philipp Wieder, Research Centre Jülich Nov 2002
Category: INFORMATIONAL Grid Scheduling Dictionary WG (SD-WG) M. Roehrig, Sandia National Laboratories Wolfgang Ziegler, Fraunhofer-Institute for Algorithms and Scientific Computing Philipp Wieder, Research
More informationJob sample: SCOPE (VLDBJ, 2012)
Apollo High level SQL-Like language The job query plan is represented as a DAG Tasks are the basic unit of computation Tasks are grouped in Stages Execution is driven by a scheduler Job sample: SCOPE (VLDBJ,
More informationMoab Workload Manager on Cray XT3
Moab Workload Manager on Cray XT3 presented by Don Maxwell (ORNL) Michael Jackson (Cluster Resources, Inc.) MOAB Workload Manager on Cray XT3 Why MOAB? Requirements Features Support/Futures 2 Why Moab?
More informationTowards Exascale: Leveraging InfiniBand to accelerate the performance and scalability of Slurm jobstart.
Towards Exascale: Leveraging InfiniBand to accelerate the performance and scalability of Slurm jobstart. Artem Y. Polyakov, Joshua S. Ladd, Boris I. Karasev Nov 16, 2017 Agenda Problem description Slurm
More informationSlurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT October 23, 2012
Slurm and Abel job scripts Katerina Michalickova The Research Computing Services Group SUF/USIT October 23, 2012 Abel in numbers Nodes - 600+ Cores - 10000+ (1 node->2 processors->16 cores) Total memory
More informationOPERATING SYSTEM. The Process. Introduction Process creation & termination Process state diagram Process scheduling & its criteria
OPERATING SYSTEM The Process Introduction Process creation & termination Process state diagram Process scheduling & its criteria Process The concept of process is fundamental to the structure of operating
More informationRWTH GPU-Cluster. Sandra Wienke March Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky
RWTH GPU-Cluster Fotos: Christian Iwainsky Sandra Wienke wienke@rz.rwth-aachen.de March 2012 Rechen- und Kommunikationszentrum (RZ) The GPU-Cluster GPU-Cluster: 57 Nvidia Quadro 6000 (29 nodes) innovative
More informationOPERATING SYSTEM SUPPORT (Part 1)
Eastern Mediterranean University School of Computing and Technology ITEC255 Computer Organization & Architecture OPERATING SYSTEM SUPPORT (Part 1) Introduction The operating system (OS) is the software
More informationHigh Performance Computing Cluster Advanced course
High Performance Computing Cluster Advanced course Jeremie Vandenplas, Gwen Dawes 9 November 2017 Outline Introduction to the Agrogenomics HPC Submitting and monitoring jobs on the HPC Parallel jobs on
More informationCloudBATCH: A Batch Job Queuing System on Clouds with Hadoop and HBase. Chen Zhang Hans De Sterck University of Waterloo
CloudBATCH: A Batch Job Queuing System on Clouds with Hadoop and HBase Chen Zhang Hans De Sterck University of Waterloo Outline Introduction Motivation Related Work System Design Future Work Introduction
More informationImproving overall performance and energy consumption of your cluster with remote GPU virtualization
Improving overall performance and energy consumption of your cluster with remote GPU virtualization Federico Silla & Carlos Reaño Technical University of Valencia Spain Tutorial Agenda 9.00-10.00 SESSION
More informationCPU Scheduling. CSE 2431: Introduction to Operating Systems Reading: Chapter 6, [OSC] (except Sections )
CPU Scheduling CSE 2431: Introduction to Operating Systems Reading: Chapter 6, [OSC] (except Sections 6.7.2 6.8) 1 Contents Why Scheduling? Basic Concepts of Scheduling Scheduling Criteria A Basic Scheduling
More informationImproving User Accounting and Isolation with Linux Kernel Features. Brian Bockelman Condor Week 2011
Improving User Accounting and Isolation with Linux Kernel Features Brian Bockelman Condor Week 2011 Case Study: MPD The MPICH2 library is a common implementation of the MPI interface, a popular parallel
More informationName Department/Research Area Have you used the Linux command line?
Please log in with HawkID (IOWA domain) Macs are available at stations as marked To switch between the Windows and the Mac systems, press scroll lock twice 9/27/2018 1 Ben Rogers ITS-Research Services
More informationHow to run a job on a Cluster?
How to run a job on a Cluster? Cluster Training Workshop Dr Samuel Kortas Computational Scientist KAUST Supercomputing Laboratory Samuel.kortas@kaust.edu.sa 17 October 2017 Outline 1. Resources available
More informationHTC Brief Instructions
HTC Brief Instructions Version 18.08.2018 University of Paderborn Paderborn Center for Parallel Computing Warburger Str. 100, D-33098 Paderborn http://pc2.uni-paderborn.de/ 2 HTC BRIEF INSTRUCTIONS Table
More informationCase study of a computing center: Accounts, Priorities and Quotas
Afficher le masque pour Insérer le titre ici Direction Informatique 05/02/2015 Case study of a computing center: Accounts, Priorities and Quotas Michel Ringenbach mir@unistra.fr HPC Center, Université
More informationOVERVIEW OF THE SAS GRID
OVERVIEW OF THE SAS GRID Host Caroline Scottow Presenter Peter Hobart MANAGING THE WEBINAR In Listen Mode Control bar opened with the white arrow in the orange box Copyr i g ht 2012, SAS Ins titut e Inc.
More informationSolved MCQs on Operating System Principles. Set-1
Solved MCQs on Operating System Principles Set-1 1. Which of the following is/ are the part of operating system? A) Kernel services B) Library services C) Application level services D) All of the above
More informationBatch Systems. Running calculations on HPC resources
Batch Systems Running calculations on HPC resources Outline What is a batch system? How do I interact with the batch system Job submission scripts Interactive jobs Common batch systems Converting between
More informationX Grid Engine. Where X stands for Oracle Univa Open Son of more to come...?!?
X Grid Engine Where X stands for Oracle Univa Open Son of more to come...?!? Carsten Preuss on behalf of Scientific Computing High Performance Computing Scheduler candidates LSF too expensive PBS / Torque
More informationExtending SLURM with Support for GPU Ranges
Available on-line at www.prace-ri.eu Partnership for Advanced Computing in Europe Extending SLURM with Support for GPU Ranges Seren Soner a, Can Özturana,, Itir Karac a a Computer Engineering Department,
More informationRESOURCE MANAGEMENT MICHAEL ROITZSCH
Department of Computer Science Institute for System Architecture, Operating Systems Group RESOURCE MANAGEMENT MICHAEL ROITZSCH AGENDA done: time, drivers today: misc. resources architectures for resource
More informationGraham vs legacy systems
New User Seminar Graham vs legacy systems This webinar only covers topics pertaining to graham. For the introduction to our legacy systems (Orca etc.), please check the following recorded webinar: SHARCNet
More informationChapter 13: I/O Systems
Chapter 13: I/O Systems Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations Streams Performance 13.2 Silberschatz, Galvin
More informationMid Term from Feb-2005 to Nov 2012 CS604- Operating System
Mid Term from Feb-2005 to Nov 2012 CS604- Operating System Latest Solved from Mid term Papers Resource Person Hina 1-The problem with priority scheduling algorithm is. Deadlock Starvation (Page# 84) Aging
More informationSPOS MODEL ANSWER MAY 2018
SPOS MODEL ANSWER MAY 2018 Q 1. a ) Write Algorithm of pass I of two pass assembler. [5] Ans :- begin if starting address is given LOCCTR = starting address; else LOCCTR = 0; while OPCODE!= END do ;; or
More informationSlurm Burst Buffer Support
Slurm Burst Buffer Support Tim Wickberg (SchedMD LLC) SC15 Burst Buffer Overview A cluster-wide high-performance storage resource Burst buffer (BB) support added Slurm version 15.08 Two types of BB allocations:
More informationIntroduction to the Cluster
Follow us on Twitter for important news and updates: @ACCREVandy Introduction to the Cluster Advanced Computing Center for Research and Education http://www.accre.vanderbilt.edu The Cluster We will be
More informationSubmitting and running jobs on PlaFRIM2 Redouane Bouchouirbat
Submitting and running jobs on PlaFRIM2 Redouane Bouchouirbat Summary 1. Submitting Jobs: Batch mode - Interactive mode 2. Partition 3. Jobs: Serial, Parallel 4. Using generic resources Gres : GPUs, MICs.
More informationChanging landscape of computing at BNL
Changing landscape of computing at BNL Shared Pool and New Users and Tools HTCondor Week May 2018 William Strecker-Kellogg Shared Pool Merging 6 HTCondor Pools into 1 2 What? Current Situation
More informationAnnouncement. Exercise #2 will be out today. Due date is next Monday
Announcement Exercise #2 will be out today Due date is next Monday Major OS Developments 2 Evolution of Operating Systems Generations include: Serial Processing Simple Batch Systems Multiprogrammed Batch
More informationOperating System. Chapter 4. Threads. Lynn Choi School of Electrical Engineering
Operating System Chapter 4. Threads Lynn Choi School of Electrical Engineering Process Characteristics Resource ownership Includes a virtual address space (process image) Ownership of resources including
More informationChapter 13: I/O Systems
Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations Streams Performance I/O Hardware Incredible variety of I/O devices Common
More informationIntroduction to Abel/Colossus and the queuing system
Introduction to Abel/Colossus and the queuing system November 14, 2018 Sabry Razick Research Infrastructure Services Group, USIT Topics First 7 slides are about us and links The Research Computing Services
More informationChapter 3. Design of Grid Scheduler. 3.1 Introduction
Chapter 3 Design of Grid Scheduler The scheduler component of the grid is responsible to prepare the job ques for grid resources. The research in design of grid schedulers has given various topologies
More informationIntroduction to Process in Computing Systems SEEM
Introduction to Process in Computing Systems SEEM 3460 1 Programs and Processes One way to describe the hardware of a computer system is to say that it provides a framework for executing programs and storing
More informationA declarative programming style job submission filter.
A declarative programming style job submission filter. Douglas Jacobsen Computational Systems Group Lead NERSC -1- Slurm User Group 2018 NERSC Vital Statistics 860 projects 7750 users Edison NERSC-7 Cray
More informationDuke Compute Cluster Workshop. 11/10/2016 Tom Milledge h:ps://rc.duke.edu/
Duke Compute Cluster Workshop 11/10/2016 Tom Milledge h:ps://rc.duke.edu/ rescompu>ng@duke.edu Outline of talk Overview of Research Compu>ng resources Duke Compute Cluster overview Running interac>ve and
More informationUsing Docker in High Performance Computing in OpenPOWER Environment
Using Docker in High Performance Computing in OpenPOWER Environment Zhaohui Ding, Senior Product Architect Sam Sanjabi, Advisory Software Engineer IBM Platform Computing #OpenPOWERSummit Join the conversation
More informationIntroduction to the Cluster
Introduction to the Cluster Advanced Computing Center for Research and Education http://www.accre.vanderbilt.edu Follow us on Twitter for important news and updates: @ACCREVandy The Cluster We will be
More informationBlackBerry Enterprise Server for Microsoft Office 365. Version: 1.0. Administration Guide
BlackBerry Enterprise Server for Microsoft Office 365 Version: 1.0 Administration Guide Published: 2013-01-29 SWD-20130131125552322 Contents 1 Related resources... 18 2 About BlackBerry Enterprise Server
More informationIntroduction to SLURM & SLURM batch scripts
Introduction to SLURM & SLURM batch scripts Anita Orendt Assistant Director Research Consulting & Faculty Engagement anita.orendt@utah.edu 16 Feb 2017 Overview of Talk Basic SLURM commands SLURM batch
More informationLSF HPC :: getting most out of your NUMA machine
Leopold-Franzens-Universität Innsbruck ZID Zentraler Informatikdienst (ZID) LSF HPC :: getting most out of your NUMA machine platform computing conference, Michael Fink who we are & what we do university
More informationIntroduction to the NCAR HPC Systems. 25 May 2018 Consulting Services Group Brian Vanderwende
Introduction to the NCAR HPC Systems 25 May 2018 Consulting Services Group Brian Vanderwende Topics to cover Overview of the NCAR cluster resources Basic tasks in the HPC environment Accessing pre-built
More informationProcess Description and Control. Chapter 3
Process Description and Control 1 Chapter 3 2 Processes Working definition: An instance of a program Processes are among the most important abstractions in an OS all the running software on a computer,
More informationSLURM. User's Guide. ITS Research Computing Northeastern University Nilay K Roy, PhD
SLURM User's Guide ITS Research Computing Northeastern University Nilay K Roy, PhD Table of Contents Chapter 1. SLURM Overview... 1 1.1 SLURM Key Functions... 1 1.2 SLURM Components... 2 1.3 SLURM Daemons...
More informationIntroduction to High Performance Computing and an Statistical Genetics Application on the Janus Supercomputer. Purpose
Introduction to High Performance Computing and an Statistical Genetics Application on the Janus Supercomputer Daniel Yorgov Department of Mathematical & Statistical Sciences, University of Colorado Denver
More informationSlurm at CEA. status and evolutions. 13 septembre 2013 CEA 10 AVRIL 2012 PAGE 1. SLURM User Group - September 2013 F. Belot, F. Diakhaté, M.
status and evolutions SLURM User Group - September 2013 F. Belot, F. Diakhaté, M. Hautreux 13 septembre 2013 CEA 10 AVRIL 2012 PAGE 1 Agenda Supercomputing projects Slurm usage and configuration specificities
More information