Jobs Resource Utilization as a Metric for Clusters Comparison and Optimization. Slurm User Group Meeting 9-10 October, 2012

Similar documents
Monitoring ARC services with GangliARC

Data Transfers in the Grid: Workload Analysis of Globus GridFTP

Diffusing Your Mobile Apps: Extending In-Network Function Virtualisation to Mobile Function Offloading

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

Cluster Computing. Resource and Job Management for HPC 16/08/2010 SC-CAMP. ( SC-CAMP) Cluster Computing 16/08/ / 50

PRESENTATION TITLE GOES HERE

A Container On a Virtual Machine On an HPC? Presentation to HPC Advisory Council. Perth, July 31-Aug 01, 2017

Altix Usage and Application Programming

Performance Measurement and Evaluation Tool for Large-scale Systems

MOHA: Many-Task Computing Framework on Hadoop

Eindhoven University of Technology - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

The Power of Prediction: Cloud Bandwidth and Cost Reduction

SparkBench: A Comprehensive Spark Benchmarking Suite Characterizing In-memory Data Analytics

The Fusion Distributed File System

Large-scale cluster management at Google with Borg

VxRail: Level Up with New Capabilities and Powers GLOBAL SPONSORS

Predicting Web Service Levels During VM Live Migrations

UNITE Managing Metering: How to Plan for, Tune and Monitor Your Metered Platform. Session MCP3040/OS3040. Guy Bonney & Bob Morrow, MGS, Inc.

EECS 428 Final Project Report Distributed Real-Time Process Control Over TCP and the Internet Brian Robinson

Introduction to Grid Computing

Data Intensive processing with irods and the middleware CiGri for the Whisper project Xavier Briand

Managing CAE Simulation Workloads in Cluster Environments

Image Processing on the Cloud. Outline

Monthly SEO Report. Example Client 16 November 2012 Scott Lawson. Date. Prepared by

Technical University of Munich - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

EDUCATION RESEARCH EXPERIENCE

Preemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization

University of Rochester - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

National Instruments Software Archive - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Server Virtualization and Optimization at HSBC. John Gibson Chief Technical Specialist HSBC Bank plc

On the Performance of MapReduce: A Stochastic Approach

davidklee.net gplus.to/kleegeek linked.com/a/davidaklee

SOFT 437. Software Performance Analysis. Ch 7&8:Software Measurement and Instrumentation

UAntwerpen, 24 June 2016

Task-based distributed processing for radio-interferometric imaging with CASA

High Performance Computing Cloud - a PaaS Perspective

RELIABILITY IN CLOUD COMPUTING SYSTEMS: SESSION 1

Advancing the Art of Internet Edge Outage Detection

University of California - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

DAS LRS Monthly Service Report

Undergraduate Admission File

APENet: LQCD clusters a la APE

Geant4 on Azure using Docker containers

AVM Networks - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

High-density Grid storage system optimization at ASGC. Shu-Ting Liao ASGC Operation team ISGC 2011

Statistical Methods in Trending. Ron Spivey RETIRED Associate Director Global Complaints Tending Alcon Laboratories

Parallel Storage Systems for Large-Scale Machines

Modeling and Tolerating Heterogeneous Failures in Large Parallel Systems

Section 1.2: What is a Function? y = 4x

A More Realistic Way of Stressing the End-to-end I/O System

CPU Scheduling. CSE 2431: Introduction to Operating Systems Reading: Chapter 6, [OSC] (except Sections )

Chapter 9 Memory Management

The Elasticity and Plasticity in Semi-Containerized Colocating Cloud Workload: a view from Alibaba Trace

Logging Mechanism. Cisco Logging Mechanism

Introduction to High Performance Parallel I/O

Moab, TORQUE, and Gold in a Heterogeneous, Federated Computing System at the University of Michigan

New Concept for Article 36 Networking and Management of the List

Performance Extrapolation for Load Testing Results of Mixture of Applications

3. EXCEL FORMULAS & TABLES

XEmacs Project Archive - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Outline. The demand The San Jose NAP. What s the Problem? Most things. Time. Part I AN OVERVIEW OF HARDWARE ISSUES FOR IP AND ATM.

Tuning WebHound 4.0 and SAS 8.2 for Enterprise Windows Systems James R. Lebak, Unisys Corporation, Malvern, PA

Chap 7, 8: Scheduling. Dongkun Shin, SKKU

BigDataBench-MT: Multi-tenancy version of BigDataBench

software.sci.utah.edu (Select Visitors)

Seattle (NWMLS Areas: 140, 380, 385, 390, 700, 701, 705, 710) Summary

PowerVault MD3 SSD Cache Overview

Measuring VDI Fitness and User Experience Technical White Paper

Building Simulation Modelers Are we big data ready? Jibonananda Sanyal and Joshua New Oak Ridge National Laboratory

Penetrating the Matrix Justin Z. Smith, William Gui Zupko II, U.S. Census Bureau, Suitland, MD

The Why and How of HPC-Cloud Hybrids with OpenStack

Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 FAS Research Computing

UBS Data Center Efficiency Strategy

Ivane Javakhishvili Tbilisi State University High Energy Physics Institute HEPI TSU

National Aeronautics and Space Admin. - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Fall COMP3511 Review

Excel Functions & Tables

Wide-Area Reliability Monitoring and Visualization Tools

Characterization and Modeling of Deleted Questions on Stack Overflow

The Oracle Database Appliance I/O and Performance Architecture

Latency on preemptible Real-Time Linux

Using BigSim to Estimate Application Performance

Milestone Solution Partner IT Infrastructure Components Certification Report

University of Valencia - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Linux process scheduling. David Morgan

MULTITHERMAN: Out-of-band High-Resolution HPC Power and Performance Monitoring Support for Big-Data Analysis

Was ist dran an einer spezialisierten Data Warehousing platform?

ACTIVE MICROSOFT CERTIFICATIONS:

SAS Scalable Performance Data Server 4.3

GWDG Software Archive - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Datura The new HPC-Plant at Albert Einstein Institute

G Robert Grimm New York University

Milestone Solution Partner IT Infrastructure Components Certification Report

Outline. March 5, 2012 CIRMMT - McGill University 2

Opera Web Browser Archive - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Monitoring and Analyzing Virtual Machines Resource Overcommitment Detection and Virtual Machine Classification

Real-time Scheduling of Skewed MapReduce Jobs in Heterogeneous Environments

University of Stuttgart - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Performance Pack. Benchmarking with PlanetPress Connect and PReS Connect

OBTAINING AN ACCOUNT:

Transcription:

Jobs Resource Utilization as a Metric for Clusters Comparison and Optimization Joseph Emeras Cristian Ruiz Jean-Marc Vincent Olivier Richard Slurm User Group Meeting 9-10 October, 2012

INRIA - MESCAL Jobs Resource Utilization 2 / 26 Tracing of jobs consumptions for about a year Processor, disk, memory, network, but unfortunately not energy 2 production clusters Analyze the processor resource consumption vs. the resource allocation Results were not exactly what we expected...

Outline INRIA - MESCAL Jobs Resource Utilization 3 / 26 1 Context CIMENT Mesocenter System Utilization Metric 2 Monitoring Tool 3 Analysis Data Processing Resource Utilization Criterion 4 Discussion

Outline INRIA - MESCAL Jobs Resource Utilization Context 4 / 26 1 Context CIMENT Mesocenter System Utilization Metric 2 Monitoring Tool 3 Analysis Data Processing Resource Utilization Criterion 4 Discussion

CIMENT Mesocenter Concepts Mesocenter, several universities and labs share computational resources. Lightweight grid software CiGri. Submission of multi-parametric jobs, bag of task, big amount of jobs. Based on the submission of Low priority jobs, preemptible called Best-eort jobs. Improve the cluster utilization, by using unused space (free resources). Production clusters. Various domains : Chemistry, Astronomy, Physics, etc. Underlying Batch Scheduler is OAR (it could have been Slurm...) INRIA - MESCAL Jobs Resource Utilization Context 5 / 26

CIGRI problematics INRIA - MESCAL Jobs Resource Utilization Context 6 / 26 Questions Best-Eort jobs seems a good idea, platform resources are highly allocated. But is it really a good thing to over-allocate? (bottlenecks?) Other optimizations possible? We need data from jobs consumptions. OAR, as SLURM provides an accounting plugin, gives means. We suspected that consumption means are not enough, we have very long jobs (days). Indeed, 30% of the jobs had a std dev over 20%, for CPU consumption. We need ner granularity.

System Utilization as a metric INRIA - MESCAL Jobs Resource Utilization Context 7 / 26 Metric commonly used to evaluate computational infrastructures utilization. Denition : ratio of the computing resources allocated to the jobs over the available resources in the system. Downsides : misses information about the computer sub-systems (disk, memory, processor). Jobs resource utilization.

INRIA - MESCAL Jobs Resource Utilization Context 8 / 26 5e+05 type Computational utilization System_utilization 4e+05 3e+05 Utilization 2e+05 1e+05 0e+00 Jun 11 Jul 11 Aug 11 Sep 11 Oct 11 Nov 11 Dec 11 Jan 12 Feb 12 Mar 12 Time Figure: System Utilization vs Computational Resource Utilization (for normal jobs from one of the CIMENT Clusters)

Outline INRIA - MESCAL Jobs Resource Utilization Monitoring Tool 9 / 26 1 Context CIMENT Mesocenter System Utilization Metric 2 Monitoring Tool 3 Analysis Data Processing Resource Utilization Criterion 4 Discussion

Jobs traces INRIA - MESCAL Jobs Resource Utilization Monitoring Tool 10 / 26 The idea is to have a trace of resources consumption per job. Dierent from other monitoring approaches such as Ganglia[2], Nagios[1], etc. Job centric monitor. Information about (CPU, memory, IO, Network) consumption. Figure: Job CPU consumption over time

Choices for Tracing Jobs Consumptions INRIA - MESCAL Jobs Resource Utilization Monitoring Tool 11 / 26 Characteristics Independent : No synchronization among the nodes. Use of mechanisms supported by most of the batch schedulers. Lightweight ( Sampling approach ) 0.35% speed down.

Choices for Tracing Jobs Consumptions INRIA - MESCAL Jobs Resource Utilization Monitoring Tool 11 / 26 Characteristics Independent : No synchronization among the nodes. Use of mechanisms supported by most of the batch schedulers. Lightweight ( Sampling approach ) 0.35% speed down. Lightweight approach 1 min frequency sampling, to avoid storage overhead and to not interfere with jobs execution. Capture not expensive using /proc directory. Data processing done o-line.

Clusters characteristics INRIA - MESCAL Jobs Resource Utilization Monitoring Tool 12 / 26 Foehn Gofree CPU Model Xeon X5550 Xeon L5640 Nodes 16 28 CPU cores 128 336 Total memory 480 GB 2016 GB Memory/node 24/48 GB 72 GB Total storage 7 TB 30 TB Network IB DDR IB QDR Total Gop/s 1367.04 3177.6 Buy date 2010-03-01 2011-01-01

Trace summary INRIA - MESCAL Jobs Resource Utilization Monitoring Tool 13 / 26 Foehn Gofree Capture start 2011-06-01 2011-05-24 Capture months 9 9 Number of jobs 41230 9052 Log Size 2.5 Gb 3.0 Gb Besteort jobs 38558 5451 Normal jobs 2672 3601

Outline INRIA - MESCAL Jobs Resource Utilization Analysis 14 / 26 1 Context CIMENT Mesocenter System Utilization Metric 2 Monitoring Tool 3 Analysis Data Processing Resource Utilization Criterion 4 Discussion

Data Processing Pre-analysis O-line data processing Traces Correlation Batch Scheduler and Resource Manager logs (SWF Format) INRIA - MESCAL Jobs Resource Utilization Analysis 15 / 26

Standard Workload Format INRIA - MESCAL Jobs Resource Utilization Analysis 16 / 26 SWF Format Headers comments Fields Version Job Number Computer Submit Time Installation Wait Time Acknowledge Run Time Information Number of Allocated Processors Conversion Average CPU Time Used MaxRecords Used Memory Preemption Requested Number of Processors UnixStartTime Requested Time TimeZone Requested Memory TimeZoneString Status StartTime User ID Endtime Group ID MaxNodes Executable Number MaxProcs Queue Number MaxRuntime Partition Number MaxMemory Preceding Job Number AllowOveruse Think Time form Proceeding Job MaxQueues Queues Queue MaxPartitions Partitions Partition Note

Data Processing INRIA - MESCAL Jobs Resource Utilization Analysis 17 / 26 Pre-analysis O-line data processing Traces Correlation Batch Scheduler and Resource Manager logs (SWF Format) Jobs resource consumption logs

Jobs Resource Consumption Logs INRIA - MESCAL Jobs Resource Utilization Analysis 18 / 26 Trace Fields Name Description Time Unix Time Stamp in seconds JOB ID Job id assigned by the batch scheduler PID PID of process that belongs to the job Node ID Provenance of the capture Measure List of measures of the resource consumptions Measure (simplied version) Name Description command Name of the binary executed memory_faults Number of memory faults and their type virtual_memory Virtual memory size pages Pages used by process and their type IO_r_w Num of bytes read/written from/to the storage layer core_usage Core utilization percentage net_r_w Network Read and Written Bytes......

Resource Utilization Criterion INRIA - MESCAL Jobs Resource Utilization Analysis 19 / 26 Criterion presentation Match Resource Consumption with Resource Allocation X axis : allocated resources over available Y axis : core consumption in function of X Green diagonal : Theoretical optimum, distances from the line = computing power loss Density graphs : darker point means denser distribution (alpha=0.1) Gofree Cluster Foehn Cluster

Gofree Cluster INRIA - MESCAL Jobs Resource Utilization Analysis 20 / 26 Gofree BestEort Jobs Full consumption of reserved cores BestEort jobs are core ecient. Gofree Normal Jobs Red stripe : 80% of the values. Linear regression gave a 0.998 slope. (red stripe : stripe between the optimum and the mean of the distances from the optimum)

Foehn Cluster INRIA - MESCAL Jobs Resource Utilization Analysis 21 / 26 SA = alloc j run_time j. j jobs Foehn BestEort Jobs 2 patterns : one optimal, one 1/4. One user with particular needs for memory bandwidth. Foehn Normal Jobs 2 low-utilization clouds. Outliers (up to 110%) due to /proc roundings when consumption jitter.

Users Special Needs INRIA - MESCAL Jobs Resource Utilization Analysis 22 / 26 Needs not expressible by the user to the batch scheduler? IO/Memory Bandwidth Constraint accept the loss? modify batch scheduler? bandwidth as a resource?

Outline INRIA - MESCAL Jobs Resource Utilization Discussion 23 / 26 1 Context CIMENT Mesocenter System Utilization Metric 2 Monitoring Tool 3 Analysis Data Processing Resource Utilization Criterion 4 Discussion

Discussion INRIA - MESCAL Jobs Resource Utilization Discussion 24 / 26 System Instrumentation Interesting and at little cost (our implementation : 0.35% speed-down) Need to correlate with other metrics (memory size and bandwidth usage, IO) Results Processor consumption on 2 clusters of same grid can be very dierent Users behaviors impact Shows Batch Scheduler request constraint lacks Future Works Characterize jobs consumption patterns Learn from past, classify couple <User,Code> On-Line tagging of jobs at submission, useful to anticipate IO intensive jobs Ideas SWF : elds for Max Memory, particular user constraint (license, hardware, locality) Slurm : plugin for tracing the jobs by sampling

Data available in a git repository INRIA - MESCAL Jobs Resource Utilization Discussion 25 / 26

Thank you for your attention INRIA - MESCAL Jobs Resource Utilization Discussion 26 / 26 joseph.emeras@imag.fr

INRIA - MESCAL Jobs Resource Utilization Discussion 26 / 26 Emir Imamagic and Dobrisa Dobrenic. Grid infrastructure monitoring system based on nagios. In Proceedings of the 2007 workshop on Grid monitoring, GMW '07, pages 2328, New York, NY, USA, 2007. ACM. Matthew L. Massie, Brent N. Chun, and David E. Culler. The ganglia distributed monitoring system : Design, implementation and experience, 2004.