MULTITHERMAN: Out-of-band High-Resolution HPC Power and Performance Monitoring Support for Big-Data Analysis

Similar documents
MULTITHERMAN: Out-of-band High-Resolution HPC Power and Performance Monitoring Support for Big-Data Analysis

A Dwarf in a Giant Embedded systems in High Performance Computing. Andrea Bartolini

D.A.V.I.D.E. (Development of an Added-Value Infrastructure Designed in Europe) IWOPH 17 E4. WHEN PERFORMANCE MATTERS

Scaling performance In Power Limited HPC Systems

CAS 2K13 Sept Jean-Pierre Panziera Chief Technology Director

Energy Efficiency Tuning: READEX. Madhura Kumaraswamy Technische Universität München

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems

GOING ARM A CODE PERSPECTIVE

A Breakthrough in Non-Volatile Memory Technology FUJITSU LIMITED

Power Attack Defense: Securing Battery-Backed Data Centers

Slurm BOF SC13 Bull s Slurm roadmap

INtelligent solutions 2ward the Development of Railway Energy and Asset Management Systems in Europe.

Employing HPC DEEP-EST for HEP Data Analysis. Viktor Khristenko (CERN, DEEP-EST), Maria Girone (CERN)

HPC projects. Grischa Bolls

Fluentd + MongoDB + Spark = Awesome Sauce

Approaches to I/O Scalability Challenges in the ECMWF Forecasting System

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING

IBM CORAL HPC System Solution

Energy efficient mapping of virtual machines

Flash Storage Complementing a Data Lake for Real-Time Insight

Chapter 1. Introduction

Deliverable D7.4 Report on the integration of the infrastructure with the new Services/Aggregator platforms V1.0

Tools and Methodology for Ensuring HPC Programs Correctness and Performance. Beau Paisley

@joerg_schad Nightmares of a Container Orchestration System

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.

Challenges in HPC I/O

LAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015

Performance analysis basics

Portable Power/Performance Benchmarking and Analysis with WattProf

Data Management and Analysis for Energy Efficient HPC Centers

I/O Monitoring at JSC, SIONlib & Resiliency

CERN openlab & IBM Research Workshop Trip Report

Machine Learning In A Snap. Thomas Parnell Research Staff Member IBM Research - Zurich

An Open Accelerator Infrastructure Project for OCP Accelerator Module (OAM)

Cloud Computing Economies of Scale

IPv6 and Energy Efficiency in School Networks

Techniques to improve the scalability of Checkpoint-Restart

ECMWF's Next Generation IO for the IFS Model and Product Generation

Qualys Cloud Platform

What Can We Learn from Four Years of Data Center Hardware Failures? Guosai Wang, Lifei Zhang, Wei Xu

Auto Management for Apache Kafka and Distributed Stateful System in General

IoT Sensor Analytics with Apache Kafka, KSQL and TensorFlow

LRZ SuperMUC One year of Operation

Optimizing Efficiency of Deep Learning Workloads through GPU Virtualization

Technical Computing Suite supporting the hybrid system

INSIDE THE PROTOTYPE BODENTYPE DATA CENTER: OPENSOURCE MONITORING OF OPENSOURCE HARDWARE

ControlUp v7.1 Release Notes

TRESCCA Trustworthy Embedded Systems for Secure Cloud Computing

Héctor Fernández and G. Pierre Vrije Universiteit Amsterdam

Search Engines and Time Series Databases

Design, Development and Improvement of Nagios System Monitoring for Large Clusters

Managing IoT and Time Series Data with Amazon ElastiCache for Redis

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Data Centre Energy & Cost Efficiency Simulation Software. Zahl Limbuwala

Webinar Series TMIP VISION

Plattformübergreifende Softwareentwicklung für heterogene Multicore-Systeme

Algorithms, System and Data Centre Optimisation for Energy Efficient HPC

Data Sheet FUJITSU Server PRIMERGY CX2550 M1 Dual Socket Server Node

Application monitoring with BELK. Nishant Sahay, Sr. Architect Bhavani Ananth, Architect

EU Liaison Update. General Assembly. Matthew Scott & Edit Herczog. Reference : GA(18)021. Trondheim. 14 June 2018

PBS PROFESSIONAL VS. MICROSOFT HPC PACK

IBM Power Systems HPC Cluster

The Green Computing Observatory

Europeana Core Service Platform

Data Sheet FUJITSU Server PRIMERGY CX400 S2 Multi-Node Server Enclosure

Dynamic Analytics Extended to all layers Utilizing P4

Using Alluxio to Improve the Performance and Consistency of HDFS Clusters

@unterstein #bedcon. Operating microservices with Apache Mesos and DC/OS

Inspur AI Computing Platform

Hitachi Converged Platform for Oracle

OCCUPANCY TRACKING: Your Secret Source of Savings

Co-designing an Energy Efficient System

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch

Understanding the latent value in all content

Butterfly effect of porting scientific applications to ARM-based platforms

System Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering.

Extending SLURM with Support for GPU Ranges

Evaluating the Effectiveness of Model Based Power Characterization

Monitoring for IT Services and WLCG. Alberto AIMAR CERN-IT for the MONIT Team

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Inside Broker How Broker Leverages the C++ Actor Framework (CAF)

Altair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015

Python, PySpark and Riak TS. Stephen Etheridge Lead Solution Architect, EMEA

IBM Leading High Performance Computing and Deep Learning Technologies

TPCX-BB (BigBench) Big Data Analytics Benchmark

Micron and Hortonworks Power Advanced Big Data Solutions

EU LEIT-ICT program and SE position on FP9

19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr

Physicals: Scope (Extrapolate) William Tschudi, LBNL

Automatic Tuning of HPC Applications with Periscope. Michael Gerndt, Michael Firbach, Isaias Compres Technische Universität München

Building supercomputers from embedded technologies

Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink

Introduction to Parallel Programming

Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

Transcription:

MULTITHERMAN: Out-of-band High-Resolution HPC Power and Performance Monitoring Support for Big-Data Analysis EU H2020 FETHPC project ANTAREX (g.a. 671623) EU FP7 ERC Project MULTITHERMAN (g.a.291125) HPC Summit Week, Ljubljana 30.05.2018 Antonio Libri A. Bartolini, F. Beneventi, A. Borghesi and L. Benini

Team & Research Activity A. Bartolini, and L. Benini Topics: 1. Examon: holistic and scalable real-time monitoring for HPC centers open source, beta release Q2/Q3 2018 (F. Beneventi, https://github.com/fbeneventi/examon) 2. Large-scale multiscale Power and Thermal modeling, and control (F. Pittino, C. Conficoni) a. Development of power and thermal models at core level for monitoring and control of large HPC clusters in production b. Datacentre cooling modeling and optimization 2

Team & Research Activity 3. Fine grain monitoring and management (A. Libri, D. Cesarini) a. DiG: multi-platform and production ready solution, based on low-cost open-source HW, already integrated with E4 technology b. Application aware power and thermal management: open source, beta release Q3 2018 4. Power and performance aware Job scheduler (A. Borghesi) a. System level power capping and optimization 5. Datacentre automation (A. Borghesi, A. Libri) a. Big-data and deep learning based automated power optimization b. Big-data and deep learning based automated anomaly detection DiG 3

Outline Motivation & D.A.V.I.D.E. Overview DiG: High Res Out-of-Band Pow & Perf Mon ExaMon: Scalable Data Collection and Analytics Monitored Metrics and Data Visualization Smart Job Scheduler SLURM Plugin 2

Motivation Innovative Scenarios Coarse grain DIMM ACC Node CPU CPU ACC req Job Scheduler req util Fine grain Fine Grain Power and Performance Measurements: Verify and classify node performance (e.g. In spec / out of spec behaviour, aging and wear out) Predictive maintenance Per user - Energy / Performance accounting System Power Capping (bound from grid): Job scheduler decides the job that enters in the system Ensures operating power below a maximum power consumption level Reduce Power budget during new Installations and Power Shortage 3

D.A.V.I.D.E. (#18 Green500 Nov 17) D.A.V.I.D.E. SUPERCOMPUTER (Development of an Added Value Infrastructure Designed in Europe) 4

D.A.V.I.D.E. SUPERCOMPUTER (Development of an Added Value Infrastructure Designed in Europe) OCP form factor compute node based on IBM Minsky 4x Tesla P100 HSMX2 2 x POWER8 with NVLink 2xIB EDR BusBar LIQUID COOLING DiG ETH Zurich / University of Bologna OUT-OF-BAND HIGH RESOLUTION POWER AND PERFORMANCE MONITORING 5

Outline Motivation & D.A.V.I.D.E. Overview DiG: High Res Out-of-Band Pow & Perf Mon ExaMon: Scalable Data Collection and Analytics Monitored Metrics and Data Visualization Smart Job Scheduler SLURM Plugin 6

DiG: High Resolution Out-of-band Power Monitoring Beaglebone Black Out-of-band Zero overhead Collect more than 1.5 ks/s, 7/7d, 24/24h, for all users Architecture independent (i.e. tested on Intel, ARM and IBM) Fine grain down to ms scale (sampling @800 ks/s + avg) IoT communication technology (MQTT) scalable Time synchronous (NTP, PTP) 7

DiG: Example of fine grain monitoring Coarse Grain View BB @1s 1 Node - 15 min BB View BB @1ms 45 Nodes -4s 15 min BB @1ms 45 Nodes -1s 8

DiG: Example of fine grain monitoring Coarse Grain View 1 Node - 15 min 15 min BB View BB @1s How do we correlate these activities with applications phases? BB @1ms 45 Nodes -4s BB @1ms 45 Nodes -1s 8

DiG: Fine grain correlation with jobs activities μs resolved time stamps Clock CPU1 CPUn Cold air/water Clock BBB1 BBBn CRAC RPM FAN Power C node1 GPU Perf counters Rack HPC cluster Hot air/water 9

DiG: Fine grain correlation with jobs activities Several Metrics Cache Miss P0 Pn Time APP MPI Synch Parallel Application Node n Power Temp Time Node n Node 1 Node 1 Clock CPU1 CPUn Cold air/water Clock BBB1 BBBn CRAC RPM FAN Power C node1 GPU Perf counters Rack HPC cluster Hot air/water 10

DiG: live FFT on the power traces Real-time Frequency analysis on power supply and more a live oscilloscope Application 1 Application 2 14 14

DiG: live FFT on the power traces Real-time Frequency analysis on power supply and more a live oscilloscope Application 1 Application 2 User -> to discriminate application phases Sys Admin -> to detect malicious users Designers -> to debug and optimize power delivery network 15 15

Outline Motivation & D.A.V.I.D.E. Overview DiG: High Res Out-of-Band Pow & Perf Mon ExaMon: Scalable Data Collection and Analytics Monitored Metrics and Data Visualization Smart Job Scheduler SLURM Plugin 13

ExaMon: Scalable Data Collection and Analytics Aggregates job s power and performance in real-time and at a fine granularity for Big Data analysis Based on opensource SW Store, Process, Visualize and Analyze Monitored Data Historical Buffer (i.e. 2 Weeks) Perpetual per Job Web frontend 17

ExaMon: Scalable Data Collection and Analytics Applications Grafana Apache Spark Python Matlab ADMIN Cassandra node 1 NoSQL Kairosdb Cassandra node M Front-end Host: davidefe01 Docker containers ~45KS/s MQTT2Kairos MQTT Brokers MQTT2kairos Broker 1 Broker M MQTT Target Facility Back-end MQTT enabled monitoring agents (e.g. Dig) 18

ExaMon: Scalable Data Collection and Analytics Applications Grafana Apache Spark Python Matlab NoSQL Cassandra node 1 Kairosdb Cassandra node M ADMIN MQTT MQTT2Kairos Broker 1 MQTT Brokers MQTT2kairos Broker M Monitoring Agents: Send Power and Performance measurements to the FE Target Facility 19

ExaMon: Scalable Data Collection and Analytics Grafana Apache Spark Applications Python NoSQL Matlab Broker: Forward data to the listeners (e.g. kairosdb) ADMIN Cassandra node 1 MQTT2Kairos Broker 1 Kairosdb MQTT Brokers Cassandra node M MQTT2kairos Broker M Mqtt2kairosdb: Interface between MQTT and KairosDB KairosDB is a front-end to handle time series in Cassandra MQTT Target Facility Cassandra: NoSQL database Highly scalable Optimized to balance the load on multiple nodes 20

ExaMon: Scalable Data Collection and Analytics Applications ADMIN Grafana Cassandra node 1 MQTT2Kairos Apache Spark Python Matlab NoSQL Kairosdb MQTT Brokers Cassandra node M MQTT2kairos Application layer: Grafana, Apache Spark, etc Aggregate metrics for Data Visualization, ML Analysis, Post Processing, etc Broker 1 Broker M MQTT Target Facility 21

facility/sensors/b ExaMon: Scalable Data Collection and Analytics MQTT Broker facility/sensors/# MQTT2Kairosdb _A _B _C MQTT Publishers Metric: A Tags: facility Sensors Metric: B Tags: Facility Sensors Metric: C Tags: facility sensors {Key,Value} = TS, Measurement Topic = /davide/node1/metric Cassandra Column Family 22

ExaMon: Batch & Streaming Pandas dataframe (Batch) examon-client (REST) Bahir-mqtt (Spark connector) (Streaming) 23

Outline Motivation & D.A.V.I.D.E. Overview DiG: High Res Out-of-Band Pow & Perf Mon ExaMon: Scalable Data Collection and Analytics Monitored Metrics and Data Visualization Smart Job Scheduler SLURM Plugin 21

D.A.V.I.D.E. Fine Grain Power Metric Broker Pow_pub MQTT Metric Name power Ts Description Unit 1s,1ms Power consumption at node power plug W DiG Power: ~45 ks/s 25

D.A.V.I.D.E. IPMI Metrics Broker IPMI_pub MQTT IPMI DiG IPMI: 89 metrics per node every 5s 26

D.A.V.I.D.E. OCC Metrics (1/2) Broker OCC_pub MQTT IPMI DiG OCC: 242 metrics per node every 10s 27

D.A.V.I.D.E. OCC Metrics (2/2) Broker OCC_pub MQTT Per-component power IPMI DiG OCC: 242 metrics per node every 10s 28

D.A.V.I.D.E. PSU Metrics Broker PSU_pub MQTT IPMI Running in the front-end Full Rack info 29

D.A.V.I.D.E. Liquid Cooling Metrics Broker MQTT Cooling_pub IPMI Running in the front-end Per Rack Cooling info 30

Real-Time Monitoring, Processing, Visualization we provide for all the users a debug portal that can be accessed to gather and visualize its job data DiG 31

Real-Time Monitoring, Processing, Visualization DiG Users can now see power breakdown for each node, live 32

Outline Motivation & D.A.V.I.D.E. Overview DiG: High Res Out-of-Band Pow & Perf Mon ExaMon: Scalable Data Collection and Analytics Monitored Metrics and Data Visualization Smart Job Scheduler SLURM Plugin 33

Smart Job Scheduler SLURM Plugin Monitors Job submission and scheduling Estimates Job Power Consumption Controls Power Consumption under system level power capping 34

Smart Job Scheduler SLURM Plugin 1. ML models to predict power consumption of HPC applications 2. SLURM Custom Extensions to schedule jobs based on their power 3. Interaction with power management Frequency scaling/rapllike mechanism 35

Smart Job Scheduler SLURM Plugin Different from previous power capping approaches We do not perturb the duration of the application, thus it doesn t change the way users pay the resources 36

Conclusion With D.A.V.I.D.E. and its monitoring infrastructure we provide real-time analysis on computing nodes high resolution power and performance measurements up to ms scale upcoming FFT live analysis Data Storage, Processing and Visualization, along with Big Data Analysis support Power Aware Job Scheduling 30

Thanks for your interest. Co-funded by EU FP7 ERC Project Multitherman (No 291125) This project has received funding from the European Union s Horizon 2020 research and innovation programme under grant agreement No 671623 (ANTAREX) Contact: Antonio Libri, a.libri@iis.ee.ethz.ch Unibo/ETH: A. Bartolini, F. Beneventi, A. Borghesi, and L. Benini