MULTITHERMAN: Out-of-band High-Resolution HPC Power and Performance Monitoring Support for Big-Data Analysis
|
|
- Myles Shanon Barker
- 5 years ago
- Views:
Transcription
1 MULTITHERMAN: Out-of-band High-Resolution HPC Power and Performance Monitoring Support for Big-Data Analysis EU H2020 FETHPC project ANTAREX (g.a ) EU FP7 ERC Project MULTITHERMAN (g.a ) EETHPC, Frankfurt Antonio Libri A. Bartolini, F. Beneventi, A. Borghesi and L. Benini
2 Outline Motivation & D.A.V.I.D.E. Overview DiG: High Res Out-of-Band Pow & Perf Mon ExaMon: Scalable Data Collection and Analytics Monitored Metrics and Data Visualization 2
3 Motivation Innovative Scenarios Coarse grain DIMM Node ACC CPU CPU ACC req Job Scheduler req util Fine grain Fine Grain Power and Performance Measurements: Verify and classify node performance (e.g. In spec / out of spec behaviour, aging and wear out) Predictive maintenance Per user - Energy / Performance accounting System Power Capping (bound from grid): Job scheduler decides the job that enters in the system Ensures operating power below a maximum power consumption level Reduce Power budget during new Installations and Power Shortage 3
4 D.A.V.I.D.E. (#18 Green500 Nov 17) D.A.V.I.D.E. SUPERCOMPUTER (Development of an Added Value Infrastructure Designed in Europe) 4
5 D.A.V.I.D.E. SUPERCOMPUTER (Development of an Added Value Infrastructure Designed in Europe) OCP form factor compute node based on IBM Minsky 4x HSMX2 Tesla P100 2 x POWER8 with NVLink BusBar 2xIB EDR LIQUID COOLING DiG ETH Zurich / University of Bologna OUT-OF-BAND HIGH RESOLUTION POWER AND PERFORMANCE MONITORING 5
6 Outline Motivation & D.A.V.I.D.E. Overview DiG: High Res Out-of-Band Pow & Perf Mon ExaMon: Scalable Data Collection and Analytics Monitored Metrics and Data Visualization 6
7 DiG: High Resolution Out-of-band Power Monitoring Beaglebone Black Out-of-band Zero overhead Collect more than 1.5 ks/s, 7/7d, 24/24h, for all users Architecture independent (i.e. tested on Intel, ARM and IBM) Fine grain down to ms scale ks/s + avg) IoT communication technology (MQTT) scalable Time synchronous (NTP, PTP) 7
8 DiG: Example of fine grain monitoring Coarse Grain View 1 Node - 15 min BB View How do we correlate these activities with applications phases? 45 Nodes -4s 15 min 45 Nodes -1s 8
9 DiG: Fine grain correlation with jobs activities μs resolved time stamps Clock CPU1 CPUn Cold air/water Clock BBB1 BBBn CRAC RPM FAN Power C node1 GPU Perf counters Rack HPC cluster Hot air/water 9
10 DiG: Fine grain correlation with jobs activities Several Metrics Cache Miss Power BBB1 BBBn P0 Pn Time APP MPI Synch Node n CPU1 Node 1 Temp Time Node Ok nup to 1ms CPUn but what about real-time Node 1analysis at higher Clock frequencies for a footballfield size cluster of computing nodes? Clock Parallel Application Cold air/water CRAC RPM FAN Power GPU Perf counters C node1 Rack HPC cluster Hot air/water 10
11 DiG: live FFT on the power traces Real-time Frequency analysis on power supply and more a live oscilloscope Application 1 Application 2 User -> to discriminate application phases Sys Admin -> to detect malicious users Designers -> to debug and optimize power delivery network 11 11
12 Outline Motivation & D.A.V.I.D.E. Overview DiG: High Res Out-of-Band Pow & Perf Mon ExaMon: Scalable Data Collection and Analytics Monitored Metrics and Data Visualization 12
13 ExaMon: Scalable Data Collection and Analytics Aggregates job s power and performance in real-time and at a fine granularity for Big Data analysis Based on open-src SW Store, Process, Visualize and Analyze Monitored Data Historical Buffer (i.e. 2 Weeks) Perpetual per Job Web frontend Open-src beta release: 13
14 ExaMon: Scalable Data Collection and Analytics Applications Grafana Apache Spark Python Matlab ADMIN Cassandra node 1 NoSQL Kairosdb Cassandra node M Front-end Host: davidefe01 Docker containers ~45KS/s MQTT2Kairos MQTT Brokers MQTT2kairos Broker 1 Broker M MQTT Target Facility Back-end MQTT enabled monitoring agents (e.g. Dig) 14
15 ExaMon: Scalable Data Collection and Analytics Applications Grafana Apache Spark Python Matlab NoSQL Cassandra node 1 Kairosdb Cassandra node M ADMIN MQTT2Kairos MQTT Brokers MQTT2kairos MQTT Broker 1 Target Facility Broker M Monitoring Agents: Send Power and Performance measurements to the FE Key-Value pairs (TS, Data) Topic (Metric) 15
16 ExaMon: Scalable Data Collection and Analytics Grafana Apache Spark Applications Python NoSQL Matlab Broker: Forward data to the listeners (e.g. kairosdb) ADMIN Cassandra node 1 MQTT2Kairos Broker 1 Kairosdb MQTT Brokers Cassandra node M MQTT2kairos Broker M Mqtt2kairosdb: Interface between MQTT and KairosDB KairosDB is a front-end to handle time series in Cassandra MQTT Target Facility Cassandra: NoSQL database Highly scalable Optimized to balance the load on multiple nodes 16
17 ExaMon: Scalable Data Collection and Analytics Applications ADMIN Grafana Cassandra node 1 MQTT2Kairos Apache Spark Python Matlab NoSQL Kairosdb MQTT Brokers Cassandra node M MQTT2kairos Application layer: Grafana, Apache Spark, etc Aggregate metrics for Data Visualization, ML Analysis, Post Processing, etc Broker 1 Broker M MQTT Target Facility 17
18 Outline Motivation & D.A.V.I.D.E. Overview DiG: High Res Out-of-Band Pow & Perf Mon ExaMon: Scalable Data Collection and Analytics Monitored Metrics and Data Visualization 19
19 D.A.V.I.D.E. Monitoring Agents DiG SW Daemons Pow_pub MQTT Broker DiG Power (1s,1ms): ~45 ks/s 19
20 D.A.V.I.D.E. Monitoring Agents DiG SW Daemons MQTT Broker Pow_pub IPMI_pub IPMI: 89 metrics per node every 5s DiG 20
21 D.A.V.I.D.E. Monitoring Agents DiG SW Daemons MQTT Broker Pow_pub IPMI_pub OCC_pub OCC: 242 metrics per node every 10s DiG 21
22 D.A.V.I.D.E. Monitoring Agents DiG SW Daemons MQTT Broker MQTT D.A.V.I.D.E. Front-End Pow_pub IPMI_pub OCC_pub PSU_pub IPMI DiG Liteon Overall Rack info (e.g., Total Power) 22
23 D.A.V.I.D.E. Monitoring Agents DiG SW Daemons MQTT Broker MQTT D.A.V.I.D.E. Front-End Pow_pub IPMI_pub OCC_pub PSU_pub Cooling_pub IPMI Info Liquid Cooling Asetek DiG Liteon 23
24 Real-Time Monitoring, Processing, Visualization we provide for all the users a debug portal that can be accessed to gather and visualize its job data DiG 24
25 Real-Time Monitoring, Processing, Visualization DiG Users can now see power breakdown for each node, live 25
26 Conclusion With D.A.V.I.D.E. and its monitoring infrastructure we provide real-time analysis on computing nodes high resolution power and performance measurements up to ms scale upcoming FFT live analysis Data Storage, Processing and Visualization, along with Big Data Analysis support 27
27 Thanks for your interest. Co-funded by EU FP7 ERC Project Multitherman (No ) This project has received funding from the European Union s Horizon 2020 research and innovation programme under grant agreement No (ANTAREX) Contact: Antonio Libri, a.libri@iis.ee.ethz.ch Unibo/ETH: A. Bartolini, F. Beneventi, A. Borghesi, and L. Benini
28 Backup Slides 28
29 Team & Research Activity A. Bartolini, and L. Benini Topics: 1. Examon: holistic and scalable real-time monitoring for HPC centers open source, beta release Q2/Q (F. Beneventi, 2. Large-scale multiscale Power and Thermal modeling, and control (F. Pittino, C. Conficoni) a. Development of power and thermal models at core level for monitoring and control of large HPC clusters in production b. Datacentre cooling modeling and optimization 29
30 Team & Research Activity 3. Fine grain monitoring and management (A. Libri, D. Cesarini) a. DiG: multi-platform and production ready solution, based on low-cost open-source HW, already integrated with E4 technology b. Application aware power and thermal management: open source, beta release Q Power and performance aware Job scheduler (A. Borghesi) a. System level power capping and optimization 5. Datacentre automation (A. Borghesi, A. Libri) a. Big-data and deep learning based automated power optimization b. Big-data and deep learning based automated anomaly detection DiG 30
31 D.A.V.I.D.E. Fine Grain Power Metric Broker Pow_pub MQTT Metric Name power Ts Description Unit 1s,1ms Power consumption at node power plug W DiG Power: ~45 ks/s 31
32 D.A.V.I.D.E. IPMI Metrics Broker IPMI_pub MQTT IPMI DiG IPMI: 89 metrics per node every 5s 32
33 D.A.V.I.D.E. OCC Metrics Broker OCC_pub MQTT Per-component power IPMI DiG OCC: 242 metrics per node every 10s 33
34 D.A.V.I.D.E. PSU Metrics Broker PSU_pub MQTT IPMI Running in the front-end Full Rack info 34
35 D.A.V.I.D.E. Liquid Cooling Metrics Broker MQTT Cooling_pub IPMI Running in the front-end Per Rack Cooling info 35
36 ExaMon: Batch & Streaming Pandas dataframe (Batch) examon-client (REST) Bahir-mqtt (Spark connector) (Streaming) Open-src beta release: 36
MULTITHERMAN: Out-of-band High-Resolution HPC Power and Performance Monitoring Support for Big-Data Analysis
MULTITHERMAN: Out-of-band High-Resolution HPC Power and Performance Monitoring Support for Big-Data Analysis EU H2020 FETHPC project ANTAREX (g.a. 671623) EU FP7 ERC Project MULTITHERMAN (g.a.291125) HPC
More informationA Dwarf in a Giant Embedded systems in High Performance Computing. Andrea Bartolini
A Dwarf in a Giant Embedded systems in High Performance Computing Andrea Bartolini a.bartolini@unibo.it 2 HPC High Performance Computing High-performance computing (HPC) is the use of parallel processing
More informationD.A.V.I.D.E. (Development of an Added-Value Infrastructure Designed in Europe) IWOPH 17 E4. WHEN PERFORMANCE MATTERS
D.A.V.I.D.E. (Development of an Added-Value Infrastructure Designed in Europe) IWOPH 17 E4. WHEN PERFORMANCE MATTERS THE COMPANY Since 2002, E4 Computer Engineering has been innovating and actively encouraging
More informationScaling performance In Power Limited HPC Systems
Scaling performance In Power Limited HPC Systems Prof. Dr. Luca Benini ERC Multitherman Lab University of Bologna Italy D ITET, Chair of Digital Dircuits & Systems Switzerland Outline Power and Thermal
More informationEnergy Efficiency Tuning: READEX. Madhura Kumaraswamy Technische Universität München
Energy Efficiency Tuning: READEX Madhura Kumaraswamy Technische Universität München Project Overview READEX Starting date: 1. September 2015 Duration: 3 years Runtime Exploitation of Application Dynamism
More informationS8765 Performance Optimization for Deep- Learning on the Latest POWER Systems
S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems Khoa Huynh Senior Technical Staff Member (STSM), IBM Jonathan Samn Software Engineer, IBM Evolving from compute systems to
More informationCAS 2K13 Sept Jean-Pierre Panziera Chief Technology Director
CAS 2K13 Sept. 2013 Jean-Pierre Panziera Chief Technology Director 1 personal note 2 Complete solutions for Extreme Computing b ubullx ssupercomputer u p e r c o p u t e r suite s u e Production ready
More informationWe are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info
We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423
More informationUsing the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver
Using the SDACK Architecture to Build a Big Data Product Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver Outline A Threat Analytic Big Data product The SDACK Architecture Akka Streams and data
More informationPower Attack Defense: Securing Battery-Backed Data Centers
Power Attack Defense: Securing Battery-Backed Data Centers Presented by Chao Li, PhD Shanghai Jiao Tong University 2016.06.21, Seoul, Korea Risk of Power Oversubscription 2 3 01. Access Control 02. Central
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More information@joerg_schad Nightmares of a Container Orchestration System
@joerg_schad Nightmares of a Container Orchestration System 2017 Mesosphere, Inc. All Rights Reserved. 1 Jörg Schad Distributed Systems Engineer @joerg_schad Jan Repnak Support Engineer/ Solution Architect
More informationEmploying HPC DEEP-EST for HEP Data Analysis. Viktor Khristenko (CERN, DEEP-EST), Maria Girone (CERN)
Employing HPC DEEP-EST for HEP Data Analysis Viktor Khristenko (CERN, DEEP-EST), Maria Girone (CERN) 1 Outline The DEEP-EST Project Goals and Motivation HEP Data Analysis on HPC with Apache Spark on HPC
More informationHPC projects. Grischa Bolls
HPC projects Grischa Bolls Outline Why projects? 7th Framework Programme Infrastructure stack IDataCool, CoolMuc Mont-Blanc Poject Deep Project Exa2Green Project 2 Why projects? Pave the way for exascale
More informationChapter 1. Introduction
Chapter 1. Introduction João Bispo 1, Pedro Pinto 1, João M.P. Cardoso 1, Jorge G. Barbosa 1, Hamid Arabnejad 1, Davide Gadioli 2, Emanuele Vitali 2, Gianluca Palermo 2, Cristina Silvano 2, Stefano Cherubin
More informationApproaches to I/O Scalability Challenges in the ECMWF Forecasting System
Approaches to I/O Scalability Challenges in the ECMWF Forecasting System PASC 16, June 9 2016 Florian Rathgeber, Simon Smart, Tiago Quintino, Baudouin Raoult, Stephan Siemen, Peter Bauer Development Section,
More informationFluentd + MongoDB + Spark = Awesome Sauce
Fluentd + MongoDB + Spark = Awesome Sauce Nishant Sahay, Sr. Architect, Wipro Limited Bhavani Ananth, Tech Manager, Wipro Limited Your company logo here Wipro Open Source Practice: Vision & Mission Vision
More informationTECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING
TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING Table of Contents: The Accelerated Data Center Optimizing Data Center Productivity Same Throughput with Fewer Server Nodes
More informationAn Open Accelerator Infrastructure Project for OCP Accelerator Module (OAM)
An Open Accelerator Infrastructure Project for OCP Accelerator Module (OAM) SERVER Whitney Zhao, Hardware Engineer, Facebook Siamak Tavallaei, Principal Architect, Microsoft Richard Ding, AI System Architect,
More informationA Breakthrough in Non-Volatile Memory Technology FUJITSU LIMITED
A Breakthrough in Non-Volatile Memory Technology & 0 2018 FUJITSU LIMITED IT needs to accelerate time-to-market Situation: End users and applications need instant access to data to progress faster and
More informationLAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015
LAMMPS-KOKKOS Performance Benchmark and Profiling September 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox, NVIDIA
More informationChallenges in HPC I/O
Challenges in HPC I/O Universität Basel Julian M. Kunkel German Climate Computing Center / Universität Hamburg 10. October 2014 Outline 1 High-Performance Computing 2 Parallel File Systems and Challenges
More informationEnergy efficient mapping of virtual machines
GreenDays@Lille Energy efficient mapping of virtual machines Violaine Villebonnet Thursday 28th November 2013 Supervisor : Georges DA COSTA 2 Current approaches for energy savings in cloud Several actions
More informationPouya Kousha Fall 2018 CSE 5194 Prof. DK Panda
Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Motivation And Intro Programming Model Spark Data Transformation Model Construction Model Training Model Inference Execution Model Data Parallel Training
More informationPortable Power/Performance Benchmarking and Analysis with WattProf
Portable Power/Performance Benchmarking and Analysis with WattProf Amir Farzad, Boyana Norris University of Oregon Mohammad Rashti RNET Technologies, Inc. Motivation Energy efficiency is becoming increasingly
More informationCloud Computing Economies of Scale
Cloud Computing Economies of Scale AWS Executive Symposium 2010 James Hamilton, 2010/7/15 VP & Distinguished Engineer, Amazon Web Services email: James@amazon.com web: mvdirona.com/jrh/work blog: perspectives.mvdirona.com
More informationMachine Learning In A Snap. Thomas Parnell Research Staff Member IBM Research - Zurich
Machine Learning In A Snap Thomas Parnell Research Staff Member IBM Research - Zurich What are GLMs? Ridge Regression Support Vector Machines Regression Generalized Linear Models Classification Lasso Regression
More informationINtelligent solutions 2ward the Development of Railway Energy and Asset Management Systems in Europe.
INtelligent solutions 2ward the Development of Railway Energy and Asset Management Systems in Europe. Introduction to IN2DREAMS The predicted growth of transport, especially in European railway infrastructures,
More informationGOING ARM A CODE PERSPECTIVE
GOING ARM A CODE PERSPECTIVE ISC18 Guillaume Colin de Verdière JUNE 2018 GCdV PAGE 1 CEA, DAM, DIF, F-91297 Arpajon, France June 2018 A history of disruptions All dates are installation dates of the machines
More informationI/O Monitoring at JSC, SIONlib & Resiliency
Mitglied der Helmholtz-Gemeinschaft I/O Monitoring at JSC, SIONlib & Resiliency Update: I/O Infrastructure @ JSC Update: Monitoring with LLview (I/O, Memory, Load) I/O Workloads on Jureca SIONlib: Task-Local
More informationThe SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.
Dublin Apache Kafka Meetup, 30 August 2017 The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Joseph @pleia2 * ASF projects 1 Elizabeth K. Joseph, Developer Advocate Developer Advocate
More informationLRZ SuperMUC One year of Operation
LRZ SuperMUC One year of Operation IBM Deep Computing 13.03.2013 Klaus Gottschalk IBM HPC Architect Leibniz Computing Center s new HPC System is now installed and operational 2 SuperMUC Technical Highlights
More informationCERN openlab & IBM Research Workshop Trip Report
CERN openlab & IBM Research Workshop Trip Report Jakob Blomer, Javier Cervantes, Pere Mato, Radu Popescu 2018-12-03 Workshop Organization 1 full day at IBM Research Zürich ~25 participants from CERN ~10
More informationOptimizing Efficiency of Deep Learning Workloads through GPU Virtualization
Optimizing Efficiency of Deep Learning Workloads through GPU Virtualization Presenters: Tim Kaldewey Performance Architect, Watson Group Michael Gschwind Chief Engineer ML & DL, Systems Group David K.
More informationAuto Management for Apache Kafka and Distributed Stateful System in General
Auto Management for Apache Kafka and Distributed Stateful System in General Jiangjie (Becket) Qin Data Infrastructure @LinkedIn GIAC 2017, 12/23/17@Shanghai Agenda Kafka introduction and terminologies
More informationIBM CORAL HPC System Solution
IBM CORAL HPC System Solution HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy
More informationData Centre Energy & Cost Efficiency Simulation Software. Zahl Limbuwala
Data Centre Energy & Cost Efficiency Simulation Software Zahl Limbuwala BCS Data Centre Simulator Overview of Tools Structure of the BCS Simulator Input Data Sample Output Development Path Overview of
More informationBlended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)
Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance
More informationPlattformübergreifende Softwareentwicklung für heterogene Multicore-Systeme
Plattformübergreifende Softwareentwicklung für heterogene Multicore-Systeme Dr.-Ing. Timo Stripf 1 Managing Director Technolgy Outline Multicore Motivation Automatic Parallelization Interactive Parallelization
More informationDesign, Development and Improvement of Nagios System Monitoring for Large Clusters
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Design, Development and Improvement of Nagios System Monitoring for Large Clusters Daniela Galetti 1, Federico Paladin 2
More informationInspur AI Computing Platform
Inspur Server Inspur AI Computing Platform 3 Server NF5280M4 (2CPU + 3 ) 4 Server NF5280M5 (2 CPU + 4 ) Node (2U 4 Only) 8 Server NF5288M5 (2 CPU + 8 ) 16 Server SR BOX (16 P40 Only) Server target market
More informationECMWF's Next Generation IO for the IFS Model and Product Generation
ECMWF's Next Generation IO for the IFS Model and Product Generation Future workflow adaptations Tiago Quintino, B. Raoult, S. Smart, A. Bonanni, F. Rathgeber, P. Bauer ECMWF tiago.quintino@ecmwf.int ECMWF
More informationEU Liaison Update. General Assembly. Matthew Scott & Edit Herczog. Reference : GA(18)021. Trondheim. 14 June 2018
EU Liaison Update General Assembly Matthew Scott & Edit Herczog Reference : GA(18)021 Trondheim 14 June 2018 Agenda Introduction Outcomes of meetings with DG CNECT & RTD, & other stakeholders Latest FP9
More informationLOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS Hermann Härtig
LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2016 Hermann Härtig LECTURE OBJECTIVES starting points independent Unix processes and block synchronous execution which component (point in
More informationAltair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015
Altair OptiStruct 13.0 Performance Benchmark and Profiling May 2015 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationUsing Alluxio to Improve the Performance and Consistency of HDFS Clusters
ARTICLE Using Alluxio to Improve the Performance and Consistency of HDFS Clusters Calvin Jia Software Engineer at Alluxio Learn how Alluxio is used in clusters with co-located compute and storage to improve
More informationSystem Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering.
System Design of Kepler Based HPC Solutions Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering. Introduction The System Level View K20 GPU is a powerful parallel processor! K20 has
More informationTechnical Computing Suite supporting the hybrid system
Technical Computing Suite supporting the hybrid system Supercomputer PRIMEHPC FX10 PRIMERGY x86 cluster Hybrid System Configuration Supercomputer PRIMEHPC FX10 PRIMERGY x86 cluster 6D mesh/torus Interconnect
More informationPython, PySpark and Riak TS. Stephen Etheridge Lead Solution Architect, EMEA
Python, PySpark and Riak TS Stephen Etheridge Lead Solution Architect, EMEA Agenda Introduction to Riak TS The Riak Python client The Riak Spark connector and PySpark CONFIDENTIAL Basho Technologies 3
More informationIBM Leading High Performance Computing and Deep Learning Technologies
IBM Leading High Performance Computing and Deep Learning Technologies Yubo Li ( 李玉博 ) Chief Architect, on Cloud IBM Research -- China email: liyubobj@cn.ibm.com QQ: 395238640 GTC China 2016 Sept. 13, 2016
More informationFlash Storage Complementing a Data Lake for Real-Time Insight
Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum
More informationBig Data. Big Data Analyst. Big Data Engineer. Big Data Architect
Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION
More informationTPCX-BB (BigBench) Big Data Analytics Benchmark
TPCX-BB (BigBench) Big Data Analytics Benchmark Bhaskar D Gowda Senior Staff Engineer Analytics & AI Solutions Group Intel Corporation bhaskar.gowda@intel.com 1 Agenda Big Data Analytics & Benchmarks Industry
More information19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr
19. prosince 2018 CIIRC Praha Milan Král, IBM Radek Špimr CORAL CORAL 2 CORAL Installation at ORNL CORAL Installation at LLNL Order of Magnitude Leap in Computational Power Real, Accelerated Science ACME
More informationWhat Can We Learn from Four Years of Data Center Hardware Failures? Guosai Wang, Lifei Zhang, Wei Xu
What Can We Learn from Four Years of Data Center Hardware Failures? Guosai Wang, Lifei Zhang, Wei Xu Motivation: Evolving Failure Model Failures in data centers are common and costly - Violate service
More informationSlurm BOF SC13 Bull s Slurm roadmap
Slurm BOF SC13 Bull s Slurm roadmap SC13 Eric Monchalin Head of Extreme Computing R&D 1 Bullx BM values Bullx BM bullx MPI integration ( runtime) Automatic Placement coherency Scalable launching through
More informationIoT Sensor Analytics with Apache Kafka, KSQL and TensorFlow
1 IoT Sensor Analytics with Apache Kafka, KSQL and TensorFlow Kafka-Native End-to-End IoT Data Integration and Processing Kai Waehner - Technology Evangelist kontakt@kai-waehner.de - LinkedIn Twitter :
More informationCUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav
CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture
More informationOptimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink
Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink Rajesh Bordawekar IBM T. J. Watson Research Center bordaw@us.ibm.com Pidad D Souza IBM Systems pidsouza@in.ibm.com 1 Outline
More informationTools and Methodology for Ensuring HPC Programs Correctness and Performance. Beau Paisley
Tools and Methodology for Ensuring HPC Programs Correctness and Performance Beau Paisley bpaisley@allinea.com About Allinea Over 15 years of business focused on parallel programming development tools Strong
More informationHéctor Fernández and G. Pierre Vrije Universiteit Amsterdam
Héctor Fernández and G. Pierre Vrije Universiteit Amsterdam Cloud Computing Day, November 20th 2012 contrail is co-funded by the EC 7th Framework Programme under Grant Agreement nr. 257438 1 Typical Cloud
More informationBuilding supercomputers from embedded technologies
http://www.montblanc-project.eu Building supercomputers from embedded technologies Alex Ramirez Barcelona Supercomputing Center Technical Coordinator This project and the research leading to these results
More informationMonitoring for IT Services and WLCG. Alberto AIMAR CERN-IT for the MONIT Team
Monitoring for IT Services and WLCG Alberto AIMAR CERN-IT for the MONIT Team 2 Outline Scope and Mandate Architecture and Data Flow Technologies and Usage WLCG Monitoring IT DC and Services Monitoring
More informationSTAR-CCM+ Performance Benchmark and Profiling. July 2014
STAR-CCM+ Performance Benchmark and Profiling July 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: CD-adapco, Intel, Dell, Mellanox Compute
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming David Lifka lifka@cac.cornell.edu May 23, 2011 5/23/2011 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor or computer to complete
More informationData Sheet FUJITSU Server PRIMERGY CX2550 M1 Dual Socket Server Node
Data Sheet FUJITSU Server PRIMERGY CX2550 M1 Dual Socket Server Node Data Sheet FUJITSU Server PRIMERGY CX2550 M1 Dual Socket Server Node Standard server node for PRIMERGY CX400 M1 multi-node server system
More informationPowerExecutive. Tom Brey IBM Agenda. Why PowerExecutive. Fundamentals of PowerExecutive. - The Data Center Power/Cooling Crisis.
PowerExecutive IBM Agenda Why PowerExecutive - The Data Center Power/Cooling Crisis Fundamentals of PowerExecutive 1 The Data Center Power/Cooling Crisis Customers want more IT processing cycles to run
More informationProfiling and diagnosing large-scale decentralized systems David Oppenheimer
Profiling and diagnosing large-scale decentralized systems David Oppenheimer ROC Retreat Thursday, June 5, 2003 1 Why focus on P2P systems? There are a few real ones file trading, backup, IM Look a lot
More informationPlay2SDG: Bridging the Gap between Serving and Analytics in Scalable Web Applications
Play2SDG: Bridging the Gap between Serving and Analytics in Scalable Web Applications Panagiotis Garefalakis M.Res Thesis Presentation, 7 September 2015 Outline Motivation Challenges Scalable web app design
More informationThe Green Computing Observatory
The Green Computing Observatory Michel Jouvin (LAL) Cécile Germain-Renaud (LRI), Thibaut Jacob (LRI), Gilles Kassel (MIS), Julien Nauroy (LRI), Guillaume Philippon (LAL) Outline Contexts Acquisition Status
More informationBest Practices for Setting BIOS Parameters for Performance
White Paper Best Practices for Setting BIOS Parameters for Performance Cisco UCS E5-based M3 Servers May 2013 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page
More informationMonitoring system for geographically distributed datacenters based on Openstack. Gioacchino Vino
Monitoring system for geographically distributed datacenters based on Openstack Gioacchino Vino Tutor: Dott. Domenico Elia Tutor: Dott. Giacinto Donvito Borsa di studio GARR Orio Carlini 2016-2017 INFN
More informationAlgorithms, System and Data Centre Optimisation for Energy Efficient HPC
2015-09-14 Algorithms, System and Data Centre Optimisation for Energy Efficient HPC Vincent Heuveline URZ Computing Centre of Heidelberg University EMCL Engineering Mathematics and Computing Lab 1 Energy
More informationJapan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS
Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS HPC User Forum, 7 th September, 2016 Outline of Talk Introduction of FLAGSHIP2020 project An Overview of post K system Concluding Remarks
More informationPlanning for Liquid Cooling Patrick McGinn Product Manager, Rack DCLC
Planning for Liquid Cooling -------------------------------- Patrick McGinn Product Manager, Rack DCLC February 3 rd, 2015 Expertise, Innovation & Delivery 13 years in the making with a 1800% growth rate
More informationQualys Cloud Platform
18 QUALYS SECURITY CONFERENCE 2018 Qualys Cloud Platform Looking Under the Hood: What Makes Our Cloud Platform so Scalable and Powerful Dilip Bachwani Vice President, Engineering, Qualys, Inc. Cloud Platform
More information@unterstein #bedcon. Operating microservices with Apache Mesos and DC/OS
@unterstein @dcos @bedcon #bedcon Operating microservices with Apache Mesos and DC/OS 1 Johannes Unterstein Software Engineer @Mesosphere @unterstein @unterstein.mesosphere 2017 Mesosphere, Inc. All Rights
More informationS THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE. Presenter: Louis Capps, Solution Architect, NVIDIA,
S7750 - THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE Presenter: Louis Capps, Solution Architect, NVIDIA, lcapps@nvidia.com A TALE OF ENLIGHTENMENT Basic OK List 10 for x = 1 to 3 20 print
More informationPredicting Program Phases and Defending against Side-Channel Attacks using Hardware Performance Counters
Predicting Program Phases and Defending against Side-Channel Attacks using Hardware Performance Counters Junaid Nomani and Jakub Szefer Computer Architecture and Security Laboratory Yale University junaid.nomani@yale.edu
More informationTechniques to improve the scalability of Checkpoint-Restart
Techniques to improve the scalability of Checkpoint-Restart Bogdan Nicolae Exascale Systems Group IBM Research Ireland 1 Outline A few words about the lab and team Challenges of Exascale A case for Checkpoint-Restart
More informationIBM Power Systems HPC Cluster
IBM Power Systems HPC Cluster Highlights Complete and fully Integrated HPC cluster for demanding workloads Modular and Extensible: match components & configurations to meet demands Integrated: racked &
More informationDELL POWERVAULT ML6000
DELL POWERVAULT ML6000 Intelligent Backup with the PowerVault ML6000 Modular Tape Library A whitepaper from Dell PowerVault Storage CONTENTS INTELLIGENT BACKUP WITH THE POWERVAULT ML6000 MODULAR TAPE LIBRARY
More informationADABAS & NATURAL 2050+
ADABAS & NATURAL 2050+ Guido Falkenberg SVP Global Customer Innovation DIGITAL TRANSFORMATION #WITHOUTCOMPROMISE 2017 Software AG. All rights reserved. ADABAS & NATURAL 2050+ GLOBAL INITIATIVE INNOVATION
More informationRolf Brink Founder/CEO
Rolf Brink Founder/CEO +31 88 96 000 00 www.asperitas.com COPYRIGHT 2017 BY ASPERITAS Asperitas, Robertus Nurksweg 5, 2033AA Haarlem, The Netherlands Information in this document can be freely used and
More informationExercises: April 11. Hermann Härtig, TU Dresden, Distributed OS, Load Balancing
Exercises: April 11 1 PARTITIONING IN MPI COMMUNICATION AND NOISE AS HPC BOTTLENECK LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2017 Hermann Härtig THIS LECTURE Partitioning: bulk synchronous
More informationPBS PROFESSIONAL VS. MICROSOFT HPC PACK
PBS PROFESSIONAL VS. MICROSOFT HPC PACK On the Microsoft Windows Platform PBS Professional offers many features which are not supported by Microsoft HPC Pack. SOME OF THE IMPORTANT ADVANTAGES OF PBS PROFESSIONAL
More informationELFms industrialisation plans
ELFms industrialisation plans CERN openlab workshop 13 June 2005 German Cancio CERN IT/FIO http://cern.ch/elfms ELFms industrialisation plans, 13/6/05 Outline Background What is ELFms Collaboration with
More informationEvaluating the Effectiveness of Model Based Power Characterization
Evaluating the Effectiveness of Model Based Power Characterization John McCullough, Yuvraj Agarwal, Jaideep Chandrashekhar (Intel), Sathya Kuppuswamy, Alex C. Snoeren, Rajesh Gupta Computer Science and
More informationData Management and Analysis for Energy Efficient HPC Centers
Data Management and Analysis for Energy Efficient HPC Centers Presented by Ghaleb Abdulla, Anna Maria Bailey and John Weaver; Copyright 2015 OSIsoft, LLC Data management and analysis for energy efficient
More informationParallel Architectures
Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36
More informationManaging IoT and Time Series Data with Amazon ElastiCache for Redis
Managing IoT and Time Series Data with ElastiCache for Redis Darin Briskman, ElastiCache Developer Outreach Michael Labib, Specialist Solutions Architect 2016, Web Services, Inc. or its Affiliates. All
More informationBENEFITS OF ASETEK LIQUID COOLING FOR DATA CENTERS
BENEFITS OF ASETEK LIQUID COOLING FOR DATA CENTERS Asetek has leveraged its expertise as the world-leading provider of efficient liquid cooling systems to create its RackCDU and ISAC direct-to-chip liquid
More informationEfficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems
Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems Ayse K. Coskun Electrical and Computer Engineering Department Boston University http://people.bu.edu/acoskun
More informationBring Context To Your Machine Data With Hadoop, RDBMS & Splunk
Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Raanan Dagan and Rohit Pujari September 25, 2017 Washington, DC Forward-Looking Statements During the course of this presentation, we may
More informationArchitecting High Performance Computing Systems for Fault Tolerance and Reliability
Architecting High Performance Computing Systems for Fault Tolerance and Reliability Blake T. Gonzales HPC Computer Scientist Dell Advanced Systems Group blake_gonzales@dell.com Agenda HPC Fault Tolerance
More informationAnyMiner 3.0, Real-time Big Data Analysis Solution for Everything Data Analysis. Mar 25, TmaxSoft Co., Ltd. All Rights Reserved.
AnyMiner 3.0, Real-time Big Analysis Solution for Everything Analysis Mar 25, 2015 2015 TmaxSoft Co., Ltd. All Rights Reserved. Ⅰ Ⅱ Ⅲ Platform for Net IT AnyMiner, Real-time Big Analysis Solution AnyMiner
More informationIntegrated CPU and Cache Power Management in Multiple Clock Domain Processors
Integrated CPU and Cache Power Management in Multiple Clock Domain Processors Nevine AbouGhazaleh, Bruce Childers, Daniel Mossé & Rami Melhem Department of Computer Science University of Pittsburgh HiPEAC
More informationCertified Big Data and Hadoop Course Curriculum
Certified Big Data and Hadoop Course Curriculum The Certified Big Data and Hadoop course by DataFlair is a perfect blend of in-depth theoretical knowledge and strong practical skills via implementation
More informationMPI RUNTIMES AT JSC, NOW AND IN THE FUTURE
, NOW AND IN THE FUTURE Which, why and how do they compare in our systems? 08.07.2018 I MUG 18, COLUMBUS (OH) I DAMIAN ALVAREZ Outline FZJ mission JSC s role JSC s vision for Exascale-era computing JSC
More information