Going further with Damaris: Energy/Performance Study in Post-Petascale I/O Approaches

Size: px
Start display at page:

Download "Going further with Damaris: Energy/Performance Study in Post-Petascale I/O Approaches"

Transcription

1 Going further with Damaris: Energy/Performance Study in Post-Petascale I/O Approaches Dorier, Orçun Yildiz, Shadi Ibrahim, Gabriel Antoniu, Anne-Cécile Orgerie 2 nd workshop of the JLESC Chicago, November

2 Challenge (2009): Make CM1 s I/O scale for the future Blue Waters system Image credit: Leigh Orf, Bob Wilhelmson 2

3 TradiXonal I/O approach Periodic checkpoints 100,000+ processes Too much data, too many files 10,000+ processes Offline, a]er simulaxon Transfer to another cluster 100 to 1000 I/O servers I/O bursts 3

4 Time-parXXoning I/O (or why you didn t get results in 5me for the deadline ) SimulaXon periodically stops to perform I/O File-per-process approach Collec=ve I/O approach Too many files Hard to read back High metadata overhead Requires synchronizaxon Data communicaxon steps 4

5 SoluXon: Damaris 5

6 Damaris in a few key concepts Dedicated I/O cores these cores do not perform any computaxon Shared memory to improve memory usage by avoiding copies Plugin system adaptability/flexibility connecxon with visualizaxon so]ware Simple API and external XML-based metadata 6

7 Damaris in the context of the Joint Lab Star=ng point Nov. 2009: preliminary discussions on I/O challenges for Blue Waters at the 2nd JLPC workshop First steps 2010: Start of the collaboraxon of the KerData team (INRIA) with NCSA: Dorier s MS internship@uiuc with Franck Cappello and Marc Snir 2011: CollaboraXon extended to ANL since (Rob Ross, Tom Peterka) Other internships and mutual visits followed Damaris at the core of several joint projects 2012: FACCTS (2012) PIs: Rob Ross, Gabriel Antoniu : Data@Exascale Associate Team (INRIA, ANL, UIUC) PIs: Gabriel Antoniu (INRIA), Rob Ross (ANL), Marc Snir (UIUC) : a WP within the NextGN PUF ANL-INRIA(+partners) Other joint projects in preparaxon 7

8 EvoluXon of Damaris Development StarXng ICS ACM SRC 2 nd Prize IPDPS 2013 PhD forum Cluster 2012 Time ParXXoning LDAV 2013 IPDPS 2014 PhD forum DIDC 2014 Version CM1 Grid 5000 Kraken OLAM Titan Nek5000 Intrepid Blue Waters In Situ VisualizaXon enabled with VisIt Dedicated nodes GTC 8

9 People involved INRIA Dorier Gabriel Antoniu Lokman Rahmani Shadi Ibrahim Orçun Yildiz Anne-Cécile Orgerie NCSA Roberto Sisneros Dave Semeraro ANL Franck Cappello Marc Snir Rob Ross Tom Peterka Dries Kimpe Internships: Dorier (1 st year master) Ma@hieu Perin (1 st year master) Sergiu Vicol (1 st year bachelor) Catalina Nita (1 st year master) Orçun Yildiz (2 nd year master) External users/contributors: Leigh Orf (Central Michigan) Francieli Zanon Boito (UFRGS) Rodrigo Kassick (UFRGS) 9

10 Damaris 1.0: state of the implementaxon 3 modes Synchronous (Xme-parXXoning) Dedicated core(s) Dedicated node(s) Very simple API for C, C++ and Fortran simulaxons XML-based data descripxon Enable/Disable shared memory Plugin system (C++ plugins) ConnecXon to VisIt for in situ visualiza=on About 20,000 lines of C++ code, based on MPI Depends on Boost, Xerces-C, XSD, (opxonally VisIt) hrp://damaris.gforge.inria.fr Poten=al plans for integra=on within the VisIt package Poten=al plans for use as one of the default backend in CM1 10

11 Three I/O approaches in Damaris Time ParXXoning Dedicated Core(s) Dedicated Node(s) Switch between modes using configuraxon file <dedicated cores= X nodes= Y /> Time par==oning Good at small scale, bad at larger scales Dedicated cores Good when many cores/node, when memory can be afforded Dedicated nodes Good when few cores/node and memory on a node is enough 11

12 Focus of this talk: How much does the I/O approach impact energy efficiency? Other talks related to this collabora5on: Lokman Rahmani, Gabriel Antoniu 12

13 Goals Study the impact of: The I/O approach (dedicated cores, nodes, etc.) The I/O frequency (Xme between I/O phases) The underlying architecture On: SimulaXon performance Energy consumpxon 13

14 Experimental setup: CM1 on Grid 5000 Nancy site: Graphene cluster (4 cores/node) 20G InfiniBand network 6 PVFS servers EATON Power DistribuXon Units CM1 on 32 nodes (128 cores) Rennes site: Parapluie cluster (24 cores/node) 20G InfiniBand network 3 PVFS servers EATON Power DistribuXon Units CM1 on 16 nodes (384 cores) 14

15 Impact of the I/O approach (G5K/Nancy) Lower = Be@er And the winner is Longer run Xme + I/O variability = lower power usage 15

16 Impact of the I/O frequency (G5K/Nancy) Lower = Be@er Time-par==oning: linear dependency between frequency and energy consumpxon Dedicated resources: when doing I/O every 10 iteraxons, DC(1), DC(2) and DN(7:1) cannot keep up, resulxng in higher energy consumpxon When dedicated resources can keep up, the energy consump=on does not depend on I/O anymore 16

17 Impact of the architecture (G5K/Nancy,Rennes) Lower = Be@er DedicaXng 1 core is the best approach on the Rennes site (24 cores per node) DedicaXng 1 node every 8 is the best approach on the Nancy site (4 cores per node) Different number of cores per node = different op=mal I/O approach 17

18 Overall power/runxme results Nancy Rennes 1 dedicated node for 7 simulaxon nodes 1 dedicated core per 24-core node 18

19 Can we model the energy efficiency under different I/O approaches? Can we predict the best one? 19

20 Model s hypothesis ApplicaXon is computaxon-intensive I/O in fully overlap with computaxon 20

21 Energy model: general case E = T sim P sim Time for 1 iteraxon on 1 core Number of iteraxons T sim = T base n iterations (n cores/node s core (n cores/node ))(n nodes s nodes (n nodes )) Scalability funcxons w.r.t. number of cores and number of nodes Simplifica=on 1: the scalability w.r.t. the number of cores per node does not depend on the number of node, and the scalability w.r.t. the number of nodes does not depend on the number of cores per node. 21

22 Energy model for dedicated nodes Max power of a node Idle power of a node P sim = P max c + 1 ( 2 P idle + P max )d c + d Number of simulaxon nodes Number of dedicated nodes Simplifica=on 2: The power of a dedicated node is the average of max and idle. 22

23 Energy model for dedicated cores P sim = P max Simplifica=on 3: The power of a node running the simulaxon does change significantly when some of the cores are dedicated to I/O. 23

24 Model calibraxon (with CM1 on G5K/Rennes) Scalability of CM1 w.r.t. the number of cores per node and w.r.t. the number of nodes Power consumpxon of 8 nodes on Rennes site of G5K (max power and idle power) 24

25 Model validaxon (CM1 on G5K/Rennes) Five runs (error bars = min-max) Worst relaxve error between model and observaxon: 4% Larger variaxons with DN(7:1): probably due to network contenxon Best approach predicted (and observed): 1 dedicated core/node 25

26 Model validaxon (CM1 on G5K/Nancy) Five runs (error bars = min-max) Worst relaxve error between model and observaxon: 5.7% Best approach predicted (and observed): 1 dedicated node for 7 non-dedicated nodes 26

27 Model s accuracy: summary Site Approach Accuracy Rennes Dedicated cores (1) Dedicated cores (2) Dedicated nodes (15:1) Dedicated nodes (7:1) Nancy Dedicated cores (1) Dedicated nodes (15:1) Dedicated nodes (7:1) 96.0% 96.9% 97.3% 98.0% 95.0% 94.3% 95.0% 27

28 Conclusion 28

29 Conclusion and future direcxons Contribu=ons: Insight on energy/performance of I/O approaches All available within Damaris Energy model for dedicated cores and dedicated nodes ValidaXon on Grid 5000 with CM1 Model s limita=on: Valid for computaxon-intensive applicaxons Does not include network-related energy consumpxon Does not include the energy consumpxon in the storage system Future work: ValidaXon with other simulaxons Tradeoff between compression, performance and energy consumpxon 29

30 A bit of adverxsement: Darshan-Web Demo: Installa=on tutorial: 30

From Damaris to CALCioM Mi/ga/ng I/O Interference in HPC Systems

From Damaris to CALCioM Mi/ga/ng I/O Interference in HPC Systems From Damaris to CALCioM Mi/ga/ng I/O Interference in HPC Systems Ma#hieu Dorier ENS Rennes, IRISA, Inria Rennes KerData project team Joint work with Rob Ross, Dries Kimpe, Gabriel Antoniu, Shadi Ibrahim

More information

Damaris. In-Situ Data Analysis and Visualization for Large-Scale HPC Simulations. KerData Team. Inria Rennes,

Damaris. In-Situ Data Analysis and Visualization for Large-Scale HPC Simulations. KerData Team. Inria Rennes, Damaris In-Situ Data Analysis and Visualization for Large-Scale HPC Simulations KerData Team Inria Rennes, http://damaris.gforge.inria.fr Outline 1. From I/O to in-situ visualization 2. Damaris approach

More information

Damaris: Using Dedicated I/O Cores for Scalable Post-petascale HPC Simulations

Damaris: Using Dedicated I/O Cores for Scalable Post-petascale HPC Simulations Damaris: Using Dedicated I/O Cores for Scalable Post-petascale HPC Simulations Matthieu Dorier ENS Cachan Brittany extension matthieu.dorier@eleves.bretagne.ens-cachan.fr Advised by Gabriel Antoniu SRC

More information

Concurrency-Optimized I/O For Visualizing HPC Simulations: An Approach Using Dedicated I/O Cores

Concurrency-Optimized I/O For Visualizing HPC Simulations: An Approach Using Dedicated I/O Cores Concurrency-Optimized I/O For Visualizing HPC Simulations: An Approach Using Dedicated I/O Cores Ma#hieu Dorier, Franck Cappello, Marc Snir, Bogdan Nicolae, Gabriel Antoniu 4th workshop of the Joint Laboratory

More information

KerData: Scalable Data Management on Clouds and Beyond

KerData: Scalable Data Management on Clouds and Beyond KerData: Scalable Data Management on Clouds and Beyond Gabriel Antoniu INRIA Rennes Bretagne Atlantique Research Centre Franco-British Workshop on Big Data in Science, 6-7 November 2012 The French Institute

More information

A-Brain and Z-CloudFlow: Scalable Data Processing on Azure Clouds Lessons Learned in 3 Years and Future Directions

A-Brain and Z-CloudFlow: Scalable Data Processing on Azure Clouds Lessons Learned in 3 Years and Future Directions A-Brain and Z-CloudFlow: Scalable Data Processing on Azure Clouds Lessons Learned in 3 Years and Future Directions A-Brain Project PIs: Gabriel Antoniu, Bertrand Thirion Contributors: Alexandru Costan,

More information

A Performance and Energy Analysis of I/O Management Approaches for Exascale Systems

A Performance and Energy Analysis of I/O Management Approaches for Exascale Systems A Performance and Energy Analysis of Management Approaches for Exascale Systems Orcun Yildiz, Matthieu Dorier, Shadi Ibrahim, Gabriel Antoniu To cite this version: Orcun Yildiz, Matthieu Dorier, Shadi

More information

Efficient Big Data Processing on Large-Scale Shared Platforms managing I/Os and Failure

Efficient Big Data Processing on Large-Scale Shared Platforms managing I/Os and Failure Efficient Big Data Processing on Large-Scale Shared Platforms managing I/Os and Failure Orcun Yildiz To cite this version: Orcun Yildiz. Efficient Big Data Processing on Large-Scale Shared Platforms managing

More information

Debugging CUDA Applications with Allinea DDT. Ian Lumb Sr. Systems Engineer, Allinea Software Inc.

Debugging CUDA Applications with Allinea DDT. Ian Lumb Sr. Systems Engineer, Allinea Software Inc. Debugging CUDA Applications with Allinea DDT Ian Lumb Sr. Systems Engineer, Allinea Software Inc. ilumb@allinea.com GTC 2013, San Jose, March 20, 2013 Embracing GPUs GPUs a rival to traditional processors

More information

Inria-UIUC/NCSA-ANL-BSC-JSC-Riken Joint Laboratory on Extreme Scale Computing (JLESC)

Inria-UIUC/NCSA-ANL-BSC-JSC-Riken Joint Laboratory on Extreme Scale Computing (JLESC) Inria-UIUC/NCSA-ANL-BSC-JSC-Riken Joint Laboratory on Extreme Scale Computing (JLESC) Franck Cappello, Argonne National Laboratory University of Illinois at Urbana Champaign Why? And What is it? JLESC

More information

Towards Scalable Data Management for Map-Reduce-based Data-Intensive Applications on Cloud and Hybrid Infrastructures

Towards Scalable Data Management for Map-Reduce-based Data-Intensive Applications on Cloud and Hybrid Infrastructures Towards Scalable Data Management for Map-Reduce-based Data-Intensive Applications on Cloud and Hybrid Infrastructures Frédéric Suter Joint work with Gabriel Antoniu, Julien Bigot, Cristophe Blanchet, Luc

More information

Leveraging Burst Buffer Coordination to Prevent I/O Interference

Leveraging Burst Buffer Coordination to Prevent I/O Interference Leveraging Burst Buffer Coordination to Prevent I/O Interference Anthony Kougkas akougkas@hawk.iit.edu Matthieu Dorier, Rob Latham, Rob Ross, Xian-He Sun Wednesday, October 26th Baltimore, USA Outline

More information

Introduction to FREE National Resources for Scientific Computing. Dana Brunson. Jeff Pummill

Introduction to FREE National Resources for Scientific Computing. Dana Brunson. Jeff Pummill Introduction to FREE National Resources for Scientific Computing Dana Brunson Oklahoma State University High Performance Computing Center Jeff Pummill University of Arkansas High Peformance Computing Center

More information

Techniques to improve the scalability of Checkpoint-Restart

Techniques to improve the scalability of Checkpoint-Restart Techniques to improve the scalability of Checkpoint-Restart Bogdan Nicolae Exascale Systems Group IBM Research Ireland 1 Outline A few words about the lab and team Challenges of Exascale A case for Checkpoint-Restart

More information

Do You Know What Your I/O Is Doing? (and how to fix it?) William Gropp

Do You Know What Your I/O Is Doing? (and how to fix it?) William Gropp Do You Know What Your I/O Is Doing? (and how to fix it?) William Gropp www.cs.illinois.edu/~wgropp Messages Current I/O performance is often appallingly poor Even relative to what current systems can achieve

More information

CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination

CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination Author manuscript, published in "IPDPS - International Parallel and Distributed Processing Symposium (4)" CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination Matthieu

More information

Update on Topology Aware Scheduling (aka TAS)

Update on Topology Aware Scheduling (aka TAS) Update on Topology Aware Scheduling (aka TAS) Work done in collaboration with J. Enos, G. Bauer, R. Brunner, S. Islam R. Fiedler Adaptive Computing 2 When things look good Image credit: Dave Semeraro 3

More information

Applying ML to SDN: Use- Cases and Ongoing Experiments

Applying ML to SDN: Use- Cases and Ongoing Experiments Applying L to SDN: Use- Cases and Ongoing Experiments Ongoing Research Albert Cabellos (UPC/BarcelonaTech, Spain) Prof. Jean Walrand (UC Berkeley) albert.cabellos@gmail.com IETF 94 Yokohama NLRG November

More information

AGIOS: Application-guided I/O Scheduling for Parallel File Systems

AGIOS: Application-guided I/O Scheduling for Parallel File Systems AGIOS: Application-guided I/O Scheduling for Parallel File Systems Francieli Zanon Boito,, Rodrigo Virote Kassick,, Philippe O. A. Navaux, Yves Denneulin Institute of Informatics Federal University of

More information

The Fusion Distributed File System

The Fusion Distributed File System Slide 1 / 44 The Fusion Distributed File System Dongfang Zhao February 2015 Slide 2 / 44 Outline Introduction FusionFS System Architecture Metadata Management Data Movement Implementation Details Unique

More information

P. K. Yeung Georgia Tech,

P. K. Yeung Georgia Tech, Progress in Petascale Computations of Turbulence at high Reynolds number P. K. Yeung Georgia Tech, pk.yeung@ae.gatech.edu Blue Waters Symposium NCSA, Urbana-Champaign, May 2014 Thanks... NSF: PRAC and

More information

Addressing the Challenges of I/O Variability in Post-Petascale HPC Simulations

Addressing the Challenges of I/O Variability in Post-Petascale HPC Simulations Addressing the Challenges of I/O Variability in Post-Petascale HPC Simulations Matthieu Dorier To cite this version: Matthieu Dorier. Addressing the Challenges of I/O Variability in Post-Petascale HPC

More information

Use of Resource Sharing Techniques to Increase Parallel Test and Test Coverage in Wafer Test Michael Huebner

Use of Resource Sharing Techniques to Increase Parallel Test and Test Coverage in Wafer Test Michael Huebner Use of Resource Sharing Techniques to Increase Parallel Test and Test Coverage in Wafer Test Michael Huebner FormFactor, Inc Mo>va>on With increasing test >mes/dut and die per wafer, test >me/wafer and

More information

NVIDIA Update and Directions on GPU Acceleration for Earth System Models

NVIDIA Update and Directions on GPU Acceleration for Earth System Models NVIDIA Update and Directions on GPU Acceleration for Earth System Models Stan Posey, HPC Program Manager, ESM and CFD, NVIDIA, Santa Clara, CA, USA Carl Ponder, PhD, Applications Software Engineer, NVIDIA,

More information

Simulation-time data analysis and I/O acceleration at extreme scale with GLEAN

Simulation-time data analysis and I/O acceleration at extreme scale with GLEAN Simulation-time data analysis and I/O acceleration at extreme scale with GLEAN Venkatram Vishwanath, Mark Hereld and Michael E. Papka Argonne Na

More information

High Performance Data Analytics for Numerical Simulations. Bruno Raffin DataMove

High Performance Data Analytics for Numerical Simulations. Bruno Raffin DataMove High Performance Data Analytics for Numerical Simulations Bruno Raffin DataMove bruno.raffin@inria.fr April 2016 About this Talk HPC for analyzing the results of large scale parallel numerical simulations

More information

Application I/O on Blue Waters. Rob Sisneros Kalyana Chadalavada

Application I/O on Blue Waters. Rob Sisneros Kalyana Chadalavada Application I/O on Blue Waters Rob Sisneros Kalyana Chadalavada I/O For Science! HDF5 I/O Library PnetCDF Adios IOBUF Scien'st Applica'on I/O Middleware U'li'es Parallel File System Darshan Blue Waters

More information

GPPD Grupo de Processamento Paralelo e Distribuído Parallel and Distributed Processing Group

GPPD Grupo de Processamento Paralelo e Distribuído Parallel and Distributed Processing Group GPPD Grupo de Processamento Paralelo e Distribuído Parallel and Distributed Processing Group Philippe O. A. Navaux HPC e novas Arquiteturas CPTEC - Cachoeira Paulista - SP 8 de março de 2017 Team Professor:

More information

Analysis of CPU Pinning and Storage Configuration in 100 Gbps Network Data Transfer

Analysis of CPU Pinning and Storage Configuration in 100 Gbps Network Data Transfer Analysis of CPU Pinning and Storage Configuration in 100 Gbps Network Data Transfer International Center for Advanced Internet Research Northwestern University Se-young Yu Jim Chen, Joe Mambretti, Fei

More information

Energy efficient mapping of virtual machines

Energy efficient mapping of virtual machines GreenDays@Lille Energy efficient mapping of virtual machines Violaine Villebonnet Thursday 28th November 2013 Supervisor : Georges DA COSTA 2 Current approaches for energy savings in cloud Several actions

More information

Scalable Parallel Building Blocks for Custom Data Analysis

Scalable Parallel Building Blocks for Custom Data Analysis Scalable Parallel Building Blocks for Custom Data Analysis Tom Peterka, Rob Ross (ANL) Attila Gyulassy, Valerio Pascucci (SCI) Wes Kendall (UTK) Han-Wei Shen, Teng-Yok Lee, Abon Chaudhuri (OSU) Morse-Smale

More information

Big Data: Tremendous challenges, great solutions

Big Data: Tremendous challenges, great solutions Big Data: Tremendous challenges, great solutions Luc Bougé ENS Rennes Alexandru Costan INSA Rennes Gabriel Antoniu INRIA Rennes Survive the data deluge! Équipe KerData 1 Big Data? 2 Big Picture The digital

More information

A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing

A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou, K. Magoutis, M. Marazakis, A. Bilas Institute of Computer Science (ICS) Foundation for Research and

More information

Rollback-Recovery Protocols for Send-Deterministic Applications. Amina Guermouche, Thomas Ropars, Elisabeth Brunet, Marc Snir and Franck Cappello

Rollback-Recovery Protocols for Send-Deterministic Applications. Amina Guermouche, Thomas Ropars, Elisabeth Brunet, Marc Snir and Franck Cappello Rollback-Recovery Protocols for Send-Deterministic Applications Amina Guermouche, Thomas Ropars, Elisabeth Brunet, Marc Snir and Franck Cappello Fault Tolerance in HPC Systems is Mandatory Resiliency is

More information

Introduction to High Performance Parallel I/O

Introduction to High Performance Parallel I/O Introduction to High Performance Parallel I/O Richard Gerber Deputy Group Lead NERSC User Services August 30, 2013-1- Some slides from Katie Antypas I/O Needs Getting Bigger All the Time I/O needs growing

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 3//5 CS 6C: Great Ideas in Computer Architecture (Machine Structures) Caches Part Instructors: Krste Asanovic & Vladimir Stojanovic hfp://insteecsberkeleyedu/~cs6c/ Parallel Requests Assigned to computer

More information

High Performance Components with Charm++ and OpenAtom (Work in Progress)

High Performance Components with Charm++ and OpenAtom (Work in Progress) High Performance Components with Charm++ and OpenAtom (Work in Progress) Christian Perez Graal/Avalon INRIA EPI LIP, ENS Lyon, France Joint Laboratory for Petascale Computing University of Illinois at

More information

Enabling Active Storage on Parallel I/O Software Stacks. Seung Woo Son Mathematics and Computer Science Division

Enabling Active Storage on Parallel I/O Software Stacks. Seung Woo Son Mathematics and Computer Science Division Enabling Active Storage on Parallel I/O Software Stacks Seung Woo Son sson@mcs.anl.gov Mathematics and Computer Science Division MSST 2010, Incline Village, NV May 7, 2010 Performing analysis on large

More information

CSE6230 Fall Parallel I/O. Fang Zheng

CSE6230 Fall Parallel I/O. Fang Zheng CSE6230 Fall 2012 Parallel I/O Fang Zheng 1 Credits Some materials are taken from Rob Latham s Parallel I/O in Practice talk http://www.spscicomp.org/scicomp14/talks/l atham.pdf 2 Outline I/O Requirements

More information

Quantifying the Energy Impact of Green Cloud Computing Users

Quantifying the Energy Impact of Green Cloud Computing Users Quantifying the Energy Impact of Green Cloud Computing Users David Guyon University of Rennes 1 david.guyon@irisa.fr Anne-Cécile Orgerie CNRS Christine Morin Inria July 5 th, 2016 David Guyon @ Compas

More information

Message Passing Interface (MPI)

Message Passing Interface (MPI) What the course is: An introduction to parallel systems and their implementation using MPI A presentation of all the basic functions and types you are likely to need in MPI A collection of examples What

More information

Parallel, In Situ Indexing for Data-intensive Computing. Introduction

Parallel, In Situ Indexing for Data-intensive Computing. Introduction FastQuery - LDAV /24/ Parallel, In Situ Indexing for Data-intensive Computing October 24, 2 Jinoh Kim, Hasan Abbasi, Luis Chacon, Ciprian Docan, Scott Klasky, Qing Liu, Norbert Podhorszki, Arie Shoshani,

More information

MPI & OpenMP Mixed Hybrid Programming

MPI & OpenMP Mixed Hybrid Programming MPI & OpenMP Mixed Hybrid Programming Berk ONAT İTÜ Bilişim Enstitüsü 22 Haziran 2012 Outline Introduc/on Share & Distributed Memory Programming MPI & OpenMP Advantages/Disadvantages MPI vs. OpenMP Why

More information

Bridging the Gap Between High Quality and High Performance for HPC Visualization

Bridging the Gap Between High Quality and High Performance for HPC Visualization Bridging the Gap Between High Quality and High Performance for HPC Visualization Rob Sisneros National Center for Supercomputing Applications University of Illinois at Urbana Champaign Outline Why am I

More information

POCCS: A Parallel Out-of-Core Computing System for Linux Clusters

POCCS: A Parallel Out-of-Core Computing System for Linux Clusters POCCS: A Parallel Out-of-Core System for Linux Clusters JIANQI TANG BINXING FANG MINGZENG HU HONGLI ZHANG Department of Computer Science and Engineering Harbin Institute of Technology No.92, West Dazhi

More information

Communication and Topology-aware Load Balancing in Charm++ with TreeMatch

Communication and Topology-aware Load Balancing in Charm++ with TreeMatch Communication and Topology-aware Load Balancing in Charm++ with TreeMatch Joint lab 10th workshop (IEEE Cluster 2013, Indianapolis, IN) Emmanuel Jeannot Esteban Meneses-Rojas Guillaume Mercier François

More information

Resilient Distributed Concurrent Collections. Cédric Bassem Promotor: Prof. Dr. Wolfgang De Meuter Advisor: Dr. Yves Vandriessche

Resilient Distributed Concurrent Collections. Cédric Bassem Promotor: Prof. Dr. Wolfgang De Meuter Advisor: Dr. Yves Vandriessche Resilient Distributed Concurrent Collections Cédric Bassem Promotor: Prof. Dr. Wolfgang De Meuter Advisor: Dr. Yves Vandriessche 1 Evolution of Performance in High Performance Computing Exascale = 10 18

More information

The Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011

The Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011 The Road to ExaScale Advances in High-Performance Interconnect Infrastructure September 2011 diego@mellanox.com ExaScale Computing Ambitious Challenges Foster Progress Demand Research Institutes, Universities

More information

Persistent Data Staging Services for Data Intensive In-situ Scientific Workflows

Persistent Data Staging Services for Data Intensive In-situ Scientific Workflows Persistent Data Staging Services for Data Intensive In-situ Scientific Workflows Melissa Romanus * melissa@cac.rutgers.edu Qian Sun * qiansun@cac.rutgers.edu Jong Choi choi@ornl.gov Scott Klasky klasky@ornl.gov

More information

Transferring a Petabyte in a Day. Raj Kettimuthu, Zhengchun Liu, David Wheeler, Ian Foster, Katrin Heitmann, Franck Cappello

Transferring a Petabyte in a Day. Raj Kettimuthu, Zhengchun Liu, David Wheeler, Ian Foster, Katrin Heitmann, Franck Cappello Transferring a Petabyte in a Day Raj Kettimuthu, Zhengchun Liu, David Wheeler, Ian Foster, Katrin Heitmann, Franck Cappello Huge amount of data from extreme scale simulations and experiments Systems have

More information

Extreme-scale Graph Analysis on Blue Waters

Extreme-scale Graph Analysis on Blue Waters Extreme-scale Graph Analysis on Blue Waters 2016 Blue Waters Symposium George M. Slota 1,2, Siva Rajamanickam 1, Kamesh Madduri 2, Karen Devine 1 1 Sandia National Laboratories a 2 The Pennsylvania State

More information

ECE7995 (7) Parallel I/O

ECE7995 (7) Parallel I/O ECE7995 (7) Parallel I/O 1 Parallel I/O From user s perspective: Multiple processes or threads of a parallel program accessing data concurrently from a common file From system perspective: - Files striped

More information

CUG Talk. In-situ data analytics for highly scalable cloud modelling on Cray machines. Nick Brown, EPCC

CUG Talk. In-situ data analytics for highly scalable cloud modelling on Cray machines. Nick Brown, EPCC CUG Talk In-situ analytics for highly scalable cloud modelling on Cray machines Nick Brown, EPCC nick.brown@ed.ac.uk Met Office NERC Cloud model (MONC) Uses Large Eddy Simulation for modelling clouds &

More information

Performance Modeling for Systematic Performance Tuning

Performance Modeling for Systematic Performance Tuning Performance Modeling for Systematic Performance Tuning Torsten Hoefler with inputs from William Gropp, Marc Snir, Bill Kramer Invited Talk RWTH Aachen University March 30 th, Aachen, Germany All used images

More information

Mondrian Mul+dimensional K Anonymity

Mondrian Mul+dimensional K Anonymity Mondrian Mul+dimensional K Anonymity Kristen Lefevre, David J. DeWi

More information

INTERTWinE workshop. Decoupling data computation from processing to support high performance data analytics. Nick Brown, EPCC

INTERTWinE workshop. Decoupling data computation from processing to support high performance data analytics. Nick Brown, EPCC INTERTWinE workshop Decoupling data computation from processing to support high performance data analytics Nick Brown, EPCC n.brown@epcc.ed.ac.uk Met Office NERC Cloud model (MONC) Uses Large Eddy Simulation

More information

FINAL REPORT. Milestone/Deliverable Description: Final implementation and final report

FINAL REPORT. Milestone/Deliverable Description: Final implementation and final report FINAL REPORT PRAC Topic: Petascale simulations of complex biological behavior in fluctuating environments NSF Award ID: 0941360 Principal Investigator: Ilias Tagkopoulos, UC Davis Milestone/Deliverable

More information

HANDLING LOAD IMBALANCE IN DISTRIBUTED & SHARED MEMORY

HANDLING LOAD IMBALANCE IN DISTRIBUTED & SHARED MEMORY HANDLING LOAD IMBALANCE IN DISTRIBUTED & SHARED MEMORY Presenters: Harshitha Menon, Seonmyeong Bak PPL Group Phil Miller, Sam White, Nitin Bhat, Tom Quinn, Jim Phillips, Laxmikant Kale MOTIVATION INTEGRATED

More information

IWES st Italian Workshop on Embedded Systems Pisa September 2016

IWES st Italian Workshop on Embedded Systems Pisa September 2016 IWES 2016 1st Italian Workshop on Embedded Systems Pisa -- 19 September 2016 Research Group Overview Roberto Giorgi University of Siena, Italy http://www.dii.unisi.it/~giorgi Siena on Earth 2 Engineering

More information

Case Studies in Storage Access by Loosely Coupled Petascale Applications

Case Studies in Storage Access by Loosely Coupled Petascale Applications Case Studies in Storage Access by Loosely Coupled Petascale Applications Justin M Wozniak and Michael Wilde Petascale Data Storage Workshop at SC 09 Portland, Oregon November 15, 2009 Outline Scripted

More information

Using Global Behavior Modeling to improve QoS in Cloud Data Storage Services

Using Global Behavior Modeling to improve QoS in Cloud Data Storage Services 2 nd IEEE International Conference on Cloud Computing Technology and Science Using Global Behavior Modeling to improve QoS in Cloud Data Storage Services Jesús Montes, Bogdan Nicolae, Gabriel Antoniu,

More information

The Spider Center-Wide File System

The Spider Center-Wide File System The Spider Center-Wide File System Presented by Feiyi Wang (Ph.D.) Technology Integration Group National Center of Computational Sciences Galen Shipman (Group Lead) Dave Dillow, Sarp Oral, James Simmons,

More information

Fault Tolerant Runtime ANL. Wesley Bland Joint Lab for Petascale Compu9ng Workshop November 26, 2013

Fault Tolerant Runtime ANL. Wesley Bland Joint Lab for Petascale Compu9ng Workshop November 26, 2013 Fault Tolerant Runtime Research @ ANL Wesley Bland Joint Lab for Petascale Compu9ng Workshop November 26, 2013 Brief History of FT Checkpoint/Restart (C/R) has been around for quite a while Guards against

More information

ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing

ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing Prof. Wu FENG Department of Computer Science Virginia Tech Work smarter not harder Overview Grand Challenge A large-scale biological

More information

Course Outline (1) Fundamentals of Computer Networks ECE 478/578. Introduc)on. Fundamentals of Computer Networks (ECE 478/578)

Course Outline (1) Fundamentals of Computer Networks ECE 478/578. Introduc)on. Fundamentals of Computer Networks (ECE 478/578) Fundamentals of Computer Networks ECE 478/578 Spring 2013 Introduc)on Course Outline (1) Fundamentals of Computer Networks (ECE 478/578) Course Schedule: Tuesday & Thursday, 8:00am 9:15am 1 Course Outline

More information

Adaptive Runtime Support

Adaptive Runtime Support Scalable Fault Tolerance Schemes using Adaptive Runtime Support Laxmikant (Sanjay) Kale http://charm.cs.uiuc.edu Parallel Programming Laboratory Department of Computer Science University of Illinois at

More information

Scalable Software Components for Ultrascale Visualization Applications

Scalable Software Components for Ultrascale Visualization Applications Scalable Software Components for Ultrascale Visualization Applications Wes Kendall, Tom Peterka, Jian Huang SC Ultrascale Visualization Workshop 2010 11-15-2010 Primary Collaborators Jian Huang Tom Peterka

More information

Parallel Programming with MPI on Clusters

Parallel Programming with MPI on Clusters Parallel Programming with MPI on Clusters Rusty Lusk Mathematics and Computer Science Division Argonne National Laboratory (The rest of our group: Bill Gropp, Rob Ross, David Ashton, Brian Toonen, Anthony

More information

Scalable Fault Tolerance Schemes using Adaptive Runtime Support

Scalable Fault Tolerance Schemes using Adaptive Runtime Support Scalable Fault Tolerance Schemes using Adaptive Runtime Support Laxmikant (Sanjay) Kale http://charm.cs.uiuc.edu Parallel Programming Laboratory Department of Computer Science University of Illinois at

More information

Overview of post- processing with GLOBK

Overview of post- processing with GLOBK Overview of post- processing with GLOBK M. Floyd K. Palamartchouk Massachuse(s Ins,tute of Technology Newcastle University GAMIT- GLOBK course University of Bristol, UK 12 16 January 2015 Material from

More information

Foundations of Data-Parallel Particle Advection!

Foundations of Data-Parallel Particle Advection! Foundations of Data-Parallel Particle Advection! Early stages of Rayleigh-Taylor Instability flow! Tom Peterka, Rob Ross Argonne National Laboratory! Boonth Nouanesengsey, Teng-Yok Lee, Abon Chaudhuri,

More information

Update of Post-K Development Yutaka Ishikawa RIKEN AICS

Update of Post-K Development Yutaka Ishikawa RIKEN AICS Update of Post-K Development Yutaka Ishikawa RIKEN AICS 11:20AM 11:40AM, 2 nd of November, 2017 FLAGSHIP2020 Project Missions Building the Japanese national flagship supercomputer, post K, and Developing

More information

Responsive Large Data Analysis and Visualization with the ParaView Ecosystem. Patrick O Leary, Kitware Inc

Responsive Large Data Analysis and Visualization with the ParaView Ecosystem. Patrick O Leary, Kitware Inc Responsive Large Data Analysis and Visualization with the ParaView Ecosystem Patrick O Leary, Kitware Inc Hybrid Computing Attribute Titan Summit - 2018 Compute Nodes 18,688 ~3,400 Processor (1) 16-core

More information

Irregular Graph Algorithms on Parallel Processing Systems

Irregular Graph Algorithms on Parallel Processing Systems Irregular Graph Algorithms on Parallel Processing Systems George M. Slota 1,2 Kamesh Madduri 1 (advisor) Sivasankaran Rajamanickam 2 (Sandia mentor) 1 Penn State University, 2 Sandia National Laboratories

More information

Accurate emulation of CPU performance

Accurate emulation of CPU performance Accurate emulation of CPU performance Tomasz Buchert 1 Lucas Nussbaum 2 Jens Gustedt 1 1 INRIA Nancy Grand Est 2 LORIA / Nancy - Université Validation of distributed systems Approaches: Theoretical approach

More information

HPC at INRIA. Michel Cosnard INRIA President and CEO

HPC at INRIA. Michel Cosnard INRIA President and CEO 1 HPC at INRIA Michel Cosnard INRIA President and CEO French Institute for Research in Computer Science and Control 2 2 Information and Communication Sciences and Technologies Research Experiment Transfer

More information

Fault tolerance in Grid and Grid 5000

Fault tolerance in Grid and Grid 5000 Fault tolerance in Grid and Grid 5000 Franck Cappello INRIA Director of Grid 5000 fci@lri.fr Fault tolerance in Grid Grid 5000 Applications requiring Fault tolerance in Grid Domains (grid applications

More information

FECFRAME extension Adding convolutional FEC codes support to the FEC Framework

FECFRAME extension Adding convolutional FEC codes support to the FEC Framework FECFRAME extension Adding convolutional FEC codes support to the FEC Framework Vincent Roca, Inria, France Ali Begen, Networked Media, Turkey https://datatracker.ietf.org/doc/draft-roca-tsvwg-fecframev2/

More information

Towards a Reconfigurable HPC Component Model

Towards a Reconfigurable HPC Component Model C2S@EXA Meeting July 10, 2014 Towards a Reconfigurable HPC Component Model Vincent Lanore1, Christian Pérez2 1 ENS de Lyon, LIP 2 Inria, LIP Avalon team 1 Context 1/4 Adaptive Mesh Refinement 2 Context

More information

BlobSeer: Bringing High Throughput under Heavy Concurrency to Hadoop Map/Reduce Applications

BlobSeer: Bringing High Throughput under Heavy Concurrency to Hadoop Map/Reduce Applications Author manuscript, published in "24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 21) (21)" DOI : 1.119/IPDPS.21.547433 BlobSeer: Bringing High Throughput under Heavy Concurrency

More information

Trends in HPC I/O and File Systems

Trends in HPC I/O and File Systems Trends in HPC I/O and File Systems Rob Ross, Phil Carns, Dave Goodell, Kevin Harms, Kamil Iskra, Dries Kimpe, Rob Latham, Tom Peterka, Rajeev Thakur, and Venkat Vishwanath MathemaBcs and Computer Science

More information

Pushing the limits of CAN - Scheduling frames with offsets provides a major performance boost

Pushing the limits of CAN - Scheduling frames with offsets provides a major performance boost Pushing the limits of CAN - Scheduling frames with offsets provides a major performance boost Nicolas NAVET INRIA / RealTime-at-Work http://www.loria.fr/~nnavet http://www.realtime-at-work.com Nicolas.Navet@loria.fr

More information

The Earth System Grid: A Visualisation Solution. Gary Strand

The Earth System Grid: A Visualisation Solution. Gary Strand The Earth System Grid: A Visualisation Solution Gary Strand Introduction Acknowledgments PI s Ian Foster (ANL) Don Middleton (NCAR) Dean Williams (LLNL) ESG Development Team Veronika Nefedova (ANL) Ann

More information

Tutorial: Application MPI Task Placement

Tutorial: Application MPI Task Placement Tutorial: Application MPI Task Placement Juan Galvez Nikhil Jain Palash Sharma PPL, University of Illinois at Urbana-Champaign Tutorial Outline Why Task Mapping on Blue Waters? When to do mapping? How

More information

Scalable In-memory Checkpoint with Automatic Restart on Failures

Scalable In-memory Checkpoint with Automatic Restart on Failures Scalable In-memory Checkpoint with Automatic Restart on Failures Xiang Ni, Esteban Meneses, Laxmikant V. Kalé Parallel Programming Laboratory University of Illinois at Urbana-Champaign November, 2012 8th

More information

Enabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters

Enabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters Enabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters Presentation at GTC 2014 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda

More information

RV: A Run'me Verifica'on Framework for Monitoring, Predic'on and Mining

RV: A Run'me Verifica'on Framework for Monitoring, Predic'on and Mining RV: A Run'me Verifica'on Framework for Monitoring, Predic'on and Mining Patrick Meredith Grigore Rosu University of Illinois at Urbana Champaign (UIUC) Run'me Verifica'on, Inc. (joint work with Dongyun

More information

Today (2010): Multicore Computing 80. Near future (~2018): Manycore Computing Number of Cores Processing

Today (2010): Multicore Computing 80. Near future (~2018): Manycore Computing Number of Cores Processing Number of Cores Manufacturing Process 300 250 200 150 100 50 0 2004 2006 2008 2010 2012 2014 2016 2018 100 Today (2010): Multicore Computing 80 1~12 cores commodity architectures 70 60 80 cores proprietary

More information

Technical Comparison between several representative checkpoint/rollback solutions for MPI programs

Technical Comparison between several representative checkpoint/rollback solutions for MPI programs Technical Comparison between several representative checkpoint/rollback solutions for MPI programs Yuan Tang Innovative Computing Laboratory Department of Computer Science University of Tennessee Knoxville,

More information

Assistant Professor at Illinois Institute of Technology (CS) Director of the Data-Intensive Distributed Systems Laboratory (DataSys)

Assistant Professor at Illinois Institute of Technology (CS) Director of the Data-Intensive Distributed Systems Laboratory (DataSys) Current position: Assistant Professor at Illinois Institute of Technology (CS) Director of the Data-Intensive Distributed Systems Laboratory (DataSys) Guest Research Faculty, Argonne National Laboratory

More information

Extreme-scale Graph Analysis on Blue Waters

Extreme-scale Graph Analysis on Blue Waters Extreme-scale Graph Analysis on Blue Waters 2016 Blue Waters Symposium George M. Slota 1,2, Siva Rajamanickam 1, Kamesh Madduri 2, Karen Devine 1 1 Sandia National Laboratories a 2 The Pennsylvania State

More information

Europeana DSI 2 Access to Digital Resources of European Heritage

Europeana DSI 2 Access to Digital Resources of European Heritage Europeana DSI 2 Access to Digital Resources of European Heritage MILESTONE Revision 1.0 Date of submission 28.04.2017 Author(s) Krystian Adamski, Tarek Alkhaeir, Marcin Heliński, Aleksandra Nowak, Marcin

More information

Domain Name Service Project

Domain Name Service Project Domain Name Service Project ETSF10-HT11 Project supervisors: Payam Amani Department of Electrical and Information Technology Lund University Payam.Amani@eit.lth.se 1 Outline Short description of the project:

More information

Got Burst Buffer. Now What? Early experiences, exciting future possibilities, and what we need from the system to make it work

Got Burst Buffer. Now What? Early experiences, exciting future possibilities, and what we need from the system to make it work Got Burst Buffer. Now What? Early experiences, exciting future possibilities, and what we need from the system to make it work The Salishan Conference on High-Speed Computing April 26, 2016 Adam Moody

More information

Storage-Based Convergence Between HPC and Big Data

Storage-Based Convergence Between HPC and Big Data Storage-Based Convergence Between HPC and Big Data Pierre Matri*, Alexandru Costan, Gabriel Antoniu, Jesús Montes*, María S. Pérez* * Universidad Politécnica de Madrid, Madrid, Spain INSA Rennes / IRISA,

More information

Understanding StoRM: from introduction to internals

Understanding StoRM: from introduction to internals Understanding StoRM: from introduction to internals 13 November 2007 Outline Storage Resource Manager The StoRM service StoRM components and internals Deployment configuration Authorization and ACLs Conclusions.

More information

MPICH: A High-Performance Open-Source MPI Implementation. SC11 Birds of a Feather Session

MPICH: A High-Performance Open-Source MPI Implementation. SC11 Birds of a Feather Session MPICH: A High-Performance Open-Source MPI Implementation SC11 Birds of a Feather Session Schedule MPICH2 status and plans Presenta

More information

Sausalito: An Applica/on Server for RESTful Services in the Cloud. Ma;hias Brantner & Donald Kossmann 28msec Inc. h;p://sausalito.28msec.

Sausalito: An Applica/on Server for RESTful Services in the Cloud. Ma;hias Brantner & Donald Kossmann 28msec Inc. h;p://sausalito.28msec. Sausalito: An Applica/on Server for RESTful Services in the Cloud Ma;hias Brantner & Donald Kossmann 28msec Inc. h;p://sausalito.28msec.com/ Conclusion Integrate DBMS, Applica3on Server, and Web Server

More information

KAAPI : Adaptive Runtime System for Parallel Computing

KAAPI : Adaptive Runtime System for Parallel Computing KAAPI : Adaptive Runtime System for Parallel Computing Thierry Gautier, thierry.gautier@inrialpes.fr Bruno Raffin, bruno.raffin@inrialpes.fr, INRIA Grenoble Rhône-Alpes Moais Project http://moais.imag.fr

More information

Internet2 Webinar: Confluence BoF. April 28, 2009

Internet2 Webinar: Confluence BoF. April 28, 2009 Internet2 Webinar: Confluence BoF April 28, 2009 Ques=ons to answer How massively can Confluence scale? What are its limita=ons? How does clustering help Confluence scale? What are some guidelines in tuning

More information