Mining Supercomputer Jobs' I/O Behavior from System Logs. Xiaosong Ma

Size: px
Start display at page:

Download "Mining Supercomputer Jobs' I/O Behavior from System Logs. Xiaosong Ma"

Transcription

1 Mining Supercomputer Jobs' I/O Behavior from System Logs Xiaosong Ma

2 OLCF Architecture Overview Rhea node Development Cluster Eos 76 Node Cray XC Cluster Scalable IO Network (SION) - Infiniband Servers Servers 8 OST(LUN) Atlas 8 OST(LUN) Atlas

3 OLCF Architecture Overview Rhea node Development Cluster Eos 76 Node Cray XC Cluster Scalable IO Network (SION) - Infiniband Servers Servers Per-OST I/O throughput Monitoring tool 8 OST(LUN) Atlas MySQL database 8 OST(LUN) Server-side I/O throughput logs Atlas

4 Server-side I/O Throughput Logs RAID controller Coarse-granule logging

5 Server-side I/O Throughput Logs RAID controller Coarse-granule logging

6 I/O throughput logs Zero overhead No impact on user IO No user effort Server-side I/O Throughput Logs RAID controller Coarse-granule logging 6

7 I/O throughput logs Zero overhead No impact on user IO No user effort Mixed I/O traffic RAID controller Coarse-granule logging Server-side I/O Throughput Logs 7

8 6 8 Job scheduler logs Prior Work: IOSI Workflow Target App (User ID + App ID) Throughput logs Start_time End_time --6 : --6 : --7 : --7 : --8 : --8 7: IOSI Input Sample set IOSI Data preprocessing Per-sample wavelet transform Cross-sample I/O burst identification. IOSI paper: Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces, FAST '... 8 IOSI Output 8

9 6 8 Job scheduler logs Prior Work: IOSI Workflow Target App (User ID + App ID) Throughput logs Start_time End_time --6 : --6 : --7 : --7 : --8 : --8 7: IOSI Input Sample set IOSI Data preprocessing Per-sample wavelet transform Cross-sample I/O burst identification Strong assumption: identical runs of app.. IOSI paper: Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces, FAST '... 9 IOSI Output 9

10 AID: Automatic I/O Diverter Start_time End_time --6 : --6 : --7 : --7 : --8 : --8 7: Time App Job Job Job App App App Job Job Job Job Job Job Job Job Job Job Job Job Job Job Job App App6 Job Job Job Job Job Job Job Job Job Job 6 Scheduling suggestion Automatically identifying I/O-heavy apps (No prior knowledge, no user involvement)

11 AID: Automatic I/O Diverter SC 6 Tech paper presentation: Thursday pm, D Start_time End_time --6 : --6 : --7 : --7 : --8 : --8 7: Time App Job Job Job App App App Job Job Job Job Job Job Job Job Job Job Job Job Job Job Job App App6 Job Job Job Job Job Job Job Job Job Job 6 Scheduling suggestion Automatically identifying I/O-heavy apps (No prior knowledge, no user involvement)

12 Application I/O Characterization Results Name Value Total number of logged jobs 8,969 Unique applications identified 9,998 Initial I/O-intensive candidates 9 Candidates passing scope checking 67 Candidates passing minimum support User-verfied candidates 8 Result from months Titan I/O traffic and job logs (User verification by )

13 Application I/O Characterization Results Name Value Total number of logged jobs 8,969 Unique applications identified ID Node Time(m) OST 9,998 App. Domain Initial I/O-intensive candidates Geo-sciences Candidates passing scope checking Combustion Candidates passing minimum support Astrophysics User-verfied candidates Combustion - 8 Systems research Combustion Computer Science Environmental User-verified I/O-intensive applications

14 Application I/O Characterization Results Name Value Total number of logged jobs 8,969 Unique applications identified 9,998 Initial I/O-intensive candidates 9 Candidates passing scope checking 67 Candidates passing minimum support User-verfied candidates 8

15 Application I/O Characterization Results Name Value Total number of logged jobs 8,969 Unique applications identified 9,998 Initial I/O-intensive candidates 9 Candidates passing scope checking 67 Candidates passing minimum support User-verfied candidates 8

16 Application I/O Characterization Results Name Value Total number of logged jobs 8,969 Unique applications identified 9,998 Initial I/O-intensive candidates 9 Candidates passing scope checking 67 Candidates passing minimum support User-verfied candidates 8 Applications not using parallel I/O systems well! Similar finding as Huong HPDC work (Darshan) Motivates better I/O performance data analysis Connecting programs to systems 6

17 Questions? Xiaosong Ma Qatar Computing Research Institute, Hamad Bin Khalifa University 7

18 I/O Contention on Large-Scale HPC Systems ORNL s Titan (World s # Supercomputer) 7. PF Peak performance 8,688 compute nodes 6-core AMD Opteron Nvidia Tesla GPU + 6 GB memory -D Torus interconnect Performance variance on HPC Shared parallel file system I/O-heavy jobs collision -> I/O performance degradation I/O performance variance on Titan with IOR [6] 8

19 CDF of per-ost I/O throughput 99.6% time < % capacity (MB/s) 98.% time < % capacity (MB/s) 88.% time < % capacity (MB/s) 9

Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces. Yang Liu Raghul Gunasekaran Xiaosong Ma Sudharshan S.

Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces. Yang Liu Raghul Gunasekaran Xiaosong Ma Sudharshan S. Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces Yang Liu Raghul Gunasekaran Xiaosong Ma Sudharshan S. Vazhkudai Instance of Large-Scale HPC Systems ORNL s TITAN (World

More information

Server-side Log Data Analytics for I/O Workload Characterization and Coordination on Large Shared Storage Systems

Server-side Log Data Analytics for I/O Workload Characterization and Coordination on Large Shared Storage Systems Server-side Log Data Analytics for I/O Workload Characterization and Coordination on Large Shared Storage Systems Yang Liu, Raghul Gunasekaran, Xiaosong Ma, and Sudharshan S. Vazhkudai North Carolina State

More information

AID (Automatic I/O Diverter), a system that mines a small set of I/O-intensive jobs out of the server-side logs, including per-storage-server traffic

AID (Automatic I/O Diverter), a system that mines a small set of I/O-intensive jobs out of the server-side logs, including per-storage-server traffic ABSTRACT LIU, YANG. Server-side Log Data Analytics for I/O Workload Characterization and Coordination on Large Shared Storage Systems. (Under the direction of Xiaosong Ma and Frank Mueller.) Competing

More information

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Quinn Mitchell HPC UNIX/LINUX Storage Systems ORNL is managed by UT-Battelle for the US Department of Energy U.S. Department

More information

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid

More information

Introduction to HPC Parallel I/O

Introduction to HPC Parallel I/O Introduction to HPC Parallel I/O Feiyi Wang (Ph.D.) and Sarp Oral (Ph.D.) Technology Integration Group Oak Ridge Leadership Computing ORNL is managed by UT-Battelle for the US Department of Energy Outline

More information

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Preparing GPU-Accelerated Applications for the Summit Supercomputer Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership

More information

HPC Saudi Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences. Presented to: March 14, 2017

HPC Saudi Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences. Presented to: March 14, 2017 Creating an Exascale Ecosystem for Science Presented to: HPC Saudi 2017 Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences March 14, 2017 ORNL is managed by UT-Battelle

More information

Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures

Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures Min Li, Sudharshan S. Vazhkudai, Ali R. Butt, Fei Meng, Xiaosong Ma, Youngjae Kim,Christian Engelmann, and Galen Shipman

More information

The Titan Tools Experience

The Titan Tools Experience The Titan Tools Experience Michael J. Brim, Ph.D. Computer Science Research, CSMD/NCCS Petascale Tools Workshop 213 Madison, WI July 15, 213 Overview of Titan Cray XK7 18,688+ compute nodes 16-core AMD

More information

Do You Know What Your I/O Is Doing? (and how to fix it?) William Gropp

Do You Know What Your I/O Is Doing? (and how to fix it?) William Gropp Do You Know What Your I/O Is Doing? (and how to fix it?) William Gropp www.cs.illinois.edu/~wgropp Messages Current I/O performance is often appallingly poor Even relative to what current systems can achieve

More information

MDHIM: A Parallel Key/Value Store Framework for HPC

MDHIM: A Parallel Key/Value Store Framework for HPC MDHIM: A Parallel Key/Value Store Framework for HPC Hugh Greenberg 7/6/2015 LA-UR-15-25039 HPC Clusters Managed by a job scheduler (e.g., Slurm, Moab) Designed for running user jobs Difficult to run system

More information

A More Realistic Way of Stressing the End-to-end I/O System

A More Realistic Way of Stressing the End-to-end I/O System A More Realistic Way of Stressing the End-to-end I/O System Verónica G. Vergara Larrea Sarp Oral Dustin Leverman Hai Ah Nam Feiyi Wang James Simmons CUG 2015 April 29, 2015 Chicago, IL ORNL is managed

More information

ABySS Performance Benchmark and Profiling. May 2010

ABySS Performance Benchmark and Profiling. May 2010 ABySS Performance Benchmark and Profiling May 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC

More information

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar CRAY XK6 REDEFINING SUPERCOMPUTING - Sanjana Rakhecha - Nishad Nerurkar CONTENTS Introduction History Specifications Cray XK6 Architecture Performance Industry acceptance and applications Summary INTRODUCTION

More information

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS

More information

CHARACTERIZING HPC I/O: FROM APPLICATIONS TO SYSTEMS

CHARACTERIZING HPC I/O: FROM APPLICATIONS TO SYSTEMS erhtjhtyhy CHARACTERIZING HPC I/O: FROM APPLICATIONS TO SYSTEMS PHIL CARNS carns@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory April 20, 2017 TU Dresden MOTIVATION FOR

More information

Oak Ridge National Laboratory Computing and Computational Sciences

Oak Ridge National Laboratory Computing and Computational Sciences Oak Ridge National Laboratory Computing and Computational Sciences OFA Update by ORNL Presented by: Pavel Shamis (Pasha) OFA Workshop Mar 17, 2015 Acknowledgments Bernholdt David E. Hill Jason J. Leverman

More information

A Large-Scale Study of Soft- Errors on GPUs in the Field

A Large-Scale Study of Soft- Errors on GPUs in the Field A Large-Scale Study of Soft- Errors on GPUs in the Field Bin Nie*, Devesh Tiwari +, Saurabh Gupta +, Evgenia Smirni*, and James H. Rogers + *College of William and Mary + Oak Ridge National Laboratory

More information

The Spider Center-Wide File System

The Spider Center-Wide File System The Spider Center-Wide File System Presented by Feiyi Wang (Ph.D.) Technology Integration Group National Center of Computational Sciences Galen Shipman (Group Lead) Dave Dillow, Sarp Oral, James Simmons,

More information

AMBER 11 Performance Benchmark and Profiling. July 2011

AMBER 11 Performance Benchmark and Profiling. July 2011 AMBER 11 Performance Benchmark and Profiling July 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource -

More information

OLCF's next- genera0on Spider file system

OLCF's next- genera0on Spider file system OLCF's next- genera0on Spider file system Sarp Oral, PhD File and Storage Systems Team Lead Technology Integra0on Group Oak Ridge Leadership Compu0ng Facility Oak Ridge Na0onal Laboratory April 18, 2013

More information

Steve Scott, Tesla CTO SC 11 November 15, 2011

Steve Scott, Tesla CTO SC 11 November 15, 2011 Steve Scott, Tesla CTO SC 11 November 15, 2011 What goal do these products have in common? Performance / W Exaflop Expectations First Exaflop Computer K Computer ~10 MW CM5 ~200 KW Not constant size, cost

More information

Present and Future Leadership Computers at OLCF

Present and Future Leadership Computers at OLCF Present and Future Leadership Computers at OLCF Al Geist ORNL Corporate Fellow DOE Data/Viz PI Meeting January 13-15, 2015 Walnut Creek, CA ORNL is managed by UT-Battelle for the US Department of Energy

More information

Overlapping Computation and Communication for Advection on Hybrid Parallel Computers

Overlapping Computation and Communication for Advection on Hybrid Parallel Computers Overlapping Computation and Communication for Advection on Hybrid Parallel Computers James B White III (Trey) trey@ucar.edu National Center for Atmospheric Research Jack Dongarra dongarra@eecs.utk.edu

More information

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE Hitoshi Sato *1, Shuichi Ihara *2, Satoshi Matsuoka *1 *1 Tokyo Institute

More information

Pedraforca: a First ARM + GPU Cluster for HPC

Pedraforca: a First ARM + GPU Cluster for HPC www.bsc.es Pedraforca: a First ARM + GPU Cluster for HPC Nikola Puzovic, Alex Ramirez We ve hit the power wall ALL computers are limited by power consumption Energy-efficient approaches Multi-core Fujitsu

More information

Exploring Emerging Technologies in the Extreme Scale HPC Co- Design Space with Aspen

Exploring Emerging Technologies in the Extreme Scale HPC Co- Design Space with Aspen Exploring Emerging Technologies in the Extreme Scale HPC Co- Design Space with Aspen Jeffrey S. Vetter SPPEXA Symposium Munich 26 Jan 2016 ORNL is managed by UT-Battelle for the US Department of Energy

More information

Analyzing the High Performance Parallel I/O on LRZ HPC systems. Sandra Méndez. HPC Group, LRZ. June 23, 2016

Analyzing the High Performance Parallel I/O on LRZ HPC systems. Sandra Méndez. HPC Group, LRZ. June 23, 2016 Analyzing the High Performance Parallel I/O on LRZ HPC systems Sandra Méndez. HPC Group, LRZ. June 23, 2016 Outline SuperMUC supercomputer User Projects Monitoring Tool I/O Software Stack I/O Analysis

More information

Efficient Object Storage Journaling in a Distributed Parallel File System

Efficient Object Storage Journaling in a Distributed Parallel File System Efficient Object Storage Journaling in a Distributed Parallel File System Presented by Sarp Oral Sarp Oral, Feiyi Wang, David Dillow, Galen Shipman, Ross Miller, and Oleg Drokin FAST 10, Feb 25, 2010 A

More information

MVAPICH2 vs. OpenMPI for a Clustering Algorithm

MVAPICH2 vs. OpenMPI for a Clustering Algorithm MVAPICH2 vs. OpenMPI for a Clustering Algorithm Robin V. Blasberg and Matthias K. Gobbert Naval Research Laboratory, Washington, D.C. Department of Mathematics and Statistics, University of Maryland, Baltimore

More information

Optimizing LS-DYNA Productivity in Cluster Environments

Optimizing LS-DYNA Productivity in Cluster Environments 10 th International LS-DYNA Users Conference Computing Technology Optimizing LS-DYNA Productivity in Cluster Environments Gilad Shainer and Swati Kher Mellanox Technologies Abstract Increasing demand for

More information

Paper Summary Problem Uses Problems Predictions/Trends. General Purpose GPU. Aurojit Panda

Paper Summary Problem Uses Problems Predictions/Trends. General Purpose GPU. Aurojit Panda s Aurojit Panda apanda@cs.berkeley.edu Summary s SIMD helps increase performance while using less power For some tasks (not everything can use data parallelism). Can use less power since DLP allows use

More information

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011 DVS, GPFS and External Lustre at NERSC How It s Working on Hopper Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011 1 NERSC is the Primary Computing Center for DOE Office of Science NERSC serves

More information

What is Parallel Computing?

What is Parallel Computing? What is Parallel Computing? Parallel Computing is several processing elements working simultaneously to solve a problem faster. 1/33 What is Parallel Computing? Parallel Computing is several processing

More information

Illinois Proposal Considerations Greg Bauer

Illinois Proposal Considerations Greg Bauer - 2016 Greg Bauer Support model Blue Waters provides traditional Partner Consulting as part of its User Services. Standard service requests for assistance with porting, debugging, allocation issues, and

More information

Exploring Use-cases for Non-Volatile Memories in support of HPC Resilience

Exploring Use-cases for Non-Volatile Memories in support of HPC Resilience Exploring Use-cases for Non-Volatile Memories in support of HPC Resilience Onkar Patil 1, Saurabh Hukerikar 2, Frank Mueller 1, Christian Engelmann 2 1 Dept. of Computer Science, North Carolina State University

More information

Overview. CS 472 Concurrent & Parallel Programming University of Evansville

Overview. CS 472 Concurrent & Parallel Programming University of Evansville Overview CS 472 Concurrent & Parallel Programming University of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information Science, University

More information

Parallel Performance Studies for a Clustering Algorithm

Parallel Performance Studies for a Clustering Algorithm Parallel Performance Studies for a Clustering Algorithm Robin V. Blasberg and Matthias K. Gobbert Naval Research Laboratory, Washington, D.C. Department of Mathematics and Statistics, University of Maryland,

More information

Oak Ridge Leadership Computing Facility: Summit and Beyond

Oak Ridge Leadership Computing Facility: Summit and Beyond Oak Ridge Leadership Computing Facility: Summit and Beyond Justin L. Whitt OLCF-4 Deputy Project Director, Oak Ridge Leadership Computing Facility Oak Ridge National Laboratory March 2017 ORNL is managed

More information

Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, Dennis Abts Sr. Principal Engineer

Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, Dennis Abts Sr. Principal Engineer Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, 2006 Sr. Principal Engineer Panel Questions How do we build scalable networks that balance power, reliability and performance

More information

ZEST Snapshot Service. A Highly Parallel Production File System by the PSC Advanced Systems Group Pittsburgh Supercomputing Center 1

ZEST Snapshot Service. A Highly Parallel Production File System by the PSC Advanced Systems Group Pittsburgh Supercomputing Center 1 ZEST Snapshot Service A Highly Parallel Production File System by the PSC Advanced Systems Group Pittsburgh Supercomputing Center 1 Design Motivation To optimize science utilization of the machine Maximize

More information

Ian Foster, CS554: Data-Intensive Computing

Ian Foster, CS554: Data-Intensive Computing The advent of computation can be compared, in terms of the breadth and depth of its impact on research and scholarship, to the invention of writing and the development of modern mathematics. Ian Foster,

More information

An Overview of Fujitsu s Lustre Based File System

An Overview of Fujitsu s Lustre Based File System An Overview of Fujitsu s Lustre Based File System Shinji Sumimoto Fujitsu Limited Apr.12 2011 For Maximizing CPU Utilization by Minimizing File IO Overhead Outline Target System Overview Goals of Fujitsu

More information

CP2K Performance Benchmark and Profiling. April 2011

CP2K Performance Benchmark and Profiling. April 2011 CP2K Performance Benchmark and Profiling April 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC

More information

An Introduction to the SPEC High Performance Group and their Benchmark Suites

An Introduction to the SPEC High Performance Group and their Benchmark Suites An Introduction to the SPEC High Performance Group and their Benchmark Suites Robert Henschel Manager, Scientific Applications and Performance Tuning Secretary, SPEC High Performance Group Research Technologies

More information

MICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE

MICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE MICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE LEVERAGE OUR EXPERTISE sales@microway.com http://microway.com/tesla NUMBERSMASHER TESLA 4-GPU SERVER/WORKSTATION Flexible form factor 4 PCI-E GPUs + 3 additional

More information

I/O at the Center for Information Services and High Performance Computing

I/O at the Center for Information Services and High Performance Computing Mich ael Kluge, ZIH I/O at the Center for Information Services and High Performance Computing HPC-I/O in the Data Center Workshop @ ISC 2015 Zellescher Weg 12 Willers-Bau A 208 Tel. +49 351-463 34217 Michael

More information

MM5 Modeling System Performance Research and Profiling. March 2009

MM5 Modeling System Performance Research and Profiling. March 2009 MM5 Modeling System Performance Research and Profiling March 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center

More information

Opportunities for container environments on Cray XC30 with GPU devices

Opportunities for container environments on Cray XC30 with GPU devices Opportunities for container environments on Cray XC30 with GPU devices Cray User Group 2016, London Sadaf Alam, Lucas Benedicic, T. Schulthess, Miguel Gila May 12, 2016 Agenda Motivation Container technologies,

More information

Treelogy: A Benchmark Suite for Tree Traversals

Treelogy: A Benchmark Suite for Tree Traversals Purdue University Programming Languages Group Treelogy: A Benchmark Suite for Tree Traversals Nikhil Hegde, Jianqiao Liu, Kirshanthan Sundararajah, and Milind Kulkarni School of Electrical and Computer

More information

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC

More information

Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions

Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Ziming Zhong Vladimir Rychkov Alexey Lastovetsky Heterogeneous Computing

More information

On the Role of Burst Buffers in Leadership- Class Storage Systems

On the Role of Burst Buffers in Leadership- Class Storage Systems On the Role of Burst Buffers in Leadership- Class Storage Systems Ning Liu, Jason Cope, Philip Carns, Christopher Carothers, Robert Ross, Gary Grider, Adam Crume, Carlos Maltzahn Contact: liun2@cs.rpi.edu,

More information

The Optimal CPU and Interconnect for an HPC Cluster

The Optimal CPU and Interconnect for an HPC Cluster 5. LS-DYNA Anwenderforum, Ulm 2006 Cluster / High Performance Computing I The Optimal CPU and Interconnect for an HPC Cluster Andreas Koch Transtec AG, Tübingen, Deutschland F - I - 15 Cluster / High Performance

More information

Leveraging Burst Buffer Coordination to Prevent I/O Interference

Leveraging Burst Buffer Coordination to Prevent I/O Interference Leveraging Burst Buffer Coordination to Prevent I/O Interference Anthony Kougkas akougkas@hawk.iit.edu Matthieu Dorier, Rob Latham, Rob Ross, Xian-He Sun Wednesday, October 26th Baltimore, USA Outline

More information

Opportunities of the rcuda remote GPU virtualization middleware. Federico Silla Universitat Politècnica de València Spain

Opportunities of the rcuda remote GPU virtualization middleware. Federico Silla Universitat Politècnica de València Spain Opportunities of the rcuda remote virtualization middleware Federico Silla Universitat Politècnica de València Spain st Outline What is rcuda? HPC Advisory Council China Conference 2017 2/45 s are the

More information

IBM CORAL HPC System Solution

IBM CORAL HPC System Solution IBM CORAL HPC System Solution HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy

More information

GPU > CPU. FOR HIGH PERFORMANCE COMPUTING PRESENTATION BY - SADIQ PASHA CHETHANA DILIP

GPU > CPU. FOR HIGH PERFORMANCE COMPUTING PRESENTATION BY - SADIQ PASHA CHETHANA DILIP GPU > CPU. FOR HIGH PERFORMANCE COMPUTING PRESENTATION BY - SADIQ PASHA CHETHANA DILIP INTRODUCTION or With the exponential increase in computational power of todays hardware, the complexity of the problem

More information

HYCOM Performance Benchmark and Profiling

HYCOM Performance Benchmark and Profiling HYCOM Performance Benchmark and Profiling Jan 2011 Acknowledgment: - The DoD High Performance Computing Modernization Program Note The following research was performed under the HPC Advisory Council activities

More information

AN INTRODUCTION TO CLUSTER COMPUTING

AN INTRODUCTION TO CLUSTER COMPUTING CLUSTERS AND YOU AN INTRODUCTION TO CLUSTER COMPUTING Engineering IT BrownBag Series 29 October, 2015 Gianni Pezzarossi Linux Systems Administrator Mark Smylie Hart Research Technology Facilitator WHAT

More information

Burst Buffers Simulation in Dragonfly Network

Burst Buffers Simulation in Dragonfly Network Burst Buffers Simulation in Dragonfly Network Jian Peng Department of Computer Science Illinois Institute of Technology Chicago, IL, USA jpeng10@hawk.iit.edu Michael Lang Los Alamos National Laboratory

More information

EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University

EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University Material from: The Datacenter as a Computer: An Introduction to

More information

The Cray CX1 puts massive power and flexibility right where you need it in your workgroup

The Cray CX1 puts massive power and flexibility right where you need it in your workgroup The Cray CX1 puts massive power and flexibility right where you need it in your workgroup Up to 96 cores of Intel 5600 compute power 3D visualization Up to 32TB of storage GPU acceleration Small footprint

More information

Utilizing Unused Resources To Improve Checkpoint Performance

Utilizing Unused Resources To Improve Checkpoint Performance Utilizing Unused Resources To Improve Checkpoint Performance Ross Miller Oak Ridge Leadership Computing Facility Oak Ridge National Laboratory Oak Ridge, Tennessee Email: rgmiller@ornl.gov Scott Atchley

More information

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Nikola Rajovic, Paul M. Carpenter, Isaac Gelado, Nikola Puzovic, Alex Ramirez, Mateo Valero SC 13, November 19 th 2013, Denver, CO, USA

More information

I/O Router Placement and Fine-Grained Routing on Titan to Support Spider II

I/O Router Placement and Fine-Grained Routing on Titan to Support Spider II I/O Router Placement and Fine-Grained Routing on Titan to Support Spider II Matt Ezell, Sarp Oral, Feiyi Wang, Devesh Tiwari, Don Maxwell, Dustin Leverman, and Jason Hill Oak Ridge National Laboratory;

More information

Future Trends in Hardware and Software for use in Simulation

Future Trends in Hardware and Software for use in Simulation Future Trends in Hardware and Software for use in Simulation Steve Feldman VP/IT, CD-adapco April, 2009 HighPerformanceComputing Building Blocks CPU I/O Interconnect Software General CPU Maximum clock

More information

INTEGRATING HPFS IN A CLOUD COMPUTING ENVIRONMENT

INTEGRATING HPFS IN A CLOUD COMPUTING ENVIRONMENT INTEGRATING HPFS IN A CLOUD COMPUTING ENVIRONMENT Abhisek Pan 2, J.P. Walters 1, Vijay S. Pai 1,2, David Kang 1, Stephen P. Crago 1 1 University of Southern California/Information Sciences Institute 2

More information

CS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics

CS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics CS4230 Parallel Programming Lecture 3: Introduction to Parallel Architectures Mary Hall August 28, 2012 Homework 1: Parallel Programming Basics Due before class, Thursday, August 30 Turn in electronically

More information

Improved Solutions for I/O Provisioning and Application Acceleration

Improved Solutions for I/O Provisioning and Application Acceleration 1 Improved Solutions for I/O Provisioning and Application Acceleration August 11, 2015 Jeff Sisilli Sr. Director Product Marketing jsisilli@ddn.com 2 Why Burst Buffer? The Supercomputing Tug-of-War A supercomputer

More information

University at Buffalo Center for Computational Research

University at Buffalo Center for Computational Research University at Buffalo Center for Computational Research The following is a short and long description of CCR Facilities for use in proposals, reports, and presentations. If desired, a letter of support

More information

LAMMPSCUDA GPU Performance. April 2011

LAMMPSCUDA GPU Performance. April 2011 LAMMPSCUDA GPU Performance April 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Dell, Intel, Mellanox Compute resource - HPC Advisory Council

More information

CUDA Experiences: Over-Optimization and Future HPC

CUDA Experiences: Over-Optimization and Future HPC CUDA Experiences: Over-Optimization and Future HPC Carl Pearson 1, Simon Garcia De Gonzalo 2 Ph.D. candidates, Electrical and Computer Engineering 1 / Computer Science 2, University of Illinois Urbana-Champaign

More information

rcuda: hybrid CPU-GPU clusters Federico Silla Technical University of Valencia Spain

rcuda: hybrid CPU-GPU clusters Federico Silla Technical University of Valencia Spain rcuda: hybrid - clusters Federico Silla Technical University of Valencia Spain Outline 1. Hybrid - clusters 2. Concerns with hybrid clusters 3. One possible solution: virtualize s! 4. rcuda what s that?

More information

High-Performance Key-Value Store on OpenSHMEM

High-Performance Key-Value Store on OpenSHMEM High-Performance Key-Value Store on OpenSHMEM Huansong Fu*, Manjunath Gorentla Venkata, Ahana Roy Choudhury*, Neena Imam, Weikuan Yu* *Florida State University Oak Ridge National Laboratory Outline Background

More information

Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy

Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy François Tessier, Venkatram Vishwanath Argonne National Laboratory, USA July 19,

More information

Habanero Operating Committee. January

Habanero Operating Committee. January Habanero Operating Committee January 25 2017 Habanero Overview 1. Execute Nodes 2. Head Nodes 3. Storage 4. Network Execute Nodes Type Quantity Standard 176 High Memory 32 GPU* 14 Total 222 Execute Nodes

More information

Fra superdatamaskiner til grafikkprosessorer og

Fra superdatamaskiner til grafikkprosessorer og Fra superdatamaskiner til grafikkprosessorer og Brødtekst maskinlæring Prof. Anne C. Elster IDI HPC/Lab Parallel Computing: Personal perspective 1980 s: Concurrent and Parallel Pascal 1986: Intel ipsc

More information

John Levesque Nov 16, 2001

John Levesque Nov 16, 2001 1 We see that the GPU is the best device available for us today to be able to get to the performance we want and meet our users requirements for a very high performance node with very high memory bandwidth.

More information

AI for HPC and HPC for AI Workflows: The Differences, Gaps and Opportunities with Data Management

AI for HPC and HPC for AI Workflows: The Differences, Gaps and Opportunities with Data Management AI for HPC and HPC for AI Workflows: The Differences, Gaps and Opportunities with Data Management @SC Asia 2018 Rangan Sukumar, PhD Office of the CTO, Cray Inc. Safe Harbor Statement This presentation

More information

The Use of Cloud Computing Resources in an HPC Environment

The Use of Cloud Computing Resources in an HPC Environment The Use of Cloud Computing Resources in an HPC Environment Bill, Labate, UCLA Office of Information Technology Prakashan Korambath, UCLA Institute for Digital Research & Education Cloud computing becomes

More information

Scalable, Automated Characterization of Parallel Application Communication Behavior

Scalable, Automated Characterization of Parallel Application Communication Behavior Scalable, Automated Characterization of Parallel Application Communication Behavior Philip C. Roth Computer Science and Mathematics Division Oak Ridge National Laboratory 12 th Scalable Tools Workshop

More information

CS500 SMARTER CLUSTER SUPERCOMPUTERS

CS500 SMARTER CLUSTER SUPERCOMPUTERS CS500 SMARTER CLUSTER SUPERCOMPUTERS OVERVIEW Extending the boundaries of what you can achieve takes reliable computing tools matched to your workloads. That s why we tailor the Cray CS500 cluster supercomputer

More information

The Constellation Project. Andrew W. Nash 14 November 2016

The Constellation Project. Andrew W. Nash 14 November 2016 The Constellation Project Andrew W. Nash 14 November 2016 The Constellation Project: Representing a High Performance File System as a Graph for Analysis The Titan supercomputer utilizes high performance

More information

System Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering.

System Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering. System Design of Kepler Based HPC Solutions Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering. Introduction The System Level View K20 GPU is a powerful parallel processor! K20 has

More information

NVIDIA Update and Directions on GPU Acceleration for Earth System Models

NVIDIA Update and Directions on GPU Acceleration for Earth System Models NVIDIA Update and Directions on GPU Acceleration for Earth System Models Stan Posey, HPC Program Manager, ESM and CFD, NVIDIA, Santa Clara, CA, USA Carl Ponder, PhD, Applications Software Engineer, NVIDIA,

More information

How то Use HPC Resources Efficiently by a Message Oriented Framework.

How то Use HPC Resources Efficiently by a Message Oriented Framework. How то Use HPC Resources Efficiently by a Message Oriented Framework www.hp-see.eu E. Atanassov, T. Gurov, A. Karaivanova Institute of Information and Communication Technologies Bulgarian Academy of Science

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Computing Technology LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton

More information

Co-designing an Energy Efficient System

Co-designing an Energy Efficient System Co-designing an Energy Efficient System Luigi Brochard Distinguished Engineer, HPC&AI Lenovo lbrochard@lenovo.com MaX International Conference 2018 Trieste 29.01.2018 Industry Thermal Challenges NVIDIA

More information

A ClusterStor update. Torben Kling Petersen, PhD. Principal Architect, HPC

A ClusterStor update. Torben Kling Petersen, PhD. Principal Architect, HPC A ClusterStor update Torben Kling Petersen, PhD Principal Architect, HPC Sonexion (ClusterStor) STILL the fastest file system on the planet!!!! Total system throughput in excess on 1.1 TB/s!! 2 Software

More information

GROMACS Performance Benchmark and Profiling. September 2012

GROMACS Performance Benchmark and Profiling. September 2012 GROMACS Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource

More information

HPC Architectures. Types of resource currently in use

HPC Architectures. Types of resource currently in use HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent

More information

Computational Challenges and Opportunities for Nuclear Astrophysics

Computational Challenges and Opportunities for Nuclear Astrophysics Computational Challenges and Opportunities for Nuclear Astrophysics Bronson Messer Acting Group Leader Scientific Computing Group National Center for Computational Sciences Theoretical Astrophysics Group

More information

Using DDN IME for Harmonie

Using DDN IME for Harmonie Irish Centre for High-End Computing Using DDN IME for Harmonie Gilles Civario, Marco Grossi, Alastair McKinstry, Ruairi Short, Nix McDonnell April 2016 DDN IME: Infinite Memory Engine IME: Major Features

More information

The Fusion Distributed File System

The Fusion Distributed File System Slide 1 / 44 The Fusion Distributed File System Dongfang Zhao February 2015 Slide 2 / 44 Outline Introduction FusionFS System Architecture Metadata Management Data Movement Implementation Details Unique

More information

DELIVERABLE D5.5 Report on ICARUS visualization cluster installation. John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS)

DELIVERABLE D5.5 Report on ICARUS visualization cluster installation. John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS) DELIVERABLE D5.5 Report on ICARUS visualization cluster installation John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS) 02 May 2011 NextMuSE 2 Next generation Multi-mechanics Simulation Environment Cluster

More information

Pervasive DataRush TM

Pervasive DataRush TM Pervasive DataRush TM Parallel Data Analysis with KNIME www.pervasivedatarush.com Company Overview Global Software Company Tens of thousands of users across the globe Americas, EMEA, Asia ~230 employees

More information

Portable SHMEMCache: A High-Performance Key-Value Store on OpenSHMEM and MPI

Portable SHMEMCache: A High-Performance Key-Value Store on OpenSHMEM and MPI Portable SHMEMCache: A High-Performance Key-Value Store on OpenSHMEM and MPI Huansong Fu*, Manjunath Gorentla Venkata, Neena Imam, Weikuan Yu* *Florida State University Oak Ridge National Laboratory Outline

More information