Lessons from Post-processing Climate Data on Modern Flash-based HPC Systems
|
|
- Silas Malone
- 5 years ago
- Views:
Transcription
1 Lessons from Post-processing Climate Data on Modern Flash-based HPC Systems Adnan Haider 1, Sheri Mickelson 2, John Dennis 2 1 Illinois Institute of Technology, USA; 2 National Center of Atmospheric Research, USA
2 Post-processing Climate Data Software Post-processing software analyze climate data Important goal of post-processing software: Allow scientists to do more science in less time Two post-processing software PyAverager: Computes averages PyReshaper: Converts input to different file layout 2
3 I/O Workload Characteristics Percentage of Runtime spent doing I/O 100% 80% 60% 40% 20% 0% PyReshaper Ice Land Atmosphere Atmosphere S.E Dataset Ocean Average I/O Request Size (MB) % I/O Time Average I/O Request Size I/O bound! Varying I/O workloads Can flash based HPC systems reduce execution time? 3
4 What is Flash? Faster hardware which accelerates I/O Flash in HPC systems 4
5 What is Flash? Faster hardware which accelerates I/O Flash in HPC systems 5
6 Two Flash System Architectures Flash Devices Gordon: SSD Wrangler: Difference in Flash Device Local Flash Design Gordon Pooled Flash Design Wrangler Compute Nodes Compute Nodes ` ` ` ` ` ` ` ` RDMA via Infiniband PCI Express Interface Parallel File System/ Object Store Mem Flash Based IO Node SSD SSD SSD SSD 6
7 Two Flash System Architectures Flash Devices Gordon: SSD Wrangler: Storage Architecture Gordon: Local Wrangler: Pooled Difference in Storage Architecture Local Flash Design Gordon Compute Nodes Compute Nodes ` ` ` ` ` ` ` ` RDMA via Infiniband Mem Flash Based IO Node SSD SSD SSD SSD Pooled Flash Design Wrangler PCI Express Interface Parallel File System/ Object Store 7
8 Two Flash System Architectures Flash Devices Gordon: SSD Wrangler: Storage Architecture Gordon: Local Wrangler: Pooled Interconnect Gordon: Infiniband Wrangler: PCI Express Difference in Interconnect Local Flash Design Gordon Compute Nodes Compute Nodes ` ` ` ` ` ` ` ` RDMA via Infiniband Mem Flash Based IO Node SSD SSD SSD SSD Pooled Flash Design Wrangler PCI Express Interface Parallel File System/ Object Store 8
9 Two Flash System Architectures Flash Devices Gordon: SSD Wrangler: Storage Architecture Gordon: Local Wrangler: Pooled Interconnect Gordon: Infiniband Wrangler: PCI Express Yellowstone has disks Local Flash Design Gordon Compute Nodes Compute Nodes ` ` ` ` ` ` ` ` RDMA via Infiniband Mem Flash Based IO Node SSD SSD SSD SSD Pooled Flash Design Wrangler PCI Express Interface Parallel File System/ Object Store 9
10 PyReshaper on 1 Compute Node Ocean(large) Yellowstone (Read & Write HDD) Gordon Read HDD Write SSD Read SSD Write HDD Read & Write SSD* Read & Write HDD Wrangler Read HDD Write Read Write HDD Read & Write Read & Write HDD Single SSD runs out of capacity! Metadata Time Read Time Write Time Seconds 10
11 PyReshaper on 1 Compute Node Ocean(large) Yellowstone (Read & Write HDD) Gordon Read HDD Write SSD Read SSD Write HDD Read & Write SSD* Read & Write HDD Wrangler Read HDD Write Read Write HDD Read & Write Read & Write HDD Reading from SSD increases runtime by 75% Metadata Time Read Time Write Time Seconds 11
12 PyReshaper on 1 Compute Node Ocean(large) Yellowstone (Read & Write HDD) Gordon Read HDD Write SSD Read SSD Write HDD Read & Write SSD* Read & Write HDD Wrangler Read HDD Write Read Write HDD Read & Write Read & Write HDD 3.6x reduction in execution time compared to Yellowstone Metadata Time Read Time Write Time Seconds 12
13 PyReshaper on 1 Compute Node Ice(small) Yellowstone (Read & Write HDD) Gordon Read HDD Write SSD Read SSD Write HDD Read & Write SSD Read & Write HDD Wrangler Read HDD Write Read Write HDD Read & Write Read & Write HDD SSDs decrease runtime by 47 % Seconds 13 Metadata Time Read Time Write Time
14 PyReshaper on 1 Compute Node Ice(small) Yellowstone (Read & Write HDD) Gordon Read HDD Write SSD Read SSD Write HDD Read & Write SSD Read & Write HDD Wrangler Read HDD Write Read Write HDD Read & Write Read & Write HDD Hybrid I/O decreases runtime by 6x Seconds 14 Metadata Time Read Time Write Time
15 PyReshaper on 1 Compute Node Ice(small) Yellowstone (Read & Write HDD) Gordon Read HDD Write SSD Read SSD Write HDD Read & Write SSD Read & Write HDD Wrangler Read HDD Write Read Write HDD Read & Write Read & Write HDD 11x reduction in execution time compared to Yellowstone Seconds 15 Metadata Time Read Time Write Time
16 Lesson 1 Incorrect matching between storage architecture and I/O workload can hide the benefits of flash devices by increasing runtime by 4x. Single SSD & Interconnect Local Flash Design: Gordon Compute Nodes ` ` ` ` RDMA via Infiniband Mem Flash Based IO Node SSD SSD SSD SSD 16
17 Lesson 2 Local flash architecture is more common Number of flash devices per compute node should increase Seconds Performance on Gordon with 16 Processes Ice 1/16 2/8 4/4 8/2 16/1 # of Compute Nodes (SSDS) / # of Processes per Node Atmosphere - Optimal number of SSDs Land Atmosphere S.E. 17
18 Lesson 3 Hybrid I/O (reading and writing to difference device types) decreases flash storage consumption by half while decreasing runtime by 6x. 18
19 Conclusion Pooled architecture performs better than local architecture but if the local architecture alleviates bottlenecks it can be a more feasible solution. 19
20 Conclusion Pooled architecture performs better than local architecture but if the local architecture alleviates bottlenecks it can be a more feasible solution. Moving from Yellowstone s HDD to Wrangler s HDD provided up to 3.6x reduction in execution time 20
21 Conclusion Pooled architecture performs better than local architecture but if the local architecture alleviates bottlenecks it can be a more feasible solution. Moving from Yellowstone s HDD to Wrangler s HDD provided up to 3.6x reduction in execution time Moving from Yellowstone s HDD to Wrangler s flash provided 11x reduction in execution time. 21
22 Conclusion Pooled architecture performs better than local architecture but if the local architecture alleviates bottlenecks it can be a more feasible solution. Moving from Yellowstone s HDD to Wrangler s HDD provided up to 3.6x reduction in execution time Moving from Yellowstone s HDD to Wrangler s flash provided up to a 11x reduction in execution time. With data amount surmounting, consideration must be placed on a cost-effective I/O architecture. 22
23 Acknowledgements Sheri Mickelson and John Dennis Kevin Paul & the ASAP group 23
24 Flash Based Systems in Future Comet Trinity Gordon Wrangler Aurora 24
25 Evolution of Flash Systems 2012 text Gordon-std 2013 text Catalyst 2015 text Local Flash Architecture Flash Devices (SSD) on remote nodes Pooled Flash Aggregates 16 flash devices at job config Pooled Flash devices as flash All-to-all connection Comet Burst Buffer 750 TB of flash and 750 GB/s bandwidth Gordon-vsmp Local Flash 800 GB of flash on compute node via PCI Ex. text text Wrangler Local Flash 320 GB of flash on each compute node text Cori Trinity Burst Buffer Burst Buffer Xeon processor based burst buffer nodes text Aurora
26 Gordon Performance Analysis Throughput of SSD / Throughput of HDD Scalability Workload /4 2/8 4/16 8/32 # of Processes / Amount of Data Written (GB) / I/O Request Size (KB) 26
27 Gordon Performance Analysis Throughput of SSD / Throughput of HDD Scalability Workload /4 2/8 4/16 8/32 # of Processes / Amount of Data Written (GB) / I/O Request Size (KB) 27
28 Wrangler Performance Analysis Throughput of SSD / Throughput of HDD Consistent /4 2/8 4/16 8/32 # of Processes / Amount of Data Written (GB) 2 16/ I/O Request Size (KB) 28
29 Performance Comparison Speedup Provided by Flash Ice Land ATM ATM S.E. Dataset Gordon Wrangler 29
30 Speedup Atmosphere Atmosphere S.E. Ocean Gordon Best Time over Wrangler HDD Time Wrangler HDD Time over Wrangler Flash Time 30
CESM Workflow Refactor Project Land Model and Biogeochemistry Working Groups 2015 Winter Meeting CSEG & ASAP/CISL
CESM Workflow Refactor Project Land Model and Biogeochemistry Working Groups 2015 Winter Meeting Alice Bertini Sheri Mickelson CSEG & ASAP/CISL CESM Workflow Refactor Project Who s involved? Joint project
More informationStore Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete
Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete 1 DDN Who We Are 2 We Design, Deploy and Optimize Storage Systems Which Solve HPC, Big Data and Cloud Business
More informationCESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011
CESM (Community Earth System Model) Performance Benchmark and Profiling August 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,
More informationAdvances of parallel computing. Kirill Bogachev May 2016
Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being
More informationImproved Solutions for I/O Provisioning and Application Acceleration
1 Improved Solutions for I/O Provisioning and Application Acceleration August 11, 2015 Jeff Sisilli Sr. Director Product Marketing jsisilli@ddn.com 2 Why Burst Buffer? The Supercomputing Tug-of-War A supercomputer
More informationIsilon Performance. Name
1 Isilon Performance Name 2 Agenda Architecture Overview Next Generation Hardware Performance Caching Performance Streaming Reads Performance Tuning OneFS Architecture Overview Copyright 2014 EMC Corporation.
More informationSNAP Performance Benchmark and Profiling. April 2014
SNAP Performance Benchmark and Profiling April 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: HP, Mellanox For more information on the supporting
More informationA ClusterStor update. Torben Kling Petersen, PhD. Principal Architect, HPC
A ClusterStor update Torben Kling Petersen, PhD Principal Architect, HPC Sonexion (ClusterStor) STILL the fastest file system on the planet!!!! Total system throughput in excess on 1.1 TB/s!! 2 Software
More informationUsing DDN IME for Harmonie
Irish Centre for High-End Computing Using DDN IME for Harmonie Gilles Civario, Marco Grossi, Alastair McKinstry, Ruairi Short, Nix McDonnell April 2016 DDN IME: Infinite Memory Engine IME: Major Features
More informationIntel Xeon Phi архитектура, модели программирования, оптимизация.
Нижний Новгород, 2017 Intel Xeon Phi архитектура, модели программирования, оптимизация. Дмитрий Прохоров, Дмитрий Рябцев, Intel Agenda What and Why Intel Xeon Phi Top 500 insights, roadmap, architecture
More informationAzor: Using Two-level Block Selection to Improve SSD-based I/O caches
Azor: Using Two-level Block Selection to Improve SSD-based I/O caches Yannis Klonatos, Thanos Makatos, Manolis Marazakis, Michail D. Flouris, Angelos Bilas {klonatos, makatos, maraz, flouris, bilas}@ics.forth.gr
More informationHigh Performance File System and I/O Middleware Design for Big Data on HPC Clusters
High Performance File System and I/O Middleware Design for Big Data on HPC Clusters by Nusrat Sharmin Islam Advisor: Dhabaleswar K. (DK) Panda Department of Computer Science and Engineering The Ohio State
More informationThe RAMDISK Storage Accelerator
The RAMDISK Storage Accelerator A Method of Accelerating I/O Performance on HPC Systems Using RAMDISKs Tim Wickberg, Christopher D. Carothers wickbt@rpi.edu, chrisc@cs.rpi.edu Rensselaer Polytechnic Institute
More informationBuilding NVLink for Developers
Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized
More informationGROMACS Performance Benchmark and Profiling. August 2011
GROMACS Performance Benchmark and Profiling August 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource
More information朱义普. Resolving High Performance Computing and Big Data Application Bottlenecks with Application-Defined Flash Acceleration. Director, North Asia, HPC
October 28, 2013 Resolving High Performance Computing and Big Data Application Bottlenecks with Application-Defined Flash Acceleration 朱义普 Director, North Asia, HPC DDN Storage Vendor for HPC & Big Data
More informationENERGY-EFFICIENT VISUALIZATION PIPELINES A CASE STUDY IN CLIMATE SIMULATION
ENERGY-EFFICIENT VISUALIZATION PIPELINES A CASE STUDY IN CLIMATE SIMULATION Vignesh Adhinarayanan Ph.D. (CS) Student Synergy Lab, Virginia Tech INTRODUCTION Supercomputers are constrained by power Power
More informationInfiniBand Networked Flash Storage
InfiniBand Networked Flash Storage Superior Performance, Efficiency and Scalability Motti Beck Director Enterprise Market Development, Mellanox Technologies Flash Memory Summit 2016 Santa Clara, CA 1 17PB
More informationHADP Talk BlueDBM: An appliance for Big Data Analytics
HADP Talk BlueDBM: An appliance for Big Data Analytics Sang-Woo Jun* Ming Liu* Sungjin Lee* Jamey Hicks+ John Ankcorn+ Myron King+ Shuotao Xu* Arvind* *MIT Computer Science and Artificial Intelligence
More informationNext-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads
Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads Liran Zvibel CEO, Co-founder WekaIO @liranzvibel 1 WekaIO Matrix: Full-featured and Flexible Public or Private S3 Compatible
More informationIBM System Storage DS8870 Release R7.3 Performance Update
IBM System Storage DS8870 Release R7.3 Performance Update Enterprise Storage Performance Yan Xu Agenda Summary of DS8870 Hardware Changes I/O Performance of High Performance Flash Enclosure (HPFE) Easy
More informationIME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning
IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning September 22 nd 2015 Tommaso Cecchi 2 What is IME? This breakthrough, software defined storage application
More informationFunctional Partitioning to Optimize End-to-End Performance on Many-core Architectures
Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures Min Li, Sudharshan S. Vazhkudai, Ali R. Butt, Fei Meng, Xiaosong Ma, Youngjae Kim,Christian Engelmann, and Galen Shipman
More informationDell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance
Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance This Dell EMC technical white paper discusses performance benchmarking results and analysis for Simulia
More informationHIGH-PERFORMANCE STORAGE FOR DISCOVERY THAT SOARS
HIGH-PERFORMANCE STORAGE FOR DISCOVERY THAT SOARS OVERVIEW When storage demands and budget constraints collide, discovery suffers. And it s a growing problem. Driven by ever-increasing performance and
More informationlibhio: Optimizing IO on Cray XC Systems With DataWarp
libhio: Optimizing IO on Cray XC Systems With DataWarp May 9, 2017 Nathan Hjelm Cray Users Group May 9, 2017 Los Alamos National Laboratory LA-UR-17-23841 5/8/2017 1 Outline Background HIO Design Functionality
More informationDesign and Architecture of Dell Acceleration Appliances for Database (DAAD): A Practical Approach with High Availability Guaranteed
Design and Architecture of Dell Acceleration Appliances for Database (DAAD): A Practical Approach with High Availability Guaranteed Kai Yu, Yuxiang Gao Dell Global Solutions Engineering Group Peng Zhang,
More informationAPI and Usage of libhio on XC-40 Systems
API and Usage of libhio on XC-40 Systems May 24, 2018 Nathan Hjelm Cray Users Group May 24, 2018 Los Alamos National Laboratory LA-UR-18-24513 5/24/2018 1 Outline Background HIO Design HIO API HIO Configuration
More informationEnhancing Checkpoint Performance with Staging IO & SSD
Enhancing Checkpoint Performance with Staging IO & SSD Xiangyong Ouyang Sonya Marcarelli Dhabaleswar K. Panda Department of Computer Science & Engineering The Ohio State University Outline Motivation and
More informationA Distributed Hash Table for Shared Memory
A Distributed Hash Table for Shared Memory Wytse Oortwijn Formal Methods and Tools, University of Twente August 31, 2015 Wytse Oortwijn (Formal Methods and Tools, AUniversity Distributed of Twente) Hash
More informationBeeGFS. Parallel Cluster File System. Container Workshop ISC July Marco Merkel VP ww Sales, Consulting
BeeGFS The Parallel Cluster File System Container Workshop ISC 28.7.18 www.beegfs.io July 2018 Marco Merkel VP ww Sales, Consulting HPC & Cognitive Workloads Demand Today Flash Storage HDD Storage Shingled
More informationPerformance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference
The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee
More informationOn the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows Rafael Ferreira da Silva, Scott Callaghan, Ewa Deelman 12 th Workflows in Support of Large-Scale Science (WORKS) SuperComputing
More informationHybrid Storage Architecture Marries Performance and Efficiency
Hybrid Storage Architecture Marries Performance and Efficiency Bill Mottram VP of Marketing at Atrato Au to nom ic 2010 Storage [aw-tuh-nom-ik Developer Conference. ] Insert Your Company Name. All Rights
More informationLeveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands
Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands Unleash Your Data Center s Hidden Power September 16, 2014 Molly Rector CMO, EVP Product Management & WW Marketing
More informationThe Oracle Database Appliance I/O and Performance Architecture
Simple Reliable Affordable The Oracle Database Appliance I/O and Performance Architecture Tammy Bednar, Sr. Principal Product Manager, ODA 1 Copyright 2012, Oracle and/or its affiliates. All rights reserved.
More informationGPUs and Emerging Architectures
GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs
More informationApplication Performance on IME
Application Performance on IME Toine Beckers, DDN Marco Grossi, ICHEC Burst Buffer Designs Introduce fast buffer layer Layer between memory and persistent storage Pre-stage application data Buffer writes
More informationChunkStash: Speeding Up Storage Deduplication using Flash Memory
ChunkStash: Speeding Up Storage Deduplication using Flash Memory Biplob Debnath +, Sudipta Sengupta *, Jin Li * * Microsoft Research, Redmond (USA) + Univ. of Minnesota, Twin Cities (USA) Deduplication
More informationArchitecting For Availability, Performance & Networking With ScaleIO
Architecting For Availability, Performance & Networking With ScaleIO Performance is a set of bottlenecks Performance related components:, Operating Systems Network Drives Performance features: Caching
More informationBig Data Meets HPC: Exploiting HPC Technologies for Accelerating Big Data Processing and Management
Big Data Meets HPC: Exploiting HPC Technologies for Accelerating Big Data Processing and Management SigHPC BigData BoF (SC 17) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu
More informationLustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE
Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE Hitoshi Sato *1, Shuichi Ihara *2, Satoshi Matsuoka *1 *1 Tokyo Institute
More informationHarmonia: An Interference-Aware Dynamic I/O Scheduler for Shared Non-Volatile Burst Buffers
I/O Harmonia Harmonia: An Interference-Aware Dynamic I/O Scheduler for Shared Non-Volatile Burst Buffers Cluster 18 Belfast, UK September 12 th, 2018 Anthony Kougkas, Hariharan Devarajan, Xian-He Sun,
More informationLS-DYNA Performance Benchmark and Profiling. October 2017
LS-DYNA Performance Benchmark and Profiling October 2017 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: LSTC, Huawei, Mellanox Compute resource
More informationAutomatic Identification of Application I/O Signatures from Noisy Server-Side Traces. Yang Liu Raghul Gunasekaran Xiaosong Ma Sudharshan S.
Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces Yang Liu Raghul Gunasekaran Xiaosong Ma Sudharshan S. Vazhkudai Instance of Large-Scale HPC Systems ORNL s TITAN (World
More informationMELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구
MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 Leading Supplier of End-to-End Interconnect Solutions Analyze Enabling the Use of Data Store ICs Comprehensive End-to-End InfiniBand and Ethernet Portfolio
More informationIntroduction to High Performance Parallel I/O
Introduction to High Performance Parallel I/O Richard Gerber Deputy Group Lead NERSC User Services August 30, 2013-1- Some slides from Katie Antypas I/O Needs Getting Bigger All the Time I/O needs growing
More informationNetApp: Solving I/O Challenges. Jeff Baxter February 2013
NetApp: Solving I/O Challenges Jeff Baxter February 2013 1 High Performance Computing Challenges Computing Centers Challenge of New Science Performance Efficiency directly impacts achievable science Power
More informationFlashed-Optimized VPSA. Always Aligned with your Changing World
Flashed-Optimized VPSA Always Aligned with your Changing World Yair Hershko Co-founder, VP Engineering, Zadara Storage 3 Modern Data Storage for Modern Computing Innovating data services to meet modern
More informationSpark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies
Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay 1 Apache Spark - Intro Spark within the Big Data ecosystem Data Sources Data Acquisition / ETL Data Storage Data Analysis / ML Serving 3 Apache
More informationNCAR Workload Analysis on Yellowstone. March 2015 V5.0
NCAR Workload Analysis on Yellowstone March 2015 V5.0 Purpose and Scope of the Analysis Understanding the NCAR application workload is a critical part of making efficient use of Yellowstone and in scoping
More informationLS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Computing Technology LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton
More informationHighly Scalable, Non-RDMA NVMe Fabric. Bob Hansen,, VP System Architecture
A Cost Effective,, High g Performance,, Highly Scalable, Non-RDMA NVMe Fabric Bob Hansen,, VP System Architecture bob@apeirondata.com Storage Developers Conference, September 2015 Agenda 3 rd Platform
More informationIBM Emulex 16Gb Fibre Channel HBA Evaluation
IBM Emulex 16Gb Fibre Channel HBA Evaluation Evaluation report prepared under contract with Emulex Executive Summary The computing industry is experiencing an increasing demand for storage performance
More informationThe rcuda middleware and applications
The rcuda middleware and applications Will my application work with rcuda? rcuda currently provides binary compatibility with CUDA 5.0, virtualizing the entire Runtime API except for the graphics functions,
More informationScheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications
Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications Sep 2009 Gilad Shainer, Tong Liu (Mellanox); Jeffrey Layton (Dell); Joshua Mora (AMD) High Performance Interconnects for
More informationHPC Storage Use Cases & Future Trends
Oct, 2014 HPC Storage Use Cases & Future Trends Massively-Scalable Platforms and Solutions Engineered for the Big Data and Cloud Era Atul Vidwansa Email: atul@ DDN About Us DDN is a Leader in Massively
More informationOncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries
Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Jeffrey Young, Alex Merritt, Se Hoon Shon Advisor: Sudhakar Yalamanchili 4/16/13 Sponsors: Intel, NVIDIA, NSF 2 The Problem Big
More informationBlueDBM: An Appliance for Big Data Analytics*
BlueDBM: An Appliance for Big Data Analytics* Arvind *[ISCA, 2015] Sang-Woo Jun, Ming Liu, Sungjin Lee, Shuotao Xu, Arvind (MIT) and Jamey Hicks, John Ankcorn, Myron King(Quanta) BigData@CSAIL Annual Meeting
More informationDDN s Vision for the Future of Lustre LUG2015 Robert Triendl
DDN s Vision for the Future of Lustre LUG2015 Robert Triendl 3 Topics 1. The Changing Markets for Lustre 2. A Vision for Lustre that isn t Exascale 3. Building Lustre for the Future 4. Peak vs. Operational
More informationSpeeding Up Cloud/Server Applications Using Flash Memory
Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta and Jin Li Microsoft Research, Redmond, WA, USA Contains work that is joint with Biplob Debnath (Univ. of Minnesota) Flash Memory
More informationMeltdown and Spectre Interconnect Performance Evaluation Jan Mellanox Technologies
Meltdown and Spectre Interconnect Evaluation Jan 2018 1 Meltdown and Spectre - Background Most modern processors perform speculative execution This speculation can be measured, disclosing information about
More informationInterconnect Your Future
Interconnect Your Future Gilad Shainer 2nd Annual MVAPICH User Group (MUG) Meeting, August 2014 Complete High-Performance Scalable Interconnect Infrastructure Comprehensive End-to-End Software Accelerators
More informationThe Benefits of Solid State in Enterprise Storage Systems. David Dale, NetApp
The Benefits of Solid State in Enterprise Storage Systems David Dale, NetApp SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies
More informationMeet the Walkers! Accelerating Index Traversals for In-Memory Databases"
Meet the Walkers! Accelerating Index Traversals for In-Memory Databases Onur Kocberber Boris Grot, Javier Picorel, Babak Falsafi, Kevin Lim, Parthasarathy Ranganathan Our World is Data-Driven! Data resides
More informationIntel Xeon Phi архитектура, модели программирования, оптимизация.
Нижний Новгород, 2016 Intel Xeon Phi архитектура, модели программирования, оптимизация. Дмитрий Прохоров, Intel Agenda What and Why Intel Xeon Phi Top 500 insights, roadmap, architecture How Programming
More informationThe PowerEdge M830 blade server
The PowerEdge M830 blade server No-compromise compute and memory scalability for data centers and remote or branch offices Now you can boost application performance, consolidation and time-to-value in
More informationNEC Express5800 A2040b 22TB Data Warehouse Fast Track. Reference Architecture with SW mirrored HGST FlashMAX III
NEC Express5800 A2040b 22TB Data Warehouse Fast Track Reference Architecture with SW mirrored HGST FlashMAX III Based on Microsoft SQL Server 2014 Data Warehouse Fast Track (DWFT) Reference Architecture
More informationParallel File Systems. John White Lawrence Berkeley National Lab
Parallel File Systems John White Lawrence Berkeley National Lab Topics Defining a File System Our Specific Case for File Systems Parallel File Systems A Survey of Current Parallel File Systems Implementation
More informationDeploying remote GPU virtualization with rcuda. Federico Silla Technical University of Valencia Spain
Deploying remote virtualization with rcuda Federico Silla Technical University of Valencia Spain st Outline What is remote virtualization? HPC ADMINTECH 2016 2/53 It deals with s, obviously! HPC ADMINTECH
More informationA New NSF TeraGrid Resource for Data-Intensive Science
A New NSF TeraGrid Resource for Data-Intensive Science Michael L. Norman Principal Investigator Director, SDSC Allan Snavely Co-Principal Investigator Project Scientist Slide 1 Coping with the data deluge
More informationJoin Processing for Flash SSDs: Remembering Past Lessons
Join Processing for Flash SSDs: Remembering Past Lessons Jaeyoung Do, Jignesh M. Patel Department of Computer Sciences University of Wisconsin-Madison $/MB GB Flash Solid State Drives (SSDs) Benefits of
More informationNVM Express over Fabrics Storage Solutions for Real-time Analytics
NVM Express over Fabrics Storage Solutions for Real-time Analytics Presented by Paul Prince, CTO Santa Clara, CA 1 NVMe Over Fabrics NVMf Why do we need NVMf? What is it? How does it fit in the Market?
More informationThe Future of Interconnect Technology
The Future of Interconnect Technology Michael Kagan, CTO HPC Advisory Council Stanford, 2014 Exponential Data Growth Best Interconnect Required 44X 0.8 Zetabyte 2009 35 Zetabyte 2020 2014 Mellanox Technologies
More informationIntelligent Hybrid Flash Management
Intelligent Hybrid Flash Management Jérôme Gaysse Senior Technology&Market Analyst jerome.gaysse@silinnov-consulting.com Flash Memory Summit 2018 Santa Clara, CA 1 Research context Analysis of system &
More informationTuning I/O Performance for Data Intensive Computing. Nicholas J. Wright. lbl.gov
Tuning I/O Performance for Data Intensive Computing. Nicholas J. Wright njwright @ lbl.gov NERSC- National Energy Research Scientific Computing Center Mission: Accelerate the pace of scientific discovery
More informationAccelerating Parallel Analysis of Scientific Simulation Data via Zazen
Accelerating Parallel Analysis of Scientific Simulation Data via Zazen Tiankai Tu, Charles A. Rendleman, Patrick J. Miller, Federico Sacerdoti, Ron O. Dror, and David E. Shaw D. E. Shaw Research Motivation
More informationSTAR-CCM+ Performance Benchmark and Profiling. July 2014
STAR-CCM+ Performance Benchmark and Profiling July 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: CD-adapco, Intel, Dell, Mellanox Compute
More informationS4D-Cache: Smart Selective SSD Cache for Parallel I/O Systems
S4D-Cache: Smart Selective SSD Cache for Parallel I/O Systems Shuibing He, Xian-He Sun, Bo Feng Department of Computer Science Illinois Institute of Technology Speed Gap Between CPU and Hard Drive http://www.velobit.com/storage-performance-blog/bid/114532/living-with-the-2012-hdd-shortage
More informationNew HPE 3PAR StoreServ 8000 and series Optimized for Flash
New HPE 3PAR StoreServ 8000 and 20000 series Optimized for Flash AGENDA HPE 3PAR StoreServ architecture fundamentals HPE 3PAR Flash optimizations HPE 3PAR portfolio overview HPE 3PAR Flash example from
More informationPerformance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA
Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to
More informationIBM FlashSystem. IBM FLiP Tool Wie viel schneller kann Ihr IBM i Power Server mit IBM FlashSystem 900 / V9000 Storage sein?
FlashSystem Family 2015 IBM FlashSystem IBM FLiP Tool Wie viel schneller kann Ihr IBM i Power Server mit IBM FlashSystem 900 / V9000 Storage sein? PiRT - Power i Round Table 17 Sep. 2015 Daniel Gysin IBM
More informationIn-Network Computing. Sebastian Kalcher, Senior System Engineer HPC. May 2017
In-Network Computing Sebastian Kalcher, Senior System Engineer HPC May 2017 Exponential Data Growth The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric (Offload) Must Wait
More informationNAMD Performance Benchmark and Profiling. February 2012
NAMD Performance Benchmark and Profiling February 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource -
More informationDeploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c
White Paper Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c What You Will Learn This document demonstrates the benefits
More informationIntroduction to Parallel Programming. Tuesday, April 17, 12
Introduction to Parallel Programming 1 Overview Parallel programming allows the user to use multiple cpus concurrently Reasons for parallel execution: shorten execution time by spreading the computational
More informationNVMe Takes It All, SCSI Has To Fall. Brave New Storage World. Lugano April Alexander Ruebensaal
Lugano April 2018 NVMe Takes It All, SCSI Has To Fall freely adapted from ABBA Brave New Storage World Alexander Ruebensaal 1 Design, Implementation, Support & Operating of optimized IT Infrastructures
More informationICON Performance Benchmark and Profiling. March 2012
ICON Performance Benchmark and Profiling March 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource - HPC
More informationMemory-Based Cloud Architectures
Memory-Based Cloud Architectures ( Or: Technical Challenges for OnDemand Business Software) Jan Schaffner Enterprise Platform and Integration Concepts Group Example: Enterprise Benchmarking -) *%'+,#$)
More informationUCS Invicta: A New Generation of Storage Performance. Mazen Abou Najm DC Consulting Systems Engineer
UCS Invicta: A New Generation of Storage Performance Mazen Abou Najm DC Consulting Systems Engineer HDDs Aren t Designed For High Performance Disk 101 Can t spin faster (200 IOPS/Drive) Can t seek faster
More informationMICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE
MICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE LEVERAGE OUR EXPERTISE sales@microway.com http://microway.com/tesla NUMBERSMASHER TESLA 4-GPU SERVER/WORKSTATION Flexible form factor 4 PCI-E GPUs + 3 additional
More informationHPC Solution. Technology for a New Era in Computing
HPC Solution Technology for a New Era in Computing TEL IN HPC & Storage.. 20 years of changing with Technology Complete Solution Integrators for Select Verticals Mechanical Design & Engineering High Performance
More informationIN11E: Architecture and Integration Testbed for Earth/Space Science Cyberinfrastructures
IN11E: Architecture and Integration Testbed for Earth/Space Science Cyberinfrastructures A Future Accelerated Cognitive Distributed Hybrid Testbed for Big Data Science Analytics Milton Halem 1, John Edward
More informationBuilding a High IOPS Flash Array: A Software-Defined Approach
Building a High IOPS Flash Array: A Software-Defined Approach Weafon Tsao Ph.D. VP of R&D Division, AccelStor, Inc. Santa Clara, CA Clarification Myth 1: S High-IOPS SSDs = High-IOPS All-Flash Array SSDs
More informationLeveraging Flash in Scalable Environments: A Systems Perspective on How FLASH Storage is Displacing Disk Storage
Leveraging Flash in Scalable Environments: A Systems Perspective on How FLASH Storage is Displacing Disk Storage Roark Hilomen, Engineering Fellow Systems & Software Solutions May 3, 2016 Forward-Looking
More informationExperiences with HP SFS / Lustre in HPC Production
Experiences with HP SFS / Lustre in HPC Production Computing Centre (SSCK) University of Karlsruhe Laifer@rz.uni-karlsruhe.de page 1 Outline» What is HP StorageWorks Scalable File Share (HP SFS)? A Lustre
More informationData storage services at KEK/CRC -- status and plan
Data storage services at KEK/CRC -- status and plan KEK/CRC Hiroyuki Matsunaga Most of the slides are prepared by Koichi Murakami and Go Iwai KEKCC System Overview KEKCC (Central Computing System) The
More informationDDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage
DDN DDN Updates Data DirectNeworks Japan, Inc Shuichi Ihara DDN A Broad Range of Technologies to Best Address Your Needs Protection Security Data Distribution and Lifecycle Management Open Monitoring Your
More informationBig Data in HPC. John Shalf Lawrence Berkeley National Laboratory
Big Data in HPC John Shalf Lawrence Berkeley National Laboratory 1 Evolving Role of Supercomputing Centers Traditional Pillars of science Theory: mathematical models of nature Experiment: empirical data
More informationRobert Gottstein, Ilia Petrov, Guillermo G. Almeida, Todor Ivanov, Alex Buchmann
Using Flash SSDs as Pi Primary Database Storage Robert Gottstein, Ilia Petrov, Guillermo G. Almeida, Todor Ivanov, Alex Buchmann {lastname}@dvs.tu-darmstadt.de Fachgebiet DVS Ilia Petrov 1 Flash SSDs,
More information