Early Evaluation of the "Infinite Memory Engine" Burst Buffer Solution

Size: px
Start display at page:

Download "Early Evaluation of the "Infinite Memory Engine" Burst Buffer Solution"

Transcription

1 Early Evaluation of the "Infinite Memory Engine" Burst Buffer Solution Wolfram Schenck Faculty of Engineering and Mathematics, Bielefeld University of Applied Sciences, Bielefeld, Germany Salem El Sayed, Maciej Foszczynski, Wilhelm Homberg, Dirk Pleiter Jülich Supercomputing Centre, Forschungszentrum Jülich, Jülich, Germany WOPSSS 2016 Frankfurt,

2 Outline Conclusions and Outlook Introduction: The Burst Buffer Concept Data Retention Time Analysis Test System NEST Benchmarks General Benchmarks (IOR) Slide 2

3 Introduction: The Burst Buffer Concept Slide 3

4 Need for New Storage Architectures Address growing performance gap Floating-point performance B fp grows faster than I/O bandwidth B io, i.e. B io /B fp becomes smaller For JUQUEEN we have B io /B fp = 1 Byte / 40,000 Flops Mitigation strategy: Hierarchical storage architecture Fast but low capacity storage tier Large capacity but slow storage tier Emerging data-intensive applications Need for large storage capacity C io, and high bandwidth B io, and high IOPs rates Slide 4

5 Application Classes Dominant read Applications processing data retrieved by experiments or collected by observatories Applications analyzing data from huge databases ("big data") Dominant write Applications from the area of simulation science, generating large amounts of data Transient write/read Applications (or sets of applications) producing and consuming significant amounts of data on the same system Transient data: Long-term storage often not necessary Cluster Main Storage System Slide 5

6 Conventional Storage System Cluster Arrow direction: Dominant write Main Storage System Time step spent with I/O 10 time steps Time step spent with non- I/O operations t Slide 6

7 Enhanced by Burst Buffer Scenario: Sustained Performance Cluster Main Storage System I/O burst 10 time steps t Full simulation cycle Cluster Burst Buffer Main Storage System 6 time steps t SPEEDUP = 10/6 = 1.67 Slide 7

8 Enhanced by Burst Buffer Scenario: Short-Term Peak Performance Cluster Main Storage System I/O burst Full simulation cycle Cluster Burst Buffer t 18 time steps Main Storage System 6 time steps t SPEEDUP = 18/6 = 3.0 Slide 8

9 Burst Buffer Concept Capacities: Conventional main storage: Large Burst buffer: Small Bandwidth: Between cluster and burst buffer: High Between burst buffer and main storage: Low Speedup obtained via burst buffer depends theoretically on (for dominant write): I/O pattern of application: Continuous vs. in bursts I/O intensity of application: Low vs. high Runtime of application: Long vs. short Increasing speedup Slide 9

10 Infinite Memory Engine (by DDN) Realisation of storage hierarchy Upper tier = IME Very small C io / B io 10 min Leverage NVM technologies External storage Very large C io / B io O(1 day) Leverage HDD technologies Benefits High bandwidth + IOPs rate Compatibility and support of any POSIX compliant parallel file system Challenges Re-organisation of I/O may be required to leverage performance Compute servers IME External storage Slide 10

11 Using IME MPI I/O interface Use of namespace of parallel file system (PFS) Prefix controls where created file is allocated, e.g. ime://gpfs/data/pleiter/file.dat Software-controlled sync from IME to PFS POSIX interface IME storage devices mounted using FUSE Use of namespace of parallel file system (PFS), but: Special mountpoint for IME (use path via this mountpoint for direct access to IME) Choice of path allows to control use of IME or PFS Software-controlled sync from IME to PFS Slide 11

12 Benchmarking Central goal of our study: Benchmarking with real-world system to check if IME fulfills theoretical expectations Benchmarks: General performance: IOR [LLNL, 2003] Benchmarking tool for testing performance of parallel filesystems using various interfaces and access patterns Computational science software from the dominant write class: NEST Slide 12

13 Test System Slide 13

14 JUlich Dedicated GPU Environment (JUDGE) (decommissioned end of 2015) JUDGE: For our tests: Up to 64 compute nodes from JUDGE Scientific Linux 6.7 Pre-release version of IME software stack (Dec. 2015) Figure: JSC Slide 14

15 Test System Schematic overview of the integration of the IME servers at JSC: (64 Gbit/s) (10 Gbit/s) JUST (32 Gbit/s) (64 Gbit/s) (20 Gbit/s) Bandwidth to IME: 128 Gbit/s = 16 GByte/s IME = IME Server 24 SSDs with 200 GiB each (overall ca. 4.7 TiB) 2 IB host adapters (QDR) Bandwidth to GPFS: 20 Gbit/s = 2.5 GByte/s Slide 15

16 General Benchmarks (IOR) IOR Settings Slide 16

17 IOR Read Performance Bandwidth saturation reached with 4 nodes (GPFS) or 8 nodes (IME) Max. GPFS read bandwidth: 0.63 GByte/s (25% of nominal value) Max. IME read bandwidth: 13.8 GByte/s (86% of nominal value) Slide 17

18 IOR Write Performance Bandwidth saturation reached with 4 nodes (GPFS) or 8 nodes (IME) Max. GPFS write bandwidth: 0.75 GByte/s (33% of nominal value) Max. IME write bandwidth: GByte/s (98% of nominal value) Slide 18

19 NEST Benchmarks Slide 19

20 The Human Brain Project HBP: Future & Emerging Technologies flagship project (co-)funded by European Commission Science-driven, seeded from FET, extending beyond ICT Ambitious, unifying goal, large-scale Goal To build an integrated ICT infrastructure enabling a global collaborative effort towards understanding the human brain, and ultimately to emulate its computational capabilities Slide 20

21 Brain Simulation (1) Simulation software: NEST (NEural Simulation Tool) Open source: / Purpose: Large-scale simulations of biologically realistic neuronal networks (focus on large networks, use of simple point neurons) Dendriten Axon Soma Neuron Spike Slide 21

22 Brain Simulation (2) In the human brain: ca. 100 bn neurons ca. 10,000 incoming connections per neuron Largest simulation so far: Simulation with 1 bn neurons (feasibility study on the K computer in Japan) I/O challenge: Simulations can produce huge amounts of data Right fig.: E. Torre, INM-6, Forschungszentrum Jülich Slide 22

23 Parallel Processing in NEST (VP: Virtual Process) Number of Threads per Rank Number of MPI Ranks M VP0 VP1 VP2 N VP neurons N VP neurons N VP neurons VP3 VP4 VP5 N VP neurons N VP neurons N VP neurons T In the whole network: N neurons with N = M T N VP Slide 23

24 Simulation Cycle Communication interval Process-internal routing of spike events to their target neurons (incl. synapse update) Updating of neuronal states (incl. spike generation) Exchange of spike events between MPI processes Slide 24

25 Creating Spike Events during Neuron Update Number of Threads per Rank Number of MPI Ranks M VP0 VP1 VP2 N VP neurons N VP neurons N VP neurons VP3 VP4 VP5 N VP neurons N VP neurons N VP neurons T Red dot: Single spike event Slide 25

26 Simulation Cycle (revisited) Communication interval Process-internal routing of spike events to their target neurons (incl. synapse update) Updating of neuronal states (incl. spike generation) Exchange of spike events between MPI processes Slide 26

27 Number of Threads per Rank Creation of Rank-Local Spike Buffers Number of MPI Ranks M VP0 VP1 VP2 N VP neurons N VP neurons N VP neurons VP3 VP4 VP5 N VP neurons N VP neurons N VP neurons T Slide 27

28 MPI Communication: Every rank receives all spike events Number of Threads per Rank Number of MPI Ranks M VP0 VP1 VP2 N VP neurons N VP neurons N VP neurons VP3 VP4 VP5 N VP neurons N VP neurons N VP neurons T MPI Slide 28

29 Simulation Cycle (revisited) Communication interval Process-internal routing of spike events to their target neurons (incl. synapse update) Updating of neuronal states (incl. spike generation) Exchange of spike events between MPI processes Slide 29

30 I/O in NEST Data collected during simulations: Spike events Recording device: Spike detector State variables (e.g., membrane potential of neurons) Recording device: Multimeter Recording devices belong to abstract node class: Connected to neurons (from which measurements are collected) Receive spike events (spike detector) Send out measurement events (multimeter) Updated like neurons (writing data during update) Each recording device exists on every virtual process (VP), writes data via C++ output stream into text file (one file per device per VP) Slide 30

31 Simulation Script for Benchmark: Random Balanced Network One spike detector and one multimeter per population (created last after all neurons) Overall 4 recording devices (= C++ output streams) per VP Fig.: Nadine Daivandy (JSC) Slide 31

32 Simulation Cycle (revisited) Communication interval Process-internal routing of spike events to their target neurons (incl. synapse update) Update of recording devices I/O Updating of neuronal states (incl. spike generation) BURST Exchange of spike events between MPI processes Slide 32

33 Design of Experiment Factor 1: Number of compute nodes 1, 2, 4, 8, 16 Strict weak scaling design: Number of neurons per node constant Factor 2: Amount of written data per node; manipulated via number of state variables recorded by each multimeter 1 22 Corresponds to 1 GiB/node 8 GiB/node (amount of spike data insignificant) Factor 3: Output file system 1. POSIX I/O to GPFS 2. POSIX I/O to IME 3. POSIX I/O to /dev/null: Baseline condition, "infinitely fast storage device" Further experimental settings: Simulated biological time: 100 ms Network size: 258,750 neurons per compute node, ca. 3e8 synapses per compute node 23 MPI ranks per compute node 5 runs per task condition, minimum reported Slide 33

34 Bandwidth (1 GiB/node) Slide 34

35 Bandwidth (8 GiB/node) Slide 35

36 Bandwidth (1 and 8 GiB/node) POSIX2IME very close to POSIX2DEVNULL: IME close to "ideal" performance Very good scaling behavior of IME: Observed bandwidth nearly doubles with doubling of number of compute nodes Bad scaling behavior of GPFS beyond 4 compute nodes Observed bandwidth small compared to IOR measurements Slide 36

37 Simulation Cycle (revisited) Communication interval Process-internal routing of spike events to their target neurons (incl. synapse update) Update of recording devices I/O Updating of neuronal states (incl. spike generation) BURST Exchange of spike events between MPI processes Slide 37

38 Simulation Time (1 GiB/node) Effective simulation time = simulation time without step 3 (MPI synchr.) Slide 38

39 Simulation Times (8 GiB/node) Effective simulation time = simulation time without step 3 (MPI synchr.) Slide 39

40 Simulation Time: Observations The larger the number of nodes, the stronger the advantage of writing to IME or /dev/null Very good scaling behavior of IME clearly visible in plots GPFS setting suffers heavily from imbalance between ranks IME reaches nearly performance of /dev/null; barely any I/Oinduced additional imbalance between ranks Slide 40

41 Relative Runtime Reduction Reported values based on average over all measured I/O loads Slide 41

42 Data Retention Time Analysis Slide 42

43 Motivation: Interactive Supercomp. Data retention time analysis: Classification of data depending on how long it will be retained Interactive supercomputing/hpc: User can interact with the application(s) that run on the supercomputer/cluster Misc. use cases for NEST Slide 43

44 NEST: Data Retention Times Data retention time analysis: Classification of data depending on how long it will be retained Slide 44

45 Conclusions and Outlook Slide 45

46 Conclusions IOR Results: IME saturated ca. 90% of nominal bandwidth in reading and writing Promising finding for all considered application classes NEST Results: Barely any I/O-induced imbalance between ranks with IME (in constrast to GPFS) IME performance close to baseline condition (/dev/null), nearly perfect weak scaling behavior At largest problem size: Nearly speedup of 4 achieved vs. GPFS Easy handling: No code changes in NEST required Conclusions: IME actually works as theoretically expected for applications from the dominant write class (writing in bursts) NEST users would strongly profit from the incorporation of IME in compute clusters (I/O no longer a limiting factor in gathering simulation results) Slide 46

47 Outlook and Recommendations Recommendations for the future development of IME: Data pre-fetching: For "dominant read" applications, data prefetching before job start would be highly beneficial Integration into job managers? Development of tools for managing short-term and transient data, integration into job managers Support for end-to-end data integrity like within GPFS Final word: IME shows: Working burst buffer solutions exist for complex parallel applications Opportunity to scale compute and I/O performance Alternatively: Opportunity to reduce bandwidth requirements for external storage system Slide 47

48 Questions? Thank you for your attention! Acknowledgements: We would like to thank DDN for making an IME test system available at Jülich Supercomputing Centre. In particular, we gracefully acknowledge the continuous support by Tommaso Cecchi and Toine Beckers. Slide 48

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning September 22 nd 2015 Tommaso Cecchi 2 What is IME? This breakthrough, software defined storage application

More information

Bridging the complexity gap: Tracing and Replaying I/O

Bridging the complexity gap: Tracing and Replaying I/O Bridging the complexity gap: Tracing and Replaying I/O UIOP 2017, Hamburg, Mar. 22nd Jean-Thomas Acquaviva, DDN Storage 2 Complexiy: E.g NSCC / A*STAR Remote Login Nodes at NUS MetroX 1PF Compute Cluster

More information

Application Performance on IME

Application Performance on IME Application Performance on IME Toine Beckers, DDN Marco Grossi, ICHEC Burst Buffer Designs Introduce fast buffer layer Layer between memory and persistent storage Pre-stage application data Buffer writes

More information

IME Infinite Memory Engine Technical Overview

IME Infinite Memory Engine Technical Overview 1 1 IME Infinite Memory Engine Technical Overview 2 Bandwidth, IOPs single NVMe drive 3 What does Flash mean for Storage? It's a new fundamental device for storing bits. We must treat it different from

More information

Improved Solutions for I/O Provisioning and Application Acceleration

Improved Solutions for I/O Provisioning and Application Acceleration 1 Improved Solutions for I/O Provisioning and Application Acceleration August 11, 2015 Jeff Sisilli Sr. Director Product Marketing jsisilli@ddn.com 2 Why Burst Buffer? The Supercomputing Tug-of-War A supercomputer

More information

NVIDIA Application Lab at Jülich

NVIDIA Application Lab at Jülich Mitglied der Helmholtz- Gemeinschaft NVIDIA Application Lab at Jülich Dirk Pleiter Jülich Supercomputing Centre (JSC) Forschungszentrum Jülich at a Glance (status 2010) Budget: 450 mio Euro Staff: 4,800

More information

DDN About Us Solving Large Enterprise and Web Scale Challenges

DDN About Us Solving Large Enterprise and Web Scale Challenges 1 DDN About Us Solving Large Enterprise and Web Scale Challenges History Founded in 98 World s Largest Private Storage Company Growing, Profitable, Self Funded Headquarters: Santa Clara and Chatsworth,

More information

Characterizing Parallel I/O Behaviour Based on Server-Side I/O Counters

Characterizing Parallel I/O Behaviour Based on Server-Side I/O Counters Characterizing Parallel I/O Behaviour Based on Server-Side I/O Counters SC16 - BoF Analyzing Parallel I/O SC16 BoF - Analyzing Parallel I/O, November 15, 2016 S. El Sayed JSC M. Bolten Kas D. Pleiter JSC

More information

Analyzing the High Performance Parallel I/O on LRZ HPC systems. Sandra Méndez. HPC Group, LRZ. June 23, 2016

Analyzing the High Performance Parallel I/O on LRZ HPC systems. Sandra Méndez. HPC Group, LRZ. June 23, 2016 Analyzing the High Performance Parallel I/O on LRZ HPC systems Sandra Méndez. HPC Group, LRZ. June 23, 2016 Outline SuperMUC supercomputer User Projects Monitoring Tool I/O Software Stack I/O Analysis

More information

A Breakthrough in Non-Volatile Memory Technology FUJITSU LIMITED

A Breakthrough in Non-Volatile Memory Technology FUJITSU LIMITED A Breakthrough in Non-Volatile Memory Technology & 0 2018 FUJITSU LIMITED IT needs to accelerate time-to-market Situation: End users and applications need instant access to data to progress faster and

More information

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete 1 DDN Who We Are 2 We Design, Deploy and Optimize Storage Systems Which Solve HPC, Big Data and Cloud Business

More information

HPC Storage Use Cases & Future Trends

HPC Storage Use Cases & Future Trends Oct, 2014 HPC Storage Use Cases & Future Trends Massively-Scalable Platforms and Solutions Engineered for the Big Data and Cloud Era Atul Vidwansa Email: atul@ DDN About Us DDN is a Leader in Massively

More information

Basics of Performance Engineering

Basics of Performance Engineering ERLANGEN REGIONAL COMPUTING CENTER Basics of Performance Engineering J. Treibig HiPerCH 3, 23./24.03.2015 Why hardware should not be exposed Such an approach is not portable Hardware issues frequently

More information

Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy

Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy François Tessier, Venkatram Vishwanath Argonne National Laboratory, USA July 19,

More information

I/O and Scheduling aspects in DEEP-EST

I/O and Scheduling aspects in DEEP-EST I/O and Scheduling aspects in DEEP-EST Norbert Eicker Jülich Supercomputing Centre & University of Wuppertal The research leading to these results has received funding from the European Community's Seventh

More information

Using DDN IME for Harmonie

Using DDN IME for Harmonie Irish Centre for High-End Computing Using DDN IME for Harmonie Gilles Civario, Marco Grossi, Alastair McKinstry, Ruairi Short, Nix McDonnell April 2016 DDN IME: Infinite Memory Engine IME: Major Features

More information

Porting Scientific Applications to OpenPOWER

Porting Scientific Applications to OpenPOWER Porting Scientific Applications to OpenPOWER Dirk Pleiter Forschungszentrum Jülich / JSC #OpenPOWERSummit Join the conversation at #OpenPOWERSummit 1 JSC s HPC Strategy IBM Power 6 JUMP, 9 TFlop/s Intel

More information

The Fusion Distributed File System

The Fusion Distributed File System Slide 1 / 44 The Fusion Distributed File System Dongfang Zhao February 2015 Slide 2 / 44 Outline Introduction FusionFS System Architecture Metadata Management Data Movement Implementation Details Unique

More information

JÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich

JÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich JÜLICH SUPERCOMPUTING CENTRE Site Introduction 09.04.2018 Michael Stephan JSC @ Forschungszentrum Jülich FORSCHUNGSZENTRUM JÜLICH Research Centre Jülich One of the 15 Helmholtz Research Centers in Germany

More information

Overview. Idea: Reduce CPU clock frequency This idea is well suited specifically for visualization

Overview. Idea: Reduce CPU clock frequency This idea is well suited specifically for visualization Exploring Tradeoffs Between Power and Performance for a Scientific Visualization Algorithm Stephanie Labasan & Matt Larsen (University of Oregon), Hank Childs (Lawrence Berkeley National Laboratory) 26

More information

High-Performance Data Loading and Augmentation for Deep Neural Network Training

High-Performance Data Loading and Augmentation for Deep Neural Network Training High-Performance Data Loading and Augmentation for Deep Neural Network Training Trevor Gale tgale@ece.neu.edu Steven Eliuk steven.eliuk@gmail.com Cameron Upright c.upright@samsung.com Roadmap 1. The General-Purpose

More information

Toward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies

Toward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies Toward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies François Tessier, Venkatram Vishwanath, Paul Gressier Argonne National Laboratory, USA Wednesday

More information

Revealing Applications Access Pattern in Collective I/O for Cache Management

Revealing Applications Access Pattern in Collective I/O for Cache Management Revealing Applications Access Pattern in for Yin Lu 1, Yong Chen 1, Rob Latham 2 and Yu Zhuang 1 Presented by Philip Roth 3 1 Department of Computer Science Texas Tech University 2 Mathematics and Computer

More information

I/O Monitoring at JSC, SIONlib & Resiliency

I/O Monitoring at JSC, SIONlib & Resiliency Mitglied der Helmholtz-Gemeinschaft I/O Monitoring at JSC, SIONlib & Resiliency Update: I/O Infrastructure @ JSC Update: Monitoring with LLview (I/O, Memory, Load) I/O Workloads on Jureca SIONlib: Task-Local

More information

API and Usage of libhio on XC-40 Systems

API and Usage of libhio on XC-40 Systems API and Usage of libhio on XC-40 Systems May 24, 2018 Nathan Hjelm Cray Users Group May 24, 2018 Los Alamos National Laboratory LA-UR-18-24513 5/24/2018 1 Outline Background HIO Design HIO API HIO Configuration

More information

The State and Needs of IO Performance Tools

The State and Needs of IO Performance Tools The State and Needs of IO Performance Tools Scalable Tools Workshop Lake Tahoe, CA August 6 12, 2017 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National

More information

Leveraging Burst Buffer Coordination to Prevent I/O Interference

Leveraging Burst Buffer Coordination to Prevent I/O Interference Leveraging Burst Buffer Coordination to Prevent I/O Interference Anthony Kougkas akougkas@hawk.iit.edu Matthieu Dorier, Rob Latham, Rob Ross, Xian-He Sun Wednesday, October 26th Baltimore, USA Outline

More information

Your cloud solution for EO Data access and processing

Your cloud solution for EO Data access and processing powered by Your cloud solution for EO Data access and processing Stanisław Dałek VP - CloudFerro 2 About CREODIAS The platform In 2017 European Space Agency, acting on behalf of the European Commission,

More information

Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures

Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures Min Li, Sudharshan S. Vazhkudai, Ali R. Butt, Fei Meng, Xiaosong Ma, Youngjae Kim,Christian Engelmann, and Galen Shipman

More information

libhio: Optimizing IO on Cray XC Systems With DataWarp

libhio: Optimizing IO on Cray XC Systems With DataWarp libhio: Optimizing IO on Cray XC Systems With DataWarp May 9, 2017 Nathan Hjelm Cray Users Group May 9, 2017 Los Alamos National Laboratory LA-UR-17-23841 5/8/2017 1 Outline Background HIO Design Functionality

More information

Techniques to improve the scalability of Checkpoint-Restart

Techniques to improve the scalability of Checkpoint-Restart Techniques to improve the scalability of Checkpoint-Restart Bogdan Nicolae Exascale Systems Group IBM Research Ireland 1 Outline A few words about the lab and team Challenges of Exascale A case for Checkpoint-Restart

More information

Practical Near-Data Processing for In-Memory Analytics Frameworks

Practical Near-Data Processing for In-Memory Analytics Frameworks Practical Near-Data Processing for In-Memory Analytics Frameworks Mingyu Gao, Grant Ayers, Christos Kozyrakis Stanford University http://mast.stanford.edu PACT Oct 19, 2015 Motivating Trends End of Dennard

More information

On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows

On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows Rafael Ferreira da Silva, Scott Callaghan, Ewa Deelman 12 th Workflows in Support of Large-Scale Science (WORKS) SuperComputing

More information

Infinite Memory Engine Freedom from Filesystem Foibles

Infinite Memory Engine Freedom from Filesystem Foibles 1 Infinite Memory Engine Freedom from Filesystem Foibles James Coomer 25 th Sept 2017 2 Bad stuff can happen to filesystems Malaligned High Concurrency Random Shared File COMPUTE NODES FILESYSTEM 3 And

More information

Trends in HPC (hardware complexity and software challenges)

Trends in HPC (hardware complexity and software challenges) Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18

More information

HPC Architectures. Types of resource currently in use

HPC Architectures. Types of resource currently in use HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA

DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA M. GAUS, G. R. JOUBERT, O. KAO, S. RIEDEL AND S. STAPEL Technical University of Clausthal, Department of Computer Science Julius-Albert-Str. 4, 38678

More information

GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations

GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations Argonne National Laboratory Argonne National Laboratory is located on 1,500

More information

Leveraging Flash in HPC Systems

Leveraging Flash in HPC Systems Leveraging Flash in HPC Systems IEEE MSST June 3, 2015 This work was performed under the auspices of the U.S. Department of Energy by under Contract DE-AC52-07NA27344. Lawrence Livermore National Security,

More information

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands Unleash Your Data Center s Hidden Power September 16, 2014 Molly Rector CMO, EVP Product Management & WW Marketing

More information

Enosis: Bridging the Semantic Gap between

Enosis: Bridging the Semantic Gap between Enosis: Bridging the Semantic Gap between File-based and Object-based Data Models Anthony Kougkas - akougkas@hawk.iit.edu, Hariharan Devarajan, Xian-He Sun Outline Introduction Background Approach Evaluation

More information

Analytics in the cloud

Analytics in the cloud Analytics in the cloud Dow we really need to reinvent the storage stack? R. Ananthanarayanan, Karan Gupta, Prashant Pandey, Himabindu Pucha, Prasenjit Sarkar, Mansi Shah, Renu Tewari Image courtesy NASA

More information

Percipient StorAGe for Exascale Data Centric Computing Computing for the Exascale

Percipient StorAGe for Exascale Data Centric Computing Computing for the Exascale Percipient StorAGe for Exascale Data Centric Computing Computing for the Exascale Shaun de Witt Culham Centre for Fusion Energy, UK 2 nd Technical Meeting on Fusion Data Processing, Validation and Analysis

More information

LIMITS OF ILP. B649 Parallel Architectures and Programming

LIMITS OF ILP. B649 Parallel Architectures and Programming LIMITS OF ILP B649 Parallel Architectures and Programming A Perfect Processor Register renaming infinite number of registers hence, avoids all WAW and WAR hazards Branch prediction perfect prediction Jump

More information

Systems Architectures towards Exascale

Systems Architectures towards Exascale Systems Architectures towards Exascale D. Pleiter German-Indian Workshop on HPC Architectures and Applications Pune 29 November 2016 Outline Introduction Exascale computing Technology trends Architectures

More information

The Cray Rainier System: Integrated Scalar/Vector Computing

The Cray Rainier System: Integrated Scalar/Vector Computing THE SUPERCOMPUTER COMPANY The Cray Rainier System: Integrated Scalar/Vector Computing Per Nyberg 11 th ECMWF Workshop on HPC in Meteorology Topics Current Product Overview Cray Technology Strengths Rainier

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system

More information

Challenges in HPC I/O

Challenges in HPC I/O Challenges in HPC I/O Universität Basel Julian M. Kunkel German Climate Computing Center / Universität Hamburg 10. October 2014 Outline 1 High-Performance Computing 2 Parallel File Systems and Challenges

More information

Extraordinary HPC file system solutions at KIT

Extraordinary HPC file system solutions at KIT Extraordinary HPC file system solutions at KIT Roland Laifer STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the State Roland of Baden-Württemberg Laifer Lustre and tools for ldiskfs investigation

More information

Introduction to High Performance Parallel I/O

Introduction to High Performance Parallel I/O Introduction to High Performance Parallel I/O Richard Gerber Deputy Group Lead NERSC User Services August 30, 2013-1- Some slides from Katie Antypas I/O Needs Getting Bigger All the Time I/O needs growing

More information

FhGFS - Performance at the maximum

FhGFS - Performance at the maximum FhGFS - Performance at the maximum http://www.fhgfs.com January 22, 2013 Contents 1. Introduction 2 2. Environment 2 3. Benchmark specifications and results 3 3.1. Multi-stream throughput................................

More information

I/O at JSC. I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O. Wolfgang Frings

I/O at JSC. I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O. Wolfgang Frings Mitglied der Helmholtz-Gemeinschaft I/O at JSC I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O Wolfgang Frings W.Frings@fz-juelich.de Jülich Supercomputing

More information

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads Liran Zvibel CEO, Co-founder WekaIO @liranzvibel 1 WekaIO Matrix: Full-featured and Flexible Public or Private S3 Compatible

More information

Improving I/O Bandwidth With Cray DVS Client-Side Caching

Improving I/O Bandwidth With Cray DVS Client-Side Caching Improving I/O Bandwidth With Cray DVS Client-Side Caching Bryce Hicks Cray Inc. Bloomington, MN USA bryceh@cray.com Abstract Cray s Data Virtualization Service, DVS, is an I/O forwarder providing access

More information

Design and Evaluation of a 2048 Core Cluster System

Design and Evaluation of a 2048 Core Cluster System Design and Evaluation of a 2048 Core Cluster System, Torsten Höfler, Torsten Mehlan and Wolfgang Rehm Computer Architecture Group Department of Computer Science Chemnitz University of Technology December

More information

Overview of Tianhe-2

Overview of Tianhe-2 Overview of Tianhe-2 (MilkyWay-2) Supercomputer Yutong Lu School of Computer Science, National University of Defense Technology; State Key Laboratory of High Performance Computing, China ytlu@nudt.edu.cn

More information

Motivation Goal Idea Proposition for users Study

Motivation Goal Idea Proposition for users Study Exploring Tradeoffs Between Power and Performance for a Scientific Visualization Algorithm Stephanie Labasan Computer and Information Science University of Oregon 23 November 2015 Overview Motivation:

More information

Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes

Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes A. Calotoiu 1, T. Hoefler 2, M. Poke 1, F. Wolf 1 1) German Research School for Simulation Sciences 2) ETH Zurich September

More information

Guidelines for Efficient Parallel I/O on the Cray XT3/XT4

Guidelines for Efficient Parallel I/O on the Cray XT3/XT4 Guidelines for Efficient Parallel I/O on the Cray XT3/XT4 Jeff Larkin, Cray Inc. and Mark Fahey, Oak Ridge National Laboratory ABSTRACT: This paper will present an overview of I/O methods on Cray XT3/XT4

More information

THE SQUARE KILOMETER ARRAY (SKA) ESD USE CASE

THE SQUARE KILOMETER ARRAY (SKA) ESD USE CASE THE SQUARE KILOMETER ARRAY (SKA) ESD USE CASE Ronald Nijboer Head ASTRON R&D Computing Group With material from Chris Broekema (ASTRON) John Romein (ASTRON) Nick Rees (SKA Office) Miles Deegan (SKA Office)

More information

An Exploration into Object Storage for Exascale Supercomputers. Raghu Chandrasekar

An Exploration into Object Storage for Exascale Supercomputers. Raghu Chandrasekar An Exploration into Object Storage for Exascale Supercomputers Raghu Chandrasekar Agenda Introduction Trends and Challenges Design and Implementation of SAROJA Preliminary evaluations Summary and Conclusion

More information

L3/L4 Multiple Level Cache concept using ADS

L3/L4 Multiple Level Cache concept using ADS L3/L4 Multiple Level Cache concept using ADS Hironao Takahashi 1,2, Hafiz Farooq Ahmad 2,3, Kinji Mori 1 1 Department of Computer Science, Tokyo Institute of Technology 2-12-1 Ookayama Meguro, Tokyo, 152-8522,

More information

CSD3 The Cambridge Service for Data Driven Discovery. A New National HPC Service for Data Intensive science

CSD3 The Cambridge Service for Data Driven Discovery. A New National HPC Service for Data Intensive science CSD3 The Cambridge Service for Data Driven Discovery A New National HPC Service for Data Intensive science Dr Paul Calleja Director of Research Computing University of Cambridge Problem statement Today

More information

Data storage services at KEK/CRC -- status and plan

Data storage services at KEK/CRC -- status and plan Data storage services at KEK/CRC -- status and plan KEK/CRC Hiroyuki Matsunaga Most of the slides are prepared by Koichi Murakami and Go Iwai KEKCC System Overview KEKCC (Central Computing System) The

More information

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,

More information

Introduction to High-Performance Computing

Introduction to High-Performance Computing Introduction to High-Performance Computing Dr. Axel Kohlmeyer Associate Dean for Scientific Computing, CST Associate Director, Institute for Computational Science Assistant Vice President for High-Performance

More information

ZEST Snapshot Service. A Highly Parallel Production File System by the PSC Advanced Systems Group Pittsburgh Supercomputing Center 1

ZEST Snapshot Service. A Highly Parallel Production File System by the PSC Advanced Systems Group Pittsburgh Supercomputing Center 1 ZEST Snapshot Service A Highly Parallel Production File System by the PSC Advanced Systems Group Pittsburgh Supercomputing Center 1 Design Motivation To optimize science utilization of the machine Maximize

More information

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup Chapter 4 Routers with Tiny Buffers: Experiments This chapter describes two sets of experiments with tiny buffers in networks: one in a testbed and the other in a real network over the Internet2 1 backbone.

More information

Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories

Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories Adrian M. Caulfield Arup De, Joel Coburn, Todor I. Mollov, Rajesh K. Gupta, Steven Swanson Non-Volatile Systems

More information

MPI RUNTIMES AT JSC, NOW AND IN THE FUTURE

MPI RUNTIMES AT JSC, NOW AND IN THE FUTURE , NOW AND IN THE FUTURE Which, why and how do they compare in our systems? 08.07.2018 I MUG 18, COLUMBUS (OH) I DAMIAN ALVAREZ Outline FZJ mission JSC s role JSC s vision for Exascale-era computing JSC

More information

MOHA: Many-Task Computing Framework on Hadoop

MOHA: Many-Task Computing Framework on Hadoop Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction

More information

Can FPGAs beat GPUs in accelerating next-generation Deep Neural Networks? Discussion of the FPGA 17 paper by Intel Corp. (Nurvitadhi et al.

Can FPGAs beat GPUs in accelerating next-generation Deep Neural Networks? Discussion of the FPGA 17 paper by Intel Corp. (Nurvitadhi et al. Can FPGAs beat GPUs in accelerating next-generation Deep Neural Networks? Discussion of the FPGA 17 paper by Intel Corp. (Nurvitadhi et al.) Andreas Kurth 2017-12-05 1 In short: The situation Image credit:

More information

Users and utilization of CERIT-SC infrastructure

Users and utilization of CERIT-SC infrastructure Users and utilization of CERIT-SC infrastructure Equipment CERIT-SC is an integral part of the national e-infrastructure operated by CESNET, and it leverages many of its services (e.g. management of user

More information

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage Evaluation of Lustre File System software enhancements for improved Metadata performance Wojciech Turek, Paul Calleja,John

More information

2 Improved Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers [1]

2 Improved Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers [1] EE482: Advanced Computer Organization Lecture #7 Processor Architecture Stanford University Tuesday, June 6, 2000 Memory Systems and Memory Latency Lecture #7: Wednesday, April 19, 2000 Lecturer: Brian

More information

SpiNNaker a Neuromorphic Supercomputer. Steve Temple University of Manchester, UK SOS21-21 Mar 2017

SpiNNaker a Neuromorphic Supercomputer. Steve Temple University of Manchester, UK SOS21-21 Mar 2017 SpiNNaker a Neuromorphic Supercomputer Steve Temple University of Manchester, UK SOS21-21 Mar 2017 Outline of talk Introduction Modelling neurons Architecture and technology Principles of operation Summary

More information

The Leading Parallel Cluster File System

The Leading Parallel Cluster File System The Leading Parallel Cluster File System www.thinkparq.com www.beegfs.io ABOUT BEEGFS What is BeeGFS BeeGFS (formerly FhGFS) is the leading parallel cluster file system, developed with a strong focus on

More information

I/O-500 Status. Julian M. Kunkel 1, Jay Lofstead 2, John Bent 3, George S. Markomanolis

I/O-500 Status. Julian M. Kunkel 1, Jay Lofstead 2, John Bent 3, George S. Markomanolis I/O-500 Status Julian M. Kunkel 1, Jay Lofstead 2, John Bent 3, George S. Markomanolis 4 1. Deutsches Klimarechenzentrum GmbH (DKRZ) 2. Sandia National Laboratory 3. Seagate Government Solutions 4. KAUST

More information

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Jeffrey Young, Alex Merritt, Se Hoon Shon Advisor: Sudhakar Yalamanchili 4/16/13 Sponsors: Intel, NVIDIA, NSF 2 The Problem Big

More information

LHCb Distributed Conditions Database

LHCb Distributed Conditions Database LHCb Distributed Conditions Database Marco Clemencic E-mail: marco.clemencic@cern.ch Abstract. The LHCb Conditions Database project provides the necessary tools to handle non-event time-varying data. The

More information

Assessment of LS-DYNA Scalability Performance on Cray XD1

Assessment of LS-DYNA Scalability Performance on Cray XD1 5 th European LS-DYNA Users Conference Computing Technology (2) Assessment of LS-DYNA Scalability Performance on Cray Author: Ting-Ting Zhu, Cray Inc. Correspondence: Telephone: 651-65-987 Fax: 651-65-9123

More information

Triton file systems - an introduction. slide 1 of 28

Triton file systems - an introduction. slide 1 of 28 Triton file systems - an introduction slide 1 of 28 File systems Motivation & basic concepts Storage locations Basic flow of IO Do's and Don'ts Exercises slide 2 of 28 File systems: Motivation Case #1:

More information

simulation framework for piecewise regular grids

simulation framework for piecewise regular grids WALBERLA, an ultra-scalable multiphysics simulation framework for piecewise regular grids ParCo 2015, Edinburgh September 3rd, 2015 Christian Godenschwager, Florian Schornbaum, Martin Bauer, Harald Köstler

More information

Cray XC Scalability and the Aries Network Tony Ford

Cray XC Scalability and the Aries Network Tony Ford Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?

More information

Mapping MPI+X Applications to Multi-GPU Architectures

Mapping MPI+X Applications to Multi-GPU Architectures Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under

More information

Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces. Yang Liu Raghul Gunasekaran Xiaosong Ma Sudharshan S.

Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces. Yang Liu Raghul Gunasekaran Xiaosong Ma Sudharshan S. Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces Yang Liu Raghul Gunasekaran Xiaosong Ma Sudharshan S. Vazhkudai Instance of Large-Scale HPC Systems ORNL s TITAN (World

More information

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010 Moneta: A High-performance Storage Array Architecture for Nextgeneration, Non-volatile Memories Micro 2010 NVM-based SSD NVMs are replacing spinning-disks Performance of disks has lagged NAND flash showed

More information

An Introduction to GPFS

An Introduction to GPFS IBM High Performance Computing July 2006 An Introduction to GPFS gpfsintro072506.doc Page 2 Contents Overview 2 What is GPFS? 3 The file system 3 Application interfaces 4 Performance and scalability 4

More information

SDA: Software-Defined Accelerator for Large- Scale DNN Systems

SDA: Software-Defined Accelerator for Large- Scale DNN Systems SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, 1 Yong Wang, 1 Bo Yu, 1 Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A

More information

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory Dhananjoy Das, Sr. Systems Architect SanDisk Corp. 1 Agenda: Applications are KING! Storage landscape (Flash / NVM)

More information

Accelerating sequential computer vision algorithms using commodity parallel hardware

Accelerating sequential computer vision algorithms using commodity parallel hardware Accelerating sequential computer vision algorithms using commodity parallel hardware Platform Parallel Netherlands GPGPU-day, 28 June 2012 Jaap van de Loosdrecht NHL Centre of Expertise in Computer Vision

More information

MPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA

MPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA MPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA Gilad Shainer 1, Tong Liu 1, Pak Lui 1, Todd Wilde 1 1 Mellanox Technologies Abstract From concept to engineering, and from design to

More information

Peta-Scale Simulations with the HPC Software Framework walberla:

Peta-Scale Simulations with the HPC Software Framework walberla: Peta-Scale Simulations with the HPC Software Framework walberla: Massively Parallel AMR for the Lattice Boltzmann Method SIAM PP 2016, Paris April 15, 2016 Florian Schornbaum, Christian Godenschwager,

More information

The Computation and Data Needs of Canadian Astronomy

The Computation and Data Needs of Canadian Astronomy Summary The Computation and Data Needs of Canadian Astronomy The Computation and Data Committee In this white paper, we review the role of computing in astronomy and astrophysics and present the Computation

More information

Quantifying power consumption variations of HPC systems using SPEC MPI benchmarks

Quantifying power consumption variations of HPC systems using SPEC MPI benchmarks Center for Information Services and High Performance Computing (ZIH) Quantifying power consumption variations of HPC systems using SPEC MPI benchmarks EnA-HPC, Sept 16 th 2010, Robert Schöne, Daniel Molka,

More information

Out-Of-Core Sort-First Parallel Rendering for Cluster-Based Tiled Displays

Out-Of-Core Sort-First Parallel Rendering for Cluster-Based Tiled Displays Out-Of-Core Sort-First Parallel Rendering for Cluster-Based Tiled Displays Wagner T. Corrêa James T. Klosowski Cláudio T. Silva Princeton/AT&T IBM OHSU/AT&T EG PGV, Germany September 10, 2002 Goals Render

More information

Structuring PLFS for Extensibility

Structuring PLFS for Extensibility Structuring PLFS for Extensibility Chuck Cranor, Milo Polte, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University What is PLFS? Parallel Log Structured File System Interposed filesystem b/w

More information

DDN and Flash GRIDScaler, Flashscale Infinite Memory Engine

DDN and Flash GRIDScaler, Flashscale Infinite Memory Engine 1! DDN and Flash GRIDScaler, Flashscale Infinite Memory Engine T. Cecchi - September 21 st 2016 HPC Advisory Council 2! DDN END-TO-END DATA LIFECYCLE MANAGEMENT BURST & COMPUTE SSD, DISK & FILE SYSTEM

More information

OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI

OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI CMPE 655- MULTIPLE PROCESSOR SYSTEMS OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI What is MULTI PROCESSING?? Multiprocessing is the coordinated processing

More information

Advanced Data Placement via Ad-hoc File Systems at Extreme Scales (ADA-FS)

Advanced Data Placement via Ad-hoc File Systems at Extreme Scales (ADA-FS) Advanced Data Placement via Ad-hoc File Systems at Extreme Scales (ADA-FS) Understanding I/O Performance Behavior (UIOP) 2017 Sebastian Oeste, Mehmet Soysal, Marc-André Vef, Michael Kluge, Wolfgang E.

More information