Early Evaluation of the "Infinite Memory Engine" Burst Buffer Solution
|
|
- Wendy Baker
- 6 years ago
- Views:
Transcription
1 Early Evaluation of the "Infinite Memory Engine" Burst Buffer Solution Wolfram Schenck Faculty of Engineering and Mathematics, Bielefeld University of Applied Sciences, Bielefeld, Germany Salem El Sayed, Maciej Foszczynski, Wilhelm Homberg, Dirk Pleiter Jülich Supercomputing Centre, Forschungszentrum Jülich, Jülich, Germany WOPSSS 2016 Frankfurt,
2 Outline Conclusions and Outlook Introduction: The Burst Buffer Concept Data Retention Time Analysis Test System NEST Benchmarks General Benchmarks (IOR) Slide 2
3 Introduction: The Burst Buffer Concept Slide 3
4 Need for New Storage Architectures Address growing performance gap Floating-point performance B fp grows faster than I/O bandwidth B io, i.e. B io /B fp becomes smaller For JUQUEEN we have B io /B fp = 1 Byte / 40,000 Flops Mitigation strategy: Hierarchical storage architecture Fast but low capacity storage tier Large capacity but slow storage tier Emerging data-intensive applications Need for large storage capacity C io, and high bandwidth B io, and high IOPs rates Slide 4
5 Application Classes Dominant read Applications processing data retrieved by experiments or collected by observatories Applications analyzing data from huge databases ("big data") Dominant write Applications from the area of simulation science, generating large amounts of data Transient write/read Applications (or sets of applications) producing and consuming significant amounts of data on the same system Transient data: Long-term storage often not necessary Cluster Main Storage System Slide 5
6 Conventional Storage System Cluster Arrow direction: Dominant write Main Storage System Time step spent with I/O 10 time steps Time step spent with non- I/O operations t Slide 6
7 Enhanced by Burst Buffer Scenario: Sustained Performance Cluster Main Storage System I/O burst 10 time steps t Full simulation cycle Cluster Burst Buffer Main Storage System 6 time steps t SPEEDUP = 10/6 = 1.67 Slide 7
8 Enhanced by Burst Buffer Scenario: Short-Term Peak Performance Cluster Main Storage System I/O burst Full simulation cycle Cluster Burst Buffer t 18 time steps Main Storage System 6 time steps t SPEEDUP = 18/6 = 3.0 Slide 8
9 Burst Buffer Concept Capacities: Conventional main storage: Large Burst buffer: Small Bandwidth: Between cluster and burst buffer: High Between burst buffer and main storage: Low Speedup obtained via burst buffer depends theoretically on (for dominant write): I/O pattern of application: Continuous vs. in bursts I/O intensity of application: Low vs. high Runtime of application: Long vs. short Increasing speedup Slide 9
10 Infinite Memory Engine (by DDN) Realisation of storage hierarchy Upper tier = IME Very small C io / B io 10 min Leverage NVM technologies External storage Very large C io / B io O(1 day) Leverage HDD technologies Benefits High bandwidth + IOPs rate Compatibility and support of any POSIX compliant parallel file system Challenges Re-organisation of I/O may be required to leverage performance Compute servers IME External storage Slide 10
11 Using IME MPI I/O interface Use of namespace of parallel file system (PFS) Prefix controls where created file is allocated, e.g. ime://gpfs/data/pleiter/file.dat Software-controlled sync from IME to PFS POSIX interface IME storage devices mounted using FUSE Use of namespace of parallel file system (PFS), but: Special mountpoint for IME (use path via this mountpoint for direct access to IME) Choice of path allows to control use of IME or PFS Software-controlled sync from IME to PFS Slide 11
12 Benchmarking Central goal of our study: Benchmarking with real-world system to check if IME fulfills theoretical expectations Benchmarks: General performance: IOR [LLNL, 2003] Benchmarking tool for testing performance of parallel filesystems using various interfaces and access patterns Computational science software from the dominant write class: NEST Slide 12
13 Test System Slide 13
14 JUlich Dedicated GPU Environment (JUDGE) (decommissioned end of 2015) JUDGE: For our tests: Up to 64 compute nodes from JUDGE Scientific Linux 6.7 Pre-release version of IME software stack (Dec. 2015) Figure: JSC Slide 14
15 Test System Schematic overview of the integration of the IME servers at JSC: (64 Gbit/s) (10 Gbit/s) JUST (32 Gbit/s) (64 Gbit/s) (20 Gbit/s) Bandwidth to IME: 128 Gbit/s = 16 GByte/s IME = IME Server 24 SSDs with 200 GiB each (overall ca. 4.7 TiB) 2 IB host adapters (QDR) Bandwidth to GPFS: 20 Gbit/s = 2.5 GByte/s Slide 15
16 General Benchmarks (IOR) IOR Settings Slide 16
17 IOR Read Performance Bandwidth saturation reached with 4 nodes (GPFS) or 8 nodes (IME) Max. GPFS read bandwidth: 0.63 GByte/s (25% of nominal value) Max. IME read bandwidth: 13.8 GByte/s (86% of nominal value) Slide 17
18 IOR Write Performance Bandwidth saturation reached with 4 nodes (GPFS) or 8 nodes (IME) Max. GPFS write bandwidth: 0.75 GByte/s (33% of nominal value) Max. IME write bandwidth: GByte/s (98% of nominal value) Slide 18
19 NEST Benchmarks Slide 19
20 The Human Brain Project HBP: Future & Emerging Technologies flagship project (co-)funded by European Commission Science-driven, seeded from FET, extending beyond ICT Ambitious, unifying goal, large-scale Goal To build an integrated ICT infrastructure enabling a global collaborative effort towards understanding the human brain, and ultimately to emulate its computational capabilities Slide 20
21 Brain Simulation (1) Simulation software: NEST (NEural Simulation Tool) Open source: / Purpose: Large-scale simulations of biologically realistic neuronal networks (focus on large networks, use of simple point neurons) Dendriten Axon Soma Neuron Spike Slide 21
22 Brain Simulation (2) In the human brain: ca. 100 bn neurons ca. 10,000 incoming connections per neuron Largest simulation so far: Simulation with 1 bn neurons (feasibility study on the K computer in Japan) I/O challenge: Simulations can produce huge amounts of data Right fig.: E. Torre, INM-6, Forschungszentrum Jülich Slide 22
23 Parallel Processing in NEST (VP: Virtual Process) Number of Threads per Rank Number of MPI Ranks M VP0 VP1 VP2 N VP neurons N VP neurons N VP neurons VP3 VP4 VP5 N VP neurons N VP neurons N VP neurons T In the whole network: N neurons with N = M T N VP Slide 23
24 Simulation Cycle Communication interval Process-internal routing of spike events to their target neurons (incl. synapse update) Updating of neuronal states (incl. spike generation) Exchange of spike events between MPI processes Slide 24
25 Creating Spike Events during Neuron Update Number of Threads per Rank Number of MPI Ranks M VP0 VP1 VP2 N VP neurons N VP neurons N VP neurons VP3 VP4 VP5 N VP neurons N VP neurons N VP neurons T Red dot: Single spike event Slide 25
26 Simulation Cycle (revisited) Communication interval Process-internal routing of spike events to their target neurons (incl. synapse update) Updating of neuronal states (incl. spike generation) Exchange of spike events between MPI processes Slide 26
27 Number of Threads per Rank Creation of Rank-Local Spike Buffers Number of MPI Ranks M VP0 VP1 VP2 N VP neurons N VP neurons N VP neurons VP3 VP4 VP5 N VP neurons N VP neurons N VP neurons T Slide 27
28 MPI Communication: Every rank receives all spike events Number of Threads per Rank Number of MPI Ranks M VP0 VP1 VP2 N VP neurons N VP neurons N VP neurons VP3 VP4 VP5 N VP neurons N VP neurons N VP neurons T MPI Slide 28
29 Simulation Cycle (revisited) Communication interval Process-internal routing of spike events to their target neurons (incl. synapse update) Updating of neuronal states (incl. spike generation) Exchange of spike events between MPI processes Slide 29
30 I/O in NEST Data collected during simulations: Spike events Recording device: Spike detector State variables (e.g., membrane potential of neurons) Recording device: Multimeter Recording devices belong to abstract node class: Connected to neurons (from which measurements are collected) Receive spike events (spike detector) Send out measurement events (multimeter) Updated like neurons (writing data during update) Each recording device exists on every virtual process (VP), writes data via C++ output stream into text file (one file per device per VP) Slide 30
31 Simulation Script for Benchmark: Random Balanced Network One spike detector and one multimeter per population (created last after all neurons) Overall 4 recording devices (= C++ output streams) per VP Fig.: Nadine Daivandy (JSC) Slide 31
32 Simulation Cycle (revisited) Communication interval Process-internal routing of spike events to their target neurons (incl. synapse update) Update of recording devices I/O Updating of neuronal states (incl. spike generation) BURST Exchange of spike events between MPI processes Slide 32
33 Design of Experiment Factor 1: Number of compute nodes 1, 2, 4, 8, 16 Strict weak scaling design: Number of neurons per node constant Factor 2: Amount of written data per node; manipulated via number of state variables recorded by each multimeter 1 22 Corresponds to 1 GiB/node 8 GiB/node (amount of spike data insignificant) Factor 3: Output file system 1. POSIX I/O to GPFS 2. POSIX I/O to IME 3. POSIX I/O to /dev/null: Baseline condition, "infinitely fast storage device" Further experimental settings: Simulated biological time: 100 ms Network size: 258,750 neurons per compute node, ca. 3e8 synapses per compute node 23 MPI ranks per compute node 5 runs per task condition, minimum reported Slide 33
34 Bandwidth (1 GiB/node) Slide 34
35 Bandwidth (8 GiB/node) Slide 35
36 Bandwidth (1 and 8 GiB/node) POSIX2IME very close to POSIX2DEVNULL: IME close to "ideal" performance Very good scaling behavior of IME: Observed bandwidth nearly doubles with doubling of number of compute nodes Bad scaling behavior of GPFS beyond 4 compute nodes Observed bandwidth small compared to IOR measurements Slide 36
37 Simulation Cycle (revisited) Communication interval Process-internal routing of spike events to their target neurons (incl. synapse update) Update of recording devices I/O Updating of neuronal states (incl. spike generation) BURST Exchange of spike events between MPI processes Slide 37
38 Simulation Time (1 GiB/node) Effective simulation time = simulation time without step 3 (MPI synchr.) Slide 38
39 Simulation Times (8 GiB/node) Effective simulation time = simulation time without step 3 (MPI synchr.) Slide 39
40 Simulation Time: Observations The larger the number of nodes, the stronger the advantage of writing to IME or /dev/null Very good scaling behavior of IME clearly visible in plots GPFS setting suffers heavily from imbalance between ranks IME reaches nearly performance of /dev/null; barely any I/Oinduced additional imbalance between ranks Slide 40
41 Relative Runtime Reduction Reported values based on average over all measured I/O loads Slide 41
42 Data Retention Time Analysis Slide 42
43 Motivation: Interactive Supercomp. Data retention time analysis: Classification of data depending on how long it will be retained Interactive supercomputing/hpc: User can interact with the application(s) that run on the supercomputer/cluster Misc. use cases for NEST Slide 43
44 NEST: Data Retention Times Data retention time analysis: Classification of data depending on how long it will be retained Slide 44
45 Conclusions and Outlook Slide 45
46 Conclusions IOR Results: IME saturated ca. 90% of nominal bandwidth in reading and writing Promising finding for all considered application classes NEST Results: Barely any I/O-induced imbalance between ranks with IME (in constrast to GPFS) IME performance close to baseline condition (/dev/null), nearly perfect weak scaling behavior At largest problem size: Nearly speedup of 4 achieved vs. GPFS Easy handling: No code changes in NEST required Conclusions: IME actually works as theoretically expected for applications from the dominant write class (writing in bursts) NEST users would strongly profit from the incorporation of IME in compute clusters (I/O no longer a limiting factor in gathering simulation results) Slide 46
47 Outlook and Recommendations Recommendations for the future development of IME: Data pre-fetching: For "dominant read" applications, data prefetching before job start would be highly beneficial Integration into job managers? Development of tools for managing short-term and transient data, integration into job managers Support for end-to-end data integrity like within GPFS Final word: IME shows: Working burst buffer solutions exist for complex parallel applications Opportunity to scale compute and I/O performance Alternatively: Opportunity to reduce bandwidth requirements for external storage system Slide 47
48 Questions? Thank you for your attention! Acknowledgements: We would like to thank DDN for making an IME test system available at Jülich Supercomputing Centre. In particular, we gracefully acknowledge the continuous support by Tommaso Cecchi and Toine Beckers. Slide 48
IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning
IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning September 22 nd 2015 Tommaso Cecchi 2 What is IME? This breakthrough, software defined storage application
More informationBridging the complexity gap: Tracing and Replaying I/O
Bridging the complexity gap: Tracing and Replaying I/O UIOP 2017, Hamburg, Mar. 22nd Jean-Thomas Acquaviva, DDN Storage 2 Complexiy: E.g NSCC / A*STAR Remote Login Nodes at NUS MetroX 1PF Compute Cluster
More informationApplication Performance on IME
Application Performance on IME Toine Beckers, DDN Marco Grossi, ICHEC Burst Buffer Designs Introduce fast buffer layer Layer between memory and persistent storage Pre-stage application data Buffer writes
More informationIME Infinite Memory Engine Technical Overview
1 1 IME Infinite Memory Engine Technical Overview 2 Bandwidth, IOPs single NVMe drive 3 What does Flash mean for Storage? It's a new fundamental device for storing bits. We must treat it different from
More informationImproved Solutions for I/O Provisioning and Application Acceleration
1 Improved Solutions for I/O Provisioning and Application Acceleration August 11, 2015 Jeff Sisilli Sr. Director Product Marketing jsisilli@ddn.com 2 Why Burst Buffer? The Supercomputing Tug-of-War A supercomputer
More informationNVIDIA Application Lab at Jülich
Mitglied der Helmholtz- Gemeinschaft NVIDIA Application Lab at Jülich Dirk Pleiter Jülich Supercomputing Centre (JSC) Forschungszentrum Jülich at a Glance (status 2010) Budget: 450 mio Euro Staff: 4,800
More informationDDN About Us Solving Large Enterprise and Web Scale Challenges
1 DDN About Us Solving Large Enterprise and Web Scale Challenges History Founded in 98 World s Largest Private Storage Company Growing, Profitable, Self Funded Headquarters: Santa Clara and Chatsworth,
More informationCharacterizing Parallel I/O Behaviour Based on Server-Side I/O Counters
Characterizing Parallel I/O Behaviour Based on Server-Side I/O Counters SC16 - BoF Analyzing Parallel I/O SC16 BoF - Analyzing Parallel I/O, November 15, 2016 S. El Sayed JSC M. Bolten Kas D. Pleiter JSC
More informationAnalyzing the High Performance Parallel I/O on LRZ HPC systems. Sandra Méndez. HPC Group, LRZ. June 23, 2016
Analyzing the High Performance Parallel I/O on LRZ HPC systems Sandra Méndez. HPC Group, LRZ. June 23, 2016 Outline SuperMUC supercomputer User Projects Monitoring Tool I/O Software Stack I/O Analysis
More informationA Breakthrough in Non-Volatile Memory Technology FUJITSU LIMITED
A Breakthrough in Non-Volatile Memory Technology & 0 2018 FUJITSU LIMITED IT needs to accelerate time-to-market Situation: End users and applications need instant access to data to progress faster and
More informationStore Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete
Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete 1 DDN Who We Are 2 We Design, Deploy and Optimize Storage Systems Which Solve HPC, Big Data and Cloud Business
More informationHPC Storage Use Cases & Future Trends
Oct, 2014 HPC Storage Use Cases & Future Trends Massively-Scalable Platforms and Solutions Engineered for the Big Data and Cloud Era Atul Vidwansa Email: atul@ DDN About Us DDN is a Leader in Massively
More informationBasics of Performance Engineering
ERLANGEN REGIONAL COMPUTING CENTER Basics of Performance Engineering J. Treibig HiPerCH 3, 23./24.03.2015 Why hardware should not be exposed Such an approach is not portable Hardware issues frequently
More informationShort Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy
Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy François Tessier, Venkatram Vishwanath Argonne National Laboratory, USA July 19,
More informationI/O and Scheduling aspects in DEEP-EST
I/O and Scheduling aspects in DEEP-EST Norbert Eicker Jülich Supercomputing Centre & University of Wuppertal The research leading to these results has received funding from the European Community's Seventh
More informationUsing DDN IME for Harmonie
Irish Centre for High-End Computing Using DDN IME for Harmonie Gilles Civario, Marco Grossi, Alastair McKinstry, Ruairi Short, Nix McDonnell April 2016 DDN IME: Infinite Memory Engine IME: Major Features
More informationPorting Scientific Applications to OpenPOWER
Porting Scientific Applications to OpenPOWER Dirk Pleiter Forschungszentrum Jülich / JSC #OpenPOWERSummit Join the conversation at #OpenPOWERSummit 1 JSC s HPC Strategy IBM Power 6 JUMP, 9 TFlop/s Intel
More informationThe Fusion Distributed File System
Slide 1 / 44 The Fusion Distributed File System Dongfang Zhao February 2015 Slide 2 / 44 Outline Introduction FusionFS System Architecture Metadata Management Data Movement Implementation Details Unique
More informationJÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich
JÜLICH SUPERCOMPUTING CENTRE Site Introduction 09.04.2018 Michael Stephan JSC @ Forschungszentrum Jülich FORSCHUNGSZENTRUM JÜLICH Research Centre Jülich One of the 15 Helmholtz Research Centers in Germany
More informationOverview. Idea: Reduce CPU clock frequency This idea is well suited specifically for visualization
Exploring Tradeoffs Between Power and Performance for a Scientific Visualization Algorithm Stephanie Labasan & Matt Larsen (University of Oregon), Hank Childs (Lawrence Berkeley National Laboratory) 26
More informationHigh-Performance Data Loading and Augmentation for Deep Neural Network Training
High-Performance Data Loading and Augmentation for Deep Neural Network Training Trevor Gale tgale@ece.neu.edu Steven Eliuk steven.eliuk@gmail.com Cameron Upright c.upright@samsung.com Roadmap 1. The General-Purpose
More informationToward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies
Toward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies François Tessier, Venkatram Vishwanath, Paul Gressier Argonne National Laboratory, USA Wednesday
More informationRevealing Applications Access Pattern in Collective I/O for Cache Management
Revealing Applications Access Pattern in for Yin Lu 1, Yong Chen 1, Rob Latham 2 and Yu Zhuang 1 Presented by Philip Roth 3 1 Department of Computer Science Texas Tech University 2 Mathematics and Computer
More informationI/O Monitoring at JSC, SIONlib & Resiliency
Mitglied der Helmholtz-Gemeinschaft I/O Monitoring at JSC, SIONlib & Resiliency Update: I/O Infrastructure @ JSC Update: Monitoring with LLview (I/O, Memory, Load) I/O Workloads on Jureca SIONlib: Task-Local
More informationAPI and Usage of libhio on XC-40 Systems
API and Usage of libhio on XC-40 Systems May 24, 2018 Nathan Hjelm Cray Users Group May 24, 2018 Los Alamos National Laboratory LA-UR-18-24513 5/24/2018 1 Outline Background HIO Design HIO API HIO Configuration
More informationThe State and Needs of IO Performance Tools
The State and Needs of IO Performance Tools Scalable Tools Workshop Lake Tahoe, CA August 6 12, 2017 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National
More informationLeveraging Burst Buffer Coordination to Prevent I/O Interference
Leveraging Burst Buffer Coordination to Prevent I/O Interference Anthony Kougkas akougkas@hawk.iit.edu Matthieu Dorier, Rob Latham, Rob Ross, Xian-He Sun Wednesday, October 26th Baltimore, USA Outline
More informationYour cloud solution for EO Data access and processing
powered by Your cloud solution for EO Data access and processing Stanisław Dałek VP - CloudFerro 2 About CREODIAS The platform In 2017 European Space Agency, acting on behalf of the European Commission,
More informationFunctional Partitioning to Optimize End-to-End Performance on Many-core Architectures
Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures Min Li, Sudharshan S. Vazhkudai, Ali R. Butt, Fei Meng, Xiaosong Ma, Youngjae Kim,Christian Engelmann, and Galen Shipman
More informationlibhio: Optimizing IO on Cray XC Systems With DataWarp
libhio: Optimizing IO on Cray XC Systems With DataWarp May 9, 2017 Nathan Hjelm Cray Users Group May 9, 2017 Los Alamos National Laboratory LA-UR-17-23841 5/8/2017 1 Outline Background HIO Design Functionality
More informationTechniques to improve the scalability of Checkpoint-Restart
Techniques to improve the scalability of Checkpoint-Restart Bogdan Nicolae Exascale Systems Group IBM Research Ireland 1 Outline A few words about the lab and team Challenges of Exascale A case for Checkpoint-Restart
More informationPractical Near-Data Processing for In-Memory Analytics Frameworks
Practical Near-Data Processing for In-Memory Analytics Frameworks Mingyu Gao, Grant Ayers, Christos Kozyrakis Stanford University http://mast.stanford.edu PACT Oct 19, 2015 Motivating Trends End of Dennard
More informationOn the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows Rafael Ferreira da Silva, Scott Callaghan, Ewa Deelman 12 th Workflows in Support of Large-Scale Science (WORKS) SuperComputing
More informationInfinite Memory Engine Freedom from Filesystem Foibles
1 Infinite Memory Engine Freedom from Filesystem Foibles James Coomer 25 th Sept 2017 2 Bad stuff can happen to filesystems Malaligned High Concurrency Random Shared File COMPUTE NODES FILESYSTEM 3 And
More informationTrends in HPC (hardware complexity and software challenges)
Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18
More informationHPC Architectures. Types of resource currently in use
HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationDISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA
DISTRIBUTED HIGH-SPEED COMPUTING OF MULTIMEDIA DATA M. GAUS, G. R. JOUBERT, O. KAO, S. RIEDEL AND S. STAPEL Technical University of Clausthal, Department of Computer Science Julius-Albert-Str. 4, 38678
More informationGPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations
GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations Argonne National Laboratory Argonne National Laboratory is located on 1,500
More informationLeveraging Flash in HPC Systems
Leveraging Flash in HPC Systems IEEE MSST June 3, 2015 This work was performed under the auspices of the U.S. Department of Energy by under Contract DE-AC52-07NA27344. Lawrence Livermore National Security,
More informationLeveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands
Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands Unleash Your Data Center s Hidden Power September 16, 2014 Molly Rector CMO, EVP Product Management & WW Marketing
More informationEnosis: Bridging the Semantic Gap between
Enosis: Bridging the Semantic Gap between File-based and Object-based Data Models Anthony Kougkas - akougkas@hawk.iit.edu, Hariharan Devarajan, Xian-He Sun Outline Introduction Background Approach Evaluation
More informationAnalytics in the cloud
Analytics in the cloud Dow we really need to reinvent the storage stack? R. Ananthanarayanan, Karan Gupta, Prashant Pandey, Himabindu Pucha, Prasenjit Sarkar, Mansi Shah, Renu Tewari Image courtesy NASA
More informationPercipient StorAGe for Exascale Data Centric Computing Computing for the Exascale
Percipient StorAGe for Exascale Data Centric Computing Computing for the Exascale Shaun de Witt Culham Centre for Fusion Energy, UK 2 nd Technical Meeting on Fusion Data Processing, Validation and Analysis
More informationLIMITS OF ILP. B649 Parallel Architectures and Programming
LIMITS OF ILP B649 Parallel Architectures and Programming A Perfect Processor Register renaming infinite number of registers hence, avoids all WAW and WAR hazards Branch prediction perfect prediction Jump
More informationSystems Architectures towards Exascale
Systems Architectures towards Exascale D. Pleiter German-Indian Workshop on HPC Architectures and Applications Pune 29 November 2016 Outline Introduction Exascale computing Technology trends Architectures
More informationThe Cray Rainier System: Integrated Scalar/Vector Computing
THE SUPERCOMPUTER COMPANY The Cray Rainier System: Integrated Scalar/Vector Computing Per Nyberg 11 th ECMWF Workshop on HPC in Meteorology Topics Current Product Overview Cray Technology Strengths Rainier
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system
More informationChallenges in HPC I/O
Challenges in HPC I/O Universität Basel Julian M. Kunkel German Climate Computing Center / Universität Hamburg 10. October 2014 Outline 1 High-Performance Computing 2 Parallel File Systems and Challenges
More informationExtraordinary HPC file system solutions at KIT
Extraordinary HPC file system solutions at KIT Roland Laifer STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the State Roland of Baden-Württemberg Laifer Lustre and tools for ldiskfs investigation
More informationIntroduction to High Performance Parallel I/O
Introduction to High Performance Parallel I/O Richard Gerber Deputy Group Lead NERSC User Services August 30, 2013-1- Some slides from Katie Antypas I/O Needs Getting Bigger All the Time I/O needs growing
More informationFhGFS - Performance at the maximum
FhGFS - Performance at the maximum http://www.fhgfs.com January 22, 2013 Contents 1. Introduction 2 2. Environment 2 3. Benchmark specifications and results 3 3.1. Multi-stream throughput................................
More informationI/O at JSC. I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O. Wolfgang Frings
Mitglied der Helmholtz-Gemeinschaft I/O at JSC I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O Wolfgang Frings W.Frings@fz-juelich.de Jülich Supercomputing
More informationNext-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads
Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads Liran Zvibel CEO, Co-founder WekaIO @liranzvibel 1 WekaIO Matrix: Full-featured and Flexible Public or Private S3 Compatible
More informationImproving I/O Bandwidth With Cray DVS Client-Side Caching
Improving I/O Bandwidth With Cray DVS Client-Side Caching Bryce Hicks Cray Inc. Bloomington, MN USA bryceh@cray.com Abstract Cray s Data Virtualization Service, DVS, is an I/O forwarder providing access
More informationDesign and Evaluation of a 2048 Core Cluster System
Design and Evaluation of a 2048 Core Cluster System, Torsten Höfler, Torsten Mehlan and Wolfgang Rehm Computer Architecture Group Department of Computer Science Chemnitz University of Technology December
More informationOverview of Tianhe-2
Overview of Tianhe-2 (MilkyWay-2) Supercomputer Yutong Lu School of Computer Science, National University of Defense Technology; State Key Laboratory of High Performance Computing, China ytlu@nudt.edu.cn
More informationMotivation Goal Idea Proposition for users Study
Exploring Tradeoffs Between Power and Performance for a Scientific Visualization Algorithm Stephanie Labasan Computer and Information Science University of Oregon 23 November 2015 Overview Motivation:
More informationUsing Automated Performance Modeling to Find Scalability Bugs in Complex Codes
Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes A. Calotoiu 1, T. Hoefler 2, M. Poke 1, F. Wolf 1 1) German Research School for Simulation Sciences 2) ETH Zurich September
More informationGuidelines for Efficient Parallel I/O on the Cray XT3/XT4
Guidelines for Efficient Parallel I/O on the Cray XT3/XT4 Jeff Larkin, Cray Inc. and Mark Fahey, Oak Ridge National Laboratory ABSTRACT: This paper will present an overview of I/O methods on Cray XT3/XT4
More informationTHE SQUARE KILOMETER ARRAY (SKA) ESD USE CASE
THE SQUARE KILOMETER ARRAY (SKA) ESD USE CASE Ronald Nijboer Head ASTRON R&D Computing Group With material from Chris Broekema (ASTRON) John Romein (ASTRON) Nick Rees (SKA Office) Miles Deegan (SKA Office)
More informationAn Exploration into Object Storage for Exascale Supercomputers. Raghu Chandrasekar
An Exploration into Object Storage for Exascale Supercomputers Raghu Chandrasekar Agenda Introduction Trends and Challenges Design and Implementation of SAROJA Preliminary evaluations Summary and Conclusion
More informationL3/L4 Multiple Level Cache concept using ADS
L3/L4 Multiple Level Cache concept using ADS Hironao Takahashi 1,2, Hafiz Farooq Ahmad 2,3, Kinji Mori 1 1 Department of Computer Science, Tokyo Institute of Technology 2-12-1 Ookayama Meguro, Tokyo, 152-8522,
More informationCSD3 The Cambridge Service for Data Driven Discovery. A New National HPC Service for Data Intensive science
CSD3 The Cambridge Service for Data Driven Discovery A New National HPC Service for Data Intensive science Dr Paul Calleja Director of Research Computing University of Cambridge Problem statement Today
More informationData storage services at KEK/CRC -- status and plan
Data storage services at KEK/CRC -- status and plan KEK/CRC Hiroyuki Matsunaga Most of the slides are prepared by Koichi Murakami and Go Iwai KEKCC System Overview KEKCC (Central Computing System) The
More informationAnalyzing the Performance of IWAVE on a Cluster using HPCToolkit
Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,
More informationIntroduction to High-Performance Computing
Introduction to High-Performance Computing Dr. Axel Kohlmeyer Associate Dean for Scientific Computing, CST Associate Director, Institute for Computational Science Assistant Vice President for High-Performance
More informationZEST Snapshot Service. A Highly Parallel Production File System by the PSC Advanced Systems Group Pittsburgh Supercomputing Center 1
ZEST Snapshot Service A Highly Parallel Production File System by the PSC Advanced Systems Group Pittsburgh Supercomputing Center 1 Design Motivation To optimize science utilization of the machine Maximize
More informationChapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup
Chapter 4 Routers with Tiny Buffers: Experiments This chapter describes two sets of experiments with tiny buffers in networks: one in a testbed and the other in a real network over the Internet2 1 backbone.
More informationMoneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories
Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories Adrian M. Caulfield Arup De, Joel Coburn, Todor I. Mollov, Rajesh K. Gupta, Steven Swanson Non-Volatile Systems
More informationMPI RUNTIMES AT JSC, NOW AND IN THE FUTURE
, NOW AND IN THE FUTURE Which, why and how do they compare in our systems? 08.07.2018 I MUG 18, COLUMBUS (OH) I DAMIAN ALVAREZ Outline FZJ mission JSC s role JSC s vision for Exascale-era computing JSC
More informationMOHA: Many-Task Computing Framework on Hadoop
Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction
More informationCan FPGAs beat GPUs in accelerating next-generation Deep Neural Networks? Discussion of the FPGA 17 paper by Intel Corp. (Nurvitadhi et al.
Can FPGAs beat GPUs in accelerating next-generation Deep Neural Networks? Discussion of the FPGA 17 paper by Intel Corp. (Nurvitadhi et al.) Andreas Kurth 2017-12-05 1 In short: The situation Image credit:
More informationUsers and utilization of CERIT-SC infrastructure
Users and utilization of CERIT-SC infrastructure Equipment CERIT-SC is an integral part of the national e-infrastructure operated by CESNET, and it leverages many of its services (e.g. management of user
More informationIntel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage
Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage Evaluation of Lustre File System software enhancements for improved Metadata performance Wojciech Turek, Paul Calleja,John
More information2 Improved Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers [1]
EE482: Advanced Computer Organization Lecture #7 Processor Architecture Stanford University Tuesday, June 6, 2000 Memory Systems and Memory Latency Lecture #7: Wednesday, April 19, 2000 Lecturer: Brian
More informationSpiNNaker a Neuromorphic Supercomputer. Steve Temple University of Manchester, UK SOS21-21 Mar 2017
SpiNNaker a Neuromorphic Supercomputer Steve Temple University of Manchester, UK SOS21-21 Mar 2017 Outline of talk Introduction Modelling neurons Architecture and technology Principles of operation Summary
More informationThe Leading Parallel Cluster File System
The Leading Parallel Cluster File System www.thinkparq.com www.beegfs.io ABOUT BEEGFS What is BeeGFS BeeGFS (formerly FhGFS) is the leading parallel cluster file system, developed with a strong focus on
More informationI/O-500 Status. Julian M. Kunkel 1, Jay Lofstead 2, John Bent 3, George S. Markomanolis
I/O-500 Status Julian M. Kunkel 1, Jay Lofstead 2, John Bent 3, George S. Markomanolis 4 1. Deutsches Klimarechenzentrum GmbH (DKRZ) 2. Sandia National Laboratory 3. Seagate Government Solutions 4. KAUST
More informationOncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries
Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Jeffrey Young, Alex Merritt, Se Hoon Shon Advisor: Sudhakar Yalamanchili 4/16/13 Sponsors: Intel, NVIDIA, NSF 2 The Problem Big
More informationLHCb Distributed Conditions Database
LHCb Distributed Conditions Database Marco Clemencic E-mail: marco.clemencic@cern.ch Abstract. The LHCb Conditions Database project provides the necessary tools to handle non-event time-varying data. The
More informationAssessment of LS-DYNA Scalability Performance on Cray XD1
5 th European LS-DYNA Users Conference Computing Technology (2) Assessment of LS-DYNA Scalability Performance on Cray Author: Ting-Ting Zhu, Cray Inc. Correspondence: Telephone: 651-65-987 Fax: 651-65-9123
More informationTriton file systems - an introduction. slide 1 of 28
Triton file systems - an introduction slide 1 of 28 File systems Motivation & basic concepts Storage locations Basic flow of IO Do's and Don'ts Exercises slide 2 of 28 File systems: Motivation Case #1:
More informationsimulation framework for piecewise regular grids
WALBERLA, an ultra-scalable multiphysics simulation framework for piecewise regular grids ParCo 2015, Edinburgh September 3rd, 2015 Christian Godenschwager, Florian Schornbaum, Martin Bauer, Harald Köstler
More informationCray XC Scalability and the Aries Network Tony Ford
Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?
More informationMapping MPI+X Applications to Multi-GPU Architectures
Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under
More informationAutomatic Identification of Application I/O Signatures from Noisy Server-Side Traces. Yang Liu Raghul Gunasekaran Xiaosong Ma Sudharshan S.
Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces Yang Liu Raghul Gunasekaran Xiaosong Ma Sudharshan S. Vazhkudai Instance of Large-Scale HPC Systems ORNL s TITAN (World
More informationMoneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010
Moneta: A High-performance Storage Array Architecture for Nextgeneration, Non-volatile Memories Micro 2010 NVM-based SSD NVMs are replacing spinning-disks Performance of disks has lagged NAND flash showed
More informationAn Introduction to GPFS
IBM High Performance Computing July 2006 An Introduction to GPFS gpfsintro072506.doc Page 2 Contents Overview 2 What is GPFS? 3 The file system 3 Application interfaces 4 Performance and scalability 4
More informationSDA: Software-Defined Accelerator for Large- Scale DNN Systems
SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, 1 Yong Wang, 1 Bo Yu, 1 Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A
More informationNVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory
NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory Dhananjoy Das, Sr. Systems Architect SanDisk Corp. 1 Agenda: Applications are KING! Storage landscape (Flash / NVM)
More informationAccelerating sequential computer vision algorithms using commodity parallel hardware
Accelerating sequential computer vision algorithms using commodity parallel hardware Platform Parallel Netherlands GPGPU-day, 28 June 2012 Jaap van de Loosdrecht NHL Centre of Expertise in Computer Vision
More informationMPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA
MPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA Gilad Shainer 1, Tong Liu 1, Pak Lui 1, Todd Wilde 1 1 Mellanox Technologies Abstract From concept to engineering, and from design to
More informationPeta-Scale Simulations with the HPC Software Framework walberla:
Peta-Scale Simulations with the HPC Software Framework walberla: Massively Parallel AMR for the Lattice Boltzmann Method SIAM PP 2016, Paris April 15, 2016 Florian Schornbaum, Christian Godenschwager,
More informationThe Computation and Data Needs of Canadian Astronomy
Summary The Computation and Data Needs of Canadian Astronomy The Computation and Data Committee In this white paper, we review the role of computing in astronomy and astrophysics and present the Computation
More informationQuantifying power consumption variations of HPC systems using SPEC MPI benchmarks
Center for Information Services and High Performance Computing (ZIH) Quantifying power consumption variations of HPC systems using SPEC MPI benchmarks EnA-HPC, Sept 16 th 2010, Robert Schöne, Daniel Molka,
More informationOut-Of-Core Sort-First Parallel Rendering for Cluster-Based Tiled Displays
Out-Of-Core Sort-First Parallel Rendering for Cluster-Based Tiled Displays Wagner T. Corrêa James T. Klosowski Cláudio T. Silva Princeton/AT&T IBM OHSU/AT&T EG PGV, Germany September 10, 2002 Goals Render
More informationStructuring PLFS for Extensibility
Structuring PLFS for Extensibility Chuck Cranor, Milo Polte, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University What is PLFS? Parallel Log Structured File System Interposed filesystem b/w
More informationDDN and Flash GRIDScaler, Flashscale Infinite Memory Engine
1! DDN and Flash GRIDScaler, Flashscale Infinite Memory Engine T. Cecchi - September 21 st 2016 HPC Advisory Council 2! DDN END-TO-END DATA LIFECYCLE MANAGEMENT BURST & COMPUTE SSD, DISK & FILE SYSTEM
More informationOVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI
CMPE 655- MULTIPLE PROCESSOR SYSTEMS OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI What is MULTI PROCESSING?? Multiprocessing is the coordinated processing
More informationAdvanced Data Placement via Ad-hoc File Systems at Extreme Scales (ADA-FS)
Advanced Data Placement via Ad-hoc File Systems at Extreme Scales (ADA-FS) Understanding I/O Performance Behavior (UIOP) 2017 Sebastian Oeste, Mehmet Soysal, Marc-André Vef, Michael Kluge, Wolfgang E.
More information