EMPRESS Extensible Metadata PRovider for Extreme-scale Scientific Simulations

Similar documents
Using a Robust Metadata Management System to Accelerate Scientific Discovery at Extreme Scales

EMPRESS Extensible Metadata PRovider for Extreme-scale Scientific Simulations.

Integrating Analysis and Computation with Trios Services

Stream Processing for Remote Collaborative Data Analysis

Canopus: Enabling Extreme-Scale Data Analytics on Big HPC Storage via Progressive Refactoring

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

Parallel, In Situ Indexing for Data-intensive Computing. Introduction

Adaptable IO System (ADIOS)

Mahanaxar: Quality of Service Guarantees in High-Bandwidth, Real-Time Streaming Data Storage

Oh, Exascale! The effect of emerging architectures on scien1fic discovery. Kenneth Moreland, Sandia Na1onal Laboratories

SDS: A Framework for Scientific Data Services

CSE6230 Fall Parallel I/O. Fang Zheng

Fast Forward Storage & I/O. Jeff Layton (Eric Barton)

The Fusion Distributed File System

Agent Framework For Intelligent Data Processing

Big Data in Scientific Domains

What is DARMA? DARMA is a C++ abstraction layer for asynchronous many-task (AMT) runtimes.

Collaborators from SDM Center, CPES, GPSC, GSEP, Sandia, ORNL

A performance portable implementation of HOMME via the Kokkos programming model

Sandia National Laboratories Solutions for Aras Innovator

Parallel Programming Futures: What We Have and Will Have Will Not Be Enough

Multi-Fidelity Modeling of Jointed Structures

JULEA: A Flexible Storage Framework for HPC

Portability and Scalability of Sparse Tensor Decompositions on CPU/MIC/GPU Architectures

Data publication and discovery with Globus

The Cambridge Bio-Medical-Cloud An OpenStack platform for medical analytics and biomedical research

Case Study: CyberSKA - A Collaborative Platform for Data Intensive Radio Astronomy

Tintenfisch: File System Namespace Schemas and Generators

EXTREME SCALE DATA MANAGEMENT IN HIGH PERFORMANCE COMPUTING

Barely Sufficient Project Management

Aras Innovator Active Directory Sync

Optimizing Parallel Access to the BaBar Database System Using CORBA Servers

Increasing dataset quality metadata presence: Quality focused metadata editor and catalogue queriables.

The Canadian CyberSKA Project

irods for Data Management and Archiving UGM 2018 Masilamani Subramanyam

A Classifica*on of Scien*fic Visualiza*on Algorithms for Massive Threading Kenneth Moreland Berk Geveci Kwan- Liu Ma Robert Maynard

Introduction to High Performance Parallel I/O

A Reference Architecture for Payload Reusable Software (RAPRS)

Scalable Community Detection Benchmark Generation

Using ESML in a Semantic Web Approach for Improved Earth Science Data Usability

Journey of the Bubbles

High Speed Asynchronous Data Transfers on the Cray XT3

ABHP Certification Radiation Disciplines

A Platform for Provisioning Integrated Data and Visualization Capabilities Presented to SATURN in May 2016 Gerry Giese, Sandia National Laboratories

Oracle Fusion Middleware 11g: Build Applications with ADF I

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

The DEEP-ER take on I/O

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

Tackling the Reproducibility Problem in Systems Research with Declarative Experiment Specifications

Service Mesh and Microservices Networking

Ambry: LinkedIn s Scalable Geo- Distributed Object Store

Data Throttling for Data-Intensive Workflows

Dac-Man: Data Change Management for Scientific Datasets on HPC systems

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

Tackling the Reproducibility Problem in Systems Research with Declarative Experiment Specifications

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

EUDAT-B2FIND A FAIR and Interdisciplinary Discovery Portal for Research Data

Got Burst Buffer. Now What? Early experiences, exciting future possibilities, and what we need from the system to make it work

Science 2.0 VU Big Science, e-science and E- Infrastructures + Bibliometric Network Analysis

Hypergraph Exploitation for Data Sciences

SST + MacSim. Case Studies Using SST MacSim. Genie Hsieh Sandia National Labs

Ceph: A Scalable, High-Performance Distributed File System PRESENTED BY, NITHIN NAGARAJ KASHYAP

3) CHARLIE HULL. Implementing open source search for a major specialist recruiting firm

Dax: A Massively Threaded Visualiza5on and Analysis Toolkit for Extreme Scale

Securing Cloud Computing

Resilient Energy Solutions for Community Needs

Data Centres in the Virtual Observatory Age

DAOS and Friends: A Proposal for an Exascale Storage System

Basics of Data Management

FCP: A Fast and Scalable Data Copy Tool for High Performance Parallel File Systems

Procurement Guidance for Energy Storage Projects: Help with RFIs, RFQs and RFPs

Managing large-scale workflows with Pegasus

Campaign Storage. Peter Braam Co-founder & CEO Campaign Storage

Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy

SolidFire and Ceph Architectural Comparison

Aggregation of Real-Time System Monitoring Data for Analyzing Large-Scale Parallel and Distributed Computing Environments

NASA and International Open Standards and the Future of Space Weather Studies

CYBER ANALYTICS. Architecture Overview. Technical Brief. May 2016 novetta.com 2016, Novetta

Large Data Visualization

Oracle Fusion Middleware 11g: Build Applications with ADF I

MICRO-SEGMENTATION FOR CLOUD-SCALE SECURITY TECHNICAL WHITE PAPER

WHITE PAPER Software-Defined Storage IzumoFS with Cisco UCS and Cisco UCS Director Solutions

STFC Cloud Developments. Ian Collier & Alex Dibbo STFC Rutherford Appleton Laboratory UK Research & Innovation HEPiX Spring, Madison, WI 17 May 2018

EditShare XStream EFS Shared Storage

Performance Tools and Holistic HPC Workflows

Introduction to HPC Parallel I/O

Remote Direct Storage Management for Exa-Scale Storage

Caching and Buffering in HDF5

The Data exacell DXC. J. Ray Scott DXC PI May 17, 2016

Tintenfisch: File System Namespace Schemas and Generators

South American Astronomy Coordination Committee (SAACC) Meeting

Spark and HPC for High Energy Physics Data Analyses

FAST, FLEXIBLE, RELIABLE SEAMLESSLY ROUTING AND SECURING BILLIONS OF REQUESTS PER MONTH

On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

AutoPyFactory: A Scalable Flexible Pilot Factory Implementation

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

Working with Scientific Data in ArcGIS Platform

Navigational Data Management. Joshua Stillerman, Martin Greenwald, John Wright MIT Plasma Science and Fusion Center

Co-existence: Can Big Data and Big Computation Co-exist on the Same Systems?

Transcription:

EMPRESS Extensible Metadata PRovider for Extreme-scale Scientific Simulations Photos placed in horizontal position with even amount of white space between photos and header Margaret Lawson, Jay Lofstead, Scott Levy, Patrick Widener, Craig Ulmer, Shyamali Mukherjee, Gary Templet, Todd Kordenbrock SAND2017-12103 C Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy s National Nuclear Security Administration under contract DE-NA0003525.

Problems Faced Simulations with 100s TB per output, run every few minutes Ex. XGC1, Square Kilometer Array Radio Telescope (SKA) Storage devices too slow to sift through all output to find interesting data Scientists have specific data they want to retrieve Ex. blob in fusion reactor or a phenomenon in astronomy 2

Motivating Question How can we facilitate scientific discovery from simulations in the exascale age? 3

EMPRESS Solution Allow users to label data and retrieve data based on labels Features: Robust, standard per-process metadata User-created metadata that is fully customizable at runtime Programmatic query API to retrieve data contents based on metadata 4

Previous Solutions HDF5 and NetCDF rudimentary attribute capabilities, basic metadata ADIOS per-process metadata None of these address efficient attribute searching FastBit offers data querying based on values, but very limited support for spatial queries and attributes 5

Why not use a Key-Value Store? Custom keys can go a long way, but not far enough Two Problems: Inexact matches Custom Metadata Relational databases with indices are radically faster at searching like this 6

SIRIUS Architecture Applications I/O API SIRIUS Architecture Cross Layer Services Refactoring Storage and I/O System Services NVRAM Reduction Storage Resources (Ceph managed) PFS QoS Resource Management Data Placement & Movement Campaign Storage Migration Purging Other Plugins EMPRESS Long term storage Description of Data 7

SIRIUS Workflow Write Process EMPRESS Simulation Generate Tags Metadata + tags Lightweight Analysis Data Ceph 8

SIRIUS Workflow Read Process EMPRESS 2.Programmatic Query API User 1.Query ADIOS 3. Matching Object Names --- 6. Data 4. Object Names 5. Data Ceph 9

High Level Design Simulation Node Simulation EMPRESS Servers Simulation ADIOS EMPRESS API Ceph API Programmatic Query API 10

Faodail 11

Storage - Tracked Metadata Dataset information Application, run, and timestep information Variable information Catalogs types of data stored for an output operation Variable chunk information Subdivision of simulation space associated with a particular variable Custom metadata class Metadata category the user adds for a particular dataset Ex. Max Custom metadata instance Ex. Flag for chunk or a bounding box spanning chunks 12

Testing Goals Scalable? Number of client processes: 1024-2048 Effect of client to server ratio Ratios tested: 32:1 128:1 Overhead of including a large number of custom metadata items Number of custom metadata classes: 0 or 10 On average 2.641 custom metadata instances per chunk 13

Testing Goals (Continued) Proof of concept, can EMPRESS efficiently support: Common writing operations 2 datasets written, each with 10 globally distributed 3-D arrays Common reading operations 6 different read patterns that scientists frequently use (Lofstead, et al. Six Degrees of Scientific Data ) A broad range of custom metadata 10 custom metadata classes including max, flag, bounding box (two 3-D points) Scientific validity A minimum of 5 runs per configuration on 3 computing clusters: Serrano (total nodes: 1122) Skybridge (total nodes: 1848) Chama (total nodes: 1232) 14

Testing Query Times EMPRESS efficiently supports a wide variety of operations including custom metadata operations 15

Testing Chunk Retrieval Time Most time is spent waiting for the server to respond Room for improvement in the Faodail infrastructure 16

Testing Writing and Reading Time Good scalability for fixed client-server ratio No significant overhead for adding custom metadata Client-server ratio greatly affects performance 17

Future Work Increasing EMPRESS flexibility, efficiency, and scalability Support more queries Different metadata distribution? 18

Acknowledgements Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy s National Nuclear Security Administration under contract DE-NA0003525. This work was supported under the U.S. Department of Energy National Nuclear Security Agency ATDM project funding. This work was also supported by the U.S. Department of Energy Office of Science, under the SSIO grant series, SIRIUS project and the Data Management grant series, Decaf project, program manager Lucy Nowell. 19

20

21