EIOW Exa-scale I/O workgroup (exascale10)

Similar documents
Bridging the peta- to exa-scale I/O gap

Challenges in HPC I/O

Improved Solutions for I/O Provisioning and Application Acceleration

Bridging the peta- to exa-scale I/O gap

I/O Profiling Towards the Exascale

I/O at the Center for Information Services and High Performance Computing

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

Analyzing I/O Performance on a NEXTGenIO Class System

Co-existence: Can Big Data and Big Computation Co-exist on the Same Systems?

Parallel File Systems Compared

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

Fast Forward I/O & Storage

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

Improving overall Robinhood performance for use on large-scale deployments Colin Faber

Toward a Windows Native Client (WNC) Meghan McClelland LAD2013

Introduction The Project Lustre Architecture Performance Conclusion References. Lustre. Paul Bienkowski

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions

2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice

Campaign Storage. Peter Braam Co-founder & CEO Campaign Storage

JULEA: A Flexible Storage Framework for HPC

ECMWF's Next Generation IO for the IFS Model and Product Generation

Lustre overview and roadmap to Exascale computing

Lustre* - Fast Forward to Exascale High Performance Data Division. Eric Barton 18th April, 2013

Filesystems on SSCK's HP XC6000

Data Movement & Tiering with DMF 7

Barcelona Supercomputing Center

The DEEP-ER take on I/O

Percipient StorAGe for Exascale Data Centric Computing Computing for the Exascale

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

I/O at the German Climate Computing Center (DKRZ)

Lustre Beyond HPC. Presented to the Lustre* User Group Beijing October 2013

Extreme I/O Scaling with HDF5

NEXTGenIO Performance Tools for In-Memory I/O

Update of Post-K Development Yutaka Ishikawa RIKEN AICS

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

The ASCI/DOD Scalable I/O History and Strategy Run Time Systems and Scalable I/O Team Gary Grider CCN-8 Los Alamos National Laboratory LAUR

High Performance Data Analytics for Numerical Simulations. Bruno Raffin DataMove

Big Data Analytics. Rasoul Karimi

I/O Monitoring at JSC, SIONlib & Resiliency

HPC projects. Grischa Bolls

Modern Data Warehouse The New Approach to Azure BI

The Mont-Blanc approach towards Exascale

MOHA: Many-Task Computing Framework on Hadoop

System Software for Big Data and Post Petascale Computing

Bridging the Gap Between High Quality and High Performance for HPC Visualization

Fast Forward Storage & I/O. Jeff Layton (Eric Barton)

Parallel & Scalable Machine Learning Introduction to Machine Learning Algorithms

ECMWF s Next Generation IO for the IFS Model

LUG 2012 From Lustre 2.1 to Lustre HSM IFERC (Rokkasho, Japan)

Jessica Popp Director of Engineering, High Performance Data Division

Do You Know What Your I/O Is Doing? (and how to fix it?) William Gropp

Achieving the Potential of a Fully Distributed Storage System

Announcements. me your survey: See the Announcements page. Today. Reading. Take a break around 10:15am. Ack: Some figures are from Coulouris

Next Generation CEA Computing Centres

Parallel File Systems. John White Lawrence Berkeley National Lab

Introduction to HPC Parallel I/O

Alternatives for Improving OpenStack Networking to Address NFV Needs

NetApp: Solving I/O Challenges. Jeff Baxter February 2013

Storage for HPC, HPDA and Machine Learning (ML)

An Evolutionary Path to Object Storage Access

Kinetic Open Storage Platform: Enabling Break-through Economics in Scale-out Object Storage PRESENTATION TITLE GOES HERE Ali Fenn & James Hughes

IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage

ECONOMICAL, STORAGE PURPOSE-BUILT FOR THE EMERGING DATA CENTERS. By George Crump

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

Map-Reduce. Marco Mura 2010 March, 31th

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

Strategic Briefing Paper Big Data

Evaluation of Parallel I/O Performance and Energy with Frequency Scaling on Cray XC30 Suren Byna and Brian Austin

朱义普. Resolving High Performance Computing and Big Data Application Bottlenecks with Application-Defined Flash Acceleration. Director, North Asia, HPC

Data Analytics and Storage System (DASS) Mixing POSIX and Hadoop Architectures. 13 November 2016

Workshop: Innovation Procurement in Horizon 2020 PCP Contractors wanted

File Open, Close, and Flush Performance Issues in HDF5 Scot Breitenfeld John Mainzer Richard Warren 02/19/18

Integrating Analysis and Computation with Trios Services

Quobyte The Data Center File System QUOBYTE INC.

Enosis: Bridging the Semantic Gap between

XtreemFS a case for object-based storage in Grid data management. Jan Stender, Zuse Institute Berlin

HPSS Treefrog Summary MARCH 1, 2018

Open Source Storage. Ric Wheeler Architect & Senior Manager April 30, 2012

Let s Make Parallel File System More Parallel

Linux File Systems: Challenges and Futures Ric Wheeler Red Hat

Approaches to I/O Scalability Challenges in the ECMWF Forecasting System

Scalable I/O, File Systems, and Storage Networks R&D at Los Alamos LA-UR /2005. Gary Grider CCN-9

Extraordinary HPC file system solutions at KIT

The Impact of SSDs on Software Defined Storage

FastForward I/O and Storage: IOD M5 Demonstration (5.2, 5.3, 5.9, 5.10)

An Overview of Fujitsu s Lustre Based File System

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

SUSE. High Performance Computing. Eduardo Diaz. Alberto Esteban. PreSales SUSE Linux Enterprise

High Performance Storage Solutions

Open SFS Roadmap. Presented by David Dillow TWG Co-Chair

Lustre File System. Proseminar 2013 Ein-/Ausgabe - Stand der Wissenschaft Universität Hamburg. Paul Bienkowski Author. Michael Kuhn Supervisor

Storage in HPC: Scalable Scientific Data Management. Carlos Maltzahn IEEE Cluster 2011 Storage in HPC Panel 9/29/11

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein Sameh Elnikety. Copyright 2012 Philip A. Bernstein

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING

Science-as-a-Service

Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems

Using DDN IME for Harmonie

SDS: A Framework for Scientific Data Services

Transforming IT: From Silos To Services

Transcription:

EIOW Exa-scale I/O workgroup (exascale10) Meghan McClelland Peter Braam Lug 2013

Large scale data management is fundamentally broken but functions somewhat successfully as an awkward patchwork Current practices Problem Statement Future needs What is wrong with current approaches? What framework can be built to handle this? The Exa-scale IO Workgroup (EIOW) has has worked with application developers and storage experts and made exciting progress. 2

EIOW (exascale10) mission Let HPC application experts explain requirements for next generation storage Architect, design, implement an open source set of exascale I/O middleware So far around 40 participating organizations 3

EIOW Participants (apologies some probably omitted) University of Paderborn University of Mainz Barcelona Supercomputing (BSC) DDN Fujitsu TU Dresden University of Tsukuba Hamburg University TACC NCSA HDF group MPG/RZG Juelich Goethe Universitat Frankfurt ZIH DKRZ Netapp Tokyo Institute of Technology Micron Xyratex DSSD Sandia PNNL Cray DOE PSC LRZ HLRS CEA T-Platforms Partec-EOFS STFC Intel NEC 4

EIOW is an open effort European Open File System (EOFS) supported workgroup since inception A core EOFS project (like Lustre) since Sep 2012 Everything is being published on the web And actively being copied and amended We will move in the direction of Internet Engineering Task Force (IETF) style controlled openness 5

EIOW intends to be a ubiquitous middleware An agreed, eventually standardized API for applications & data management We hope to be an implementation of choice for researchers to study, amend, influence and change Such research projects are now numerous A storage access API allowing storage vendors to bolt it onto their favorite data object and metadata stores 6

Application Middleware issues There are 100 s of middleware packages, sometimes layered IO Middleware Layers (eg HDF5) MPI - IO Parallel File System RAID Block Storage API Application developers regard them as very useful and convenient They generally are very difficult to get working well This is not ready for future hardware The stack isn t working well 7

Middleware issues Proliferation of middleware packages 100 s Many with a great deal of overlap MPI-IO, PLFS, HDF5, NetCDF, Hercule,. Many have strengths and weaknesses E.g. HDF5 is very highly regarded Because there is no stack they are nearly impossible to debug They re-implement major parts of file systems Leads to inefficiencies, incorrectness, huge code bases Nearly impossible to define HA properly Neither file systems nor middleware are ready for new hardware particularly memory class storage 8

10PF handled by large (mostly Lustre) storage systems 1TB/sec several billions of files 10PF 100PF 1EF 100PF Flash cache approach 10 TB/sec Flash takes the bursts / Disks more continuously used Takes ~ 20,000 disks (0.5MW / lots of heat / lots of failed drives) Probably a metadata server becomes a scalability limit 1EF the gap The paradigm appears to break: 100K drives is not acceptable Most data can no longer make it to disks What data management can help? 9

Big Data in the Military Warfighter Problem Future Sensor Z 10 18 GLOBAL HAWK DATA PREDATOR UAV VIDEO Future Sensor Y Future Sensor X Notional Gap UUVs FIRESCOUT VTUAV DATA GIG Data Capacity (Services, Transport & Storage) Large Data JCTD 10 15 10 12 2000 Today 2010 2015 & Beyond 10

Future Needs Technology revolutions File system clients will have ~10,000 cores Architectures will be heterogeneous Flash and/or PCM storage leads to tiered storage Anti revolution disks will only be a bit faster than today Tiered storage, in part memory class storage Data management to move less data to drives Scale performance 100x from today 11

Pre-SQL (1972) databases were in this situation We need manageable API s for unstructured data EIOW is an emerging framework Providing rich I/O and management interfaces Platform to build layered I/O applications efficiently, correctly E.g. HDF5 metadata without layering it on other file systems Logging and analytics through the stack Transactions, data integrity through the stack Not a 1980 s approach to availability What we ve seen is that most requirements can be addressed as adding plugins to a base system 12

Component Decomposition Applica5ons / Tools / Job Scheduler API s for I/O, schema use & definition, management, ILM Storage API: Objects & Distributed DB Schema & Distributed Containers Internal format descrip5ons Tables Rela5onships Layout, performance, ILM Data collec/on & reduc/on OL editor Simula/on Drivers Simula/on Engine Sample Schemas NetCDF, HDf5, POSIX, Hadoop Configura/on Management OL, AD Visualiza/on Interface Collec/ons I/O, MD, transac5ons Schema Resource mgmt FDMI Availability (FDMI plugin) Sample FDMI Plugins HSM, Migra/on, ILM, FSCK, Search/IDX Interfaces RAS Adaptive API s Machine Learning (real /me) Simula'on and modeling: Batched or delayed processing Input is opera5on logs (OL) and analysis data (AD) 13

Non Blocking Availability Failures will be common in very large systems Failover Wait until resource recovers Doesn t work well Instead: focus on availability No reply: change resource Adapt layout Asynchronous cleanup C2 DS1 C2 DS1 New Layout write Client C2 DS2 Client MDS Client C2 DS2 1. Client sees failure C2 DS3 C2 DS3 C2 DS3 2. Client changes layout 3. Client writes using new layout C2 DS3 Possible asynchronous rebuild 14

HA 15

Workshops Requirements Gathering 1 st workshop (Munich 02/12) 2 nd workshop (Portland 4/12) 3 rd workshop (Tokyo 5/12) Architectural Design, Funding 4 th workshop (Barcelona 9/12) Alternative Approaches 5 th workshop (Salt Lake City 11/12) Design Discussion of Code Components 6 th workshop (San Jose 2/13) Next workshop Leipzig Germany June 20 th 2013 Implementation Level Design, Future Efforts 16

Current Efforts Community phone calls, new web site Prototype code is being developed Core system (schemas, interfaces, HA) Simulation / monitoring Evaluate ideas with prototypes Research proposals Evaluation in next generation systems 17

Conclusion A framework like SQL for HPC data / big data is 40 years overdue We aim to change that. 18

Thank You Meghan_Mcclelland@xyratex.com EIOW Website: https://sites.google.com/a/eiow.org/exascale-io-workgroup/