Abstract Storage Moving file format specific abstrac7ons into petabyte scale storage systems. Joe Buck, Noah Watkins, Carlos Maltzahn & ScoD Brandt

Similar documents
A Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System

Storage in HPC: Scalable Scientific Data Management. Carlos Maltzahn IEEE Cluster 2011 Storage in HPC Panel 9/29/11

Implementing MPI on Windows: Comparison with Common Approaches on Unix

Jialin Liu, Evan Racah, Quincey Koziol, Richard Shane Canon, Alex Gittens, Lisa Gerhardt, Suren Byna, Mike F. Ringenburg, Prabhat

Super Instruction Architecture for Heterogeneous Systems. Victor Lotric, Nakul Jindal, Erik Deumens, Rod Bartlett, Beverly Sanders

Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis

Fault Tolerant Runtime ANL. Wesley Bland Joint Lab for Petascale Compu9ng Workshop November 26, 2013

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Crossing the Chasm: Sneaking a parallel file system into Hadoop

An Evolutionary Path to Object Storage Access

BLAS. Basic Linear Algebra Subprograms

ECSE 425 Lecture 25: Mul1- threading

MegaPipe: A New Programming Interface for Scalable Network I/O

Developing Integrated Data Services for Cray Systems with a Gemini Interconnect

PyTables. An on- disk binary data container. Francesc Alted. May 9 th 2012, Aus=n Python meetup

OPENFABRICS INTERFACES: PAST, PRESENT, AND FUTURE

Ceph: A Scalable, High-Performance Distributed File System PRESENTED BY, NITHIN NAGARAJ KASHYAP

Latest Trends in Database Technology NoSQL and Beyond

Op#mizing MapReduce for Highly- Distributed Environments

BigDataBench- S: An Open- source Scien6fic Big Data Benchmark Suite

Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects?

Fluxo. Improving the Responsiveness of Internet Services with Automa7c Cache Placement

CLOUD SERVICES. Cloud Value Assessment.

CSE 306: Opera.ng Systems Opera.ng Systems History and Overview

Opera.ng Systems History and Overview

OS History and OS Structures

CS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University

Database Machine Administration v/s Database Administration: Similarities and Differences

Ways to implement a language

Informa)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies

Oracle VM Workshop Applica>on Driven Virtualiza>on

MLlib. Distributed Machine Learning on. Evan Sparks. UC Berkeley

RouteBricks: Exploi2ng Parallelism to Scale So9ware Routers

CS 465 Final Review. Fall 2017 Prof. Daniel Menasce

Op#mizing PGAS overhead in a mul#-locale Chapel implementa#on of CoMD

Unstructured Finite Volume Code on a Cluster with Mul6ple GPUs per Node

DataMods: Programmable File System Services

Parallel programming with Java Slides 1: Introduc:on. Michelle Ku=el August/September 2012 (lectures will be recorded)

The Design and Implementation of AQuA: An Adaptive Quality of Service Aware Object-Based Storage Device

MapReduce. Tom Anderson

SEDA An architecture for Well Condi6oned, scalable Internet Services

Network Coding: Theory and Applica7ons

Next hop in rou-ng Summary of Future Internet WP1 work. Hannu Flinck

Architecture of So-ware Systems Massively Distributed Architectures Reliability, Failover and failures. Mar>n Rehák

MapReduce. Cloud Computing COMP / ECPE 293A

Execu&on Templates: Caching Control Plane Decisions for Strong Scaling of Data Analy&cs

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

M 2 R: Enabling Stronger Privacy in MapReduce Computa;on

Hobbes: Composi,on and Virtualiza,on as the Founda,ons of an Extreme- Scale OS/R

Introduc)on to SDN and NFV. Tomás Lynch Solu/on Architect III Ericsson

Instructor: Randy H. Katz hbp://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #13. Warehouse Scale Computer

Simulation-time data analysis and I/O acceleration at extreme scale with GLEAN

SciHadoop: Array Based Query Processing in Hadoop

NoSQL data stores and SOS: Uniform Access to Non-Relational Database Systems Paolo Atzeni Francesca Bugiotti Luca Rossi

1/10/16. RPC and Clocks. Tom Anderson. Last Time. Synchroniza>on RPC. Lab 1 RPC

MPICH: A High-Performance Open-Source MPI Implementation. SC11 Birds of a Feather Session

ECS 165B: Database System Implementa6on Lecture 14

A One-Sided View of HPC: Global-View Models and Portable Runtime Systems

DataMods. Programmable File System Services. Noah Watkins

Ceph Rados Gateway. Orit Wasserman Fosdem 2016

Implemen'ng IPv6 Segment Rou'ng in the Linux Kernel

High Performance Data Analytics for Numerical Simulations. Bruno Raffin DataMove

Concurrency-Optimized I/O For Visualizing HPC Simulations: An Approach Using Dedicated I/O Cores

Introduction to Python. Fang (Cherry) Liu Ph.D. Scien5fic Compu5ng Consultant PACE GATECH

Redefining Personal with

Fix- point engine in Z3. Krystof Hoder Nikolaj Bjorner Leonardo de Moura

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Replicate and Migrate Objects in the Run5me not Cache Lines or Pages in Hardware

Outline. Spanner Mo/va/on. Tom Anderson

MPI & OpenMP Mixed Hybrid Programming

Instruc=on Set Architecture

Dynamic Languages. CSE 501 Spring 15. With materials adopted from John Mitchell

OPTIMAL ROUTING VS. ROUTE REFLECTOR VNF - RECONCILE THE FIRE WITH WATER

Near- Data Computa.on: It s Not (Just) About Performance

Ceph: A Scalable, High-Performance Distributed File System

FPGA for Dummies. Introduc)on to Programmable Logic

Soft GPGPUs for Embedded FPGAS: An Architectural Evaluation

The Fusion Distributed File System

CSE Opera,ng System Principles

virtual machine block storage with the ceph distributed storage system sage weil xensummit august 28, 2012

PyTables. An on- disk binary data container, query engine and computa:onal kernel. Francesc Alted

6.888 Lecture 8: Networking for Data Analy9cs

What's new in Jewel for RADOS? SAMUEL JUST 2015 VAULT

an Object-Based File System for Large-Scale Federated IT Infrastructures

SoftFlash: Programmable Storage in Future Data Centers Jae Do Researcher, Microsoft Research

Welcome to CS 449: Introduc3on to System So6ware. Instructor: Wonsun Ahn

CS-580K/480K Advanced Topics in Cloud Computing. Object Storage

Technical Deep-Dive in a Column-Oriented In-Memory Database

Model Driven APIs for the Network Infrastructure Layer

Lecture 8: Memory Management

Objec&ves. Packages Collec&ons Generics. Sept 28, 2016 Sprenkle - CSCI209 1

CCW Workshop Technical Session on Mobile Cloud Compu<ng

PhD in Computer And Control Engineering XXVII cycle. Torino February 27th, 2015.

Today s Objec4ves. Data Center. Virtualiza4on Cloud Compu4ng Amazon Web Services. What did you think? 10/23/17. Oct 23, 2017 Sprenkle - CSCI325

Introduction to High Performance Parallel I/O

Redrawing the Boundary Between So3ware and Storage for Fast Non- Vola;le Memories

Search Engines. Informa1on Retrieval in Prac1ce. Annotations by Michael L. Nelson

An Introduction to GPFS

A Plugin-based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS

! Readings! ! Room-level, on-chip! vs.!

Transcription:

Abstract Storage Moving file format specific abstrac7ons into petabyte scale storage systems Joe Buck, Noah Watkins, Carlos Maltzahn & ScoD Brandt

Introduc7on Current HPC environment separates computa7on from storage Tradi7onal focus on computa7on, not I/O Applica7ons require I/O architecture independence Many scien7fic applica7ons are data intensive Performance increasingly limited by datamovement

HPC Architecture Diagram courtesy of Rob Ross, Argonne Na7onal Laboratory

HPC Architecture Current bodleneck in the controllers HW bodleneck Diagram courtesy of Rob Ross, Argonne Na7onal Laboratory

HPC Architecture Future bodleneck: I/O nodes / storage nodes network HW bodleneck Diagram courtesy of Rob Ross, Argonne Na7onal Laboratory

Approach: Move func7ons closer to data Use spare CPU cycles at intelligent storage nodes Replace communica7on with CPU cycles Provide storage interfaces with higher abstrac7ons Enable file system op7miza7ons due to knowledge of data structure Do this for small selec7on of data structures This is not another object oriented database!

Why Now? Parallel file systems move more intelligence into storage nodes anyways Advances in performance management and virtualiza7on Moving bytes slated to be a dominant cost in exa scale systems Scien7fic file formats and operators increasingly standard NetCDF, HDF Structured abstrac7ons have seen recent success BigTable, MapReduce CouchDB

Abstract Storage Storage as an Abstract Data Type ADT decouples interface from implementa7on Only few ADTs necessary, e.g.: Dic7onary (Key/value pairs) Hypercube (Coordinate Systems) Queue Op7mize each one for each parallel architecture Data placement Performance management Buffer cache management (incl. pre fetching) Coherence

ADTs and Scien7fic Data Scien7fic data is normally mul7 dimensional, lending itself well to this approach Mul7 dimensional and hierarchical structures are readily mapped onto data types Mul7ple structures mapped onto (por7ons) of the same data for more efficient access Operate on the appropriate structure (matrix, row, element, etc)

Implementa7on Challenges Programming model for implemen7ng ADTs Everything based on byte streams Current storage APIs (e.g. POSIX) Current file system subsystems Buffer cache Striping strategies Storage node interfaces Need awareness of structured data New interfaces at various storage layers

Prototype: Ceph Doodle Focus: Programming model for implemen7ng ADTs Construc7on and test framework for: Storage abstrac7ons ADT implementa7ons Programming models (flexibility, ease of use) Based on object based parallel file system architecture (e.g. Ceph).

Ceph Doodle Features Rapid prototyping: Uses RPC mechanism WriDen in Python Support for plugins for different ADTs Byte stream (implemented as storage objects) Dic7onary (implemented as skip lists)

Ceph Doodle Overview Client Applica7on ADT Opera7on( ) Clients use applica7on specific interfaces Data Type Client ADT Opera7on( ) RPC_X(Op, ObjID, Context) RPC_Y(Op, ObjID, Context) RPC_Z(Op, ObjID, Context) RPC to OSD With Object Striping & Caching Strategy Data types are cross cufng system modules Striping and caching are op7mized per data type OSD RPC ADT Opera7on(Object, Context) Mappings route ADT RPCs to storage nodes

Dic7onary Implementa7on: Skip lists 4 3 2 1 0.head 9 23 1024 1025.tail

Splifng skip lists across nodes 4 3 2 1 0.head 9 23 1024 1025.tail

Future Work Building on top of Ceph New dynamically loadable object libraries Redesigning caching Data structure boundary aware v.s. pages Pre fetching = access paderns = ADT parameters Rethinking striping strategies Unified views supported by virtual ADT layer Embedding versioning and provenance capturing into file system

Thank you buck@cs.ucsc.edu