DeclStore: Layering is for the Faint of Heart

Size: px
Start display at page:

Download "DeclStore: Layering is for the Faint of Heart"

Transcription

1 DeclStore: Layering is for the Faint of Heart [HotStorage, July 2017] Noah Watkins, Michael Sevilla, Ivo Jimenez, Kathryn Dahlgren, Peter Alvaro, Shel Finkelstein, Carlos Maltzahn

2 Layers on layers on layers Application Special App Object Block File Logging Key-value Special App Middleware Storage Interfaces POSIX Fancy Storage Unified Storage 2

3 Application-specific storage systems Application Special App Object Block File Logging Key-value Special App Middleware Storage Interfaces POSIX Fancy Storage Unified Storage 3

4 Unified storage systems Application Special App Object Block File Logging Key-value Special App Middleware POSIX Examples OnTap Fancy Storage Swift Ceph GPFS EMC Storage Interfaces Unified Storage 4

5 Eliminating layers: big challenges, big rewards Big Question If unified storage becomes the new normal, what new challenges will need to be addressed in system design? Talk Outline Object Block File Logging Key-value Special App Storage Interfaces Unified Storage Motivating unified storage Storage interface design space Declarative storage interfaces 5

6 Motivation: systems trend towards generality Technology Trend Example Specialized unstructured data management systems JSON data type in relational database management systems Specialized real-time and embedded systems Embedded Linux and RT_PREEMPT Specialized storage systems Embedded Ceph and unified system 6

7 Motivation: breaking the narrow waist model User Defined Classes Examples Locking, Logging, Concurrency control, Metadata management Remote compute Developers willing to break layers and use non-standard APIs 7

8 Why in practice do people break down layers? Application I in AP e Cod and h t pamiddleware POSIX ef cies n e fici Special App es, rvic risks e s f t, n o, cos o i t lica ioning p u D rovis r-p ove Fancy Storage Object Block File Logging Key-value Special App Storage Interfaces Unified Storage 8

9 Unified storage has its own set of challenges Special App es, rvic risks e s f t, n o, cos o i t lica ioning p u D rovis r-p ove Logging Key-value Special App ty afe ies) S, os ficienc Q, f ility th ine b a t Por ode pa c s Storage u l Interfaces p ( Fancy Storage Unified Storage Application I in AP e Cod and h t pamiddleware POSIX ef cies n e fici Object Block File 9

10 Exploring design challenges in unified storage Application ies e Cod PI i A and h t a pmiddleware POSIX c cien i f f ne, Special App ices sks v r e ri of s cost, n atio ning, c i l Dup rovisio r-p ove Fancy Storage Object Block File Logging Key-value Special App Storage Interfaces Unified Storage 10

11 Malacology: programmable interface research platform Target storage interface (Goal) Composed, generic service glue layer Internal sys services (Building blocks) Malacology: A Programmable Storage System, M. Sevilla, N. Watkins, I. Jimenez, P. Alvaro, S. Finkelstein, J. LeFevre, C. Maltzahn, EuroSys 17 Mantle: A Programmable Metadata Load Balancer for the Ceph File System, M. Sevilla, N. Watkins, C. Maltzahn, I. Nassi, S. Brandt, et. al, SC 15 CORFU: A Shared Log Design for Flash Clusters, Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, and Ted Wobber, Michael Wei, et. al, NSDI 12 11

12 Keeping pace with evolving storage systems Hard-wired prototype implementations How to take advantage of... Software upgrades New hardware Additional services Performance features 12

13 Example: building the CORFU shared-log interface CORFU1 is a dist. shared-log High-performance design ?? Append and random reads I/O parallelism (striping) Soft-state network counter Challenges Find a good mapping Performance optimizations Minimal maintenance CORFU: A Shared Log Design for Flash Clusters, Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, and Ted Wobber, Michael Wei, et. al, NSDI 12 13

14 Example: building CORFU distributed shared-log Implementation options Partitioning Log partitioning?: 1-1, striping, etc... 14

15 Example: building CORFU distributed shared-log Implementation options Partitioning Metadata Log partitioning?: 1-1, striping, etc... 15

16 Example: building CORFU distributed shared-log Implementation options 0 Partitioning Metadata I/O interfaces Log partitioning?: 1-1, striping, etc... ByteStream, K/V FS (xfs, btrfs) BlueStore K/V (Rocks, LMDB) 16

17 Example: building CORFU distributed shared-log Implementation options 0 Partitioning Metadata I/O interfaces Log partitioning?: 1-1, striping, etc... 2x Constructed 4 versions 2 partitioning * 2 I/O interfaces Partitioning (1-1, striping) I/O interface (bytestream, K/V) ByteStream, K/V FS (xfs, btrfs) 2x BlueStore K/V (Rocks, LMDB) 17

18 Example: building CORFU distributed shared-log Implementation options Partitioning Metadata I/O interfaces Log partitioning?: 1-1, striping, etc... 2x Constructed 4 versions 0 2 partitioning * 2 I/O interfaces Partitioning (1-1, striping) I/O interface (bytestream, K/V) Hard-wired implementations Few 1000 s LOC Manually optimize and switch between approaches ByteStream, K/V FS (xfs, btrfs) 2x BlueStore K/V (Rocks, LMDB) 18

19 Example: shared-log performance toss-up in 2014 Four implementations Ceph version circa 2014 Graph takeaways Similar top performers Our claim Clear performance losers Performance Comparison of 4 Designs Appends / Sec Ceph 2014 Select simpler implementation Added complexity for no benefit What is complexity? Lines of code Conceptual 19

20 Example: clear design choice in 2016 Takeaway: A reasonable choice in 2014 would be a poor choice in 2016 after a simple upgrade Performance Comparison of 4 Designs Appends / Sec Same implementations Same hardware / benchmark Newer version of Ceph Clear performance winner in 2016 Ceph 2014 Appends / Sec Ceph

21 Problem: navigation of large design space interfaces / data models you are here heterogeneous hardware / memories features & tunables (>1000 in Ceph) time / versions 21

22 DeclStore: express storage interfaces declaratively Freedom from storage system and domain expertise Abstractions over storage services and interfaces The point is we need a high-level language Many possible mappings Many degrees of freedom Generate storage interface implementations Exploit relational database optimization research 22

23 DeclStore: building on a strong foundation Based on Bloom programming language High-level declarative language (Alvaro, CIDR `11) Many degrees of freedom on reordering Abstract relational data model Programs are queries Subject to optimization Evidence this is possible... LogicBlox (Aref, SIGMOD 2015) Dedalus (Alvaro, et. al, Datalog 2010) Declarative Networking (Loo, SIGMOD 2006) 23

24 Example: a declarative specification for CORFU System state modeled as a set of relations Interfaces are queries over a request stream Persistent state Input/Output Operate on persistent state Collection operations Full specification Just another slide s worth Compare 1K s LOC C++ # persistent relations table :epoch, [:epoch] table :log, [:pos] => [:state, :data] # interfaces are also like tables interface input, :op, [:type, :pos, :epoch] => [:data] interface output, :ret, [:type, :pos, :epoch] => [:retval] bloom :write do temp :valid_write <= write_op.notin(found_op) log <+ valid_write { o [o.pos, 'valid', o.data] } ret <= valid_write { o [o.type, o.pos, o.epoch, 'ok'] } ret <= write_op.notin(valid_write) { o [o.type, o.pos, o.epoch, 'read-only'] } end Brados: Declarative, Programmable Object Storage, Noah Watkins, Michael Sevilla, Ivo Jimenez, Neha Ojha, Peter Alvaro, Carlos Maltzahn, UCSC-SOE

25 What s the catch? This is a storage system, not a database... Generality introduces overhead Additional complexity and layers We don t want layers! The cost of online optimization 25

26 Taking advantage of time scales: offline optimization We are generating storage service implementations Automate integration on new features and hardware Planned upgrade points v.s. per-request optimization DeclStore Interface Definition System Upgrade Point Opt. Timescale (Month / Year) System Upgrade Point Opt. System Upgrade Point Request (< 1ms) 26

27 Example: a declarative specification for CORFU No hard-wired choice of storage interface! Bytestream vs K/V Partitioning strategy Take advantage of existing analysis tools and techniques! Convincingly correct! # persistent relations table :epoch, [:epoch] table :log, [:pos] => [:state, :data] # interfaces are also like tables interface input, :op, [:type, :pos, :epoch] => [:data] interface output, :ret, [:type, :pos, :epoch] => [:retval] bloom :write do temp :valid_write <= write_op.notin(found_op) log <+ valid_write { o [o.pos, 'valid', o.data] } ret <= valid_write { o [o.type, o.pos, o.epoch, 'ok'] } ret <= write_op.notin(valid_write) { o [o.type, o.pos, o.epoch, 'read-only'] } end 27

28 Example: performance benefit of group commit Ceph I/O handling is conservative Log operations are independent Lots of internal code traversal Using efficient interfaces Determined through analysis Amortize across I/O and code path Basic Operation Batching Queuing, locking, ordering Vector or range-based interfaces See paper for more details Cost models and outliers 28

29 Conclusion: expanding the set of storage interfaces... is becoming increasingly common will create an entirely new set of challenges; we showed new systems being built old systems being adapted hard-wired solutions are difficult to maintain even a basic software upgrade can have a big impact through declarative specifications can address many concerns lower maintenance costs automate system evolution 29

30 Thank you Questions? DeclStore: Layering is for the Faint of Heart Noah Watkins, Michael Sevilla, Ivo Jimenez, Kathryn Dahlgren, Peter Alvaro, Shel Finkelstein, Carlos Maltzahn

31 What else we are shooting for... Volatile views of persistent data Physical design migration Example: soft-state network counter in CORFU Example: indexes, row vs. column. vs. hybrid Program analysis and optimization Example: exploit non-interference 31

32 Cost models are an important component Batching with range operations Sensitive to outliers Wasted I/O Simple heuristics for identifying outliers outperform Basic-Batch Order of magnitude > IOPS Minimal reduction over Opt-Batch Cost models needed for automatic strategy selection orthogonal to program analysis 32

Malacology. A Programmable Storage System [Sevilla et al. EuroSys '17]

Malacology. A Programmable Storage System [Sevilla et al. EuroSys '17] Malacology A Programmable Storage System [Sevilla et al. EuroSys '17] Michael A. Sevilla, Noah Watkins, Ivo Jimenez, Peter Alvaro, Shel Finkelstein, Jeff LeFevre, Carlos Maltzahn University of California,

More information

DeclStore: Layering is for the Faint of Heart

DeclStore: Layering is for the Faint of Heart DeclStore: Layering is for the Faint of Heart Noah Watkins, Michael A. Sevilla, Ivo Jimenez, Kathryn Dahlgren, Peter Alvaro, Shel Finkelstein, Carlos Maltzahn The University of California, Santa Cruz {jayhawk,

More information

Curriculum Vitae. April 5, 2018

Curriculum Vitae. April 5, 2018 Curriculum Vitae April 5, 2018 Michael A. Sevilla mikesevilla3@gmail.com website: users.soe.ucsc.edu/~msevilla 127 Storey St., Santa Cruz, CA 95060 code: github.com/michaelsevilla mobile: (858) 449-3086

More information

Brados: Declarative, Programmable Object Storage

Brados: Declarative, Programmable Object Storage Brados: Declarative, Programmable Object Storage Noah Watkins, Michael Sevilla, Ivo Jimenez, Neha Ojha, Peter Alvaro, Carlos Maltzahn University of California, Santa Cruz Submission Type: Research Abstract

More information

CORFU: A Shared Log Design for Flash Clusters

CORFU: A Shared Log Design for Flash Clusters CORFU: A Shared Log Design for Flash Clusters Authors: Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, Ted Wobber, Michael Wei, John D. Davis EECS 591 11/7/18 Presented by Evan Agattas and Fanzhong

More information

Tintenfisch: File System Namespace Schemas and Generators

Tintenfisch: File System Namespace Schemas and Generators System Namespace, Reza Nasirigerdeh, Carlos Maltzahn, Jeff LeFevre, Noah Watkins, Peter Alvaro, Margaret Lawson*, Jay Lofstead*, Jim Pivarski^ UC Santa Cruz, *Sandia National Laboratories, ^Princeton University

More information

Malacology: A Programmable Storage System

Malacology: A Programmable Storage System To appear in EuroSys 17 Malacology: A Programmable Storage System Michael A. Sevilla, Noah Watkins, Ivo Jimenez, Peter Alvaro, Shel Finkelstein, Jeff LeFevre, Carlos Maltzahn University of California,

More information

A Design for Networked Flash

A Design for Networked Flash A Design for Networked Flash (Clusters Of Raw Flash Units) Mahesh Balakrishnan, John Davis, Dahlia Malkhi, Vijayan Prabhakaran, Michael Wei*, Ted Wobber Microso- Research Silicon Valley * Graduate student

More information

An Introduction to Big Data Formats

An Introduction to Big Data Formats Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION

More information

Cudele: An API and Framework for Programmable Consistency and Durability in a Global Namespace

Cudele: An API and Framework for Programmable Consistency and Durability in a Global Namespace Cudele: An API and Framework for Programmable Consistency and Durability in a Global Namespace Michael A. Sevilla, Ivo Jimenez, Noah Watkins, Jeff LeFevre Peter Alvaro, Shel Finkelstein, Patrick Donnelly*,

More information

Distributed Consensus Paxos

Distributed Consensus Paxos Distributed Consensus Paxos Ethan Cecchetti October 18, 2016 CS6410 Some structure taken from Robert Burgess s 2009 slides on this topic State Machine Replication (SMR) View a server as a state machine.

More information

Lustre A Platform for Intelligent Scale-Out Storage

Lustre A Platform for Intelligent Scale-Out Storage Lustre A Platform for Intelligent Scale-Out Storage Rumi Zahir, rumi. May 2003 rumi.zahir@intel.com Agenda Problem Statement Trends & Current Data Center Storage Architectures The Lustre File System Project

More information

Storage in HPC: Scalable Scientific Data Management. Carlos Maltzahn IEEE Cluster 2011 Storage in HPC Panel 9/29/11

Storage in HPC: Scalable Scientific Data Management. Carlos Maltzahn IEEE Cluster 2011 Storage in HPC Panel 9/29/11 Storage in HPC: Scalable Scientific Data Management Carlos Maltzahn IEEE Cluster 2011 Storage in HPC Panel 9/29/11 Who am I? Systems Research Lab (SRL), UC Santa Cruz LANL/UCSC Institute for Scalable Scientific

More information

Tackling the Reproducibility Problem in Systems Research with Declarative Experiment Specifications

Tackling the Reproducibility Problem in Systems Research with Declarative Experiment Specifications Tackling the Reproducibility Problem in Systems Research with Declarative Experiment Specifications Ivo Jimenez, Carlos Maltzahn (UCSC) Adam Moody, Kathryn Mohror (LLNL) Jay Lofstead (Sandia) Andrea Arpaci-Dusseau,

More information

Balancing Fairness and Efficiency in Tiered Storage Systems with Bottleneck-Aware Allocation

Balancing Fairness and Efficiency in Tiered Storage Systems with Bottleneck-Aware Allocation Balancing Fairness and Efficiency in Tiered Storage Systems with Bottleneck-Aware Allocation Hui Wang, Peter Varman Rice University FAST 14, Feb 2014 Tiered Storage Tiered storage: HDs and SSDs q Advantages:

More information

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

MATE-EC2: A Middleware for Processing Data with Amazon Web Services MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering

More information

Using Lua in the Ceph distributed storage system

Using Lua in the Ceph distributed storage system Using Lua in the Ceph distributed storage system Noah Watkins Michael Sevilla noahwatkins@gmail.com @noahdesu mikesevilla3@gmail.com Lua Workshop, October 2017, San Francisco yet another Lua embedding

More information

EECS 482 Introduction to Operating Systems

EECS 482 Introduction to Operating Systems EECS 482 Introduction to Operating Systems Winter 2018 Baris Kasikci Slides by: Harsha V. Madhyastha OS Abstractions Applications Threads File system Virtual memory Operating System Next few lectures:

More information

Enosis: Bridging the Semantic Gap between

Enosis: Bridging the Semantic Gap between Enosis: Bridging the Semantic Gap between File-based and Object-based Data Models Anthony Kougkas - akougkas@hawk.iit.edu, Hariharan Devarajan, Xian-He Sun Outline Introduction Background Approach Evaluation

More information

ROCK INK PAPER COMPUTER

ROCK INK PAPER COMPUTER Introduction to Ceph and Architectural Overview Federico Lucifredi Product Management Director, Ceph Storage Boston, December 16th, 2015 CLOUD SERVICES COMPUTE NETWORK STORAGE the future of storage 2 ROCK

More information

Solving Linux File System Pain Points. Steve French Samba Team & Linux Kernel CIFS VFS maintainer Principal Software Engineer Azure Storage

Solving Linux File System Pain Points. Steve French Samba Team & Linux Kernel CIFS VFS maintainer Principal Software Engineer Azure Storage Solving Linux File System Pain Points Steve French Samba Team & Linux Kernel CIFS VFS maintainer Principal Software Engineer Azure Storage Legal Statement This work represents the views of the author(s)

More information

What's new in Jewel for RADOS? SAMUEL JUST 2015 VAULT

What's new in Jewel for RADOS? SAMUEL JUST 2015 VAULT What's new in Jewel for RADOS? SAMUEL JUST 2015 VAULT QUICK PRIMER ON CEPH AND RADOS CEPH MOTIVATING PRINCIPLES All components must scale horizontally There can be no single point of failure The solution

More information

Techniques to improve the scalability of Checkpoint-Restart

Techniques to improve the scalability of Checkpoint-Restart Techniques to improve the scalability of Checkpoint-Restart Bogdan Nicolae Exascale Systems Group IBM Research Ireland 1 Outline A few words about the lab and team Challenges of Exascale A case for Checkpoint-Restart

More information

A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd.

A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd. A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd. 1 Agenda Introduction Background and Motivation Hybrid Key-Value Data Store Architecture Overview Design details Performance

More information

Architectural challenges for building a low latency, scalable multi-tenant data warehouse

Architectural challenges for building a low latency, scalable multi-tenant data warehouse Architectural challenges for building a low latency, scalable multi-tenant data warehouse Mataprasad Agrawal Solutions Architect, Services CTO 2017 Persistent Systems Ltd. All rights reserved. Our analytics

More information

CS-580K/480K Advanced Topics in Cloud Computing. Object Storage

CS-580K/480K Advanced Topics in Cloud Computing. Object Storage CS-580K/480K Advanced Topics in Cloud Computing Object Storage 1 When we use object storage When we check Facebook, twitter Gmail Docs on DropBox Check share point Take pictures with Instagram 2 Object

More information

Data Consistency with SPLICE Middleware. Leslie Madden Chad Offenbacker Naval Surface Warfare Center Dahlgren Division

Data Consistency with SPLICE Middleware. Leslie Madden Chad Offenbacker Naval Surface Warfare Center Dahlgren Division Data Consistency with SPLICE Middleware Leslie Madden Chad Offenbacker Naval Surface Warfare Center Dahlgren Division Slide 1 6/30/2005 Background! US Navy Combat Systems are evolving to distributed systems

More information

Using Transparent Compression to Improve SSD-based I/O Caches

Using Transparent Compression to Improve SSD-based I/O Caches Using Transparent Compression to Improve SSD-based I/O Caches Thanos Makatos, Yannis Klonatos, Manolis Marazakis, Michail D. Flouris, and Angelos Bilas {mcatos,klonatos,maraz,flouris,bilas}@ics.forth.gr

More information

Practical MySQL Performance Optimization. Peter Zaitsev, CEO, Percona July 02, 2015 Percona Technical Webinars

Practical MySQL Performance Optimization. Peter Zaitsev, CEO, Percona July 02, 2015 Percona Technical Webinars Practical MySQL Performance Optimization Peter Zaitsev, CEO, Percona July 02, 2015 Percona Technical Webinars In This Presentation We ll Look at how to approach Performance Optimization Discuss Practical

More information

The Case for Programmable Object Storage Systems

The Case for Programmable Object Storage Systems The Case for Programmable Object Storage Systems Noah Watkins Michael Sevilla Carlos Maltzahn University of California, Santa Cruz {jayhawk,msevilla,carlosm}@soe.ucsc.edu Abstract As applications scale

More information

Tackling the Reproducibility Problem in Systems Research with Declarative Experiment Specifications

Tackling the Reproducibility Problem in Systems Research with Declarative Experiment Specifications Tackling the Reproducibility Problem in Systems Research with Declarative Experiment Specifications Ivo Jimenez, Carlos Maltzahn (UCSC) Adam Moody, Kathryn Mohror (LLNL) Jay Lofstead (Sandia) Andrea Arpaci-Dusseau,

More information

I/O and file systems. Dealing with device heterogeneity

I/O and file systems. Dealing with device heterogeneity I/O and file systems Abstractions provided by operating system for storage devices Heterogeneous -> uniform One/few storage objects (disks) -> many storage objects (files) Simple naming -> rich naming

More information

FS Consistency & Journaling

FS Consistency & Journaling FS Consistency & Journaling Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau) Why Is Consistency Challenging? File system may perform several disk writes to serve a single request Caching

More information

Cost-Effective Virtual Petabytes Storage Pools using MARS. LCA 2018 Presentation by Thomas Schöbel-Theuer

Cost-Effective Virtual Petabytes Storage Pools using MARS. LCA 2018 Presentation by Thomas Schöbel-Theuer Cost-Effective Virtual Petabytes Storage Pools using MARS LCA 2018 Presentation by Thomas Schöbel-Theuer 1 Virtual Petabytes Storage Pools: Agenda Storage Architectures Scalability && Costs HOWTO Background

More information

A Gentle Introduction to Ceph

A Gentle Introduction to Ceph A Gentle Introduction to Ceph Narrated by Tim Serong tserong@suse.com Adapted from a longer work by Lars Marowsky-Brée lmb@suse.com Once upon a time there was a Free and Open Source distributed storage

More information

Distributed Data Analytics Transactions

Distributed Data Analytics Transactions Distributed Data Analytics G-3.1.09, Campus III Hasso Plattner Institut must ensure that interactions succeed consistently An OLTP Topic Motivation Most database interactions consist of multiple, coherent

More information

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant Big and Fast Anti-Caching in OLTP Systems Justin DeBrabant Online Transaction Processing transaction-oriented small footprint write-intensive 2 A bit of history 3 OLTP Through the Years relational model

More information

Dongjun Shin Samsung Electronics

Dongjun Shin Samsung Electronics 2014.10.31. Dongjun Shin Samsung Electronics Contents 2 Background Understanding CPU behavior Experiments Improvement idea Revisiting Linux I/O stack Conclusion Background Definition 3 CPU bound A computer

More information

Improving throughput for small disk requests with proximal I/O

Improving throughput for small disk requests with proximal I/O Improving throughput for small disk requests with proximal I/O Jiri Schindler with Sandip Shete & Keith A. Smith Advanced Technology Group 2/16/2011 v.1.3 Important Workload in Datacenters Serial reads

More information

NPTEL Course Jan K. Gopinath Indian Institute of Science

NPTEL Course Jan K. Gopinath Indian Institute of Science Storage Systems NPTEL Course Jan 2012 (Lecture 39) K. Gopinath Indian Institute of Science Google File System Non-Posix scalable distr file system for large distr dataintensive applications performance,

More information

INTRODUCTION TO CEPH. Orit Wasserman Red Hat August Penguin 2017

INTRODUCTION TO CEPH. Orit Wasserman Red Hat August Penguin 2017 INTRODUCTION TO CEPH Orit Wasserman Red Hat August Penguin 2017 CEPHALOPOD A cephalopod is any member of the molluscan class Cephalopoda. These exclusively marine animals are characterized by bilateral

More information

"Software-defined storage Crossing the right bridge"

Software-defined storage Crossing the right bridge Navigating the software-defined storage world shaping tomorrow with you René Huebel "Software-defined storage Crossing the right bridge" SDS the model and the promises Control Abstraction The promises

More information

Improved Solutions for I/O Provisioning and Application Acceleration

Improved Solutions for I/O Provisioning and Application Acceleration 1 Improved Solutions for I/O Provisioning and Application Acceleration August 11, 2015 Jeff Sisilli Sr. Director Product Marketing jsisilli@ddn.com 2 Why Burst Buffer? The Supercomputing Tug-of-War A supercomputer

More information

Eventually Consistent HTTP with Statebox and Riak

Eventually Consistent HTTP with Statebox and Riak Eventually Consistent HTTP with Statebox and Riak Author: Bob Ippolito (@etrepum) Date: November 2011 Venue: QCon San Francisco 2011 1/62 Introduction This talk isn't really about web. It's about how we

More information

Abstract Storage Moving file format specific abstrac7ons into petabyte scale storage systems. Joe Buck, Noah Watkins, Carlos Maltzahn & ScoD Brandt

Abstract Storage Moving file format specific abstrac7ons into petabyte scale storage systems. Joe Buck, Noah Watkins, Carlos Maltzahn & ScoD Brandt Abstract Storage Moving file format specific abstrac7ons into petabyte scale storage systems Joe Buck, Noah Watkins, Carlos Maltzahn & ScoD Brandt Introduc7on Current HPC environment separates computa7on

More information

TRANSACTIONAL FLASH CARSTEN WEINHOLD. Vijayan Prabhakaran, Thomas L. Rodeheffer, Lidong Zhou

TRANSACTIONAL FLASH CARSTEN WEINHOLD. Vijayan Prabhakaran, Thomas L. Rodeheffer, Lidong Zhou Department of Computer Science Institute for System Architecture, Operating Systems Group TRANSACTIONAL FLASH Vijayan Prabhakaran, Thomas L. Rodeheffer, Lidong Zhou CARSTEN WEINHOLD MOTIVATION Transactions

More information

Unifying Big Data Workloads in Apache Spark

Unifying Big Data Workloads in Apache Spark Unifying Big Data Workloads in Apache Spark Hossein Falaki @mhfalaki Outline What s Apache Spark Why Unification Evolution of Unification Apache Spark + Databricks Q & A What s Apache Spark What is Apache

More information

Choosing the Right Container Infrastructure for Your Organization

Choosing the Right Container Infrastructure for Your Organization WHITE PAPER Choosing the Right Container Infrastructure for Your Organization Container adoption is accelerating rapidly. Gartner predicts that by 2018 more than 50% of new workloads will be deployed into

More information

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning September 22 nd 2015 Tommaso Cecchi 2 What is IME? This breakthrough, software defined storage application

More information

Forget IOPS: A Proper Way to Characterize & Test Storage Performance Peter Murray SwiftTest

Forget IOPS: A Proper Way to Characterize & Test Storage Performance Peter Murray SwiftTest Forget IOPS: A Proper Way to Characterize & Test Storage Performance Peter Murray peter@swifttest.com SwiftTest Storage Performance Validation Rely on vendor IOPS claims Test in production and pray Validate

More information

ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency

ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency Thanos Makatos, Yannis Klonatos, Manolis Marazakis, Michail D. Flouris, and Angelos Bilas {mcatos,klonatos,maraz,flouris,bilas}@ics.forth.gr

More information

Distributed Systems 16. Distributed File Systems II

Distributed Systems 16. Distributed File Systems II Distributed Systems 16. Distributed File Systems II Paul Krzyzanowski pxk@cs.rutgers.edu 1 Review NFS RPC-based access AFS Long-term caching CODA Read/write replication & disconnected operation DFS AFS

More information

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores CSE 444: Database Internals Lectures 26 NoSQL: Extensible Record Stores CSE 444 - Spring 2014 1 References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No. 4)

More information

Flash Storage Complementing a Data Lake for Real-Time Insight

Flash Storage Complementing a Data Lake for Real-Time Insight Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum

More information

TDDB68 Concurrent Programming and Operating Systems. Lecture: File systems

TDDB68 Concurrent Programming and Operating Systems. Lecture: File systems TDDB68 Concurrent Programming and Operating Systems Lecture: File systems Mikael Asplund, Senior Lecturer Real-time Systems Laboratory Department of Computer and Information Science Copyright Notice: Thanks

More information

Ceph Intro & Architectural Overview. Federico Lucifredi Product Management Director, Ceph Storage Vancouver & Guadalajara, May 18th, 2015

Ceph Intro & Architectural Overview. Federico Lucifredi Product Management Director, Ceph Storage Vancouver & Guadalajara, May 18th, 2015 Ceph Intro & Architectural Overview Federico Lucifredi Product anagement Director, Ceph Storage Vancouver & Guadalajara, ay 8th, 25 CLOUD SERVICES COPUTE NETWORK STORAGE the future of storage 2 ROCK INK

More information

Kodewerk. Java Performance Services. The War on Latency. Reducing Dead Time Kirk Pepperdine Principle Kodewerk Ltd.

Kodewerk. Java Performance Services. The War on Latency. Reducing Dead Time Kirk Pepperdine Principle Kodewerk Ltd. Kodewerk tm Java Performance Services The War on Latency Reducing Dead Time Kirk Pepperdine Principle Kodewerk Ltd. Me Work as a performance tuning freelancer Nominated Sun Java Champion www.kodewerk.com

More information

DISTRIBUTED STORAGE AND COMPUTE WITH LIBRADOS SAGE WEIL VAULT

DISTRIBUTED STORAGE AND COMPUTE WITH LIBRADOS SAGE WEIL VAULT DISTRIBUTED STORAGE AND COMPUTE WITH LIBRADOS SAGE WEIL VAULT - 2015.03.11 AGENDA motivation what is Ceph? what is librados? what can it do? other RADOS goodies a few use cases 2 MOTIVATION MY FIRST WEB

More information

LINUX KERNEL UPDATES FOR AUTOMOTIVE: LESSONS LEARNED

LINUX KERNEL UPDATES FOR AUTOMOTIVE: LESSONS LEARNED LINUX KERNEL UPDATES FOR AUTOMOTIVE: LESSONS LEARNED TOM MCREYNOLDS, VLAD BUZOV AUTOMOTIVE SOFTWARE OCTOBER 15TH, 2013 Why kernel upgrades : the problem Linux Kernel cadence doesn t match Automotive s

More information

Webscale, All Flash, Distributed File Systems. Avraham Meir Elastifile

Webscale, All Flash, Distributed File Systems. Avraham Meir Elastifile Webscale, All Flash, Distributed File Systems Avraham Meir Elastifile 1 Outline The way to all FLASH The way to distributed storage Scale-out storage management Conclusion 2 Storage Technology Trend NAND

More information

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals References CSE 444: Database Internals Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol 39, No 4) Lectures 26 NoSQL: Extensible Record Stores Bigtable: A Distributed

More information

Improvements in MySQL 5.5 and 5.6. Peter Zaitsev Percona Live NYC May 26,2011

Improvements in MySQL 5.5 and 5.6. Peter Zaitsev Percona Live NYC May 26,2011 Improvements in MySQL 5.5 and 5.6 Peter Zaitsev Percona Live NYC May 26,2011 State of MySQL 5.5 and 5.6 MySQL 5.5 Released as GA December 2011 Percona Server 5.5 released in April 2011 Proven to be rather

More information

BigTable. CSE-291 (Cloud Computing) Fall 2016

BigTable. CSE-291 (Cloud Computing) Fall 2016 BigTable CSE-291 (Cloud Computing) Fall 2016 Data Model Sparse, distributed persistent, multi-dimensional sorted map Indexed by a row key, column key, and timestamp Values are uninterpreted arrays of bytes

More information

Executive Summary. It is important for a Java Programmer to understand the power and limitations of concurrent programming in Java using threads.

Executive Summary. It is important for a Java Programmer to understand the power and limitations of concurrent programming in Java using threads. Executive Summary. It is important for a Java Programmer to understand the power and limitations of concurrent programming in Java using threads. Poor co-ordination that exists in threads on JVM is bottleneck

More information

Cisco Tetration Analytics

Cisco Tetration Analytics Cisco Tetration Analytics Enhanced security and operations with real time analytics John Joo Tetration Business Unit Cisco Systems Security Challenges in Modern Data Centers Securing applications has become

More information

RED HAT CLOUD STRATEGY (OPEN HYBRID CLOUD) Ahmed El-Rayess Solutions Architect

RED HAT CLOUD STRATEGY (OPEN HYBRID CLOUD) Ahmed El-Rayess Solutions Architect RED HAT CLOUD STRATEGY (OPEN HYBRID CLOUD) Ahmed El-Rayess Solutions Architect AGENDA Cloud Concepts Market Overview Evolution to Cloud Workloads Evolution to Cloud Infrastructure CLOUD TYPES AND DEPLOYMENT

More information

Current Topics in OS Research. So, what s hot?

Current Topics in OS Research. So, what s hot? Current Topics in OS Research COMP7840 OSDI Current OS Research 0 So, what s hot? Operating systems have been around for a long time in many forms for different types of devices It is normally general

More information

Joe Butler, Principal Engineer, Director Cloud Services Lab. Nov , OpenStack Summit Paris.

Joe Butler, Principal Engineer, Director Cloud Services Lab. Nov , OpenStack Summit Paris. Telemetry the foundation of intelligent cloud orchestration. Joe Butler, Principal Engineer, Director Cloud Services Lab. Nov 3 2014, OpenStack Summit Paris. http://sched.co/1xj2lm9 Datacenter Trends and

More information

Distributed File Systems II

Distributed File Systems II Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation

More information

Process simulation as a domain- specific OPC UA information model

Process simulation as a domain- specific OPC UA information model Process simulation as a domain- specific OPC UA information model Paolo Greppi, consultant, 3iP, Italy ESCAPE 20 June 6 th to 9 th 2010 Ischia, Naples (Italy) Presentation outline Classic OPC OPC Unified

More information

Differential RAID: Rethinking RAID for SSD Reliability

Differential RAID: Rethinking RAID for SSD Reliability Differential RAID: Rethinking RAID for SSD Reliability Mahesh Balakrishnan Asim Kadav 1, Vijayan Prabhakaran, Dahlia Malkhi Microsoft Research Silicon Valley 1 The University of Wisconsin-Madison Solid

More information

Debugging Applications in Pervasive Computing

Debugging Applications in Pervasive Computing Debugging Applications in Pervasive Computing Larry May 1, 2006 SMA 5508; MIT 6.883 1 Outline Video of Speech Controlled Animation Survey of approaches to debugging Turning bugs into features Speech recognition

More information

Shared File System Requirements for SAS Grid Manager. Table Talk #1546 Ben Smith / Brian Porter

Shared File System Requirements for SAS Grid Manager. Table Talk #1546 Ben Smith / Brian Porter Shared File System Requirements for SAS Grid Manager Table Talk #1546 Ben Smith / Brian Porter About the Presenters Main Presenter: Ben Smith, Technical Solutions Architect, IBM smithbe1@us.ibm.com Brian

More information

What is the Future of PostgreSQL?

What is the Future of PostgreSQL? What is the Future of PostgreSQL? Robert Haas 2013 EDB All rights reserved. 1 PostgreSQL Popularity By The Numbers Date Rating Increase vs. Prior Year % Increase January 2016 282.401 +27.913 +11% January

More information

Dynamic Reconfiguration of Primary/Backup Clusters

Dynamic Reconfiguration of Primary/Backup Clusters Dynamic Reconfiguration of Primary/Backup Clusters (Apache ZooKeeper) Alex Shraer Yahoo! Research In collaboration with: Benjamin Reed Dahlia Malkhi Flavio Junqueira Yahoo! Research Microsoft Research

More information

The Future of Real-Time in Spark

The Future of Real-Time in Spark The Future of Real-Time in Spark Reynold Xin @rxin Spark Summit, New York, Feb 18, 2016 Why Real-Time? Making decisions faster is valuable. Preventing credit card fraud Monitoring industrial machinery

More information

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI 2006 Presented by Xiang Gao 2014-11-05 Outline Motivation Data Model APIs Building Blocks Implementation Refinement

More information

SolidFire and Ceph Architectural Comparison

SolidFire and Ceph Architectural Comparison The All-Flash Array Built for the Next Generation Data Center SolidFire and Ceph Architectural Comparison July 2014 Overview When comparing the architecture for Ceph and SolidFire, it is clear that both

More information

Introducing the Cray XMT. Petr Konecny May 4 th 2007

Introducing the Cray XMT. Petr Konecny May 4 th 2007 Introducing the Cray XMT Petr Konecny May 4 th 2007 Agenda Origins of the Cray XMT Cray XMT system architecture Cray XT infrastructure Cray Threadstorm processor Shared memory programming model Benefits/drawbacks/solutions

More information

A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing

A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou, K. Magoutis, M. Marazakis, A. Bilas Institute of Computer Science (ICS) Foundation for Research and

More information

ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Flash Devices

ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Flash Devices ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Devices Jiacheng Zhang, Jiwu Shu, Youyou Lu Tsinghua University 1 Outline Background and Motivation ParaFS Design Evaluation

More information

Vertica s Design: Basics, Successes, and Failures

Vertica s Design: Basics, Successes, and Failures Vertica s Design: Basics, Successes, and Failures Chuck Bear CIDR 2015 January 5, 2015 1. Vertica Basics: Storage Format Design Goals SQL (for the ecosystem and knowledge pool) Clusters of commodity hardware

More information

Cluster-Level Google How we use Colossus to improve storage efficiency

Cluster-Level Google How we use Colossus to improve storage efficiency Cluster-Level Storage @ Google How we use Colossus to improve storage efficiency Denis Serenyi Senior Staff Software Engineer dserenyi@google.com November 13, 2017 Keynote at the 2nd Joint International

More information

Data Movement & Tiering with DMF 7

Data Movement & Tiering with DMF 7 Data Movement & Tiering with DMF 7 Kirill Malkin Director of Engineering April 2019 Why Move or Tier Data? We wish we could keep everything in DRAM, but It s volatile It s expensive Data in Memory 2 Why

More information

Swift 5, ABI Stability and

Swift 5, ABI Stability and Swift 5, ABI Stability and Concurrency @phillfarrugia Important Documents Concurrency Manifesto by Chris Lattner https: /gist.github.com/lattner/ 31ed37682ef1576b16bca1432ea9f782 Kicking off Concurrency

More information

Algorithms and Data Structures for Efficient Free Space Reclamation in WAFL

Algorithms and Data Structures for Efficient Free Space Reclamation in WAFL Algorithms and Data Structures for Efficient Free Space Reclamation in WAFL Ram Kesavan Technical Director, WAFL NetApp, Inc. SDC 2017 1 Outline Garbage collection in WAFL Usenix FAST 2017 ACM Transactions

More information

A Comparison of Performance and Accuracy of Measurement Algorithms in Software

A Comparison of Performance and Accuracy of Measurement Algorithms in Software A Comparison of Performance and Accuracy of Measurement Algorithms in Software Omid Alipourfard, Masoud Moshref 1, Yang Zhou 2, Tong Yang 2, Minlan Yu 3 Yale University, Barefoot Networks 1, Peking University

More information

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory Dhananjoy Das, Sr. Systems Architect SanDisk Corp. 1 Agenda: Applications are KING! Storage landscape (Flash / NVM)

More information

Performance improvements in MySQL 5.5

Performance improvements in MySQL 5.5 Performance improvements in MySQL 5.5 Percona Live Feb 16, 2011 San Francisco, CA By Peter Zaitsev Percona Inc -2- Performance and Scalability Talk about Performance, Scalability, Diagnostics in MySQL

More information

2011 IBM Research Strategic Initiative: Workload Optimized Systems

2011 IBM Research Strategic Initiative: Workload Optimized Systems PIs: Michael Hind, Yuqing Gao Execs: Brent Hailpern, Toshio Nakatani, Kevin Nowka 2011 IBM Research Strategic Initiative: Workload Optimized Systems Yuqing Gao IBM Research 2011 IBM Corporation Motivation

More information

Advanced Format in Legacy Infrastructures More Transparent than Disruptive

Advanced Format in Legacy Infrastructures More Transparent than Disruptive Advanced Format in Legacy Infrastructures More Transparent than Disruptive Sponsored by IDEMA Presented by Curtis E. Stevens Agenda AF History Enterprise AF Futures SMR & LBA Indirection Hybrids & SSDs

More information

Ceph Intro & Architectural Overview. Abbas Bangash Intercloud Systems

Ceph Intro & Architectural Overview. Abbas Bangash Intercloud Systems Ceph Intro & Architectural Overview Abbas Bangash Intercloud Systems About Me Abbas Bangash Systems Team Lead, Intercloud Systems abangash@intercloudsys.com intercloudsys.com 2 CLOUD SERVICES COMPUTE NETWORK

More information

Sandor Heman, Niels Nes, Peter Boncz. Dynamic Bandwidth Sharing. Cooperative Scans: Marcin Zukowski. CWI, Amsterdam VLDB 2007.

Sandor Heman, Niels Nes, Peter Boncz. Dynamic Bandwidth Sharing. Cooperative Scans: Marcin Zukowski. CWI, Amsterdam VLDB 2007. Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS Marcin Zukowski Sandor Heman, Niels Nes, Peter Boncz CWI, Amsterdam VLDB 2007 Outline Scans in a DBMS Cooperative Scans Benchmarks DSM version VLDB,

More information

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc. UK LUG 10 th July 2012 Lustre at Exascale Eric Barton CTO Whamcloud, Inc. eeb@whamcloud.com Agenda Exascale I/O requirements Exascale I/O model 3 Lustre at Exascale - UK LUG 10th July 2012 Exascale I/O

More information

Speeding Up Cloud/Server Applications Using Flash Memory

Speeding Up Cloud/Server Applications Using Flash Memory Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta and Jin Li Microsoft Research, Redmond, WA, USA Contains work that is joint with Biplob Debnath (Univ. of Minnesota) Flash Memory

More information

OPERATING SYSTEM TRANSACTIONS

OPERATING SYSTEM TRANSACTIONS OPERATING SYSTEM TRANSACTIONS Donald E. Porter, Owen S. Hofmann, Christopher J. Rossbach, Alexander Benn, and Emmett Witchel The University of Texas at Austin OS APIs don t handle concurrency 2 OS is weak

More information

Toward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store. Wei Xie TTU CS Department Seminar, 3/7/2017

Toward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store. Wei Xie TTU CS Department Seminar, 3/7/2017 Toward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store Wei Xie TTU CS Department Seminar, 3/7/2017 1 Outline General introduction Study 1: Elastic Consistent Hashing based Store

More information

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan Storage Virtualization Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan Storage Virtualization In computer science, storage virtualization uses virtualization to enable better functionality

More information

Bigtable. Presenter: Yijun Hou, Yixiao Peng

Bigtable. Presenter: Yijun Hou, Yixiao Peng Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google, Inc. OSDI 06 Presenter: Yijun Hou, Yixiao Peng

More information

Summary: Issues / Open Questions:

Summary: Issues / Open Questions: Summary: The paper introduces Transitional Locking II (TL2), a Software Transactional Memory (STM) algorithm, which tries to overcomes most of the safety and performance issues of former STM implementations.

More information