HYDRAstor: a Scalable Secondary Storage

Similar documents
HYDRAstor: a Scalable Secondary Storage

HydraFS: a High-Throughput File System for the HYDRAstor Content-Addressable Storage System

Scale-out Object Store for PB/hr Backups and Long Term Archive April 24, 2014

White Paper Simplified Backup and Reliable Recovery

Scale-out Data Deduplication Architecture

Scale-Out Architectures for Secondary Storage

Maximizing Data Efficiency: Benefits of Global Deduplication

Hedvig as backup target for Veeam

Reducing The De-linearization of Data Placement to Improve Deduplication Performance

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE

Data Deduplication Methods for Achieving Data Efficiency

Milestone Solution Partner IT Infrastructure Components Certification Report

Deduplication Storage System

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM

Foster B-Trees. Lucas Lersch. M. Sc. Caetano Sauer Advisor

The Google File System

COM Verification. PRESENTATION TITLE GOES HERE Alan G. Yoder, Ph.D. SNIA Technical Council Huawei Technologies, LLC

How to Reduce Data Capacity in Objectbased Storage: Dedup and More

ADVANCED DEDUPLICATION CONCEPTS. Thomas Rivera, BlueArc Gene Nagle, Exar

CA485 Ray Walshe Google File System

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM

DEMYSTIFYING DATA DEDUPLICATION A WHITE PAPER

ADVANCED DATA REDUCTION CONCEPTS

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

Purity: building fast, highly-available enterprise flash storage from commodity components

In-line Deduplication for Cloud storage to Reduce Fragmentation by using Historical Knowledge

SolidFire and Ceph Architectural Comparison

Veeam with Cohesity Data Platform

Data Reduction Meets Reality What to Expect From Data Reduction

Multi-level Selective Deduplication for VM Snapshots in Cloud Storage

White paper ETERNUS CS800 Data Deduplication Background

The What, Why and How of the Pure Storage Enterprise Flash Array. Ethan L. Miller (and a cast of dozens at Pure Storage)

HICAMP Bitmap. A Space-Efficient Updatable Bitmap Index for In-Memory Databases! Bo Wang, Heiner Litz, David R. Cheriton Stanford University DAMON 14

Software-defined Storage: Fast, Safe and Efficient

WHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION?

The Google File System (GFS)

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

Google File System. Arun Sundaram Operating Systems

ActiveScale Erasure Coding and Self Protecting Technologies

Microsoft DPM Meets BridgeSTOR Advanced Data Reduction and Security

The Logic of Physical Garbage Collection in Deduplicating Storage

TIBX NEXT-GENERATION ARCHIVE FORMAT IN ACRONIS BACKUP CLOUD

GFS: The Google File System. Dr. Yingwu Zhu

Deduplication File System & Course Review

WHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper

ChunkStash: Speeding Up Storage Deduplication using Flash Memory

The World s Fastest Backup Systems

SMORE: A Cold Data Object Store for SMR Drives

IBM řešení pro větší efektivitu ve správě dat - Store more with less

File systems CS 241. May 2, University of Illinois

CLOUD-SCALE FILE SYSTEMS

IME Infinite Memory Engine Technical Overview

Delta Compressed and Deduplicated Storage Using Stream-Informed Locality

ECS High Availability Design

HP D2D & STOREONCE OVERVIEW

The Google File System

The Google File System

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Cohesity SpanFS and SnapTree. White Paper

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

Backup and archiving need not to create headaches new pain relievers are around

Introducing Tegile. Company Overview. Product Overview. Solutions & Use Cases. Partnering with Tegile

arxiv: v3 [cs.dc] 27 Jun 2013

Tom Sas HP. Author: SNIA - Data Protection & Capacity Optimization (DPCO) Committee

Cumulus: Filesystem Backup to the Cloud

File Systems Management and Examples

A Review on Backup-up Practices using Deduplication

Cold Storage: The Road to Enterprise Ilya Kuznetsov YADRO

Distributed Filesystem

FGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance

Administração e Optimização de Bases de Dados 2012/2013 Index Tuning

NPTEL Course Jan K. Gopinath Indian Institute of Science

Technology Insight Series

ICS Principles of Operating Systems

Google File System, Replication. Amin Vahdat CSE 123b May 23, 2006

Design Tradeoffs for Data Deduplication Performance in Backup Workloads

COMP 530: Operating Systems File Systems: Fundamentals

Distributed File Systems II

ENCRYPTED DATA MANAGEMENT WITH DEDUPLICATION IN CLOUD COMPUTING

Lattice: A Decentralized, Distributed Datastore

ActiveScale Erasure Coding and Self Protecting Technologies

Veeam and HP: Meet your backup data protection goals

Speeding Up Cloud/Server Applications Using Flash Memory

Chapter 7: File-System

EMC DATA DOMAIN PRODUCT OvERvIEW

GFS: The Google File System

CS720 - Operating Systems

Reducing Replication Bandwidth for Distributed Document Databases

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

Today CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space

The term "physical drive" refers to a single hard disk module. Figure 1. Physical Drive

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng

SED: SCALABLE & EFFICIENT DE-DUPLICATION FILE SYSTEM FOR VIRTUAL MACHINE IMAGES

Design of Global Data Deduplication for A Scale-out Distributed Storage System

Clotho: Transparent Data Versioning at the Block I/O Level

Technical White Paper: IntelliFlash Architecture

Index Tuning. Index. An index is a data structure that supports efficient access to data. Matching records. Condition on attribute value

50 TB. Traditional Storage + Data Protection Architecture. StorSimple Cloud-integrated Storage. Traditional CapEx: $375K Support: $75K per Year

Data De-duplication for Distributed Segmented Parallel FS

Transcription:

HYDRAstor: a Scalable Secondary Storage 7th USENIX Conference on File and Storage Technologies (FAST '09) February 26 th 2009 C. Dubnicki, L. Gryz, L. Heldt, M. Kaczmarczyk, W. Kilian, P. Strzelczak, J. Szczepkowski, M. Welnicki C. Ungureanu

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 2 Scalable secondary storage Characteristics Huge amount of data Small backup windows Duplication between backup streams Reliable, on-line retrieval Varying value of data Requirements - Scalability (dynamic) - Low cost per TB - Very high write performance - Global deduplication - Failure tolerance - High restore performance - Adjust resilience overhead - Data deletion

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 3 Scalable secondary storage Characteristics Huge amount of data Small backup windows Duplication between backup streams Reliable, on-line retrieval Varying value of data Requirements - Scalability (dynamic) - Low cost per TB - Very high write performance - Global deduplication - Failure tolerance - High restore performance - Adjust resilience overhead - Data deletion

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 4 Scalable secondary storage Characteristics Huge amount of data Small backup windows Duplication between backup streams Reliable, on-line retrieval Varying value of data Requirements - Scalability (dynamic) - Low cost per TB - Very high write performance - Global deduplication - Failure tolerance - High restore performance - Adjust resilience overhead - Data deletion

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 5 Scalable secondary storage Characteristics Huge amount of data Small backup windows Duplication between backup streams Reliable, on-line retrieval Varying value of data Requirements - Scalability (dynamic) - Low cost per TB - Very high write performance - Global deduplication - Failure tolerance - High restore performance - Adjust resilience overhead - Data deletion

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 6 Scalable secondary storage Characteristics Huge amount of data Small backup windows Duplication between backup streams Reliable, on-line retrieval Varying value of data Requirements - Scalability (dynamic) - Low cost per TB - Very high write performance - Global deduplication - Failure tolerance - High restore performance - Adjust resilience overhead - Data deletion

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 7 Challenges High-performance, decentralized global deduplication... in a dynamic, distributed system... with deletion and failures Combination introduces complexity Tension between: Deduplication and dynamic scalability Deduplication and on-demand deletion Failure tolerance and deletion

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 8 Satisfies Scalable secondary storage requirements Started as a research project at NEC Laboratories America, in Princeton, NJ Successfully commercialized Today: real-world, commercial system Sold by NEC in the US and Japan Development of back-end continues at 9LivesData, LLC in Warsaw, Poland Spinoff from NEC Laboratories

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 9 HYDRAstor functionality Content addressable storage (CAS) Vast data repository Storing and extracting streams of blocks Single system image built of independent nodes Support for standard access methods Filesystem, VTL Dynamic capacity sharing Self-recovery from failures On-demand deletion

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 10 Programming Model Repository of blocks Content-addressed Immutable Variable-sized hash=011..0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 11 Programming Model Repository of blocks Content-addressed Immutable Variable-sized Exposed pointers to other blocks E 011..0 hash=011..0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 12 Programming Model Repository of blocks Content-addressed hash=010..1 Root1 E Immutable Variable-sized Exposed pointers to other blocks E Trees of blocks E 011..0 hash=011..0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 13 Programming Model Repository of blocks Content-addressed hash=010..1 Root1 E Root2 E Immutable hash=110..0 Variable-sized Exposed pointers to other blocks Trees of blocks DAGs due to deduplication No cycles possible E E 011..0 011..0 hash=011..0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 14 Programming Model Repository of blocks Content-addressed hash=010..1 Root1 E Root2 E Immutable hash=110..0 Variable-sized Exposed pointers to other blocks Trees of blocks DAGs due to deduplication No cycles possible E E 011..0 011..0 Deletion of whole trees hash=011..0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 15 Programming Model Repository of blocks Content-addressed hash=010..1 Root1 E Root2 E Immutable hash=110..0 Variable-sized Exposed pointers to other blocks Trees of blocks DAGs due to deduplication No cycles possible E E 011..0 011..0 Deletion of whole trees hash=011..0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 16 Programming Model Repository of blocks Content-addressed hash=010..1 Root1 E Root2 E Immutable hash=110..0 Variable-sized Exposed pointers to other blocks Trees of blocks DAGs due to deduplication No cycles possible E E 011..0 011..0 Deletion of whole trees hash=011..0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 17 Programming Model Repository of blocks Content-addressed Root2 E Immutable hash=110..0 Variable-sized Exposed pointers to other blocks Trees of blocks DAGs due to deduplication No cycles possible E 011..0 011..0 Deletion of whole trees hash=011..0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 18 Architecture overview Standard server-grade hardware running Linux Scalability on data-center level Access Nodes NFS / CIFS Front-end Internal Network Back-end (CAS Layer) Storage Nodes

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 19 Requirements on scalable storage Failure tolerance Data organization: selected requirements Required internal data services Identify data resilience reduction Fast data rebuilding High performance Preserve locality of data streams Prefetching Dynamic scalability Decentralized data management Load balancing Fast data transfer to new location Deduplication Location of potential duplicates Availability & resiliency verification On-demand deletion Failure-tolerant, distributed deletion

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 20 Requirements on scalable storage Failure tolerance Data organization: selected requirements Required internal data services Identify data resilience reduction Fast data rebuilding High performance Preserve locality of data streams Prefetching Dynamic scalability Decentralized data management Load balancing Fast data transfer to new location Deduplication Location of potential duplicates Availability & resiliency verification On-demand deletion Failure-tolerant, distributed deletion

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 21 Requirements on scalable storage Failure tolerance Data organization: selected requirements Required internal data services Identify data resilience reduction Fast data rebuilding High performance Preserve locality of data streams Prefetching Dynamic scalability Decentralized data management Load balancing Fast data transfer to new location Deduplication Location of potential duplicates Availability & resiliency verification On-demand deletion Failure-tolerant, distributed deletion

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 22 Requirements on scalable storage Failure tolerance Data organization: selected requirements Required internal data services Identify data resilience reduction Fast data rebuilding High performance Preserve locality of data streams Prefetching Dynamic scalability Decentralized data management Load balancing Fast data transfer to new location Deduplication Location of potential duplicates Availability & resiliency verification On-demand deletion Failure-tolerant, distributed deletion

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 23 Requirements on scalable storage Failure tolerance Data organization: selected requirements Required internal data services Identify data resilience reduction Fast data rebuilding High performance Preserve locality of data streams Prefetching Dynamic scalability Decentralized data management Load balancing Fast data transfer to new location Deduplication Location of potential duplicates Availability & resiliency verification On-demand deletion Failure-tolerant, distributed deletion

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 24 Failure tolerance: erasure coding Block erasure-coded into N fragments Storage overhead tunable Example: N=8, m=5 Original block Encode Redundant Fragments Original Fragments Decode Any 3 fragments can be lost

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 25 Scalability with DHT: data placement Block location: DHT with prefix routing empty prefix 0 1 00 01 10 11

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 26 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix hash=011..0 empty prefix Block 0 1 00 01 10 11

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 27 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components Hosted on SNs empty prefix 0 1 hash=011..0 Block N=4 N components per prefix Node 1 00 1 01 10 11 0 2 Node 12 3 1 1 Node 13 2 3 Node 14 0 0 2 Node 15 2 1 Node 16 3 3 0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 28 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components Hosted on SNs empty prefix 0 1 hash=011..0 Block N=4 N components per prefix Node 1 00 1 01 10 11 0 2 Store fragments Node 12 3 1 1 Node 13 2 3 Node 14 0 0 2 Node 15 2 1 Node 16 3 3 0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 29 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components empty prefix hash=011..0 Block Hosted on SNs 0 1 N=4 N components per prefix Node 1 00 1 01 10 11 0 2 Store fragments Node 12 3 1 1 Distributed consensus Node 13 Node 14 Node 15 2 0 0 2 2 3 1 Node 16 3 3 0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 30 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components empty prefix hash=011..0 Block Hosted on SNs 0 1 N=4 N components per prefix Node 1 00 1 01 10 11 0 2 Store fragments Node 12 3 1 1 Distributed consensus Node 13 Node 14 Node 15 2 0 0 2 2 3 1 Node 16 3 3 0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 31 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components empty prefix hash=011..0 Block Hosted on SNs 0 1 N=4 N components per prefix Node 1 00 01 10 11 Store fragments Node 12 3 1 1 Distributed consensus Node 13 Node 14 Node 15 2 0 0 1 2 0 2 3 2 1 Node 16 3 3 0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 32 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components empty prefix hash=011..0 Block Hosted on SNs 0 1 N=4 N components per prefix Node 1 00 01 10 11 Store fragments Node 12 3 1 1 Distributed consensus Node 13 Node 14 Node 15 2 0 0 1 2 0 2 3 2 1 Node 16 3 3 0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 33 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components empty prefix hash=011..0 Block Hosted on SNs 0 1 N=4 N components per prefix Node 1 00 01 10 11 Store fragments Node 12 3 1 1 Distributed consensus Load balancing Node 13 Node 14 Node 15 Node 16 2 0 0 1 2 3 0 2 3 3 2 1 0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 34 Data organization: synchrun chains A B C D E F G Data stream split to blocks

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 35 Data organization: synchrun chains A B C D E F G 010 101 110 011 000 011 100 Data stream split to blocks es of blocks computed

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 36 Data organization: synchrun chains A B C D E F G 010 101 110 011 000 011 100 Data stream split to blocks es of blocks computed Routing through DHT

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 37 Data organization: synchrun chains A B C D E F G 010 101 Prefix 01 110 011 000 011 100 Data stream split to blocks es of blocks computed Routing through DHT

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 38 Data organization: synchrun chains A B C D E F G 010 101 Prefix 01 110 011 000 Compression 011 100 Data stream split to blocks es of blocks computed Routing through DHT Erasure Coding

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 39 Data organization: synchrun chains 0 1 2 3 A B C D E F G 010 Prefix 01 101 110 011 000 Compression Erasure Coding 011 100 Data stream split to blocks es of blocks computed Routing through DHT Erasure-coded fragments stored by components

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 40 Data organization: synchrun chains A B C D E F G 010 Prefix 01 101 110 011 000 Compression 011 100 Data stream split to blocks es of blocks computed Routing through DHT 0 Erasure Coding A D F Erasure-coded fragments stored by components 1 A D F 2 3 A D F A D F

Synchrun HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 41 Data organization: synchrun chains A B C D E F G 010 Prefix 01 101 110 011 000 Compression 011 100 Data stream split to blocks es of blocks computed Routing through DHT 0 Erasure Coding Synchrun 1 Synchrun 2 Synchrun 3 A D F Erasure-coded fragments stored by components Grouped into synchruns 1 A D F 2 A D F 3 A D F

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 42 Data organization: synchrun chains A B C D E F G 010 Prefix 01 101 110 011 000 Compression 011 100 Data stream split to blocks es of blocks computed Routing through DHT 0 Erasure Coding Synchrun 1 Synchrun 2 Synchrun 3 A D F Erasure-coded fragments stored by components Grouped into synchruns 1 2 A D F A D F Containers stored on disks Fragment metadata separately from data 3 A D F Container Synchrun

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 43 Data organization: synchrun chains A B C D E F G 010 Prefix 01 101 110 011 000 Compression 011 100 Data stream split to blocks es of blocks computed Routing through DHT 0 Erasure Coding Synchrun 1 Synchrun 2 Synchrun 3 A D F Erasure-coded fragments stored by components Grouped into synchruns 1 2 3 A D F A D F A D F Containers stored on disks Fragment metadata separately from data Ordered synchrun chains Preserve order & locality Container Synchrun Manageable

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 44 Synchrun chains in a dynamic system 01:1

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 45 System growth: split 01:1 010:1 011:1

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 46 System growth: split 01:1 010:1 011:1

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 47 System growth: split 01:1 010:1 011:1

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 48 Concatenation 01:1 010:1

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 49 Concatenation 01:1 010:1 010:1

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 50 Marking blocks to reclaim 01:1 010:1 010:1

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 51 Space reclamation & Concatenation 01:1 010:1 010:1 010:1

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 52 Data Services: Identification of data resiliency level Missing fragments 01:0 01:1 01:2 01:3

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 53 Data Services: Identification of data resiliency level 01:0 01:1 01:2 01:3 Chain scanning

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 54 Data Services: Identification of data resiliency level 01:0 01:1 01:2 01:3 Chain scanning

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 55 Data Services: Identification of data resiliency level 01:0 01:1 01:2 01:3 Chain scanning

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 56 Data Services: Identification of data resiliency level 01:0 01:1 01:2 01:3 Chain scanning

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 57 Data services: reconstruction 01:0 01:1 01:2 01:3 Sequential read/write of entire Containers Erasure decoding and re-encoding

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 58 Data services: reconstruction 01:0 01:1 01:2 01:3 Sequential read/write of entire Containers Erasure decoding and re-encoding

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 59 Data services: reconstruction 01:0 01:1 01:2 01:3 Sequential read/write of entire Containers Erasure decoding and re-encoding

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 60 Data services: fast data transfer 01:0 01:1 01:2 01:3 Old component 01:3 Location of new node (DHT)

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 61 Data services: fast data transfer 01:0 01:1 01:2 01:3 Data transfer Old component 01:3

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 62 Data services: fast data transfer 01:0 01:1 01:2 01:3 Data transfer Old component 01:3

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 63 Data services: fast data transfer 01:0 01:1 01:2 01:3 Data transfer Old component 01:3

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 64 Data services: fast data transfer 01:0 01:1 01:2 01:3 Old component 01:3

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 65 Data services for deduplication Global: duplicates detected in entire system DHT routing based on content Inline deduplication: has to be high-performance Prefetching Containers for streams of duplicates Block hashes stored separately

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 66 Data services for deduplication hash=011.. Block 01:0 Choose complete chain 01:1 01:2 01:3 Completeness: definitely not a duplicate Deletion interaction: wasn't the block scheduled for deletion?

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 67 Data services for deduplication hash=011.. Block 01:0 01:1 Query 01:2 01:3

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 68 Data services for deduplication hash=011.. Block 01:0 01:1 01:2 01:3 Local candidate found

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 69 Data services for deduplication hash=011.. Block 01:0 01:1 01:2 Successful dedup 01:3 Candidate verification

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 70 On-demand data deletion Distributed garbage collection Per-block reference counter stored perfragment Failure-tolerant Block reference counter calculated independently on peer Container chains Interference with duplicate elimination: read-only phase for block tree traversal space reclamation in background

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 71 Writes during node failure Writing Reconstruction

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 72 Write Scaling nodes added while writing

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 73 Write Scaling nodes added while writing

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 74 Questions?