HYDRAstor: a Scalable Secondary Storage

Similar documents
HYDRAstor: a Scalable Secondary Storage

Scale-out Object Store for PB/hr Backups and Long Term Archive April 24, 2014

White Paper Simplified Backup and Reliable Recovery

Scale-out Data Deduplication Architecture

Scale-Out Architectures for Secondary Storage

Maximizing Data Efficiency: Benefits of Global Deduplication

HP StoreOnce: reinventing data deduplication

HP D2D & STOREONCE OVERVIEW

How to solve your backup problems with HP StoreOnce

Milestone Solution Partner IT Infrastructure Components Certification Report

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

Get More Out of Storage with Data Domain Deduplication Storage Systems

NEC Sets High Bar in Scale-Out Secondary Storage Market

Backup and archiving need not to create headaches new pain relievers are around

Deduplication Storage System

Data Deduplication Methods for Achieving Data Efficiency

De-dupe: It s not a question of if, rather where and when! What to Look for and What to Avoid

Dell DR4100. Disk based Data Protection and Disaster Recovery. April 3, Birger Ferber, Enterprise Technologist Storage EMEA

Veritas NetBackup Appliance Family OVERVIEW BROCHURE

Software-defined Storage: Fast, Safe and Efficient

WHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION?

The World s Fastest Backup Systems

The Google File System (GFS)

White paper ETERNUS CS800 Data Deduplication Background

STOREONCE OVERVIEW. Neil Fleming Mid-Market Storage Development Manager. Copyright 2010 Hewlett-Packard Development Company, L.P.

Setting Up the DR Series System on Veeam

Protect enterprise data, achieve long-term data retention

Deduplication File System & Course Review

Chapter 7. GridStor Technology. Adding Data Paths. Data Paths for Global Deduplication. Data Path Properties

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM

Hedvig as backup target for Veeam

IBM řešení pro větší efektivitu ve správě dat - Store more with less

HydraFS: a High-Throughput File System for the HYDRAstor Content-Addressable Storage System

EMC Data Domain for Archiving Are You Kidding?

Data Reduction Meets Reality What to Expect From Data Reduction

Cohesity Flash Protect for Pure FlashBlade: Simple, Scalable Data Protection

HIGH PERFORMANCE STORAGE SOLUTION PRESENTATION All rights reserved RAIDIX

UCS Invicta: A New Generation of Storage Performance. Mazen Abou Najm DC Consulting Systems Engineer

Midsize Enterprise Solutions Selling Guide. Sell NetApp s midsize enterprise solutions and take your business and your customers further, faster

Veeam and HP: Meet your backup data protection goals

NEC HYDRAstor Date: September, 2009 Author: Terri McClure, Senior Analyst, and Lauren Whitehouse, Senior Analyst

Introducing Tegile. Company Overview. Product Overview. Solutions & Use Cases. Partnering with Tegile

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE

Cohesity SpanFS and SnapTree. White Paper

HP Dynamic Deduplication achieving a 50:1 ratio

DELL EMC DATA DOMAIN EXTENDED RETENTION SOFTWARE

Business Benefits of Policy Based Data De-Duplication Data Footprint Reduction with Quality of Service (QoS) for Data Protection

Quest QoreStor User Guide

Virtualization Selling with IBM Tape

ADVANCED DATA REDUCTION CONCEPTS

The Microsoft Large Mailbox Vision

GFS: The Google File System. Dr. Yingwu Zhu

Distributed Filesystem

EMC DATA DOMAIN OPERATING SYSTEM

powered by Cloudian and Veritas

Google File System. Arun Sundaram Operating Systems

COM Verification. PRESENTATION TITLE GOES HERE Alan G. Yoder, Ph.D. SNIA Technical Council Huawei Technologies, LLC

Offloaded Data Transfers (ODX) Virtual Fibre Channel for Hyper-V. Application storage support through SMB 3.0. Storage Spaces

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

Vendor: IBM. Exam Code: Exam Name: Storage Sales V2. Version: DEMO

Quest DR Series Disk Backup Appliances

TIBX NEXT-GENERATION ARCHIVE FORMAT IN ACRONIS BACKUP CLOUD

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM

Cold Storage: The Road to Enterprise Ilya Kuznetsov YADRO

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Network and storage settings of ES NAS high-availability network storage services

WHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper

IT Certification Exams Provider! Weofferfreeupdateserviceforoneyear! h ps://

Introducing SUSE Enterprise Storage 5

Your World is Hybrid: Protecting your VMs with Veeam and HPE Storage. Federico Venier HPE Storage Technical marketing

Setting Up the Dell DR Series System on Veeam

ADVANCED DEDUPLICATION CONCEPTS. Thomas Rivera, BlueArc Gene Nagle, Exar

Quest DR Series Disk Backup Appliances

Chapter 11: File System Implementation. Objectives

Network and storage settings of ES NAS high-availability network storage services

HOW DATA DEDUPLICATION WORKS A WHITE PAPER

CA485 Ray Walshe Google File System

Deduplication and Incremental Accelleration in Bacula with NetApp Technologies. Peter Buschman EMEA PS Consultant September 25th, 2012

HP Data Protector 9.0 Deduplication

NEC Express5800 R320f Fault Tolerant Servers & NEC ExpressCluster Software

ActiveScale Erasure Coding and Self Protecting Technologies

Automated Storage Tiering on Infortrend s ESVA Storage Systems

HPE Data Protector Deduplication

Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

TechTour. Winning Technology Roadmaps. Sig Knapstad. Cutting Edge

Current Topics in OS Research. So, what s hot?

P R O D U C T I N D E P T H. Evaluating Grid Storage for Enterprise Backup, DR and Archiving: NEC HYDRAstor

HP Designing and Implementing HP Enterprise Backup Solutions. Download Full Version :

Benefits of Storage Capacity Optimization Methods (COMs) And. Performance Optimization Methods (POMs)

How to Reduce Data Capacity in Objectbased Storage: Dedup and More

Setting Up the Dell DR Series System as an NFS Target on Amanda Enterprise 3.3.5

Enterprise Cloud Data Protection With Storage Director and IBM Cloud Object Storage October 24, 2017

Veritas Exam VCS-371 Administration of Veritas NetBackup 7.5 for Windows Version: 7.2 [ Total Questions: 207 ]

50 TB. Traditional Storage + Data Protection Architecture. StorSimple Cloud-integrated Storage. Traditional CapEx: $375K Support: $75K per Year

Exadata Database Machine: 12c Administration Workshop Ed 2

Exadata Database Machine: 12c Administration Workshop Ed 2 Duration: 5 Days

The Logic of Physical Garbage Collection in Deduplicating Storage

DEMYSTIFYING DATA DEDUPLICATION A WHITE PAPER

Exadata Database Machine: 12c Administration Workshop Ed 2

Transcription:

HYDRAstor: a Scalable Secondary Storage 7th TF-Storage Meeting September 9 th 00 Łukasz Heldt

Largest Japanese IT company $4 Billion in annual revenue 4,000 staff www.nec.com Polish R&D company 50 engineers and scientists www.9livesdata.com Owns & sells R&D of critical backend component Scalable disk based storage for backup with global deduplication Started in 00 in NEC Labs by Cezary Dubnicki 007 Product of the year award by SearchStorage.com 008 Product innovation award by Network Products Guide 009/00 FAST conference publication in San Jose Sold in US and Japan since 007 Will be sold in Poland in 0 by 9LivesData in coop. with NEC

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Backup storage Tapes are most common, despite: Sensitive environment requirements Unreliable restore Low performance Manual labor or expensive robots Problematic replication

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 4 Backup storage size Usual backup policy 4-+ full backups 7-0+ incremental Majority of data does not change Data compression : Secondary storage size: 5x-0x more than primary storage Includes many copies of the same data Each data chunk stored 5-0+ times

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 5 Backup storage size Usual backup policy 4-+ full backups 7-0+ incremental Majority of data does not change Data compression : Secondary storage size: 5x-0x more than primary storage Includes many copies of the same data Each data chunk stored 5-0+ times High potential for the deduplication technology.

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 6 Deduplication Save disk space by eliminating duplicates Sample reduction ratio 0: (depends on backup policy) Lowers price of gigabyte Sub-file level deduplication File A File B A A B C D E Only unique blocks are stored Stored blocks A B C D E File A A B C

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 7 Global deduplication Prevent silos of deduped data One system to manage Global vs. siloed dedup

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 8 HYDRAstor product Provides global deduplication using DataRedux performance, storage scalability and data resiliency using Distributed Resilient Data

HYDRAstor deployment 9 Interface: CIFS, NFS, Symantec OST Marker filtering for: Tivoli, Netbackup, Networker, CommVault

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 0 HYDRAstor architecture Accelerator Nodes realize performance Storage Nodes realize capacity Accelerator Nodes NFS / CIFS / OST over Ethernet Storage Nodes Internal Network

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC HYDRAstor architecture Accelerator Nodes realize performance Storage Nodes realize capacity Accelerator Nodes NFS / CIFS / OST over Ethernet Internal Network Non-disruptive grid expansion Storage Nodes

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC HYDRAstor scalability MiniHYDRA single server Storage: TB 40 TB* Performance:. TB / hour AN 4SN Storage: 48 TB 960 TB* Performance:.6 TB / hour 0AN 40SN (4 racks) Storage: 480 TB 9600 TB* Performance: 6 TB / hour * - assuming 0x data reduction through DataRedux

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC HYDRAstor scalability Slide from Curtis Preston presentation Curtis Preston is a famous storage analyst owning independent consulting company

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 4 HYDRAstor other features Fully automatic/non-disruptive mgmt Recovery of lost data resiliency Periodic data scrubbing Machine and disk failure recovery Configurable redundancy level erasure coding better than RAID6 Optimized replication Smart resource management

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 5 HYDRAstor backend design Details of the design: http://www.usenix.org/events/fast09/tech/full_papers/dubnicki/dubnicki.pdf

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 6 Programming Model Repository of blocks Content-addressed Immutable Variable-sized hash=0..0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 7 Programming Model Repository of blocks Content-addressed Immutable Variable-sized Exposed pointers to other blocks E 0..0 hash=0..0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 8 Programming Model Repository of blocks Content-addressed hash=00.. Root E Immutable Variable-sized Exposed pointers to other blocks E Trees of blocks E 0..0 hash=0..0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 9 Programming Model Repository of blocks Content-addressed hash=00.. Root E Root E Immutable hash=0..0 Variable-sized Exposed pointers to other blocks Trees of blocks DAGs due to deduplication No cycles possible E E 0..0 0..0 hash=0..0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 0 Programming Model Repository of blocks Content-addressed hash=00.. Root E Root E Immutable hash=0..0 Variable-sized Exposed pointers to other blocks Trees of blocks DAGs due to deduplication No cycles possible E E 0..0 0..0 Deletion of whole trees hash=0..0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Programming Model Repository of blocks Content-addressed hash=00.. Root E Root E Immutable hash=0..0 Variable-sized Exposed pointers to other blocks Trees of blocks DAGs due to deduplication No cycles possible E E 0..0 0..0 Deletion of whole trees hash=0..0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Programming Model Repository of blocks Content-addressed hash=00.. Root E Root E Immutable hash=0..0 Variable-sized Exposed pointers to other blocks Trees of blocks DAGs due to deduplication No cycles possible E E 0..0 0..0 Deletion of whole trees hash=0..0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Programming Model Repository of blocks Content-addressed Root E Immutable hash=0..0 Variable-sized Exposed pointers to other blocks Trees of blocks DAGs due to deduplication No cycles possible E 0..0 0..0 Deletion of whole trees hash=0..0

Decode HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 4 Failure tolerance: erasure coding Example: N=8, m=5 Original block Encode Redundant Fragments Original Fragments Any fragments can be lost

Decode HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 5 Failure tolerance: erasure coding Example: N=8, m=5 Original block Encode Redundant Fragments Original Fragments Any fragments can be lost Assuming disks array Mirror -copy RAID6 Erasure coding Resiliency Overhead 00% 00% 0% 0% %

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 6 Scalability with DHT: data placement Block location: DHT with prefix routing empty prefix 0 00 0 0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 7 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix hash=0..0 empty prefix Block 0 00 0 0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 8 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components Hosted on SNs empty prefix 0 hash=0..0 Block N=4 N components per prefix Node 00 0 0 0 Node Node Node 4 0 0 Node 5 Node 6 0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 9 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components Hosted on SNs empty prefix 0 hash=0..0 Block N=4 N components per prefix Node 00 0 0 0 Store fragments Node Node Node 4 0 0 Node 5 Node 6 0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 0 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components empty prefix hash=0..0 Block Hosted on SNs 0 N=4 N components per prefix Node 00 0 0 0 Store fragments Node Distributed consensus Node Node 4 Node 5 0 0 Node 6 0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components empty prefix hash=0..0 Block Hosted on SNs 0 N=4 N components per prefix Node 00 0 0 0 Store fragments Node Distributed consensus Node Node 4 Node 5 0 0 Node 6 0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components empty prefix hash=0..0 Block Hosted on SNs 0 N=4 N components per prefix Node 00 0 0 Store fragments Node Distributed consensus Node Node 4 Node 5 0 0 0 Node 6 0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components empty prefix hash=0..0 Block Hosted on SNs 0 N=4 N components per prefix Node 00 0 0 Store fragments Node Distributed consensus Node Node 4 Node 5 0 0 0 Node 6 0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 4 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components empty prefix hash=0..0 Block Hosted on SNs 0 N=4 N components per prefix Node 00 0 0 Store fragments Node Distributed consensus Load balancing Node Node 4 Node 5 Node 6 0 0 0 0

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 5 Data organization: synchrun chains A B C D E F G Data stream split to blocks

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 6 Data organization: synchrun chains A B C D E F G 00 0 0 0 000 0 00 Data stream split to blocks es of blocks computed

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 7 Data organization: synchrun chains A B C D E F G 00 0 0 0 000 0 00 Data stream split to blocks es of blocks computed Routing through DHT

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 8 Data organization: synchrun chains A B C D E F G 00 0 Prefix 0 0 0 000 0 00 Data stream split to blocks es of blocks computed Routing through DHT

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 9 Data organization: synchrun chains A B C D E F G 00 0 Prefix 0 0 0 000 Compression 0 00 Data stream split to blocks es of blocks computed Routing through DHT Erasure Coding

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 40 Data organization: synchrun chains 0 A B C D E F G 00 Prefix 0 0 0 0 000 Compression Erasure Coding 0 00 Data stream split to blocks es of blocks computed Routing through DHT Erasure-coded fragments stored by components

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 4 Data organization: synchrun chains A B C D E F G 00 Prefix 0 0 0 0 000 Compression 0 00 Data stream split to blocks es of blocks computed Routing through DHT 0 Erasure Coding A D F Erasure-coded fragments stored by components A D F A D F A D F

Synchrun HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 4 Data organization: synchrun chains A B C D E F G 00 Prefix 0 0 0 0 000 Compression 0 00 Data stream split to blocks es of blocks computed Routing through DHT 0 Erasure Coding Synchrun Synchrun Synchrun A D F Erasure-coded fragments stored by components Grouped into synchruns A D F A D F A D F

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 4 Data organization: synchrun chains A B C D E F G 00 Prefix 0 0 0 0 000 Compression 0 00 Data stream split to blocks es of blocks computed Routing through DHT 0 Erasure Coding Synchrun Synchrun Synchrun A D F Erasure-coded fragments stored by components Grouped into synchruns A D F A D F Containers stored on disks Fragment metadata separately from data A D F Container Synchrun

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 44 Data organization: synchrun chains A B C D E F G 00 Prefix 0 0 0 0 000 Compression 0 00 Data stream split to blocks es of blocks computed Routing through DHT 0 Erasure Coding Synchrun Synchrun Synchrun A D F Erasure-coded fragments stored by components Grouped into synchruns A D F A D F A D F Containers stored on disks Fragment metadata separately from data Ordered synchrun chains Preserve order & locality Container Synchrun Manageable

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 45 Data Services: Identification of data resiliency level Missing fragments 0:0 0: 0: 0:

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 46 Data Services: Identification of data resiliency level 0:0 0: 0: 0: Chain scanning

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 47 Data Services: Identification of data resiliency level 0:0 0: 0: 0: Chain scanning

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 48 Data Services: Identification of data resiliency level 0:0 0: 0: 0: Chain scanning

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 49 Data Services: Identification of data resiliency level 0:0 0: 0: 0: Chain scanning

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 50 Data services: reconstruction 0:0 0: 0: 0: Sequential read/write of entire Containers Erasure decoding and re-encoding

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 5 Data services: reconstruction 0:0 0: 0: 0: Sequential read/write of entire Containers Erasure decoding and re-encoding

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 5 Data services: reconstruction 0:0 0: 0: 0: Sequential read/write of entire Containers Erasure decoding and re-encoding

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 5 Data services: fast data transfer 0:0 0: 0: 0: Old component 0: Location of new node (DHT)

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 54 Data services: fast data transfer 0:0 0: 0: 0: Data transfer Old component 0:

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 55 Data services: fast data transfer 0:0 0: 0: 0: Data transfer Old component 0:

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 56 Data services: fast data transfer 0:0 0: 0: 0: Data transfer Old component 0:

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 57 Data services: fast data transfer 0:0 0: 0: 0: Old component 0:

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 58 Data services for deduplication hash=0.. Block 0:0 Choose complete chain 0: 0: 0: Completeness: definitely not a duplicate Deletion interaction: wasn't the block scheduled for deletion?

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 59 Data services for deduplication hash=0.. Block 0:0 0: Query 0: 0:

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 60 Data services for deduplication hash=0.. Block 0:0 0: 0: 0: Local candidate found

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 6 Data services for deduplication hash=0.. Block 0:0 0: 0: Successful dedup 0: Candidate verification

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 6 On-demand data deletion Distributed garbage collection Per-block reference counter stored perfragment Failure-tolerant Block reference counter calculated independently on peer Container chains Interference with duplicate elimination: duplicates resurrection after garbage collection space reclamation in background

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 6 Resource management Configurable load balancing between: backup/restore background tasks (reconstruction, transfer, etc.) garbage collection Shares depend on system state Assigns priority of tasks automatically e.g. reconstruction before transfer or space reclamation Maximizes resources utilization

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 64 Topics for further discussion Features and technical details of HYDRAstor Sales of HYDRAstor in Poland Cooperation with 9LivesData on other projects

HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 65 Questions? Contact: heldt@9livesdata.com www.9livesdata.com www.hydrastor.com