HYDRAstor: a Scalable Secondary Storage 7th TF-Storage Meeting September 9 th 00 Łukasz Heldt
Largest Japanese IT company $4 Billion in annual revenue 4,000 staff www.nec.com Polish R&D company 50 engineers and scientists www.9livesdata.com Owns & sells R&D of critical backend component Scalable disk based storage for backup with global deduplication Started in 00 in NEC Labs by Cezary Dubnicki 007 Product of the year award by SearchStorage.com 008 Product innovation award by Network Products Guide 009/00 FAST conference publication in San Jose Sold in US and Japan since 007 Will be sold in Poland in 0 by 9LivesData in coop. with NEC
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Backup storage Tapes are most common, despite: Sensitive environment requirements Unreliable restore Low performance Manual labor or expensive robots Problematic replication
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 4 Backup storage size Usual backup policy 4-+ full backups 7-0+ incremental Majority of data does not change Data compression : Secondary storage size: 5x-0x more than primary storage Includes many copies of the same data Each data chunk stored 5-0+ times
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 5 Backup storage size Usual backup policy 4-+ full backups 7-0+ incremental Majority of data does not change Data compression : Secondary storage size: 5x-0x more than primary storage Includes many copies of the same data Each data chunk stored 5-0+ times High potential for the deduplication technology.
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 6 Deduplication Save disk space by eliminating duplicates Sample reduction ratio 0: (depends on backup policy) Lowers price of gigabyte Sub-file level deduplication File A File B A A B C D E Only unique blocks are stored Stored blocks A B C D E File A A B C
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 7 Global deduplication Prevent silos of deduped data One system to manage Global vs. siloed dedup
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 8 HYDRAstor product Provides global deduplication using DataRedux performance, storage scalability and data resiliency using Distributed Resilient Data
HYDRAstor deployment 9 Interface: CIFS, NFS, Symantec OST Marker filtering for: Tivoli, Netbackup, Networker, CommVault
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 0 HYDRAstor architecture Accelerator Nodes realize performance Storage Nodes realize capacity Accelerator Nodes NFS / CIFS / OST over Ethernet Storage Nodes Internal Network
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC HYDRAstor architecture Accelerator Nodes realize performance Storage Nodes realize capacity Accelerator Nodes NFS / CIFS / OST over Ethernet Internal Network Non-disruptive grid expansion Storage Nodes
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC HYDRAstor scalability MiniHYDRA single server Storage: TB 40 TB* Performance:. TB / hour AN 4SN Storage: 48 TB 960 TB* Performance:.6 TB / hour 0AN 40SN (4 racks) Storage: 480 TB 9600 TB* Performance: 6 TB / hour * - assuming 0x data reduction through DataRedux
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC HYDRAstor scalability Slide from Curtis Preston presentation Curtis Preston is a famous storage analyst owning independent consulting company
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 4 HYDRAstor other features Fully automatic/non-disruptive mgmt Recovery of lost data resiliency Periodic data scrubbing Machine and disk failure recovery Configurable redundancy level erasure coding better than RAID6 Optimized replication Smart resource management
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 5 HYDRAstor backend design Details of the design: http://www.usenix.org/events/fast09/tech/full_papers/dubnicki/dubnicki.pdf
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 6 Programming Model Repository of blocks Content-addressed Immutable Variable-sized hash=0..0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 7 Programming Model Repository of blocks Content-addressed Immutable Variable-sized Exposed pointers to other blocks E 0..0 hash=0..0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 8 Programming Model Repository of blocks Content-addressed hash=00.. Root E Immutable Variable-sized Exposed pointers to other blocks E Trees of blocks E 0..0 hash=0..0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 9 Programming Model Repository of blocks Content-addressed hash=00.. Root E Root E Immutable hash=0..0 Variable-sized Exposed pointers to other blocks Trees of blocks DAGs due to deduplication No cycles possible E E 0..0 0..0 hash=0..0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 0 Programming Model Repository of blocks Content-addressed hash=00.. Root E Root E Immutable hash=0..0 Variable-sized Exposed pointers to other blocks Trees of blocks DAGs due to deduplication No cycles possible E E 0..0 0..0 Deletion of whole trees hash=0..0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Programming Model Repository of blocks Content-addressed hash=00.. Root E Root E Immutable hash=0..0 Variable-sized Exposed pointers to other blocks Trees of blocks DAGs due to deduplication No cycles possible E E 0..0 0..0 Deletion of whole trees hash=0..0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Programming Model Repository of blocks Content-addressed hash=00.. Root E Root E Immutable hash=0..0 Variable-sized Exposed pointers to other blocks Trees of blocks DAGs due to deduplication No cycles possible E E 0..0 0..0 Deletion of whole trees hash=0..0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Programming Model Repository of blocks Content-addressed Root E Immutable hash=0..0 Variable-sized Exposed pointers to other blocks Trees of blocks DAGs due to deduplication No cycles possible E 0..0 0..0 Deletion of whole trees hash=0..0
Decode HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 4 Failure tolerance: erasure coding Example: N=8, m=5 Original block Encode Redundant Fragments Original Fragments Any fragments can be lost
Decode HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 5 Failure tolerance: erasure coding Example: N=8, m=5 Original block Encode Redundant Fragments Original Fragments Any fragments can be lost Assuming disks array Mirror -copy RAID6 Erasure coding Resiliency Overhead 00% 00% 0% 0% %
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 6 Scalability with DHT: data placement Block location: DHT with prefix routing empty prefix 0 00 0 0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 7 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix hash=0..0 empty prefix Block 0 00 0 0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 8 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components Hosted on SNs empty prefix 0 hash=0..0 Block N=4 N components per prefix Node 00 0 0 0 Node Node Node 4 0 0 Node 5 Node 6 0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 9 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components Hosted on SNs empty prefix 0 hash=0..0 Block N=4 N components per prefix Node 00 0 0 0 Store fragments Node Node Node 4 0 0 Node 5 Node 6 0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 0 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components empty prefix hash=0..0 Block Hosted on SNs 0 N=4 N components per prefix Node 00 0 0 0 Store fragments Node Distributed consensus Node Node 4 Node 5 0 0 Node 6 0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components empty prefix hash=0..0 Block Hosted on SNs 0 N=4 N components per prefix Node 00 0 0 0 Store fragments Node Distributed consensus Node Node 4 Node 5 0 0 Node 6 0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components empty prefix hash=0..0 Block Hosted on SNs 0 N=4 N components per prefix Node 00 0 0 Store fragments Node Distributed consensus Node Node 4 Node 5 0 0 0 Node 6 0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components empty prefix hash=0..0 Block Hosted on SNs 0 N=4 N components per prefix Node 00 0 0 Store fragments Node Distributed consensus Node Node 4 Node 5 0 0 0 Node 6 0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 4 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components empty prefix hash=0..0 Block Hosted on SNs 0 N=4 N components per prefix Node 00 0 0 Store fragments Node Distributed consensus Load balancing Node Node 4 Node 5 Node 6 0 0 0 0
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 5 Data organization: synchrun chains A B C D E F G Data stream split to blocks
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 6 Data organization: synchrun chains A B C D E F G 00 0 0 0 000 0 00 Data stream split to blocks es of blocks computed
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 7 Data organization: synchrun chains A B C D E F G 00 0 0 0 000 0 00 Data stream split to blocks es of blocks computed Routing through DHT
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 8 Data organization: synchrun chains A B C D E F G 00 0 Prefix 0 0 0 000 0 00 Data stream split to blocks es of blocks computed Routing through DHT
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 9 Data organization: synchrun chains A B C D E F G 00 0 Prefix 0 0 0 000 Compression 0 00 Data stream split to blocks es of blocks computed Routing through DHT Erasure Coding
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 40 Data organization: synchrun chains 0 A B C D E F G 00 Prefix 0 0 0 0 000 Compression Erasure Coding 0 00 Data stream split to blocks es of blocks computed Routing through DHT Erasure-coded fragments stored by components
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 4 Data organization: synchrun chains A B C D E F G 00 Prefix 0 0 0 0 000 Compression 0 00 Data stream split to blocks es of blocks computed Routing through DHT 0 Erasure Coding A D F Erasure-coded fragments stored by components A D F A D F A D F
Synchrun HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 4 Data organization: synchrun chains A B C D E F G 00 Prefix 0 0 0 0 000 Compression 0 00 Data stream split to blocks es of blocks computed Routing through DHT 0 Erasure Coding Synchrun Synchrun Synchrun A D F Erasure-coded fragments stored by components Grouped into synchruns A D F A D F A D F
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 4 Data organization: synchrun chains A B C D E F G 00 Prefix 0 0 0 0 000 Compression 0 00 Data stream split to blocks es of blocks computed Routing through DHT 0 Erasure Coding Synchrun Synchrun Synchrun A D F Erasure-coded fragments stored by components Grouped into synchruns A D F A D F Containers stored on disks Fragment metadata separately from data A D F Container Synchrun
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 44 Data organization: synchrun chains A B C D E F G 00 Prefix 0 0 0 0 000 Compression 0 00 Data stream split to blocks es of blocks computed Routing through DHT 0 Erasure Coding Synchrun Synchrun Synchrun A D F Erasure-coded fragments stored by components Grouped into synchruns A D F A D F A D F Containers stored on disks Fragment metadata separately from data Ordered synchrun chains Preserve order & locality Container Synchrun Manageable
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 45 Data Services: Identification of data resiliency level Missing fragments 0:0 0: 0: 0:
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 46 Data Services: Identification of data resiliency level 0:0 0: 0: 0: Chain scanning
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 47 Data Services: Identification of data resiliency level 0:0 0: 0: 0: Chain scanning
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 48 Data Services: Identification of data resiliency level 0:0 0: 0: 0: Chain scanning
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 49 Data Services: Identification of data resiliency level 0:0 0: 0: 0: Chain scanning
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 50 Data services: reconstruction 0:0 0: 0: 0: Sequential read/write of entire Containers Erasure decoding and re-encoding
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 5 Data services: reconstruction 0:0 0: 0: 0: Sequential read/write of entire Containers Erasure decoding and re-encoding
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 5 Data services: reconstruction 0:0 0: 0: 0: Sequential read/write of entire Containers Erasure decoding and re-encoding
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 5 Data services: fast data transfer 0:0 0: 0: 0: Old component 0: Location of new node (DHT)
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 54 Data services: fast data transfer 0:0 0: 0: 0: Data transfer Old component 0:
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 55 Data services: fast data transfer 0:0 0: 0: 0: Data transfer Old component 0:
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 56 Data services: fast data transfer 0:0 0: 0: 0: Data transfer Old component 0:
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 57 Data services: fast data transfer 0:0 0: 0: 0: Old component 0:
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 58 Data services for deduplication hash=0.. Block 0:0 Choose complete chain 0: 0: 0: Completeness: definitely not a duplicate Deletion interaction: wasn't the block scheduled for deletion?
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 59 Data services for deduplication hash=0.. Block 0:0 0: Query 0: 0:
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 60 Data services for deduplication hash=0.. Block 0:0 0: 0: 0: Local candidate found
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 6 Data services for deduplication hash=0.. Block 0:0 0: 0: Successful dedup 0: Candidate verification
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 6 On-demand data deletion Distributed garbage collection Per-block reference counter stored perfragment Failure-tolerant Block reference counter calculated independently on peer Container chains Interference with duplicate elimination: duplicates resurrection after garbage collection space reclamation in background
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 6 Resource management Configurable load balancing between: backup/restore background tasks (reconstruction, transfer, etc.) garbage collection Shares depend on system state Assigns priority of tasks automatically e.g. reconstruction before transfer or space reclamation Maximizes resources utilization
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 64 Topics for further discussion Features and technical details of HYDRAstor Sales of HYDRAstor in Poland Cooperation with 9LivesData on other projects
HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 65 Questions? Contact: heldt@9livesdata.com www.9livesdata.com www.hydrastor.com