HYDRAstor: a Scalable Secondary Storage
|
|
- Angelina Atkinson
- 5 years ago
- Views:
Transcription
1 HYDRAstor: a Scalable Secondary Storage 7th USENIX Conference on File and Storage Technologies (FAST '09) February 26 th 2009 C. Dubnicki, L. Gryz, L. Heldt, M. Kaczmarczyk, W. Kilian, P. Strzelczak, J. Szczepkowski, M. Welnicki C. Ungureanu
2 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 2 Scalable secondary storage Characteristics Huge amount of data Small backup windows Duplication between backup streams Reliable, on-line retrieval Varying value of data Requirements - Scalability (dynamic) - Low cost per TB - Very high write performance - Global deduplication - Failure tolerance - High restore performance - Adjust resilience overhead - Data deletion
3 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 3 Scalable secondary storage Characteristics Huge amount of data Small backup windows Duplication between backup streams Reliable, on-line retrieval Varying value of data Requirements - Scalability (dynamic) - Low cost per TB - Very high write performance - Global deduplication - Failure tolerance - High restore performance - Adjust resilience overhead - Data deletion
4 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 4 Scalable secondary storage Characteristics Huge amount of data Small backup windows Duplication between backup streams Reliable, on-line retrieval Varying value of data Requirements - Scalability (dynamic) - Low cost per TB - Very high write performance - Global deduplication - Failure tolerance - High restore performance - Adjust resilience overhead - Data deletion
5 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 5 Scalable secondary storage Characteristics Huge amount of data Small backup windows Duplication between backup streams Reliable, on-line retrieval Varying value of data Requirements - Scalability (dynamic) - Low cost per TB - Very high write performance - Global deduplication - Failure tolerance - High restore performance - Adjust resilience overhead - Data deletion
6 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 6 Scalable secondary storage Characteristics Huge amount of data Small backup windows Duplication between backup streams Reliable, on-line retrieval Varying value of data Requirements - Scalability (dynamic) - Low cost per TB - Very high write performance - Global deduplication - Failure tolerance - High restore performance - Adjust resilience overhead - Data deletion
7 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 7 Challenges High-performance, decentralized global deduplication... in a dynamic, distributed system... with deletion and failures Combination introduces complexity Tension between: Deduplication and dynamic scalability Deduplication and on-demand deletion Failure tolerance and deletion
8 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 8 Satisfies Scalable secondary storage requirements Started as a research project at NEC Laboratories America, in Princeton, NJ Successfully commercialized Today: real-world, commercial system Sold by NEC in the US and Japan Development of back-end continues at 9LivesData, LLC in Warsaw, Poland Spinoff from NEC Laboratories
9 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 9 HYDRAstor functionality Content addressable storage (CAS) Vast data repository Storing and extracting streams of blocks Single system image built of independent nodes Support for standard access methods Filesystem, VTL Dynamic capacity sharing Self-recovery from failures On-demand deletion
10 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 10 Programming Model Repository of blocks Content-addressed Immutable Variable-sized hash=011..0
11 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 11 Programming Model Repository of blocks Content-addressed Immutable Variable-sized Exposed pointers to other blocks E hash=011..0
12 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 12 Programming Model Repository of blocks Content-addressed hash= Root1 E Immutable Variable-sized Exposed pointers to other blocks E Trees of blocks E hash=011..0
13 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 13 Programming Model Repository of blocks Content-addressed hash= Root1 E Root2 E Immutable hash= Variable-sized Exposed pointers to other blocks Trees of blocks DAGs due to deduplication No cycles possible E E hash=011..0
14 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 14 Programming Model Repository of blocks Content-addressed hash= Root1 E Root2 E Immutable hash= Variable-sized Exposed pointers to other blocks Trees of blocks DAGs due to deduplication No cycles possible E E Deletion of whole trees hash=011..0
15 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 15 Programming Model Repository of blocks Content-addressed hash= Root1 E Root2 E Immutable hash= Variable-sized Exposed pointers to other blocks Trees of blocks DAGs due to deduplication No cycles possible E E Deletion of whole trees hash=011..0
16 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 16 Programming Model Repository of blocks Content-addressed hash= Root1 E Root2 E Immutable hash= Variable-sized Exposed pointers to other blocks Trees of blocks DAGs due to deduplication No cycles possible E E Deletion of whole trees hash=011..0
17 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 17 Programming Model Repository of blocks Content-addressed Root2 E Immutable hash= Variable-sized Exposed pointers to other blocks Trees of blocks DAGs due to deduplication No cycles possible E Deletion of whole trees hash=011..0
18 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 18 Architecture overview Standard server-grade hardware running Linux Scalability on data-center level Access Nodes NFS / CIFS Front-end Internal Network Back-end (CAS Layer) Storage Nodes
19 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 19 Requirements on scalable storage Failure tolerance Data organization: selected requirements Required internal data services Identify data resilience reduction Fast data rebuilding High performance Preserve locality of data streams Prefetching Dynamic scalability Decentralized data management Load balancing Fast data transfer to new location Deduplication Location of potential duplicates Availability & resiliency verification On-demand deletion Failure-tolerant, distributed deletion
20 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 20 Requirements on scalable storage Failure tolerance Data organization: selected requirements Required internal data services Identify data resilience reduction Fast data rebuilding High performance Preserve locality of data streams Prefetching Dynamic scalability Decentralized data management Load balancing Fast data transfer to new location Deduplication Location of potential duplicates Availability & resiliency verification On-demand deletion Failure-tolerant, distributed deletion
21 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 21 Requirements on scalable storage Failure tolerance Data organization: selected requirements Required internal data services Identify data resilience reduction Fast data rebuilding High performance Preserve locality of data streams Prefetching Dynamic scalability Decentralized data management Load balancing Fast data transfer to new location Deduplication Location of potential duplicates Availability & resiliency verification On-demand deletion Failure-tolerant, distributed deletion
22 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 22 Requirements on scalable storage Failure tolerance Data organization: selected requirements Required internal data services Identify data resilience reduction Fast data rebuilding High performance Preserve locality of data streams Prefetching Dynamic scalability Decentralized data management Load balancing Fast data transfer to new location Deduplication Location of potential duplicates Availability & resiliency verification On-demand deletion Failure-tolerant, distributed deletion
23 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 23 Requirements on scalable storage Failure tolerance Data organization: selected requirements Required internal data services Identify data resilience reduction Fast data rebuilding High performance Preserve locality of data streams Prefetching Dynamic scalability Decentralized data management Load balancing Fast data transfer to new location Deduplication Location of potential duplicates Availability & resiliency verification On-demand deletion Failure-tolerant, distributed deletion
24 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 24 Failure tolerance: erasure coding Block erasure-coded into N fragments Storage overhead tunable Example: N=8, m=5 Original block Encode Redundant Fragments Original Fragments Decode Any 3 fragments can be lost
25 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 25 Scalability with DHT: data placement Block location: DHT with prefix routing empty prefix
26 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 26 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix hash= empty prefix Block
27 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 27 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components Hosted on SNs empty prefix 0 1 hash= Block N=4 N components per prefix Node Node Node Node Node Node
28 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 28 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components Hosted on SNs empty prefix 0 1 hash= Block N=4 N components per prefix Node Store fragments Node Node Node Node Node
29 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 29 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components empty prefix hash= Block Hosted on SNs 0 1 N=4 N components per prefix Node Store fragments Node Distributed consensus Node 13 Node 14 Node Node
30 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 30 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components empty prefix hash= Block Hosted on SNs 0 1 N=4 N components per prefix Node Store fragments Node Distributed consensus Node 13 Node 14 Node Node
31 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 31 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components empty prefix hash= Block Hosted on SNs 0 1 N=4 N components per prefix Node Store fragments Node Distributed consensus Node 13 Node 14 Node Node
32 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 32 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components empty prefix hash= Block Hosted on SNs 0 1 N=4 N components per prefix Node Store fragments Node Distributed consensus Node 13 Node 14 Node Node
33 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 33 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components empty prefix hash= Block Hosted on SNs 0 1 N=4 N components per prefix Node Store fragments Node Distributed consensus Load balancing Node 13 Node 14 Node 15 Node
34 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 34 Data organization: synchrun chains A B C D E F G Data stream split to blocks
35 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 35 Data organization: synchrun chains A B C D E F G Data stream split to blocks es of blocks computed
36 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 36 Data organization: synchrun chains A B C D E F G Data stream split to blocks es of blocks computed Routing through DHT
37 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 37 Data organization: synchrun chains A B C D E F G Prefix Data stream split to blocks es of blocks computed Routing through DHT
38 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 38 Data organization: synchrun chains A B C D E F G Prefix Compression Data stream split to blocks es of blocks computed Routing through DHT Erasure Coding
39 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 39 Data organization: synchrun chains A B C D E F G 010 Prefix Compression Erasure Coding Data stream split to blocks es of blocks computed Routing through DHT Erasure-coded fragments stored by components
40 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 40 Data organization: synchrun chains A B C D E F G 010 Prefix Compression Data stream split to blocks es of blocks computed Routing through DHT 0 Erasure Coding A D F Erasure-coded fragments stored by components 1 A D F 2 3 A D F A D F
41 Synchrun HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 41 Data organization: synchrun chains A B C D E F G 010 Prefix Compression Data stream split to blocks es of blocks computed Routing through DHT 0 Erasure Coding Synchrun 1 Synchrun 2 Synchrun 3 A D F Erasure-coded fragments stored by components Grouped into synchruns 1 A D F 2 A D F 3 A D F
42 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 42 Data organization: synchrun chains A B C D E F G 010 Prefix Compression Data stream split to blocks es of blocks computed Routing through DHT 0 Erasure Coding Synchrun 1 Synchrun 2 Synchrun 3 A D F Erasure-coded fragments stored by components Grouped into synchruns 1 2 A D F A D F Containers stored on disks Fragment metadata separately from data 3 A D F Container Synchrun
43 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 43 Data organization: synchrun chains A B C D E F G 010 Prefix Compression Data stream split to blocks es of blocks computed Routing through DHT 0 Erasure Coding Synchrun 1 Synchrun 2 Synchrun 3 A D F Erasure-coded fragments stored by components Grouped into synchruns A D F A D F A D F Containers stored on disks Fragment metadata separately from data Ordered synchrun chains Preserve order & locality Container Synchrun Manageable
44 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 44 Synchrun chains in a dynamic system 01:1
45 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 45 System growth: split 01:1 010:1 011:1
46 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 46 System growth: split 01:1 010:1 011:1
47 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 47 System growth: split 01:1 010:1 011:1
48 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 48 Concatenation 01:1 010:1
49 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 49 Concatenation 01:1 010:1 010:1
50 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 50 Marking blocks to reclaim 01:1 010:1 010:1
51 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 51 Space reclamation & Concatenation 01:1 010:1 010:1 010:1
52 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 52 Data Services: Identification of data resiliency level Missing fragments 01:0 01:1 01:2 01:3
53 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 53 Data Services: Identification of data resiliency level 01:0 01:1 01:2 01:3 Chain scanning
54 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 54 Data Services: Identification of data resiliency level 01:0 01:1 01:2 01:3 Chain scanning
55 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 55 Data Services: Identification of data resiliency level 01:0 01:1 01:2 01:3 Chain scanning
56 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 56 Data Services: Identification of data resiliency level 01:0 01:1 01:2 01:3 Chain scanning
57 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 57 Data services: reconstruction 01:0 01:1 01:2 01:3 Sequential read/write of entire Containers Erasure decoding and re-encoding
58 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 58 Data services: reconstruction 01:0 01:1 01:2 01:3 Sequential read/write of entire Containers Erasure decoding and re-encoding
59 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 59 Data services: reconstruction 01:0 01:1 01:2 01:3 Sequential read/write of entire Containers Erasure decoding and re-encoding
60 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 60 Data services: fast data transfer 01:0 01:1 01:2 01:3 Old component 01:3 Location of new node (DHT)
61 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 61 Data services: fast data transfer 01:0 01:1 01:2 01:3 Data transfer Old component 01:3
62 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 62 Data services: fast data transfer 01:0 01:1 01:2 01:3 Data transfer Old component 01:3
63 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 63 Data services: fast data transfer 01:0 01:1 01:2 01:3 Data transfer Old component 01:3
64 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 64 Data services: fast data transfer 01:0 01:1 01:2 01:3 Old component 01:3
65 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 65 Data services for deduplication Global: duplicates detected in entire system DHT routing based on content Inline deduplication: has to be high-performance Prefetching Containers for streams of duplicates Block hashes stored separately
66 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 66 Data services for deduplication hash=011.. Block 01:0 Choose complete chain 01:1 01:2 01:3 Completeness: definitely not a duplicate Deletion interaction: wasn't the block scheduled for deletion?
67 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 67 Data services for deduplication hash=011.. Block 01:0 01:1 Query 01:2 01:3
68 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 68 Data services for deduplication hash=011.. Block 01:0 01:1 01:2 01:3 Local candidate found
69 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 69 Data services for deduplication hash=011.. Block 01:0 01:1 01:2 Successful dedup 01:3 Candidate verification
70 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 70 On-demand data deletion Distributed garbage collection Per-block reference counter stored perfragment Failure-tolerant Block reference counter calculated independently on peer Container chains Interference with duplicate elimination: read-only phase for block tree traversal space reclamation in background
71 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 71 Writes during node failure Writing Reconstruction
72 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 72 Write Scaling nodes added while writing
73 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 73 Write Scaling nodes added while writing
74 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 74 Questions?
HYDRAstor: a Scalable Secondary Storage
HYDRAstor: a Scalable Secondary Storage 7th TF-Storage Meeting September 9 th 00 Łukasz Heldt Largest Japanese IT company $4 Billion in annual revenue 4,000 staff www.nec.com Polish R&D company 50 engineers
More informationHydraFS: a High-Throughput File System for the HYDRAstor Content-Addressable Storage System
HydraFS: a High-Throughput File System for the HYDRAstor Content-Addressable Storage System Cristian Ungureanu, Benjamin Atkin, Akshat Aranya, Salil Gokhale, Steve Rago, Grzegorz Calkowski, Cezary Dubnicki,
More informationScale-out Object Store for PB/hr Backups and Long Term Archive April 24, 2014
Scale-out Object Store for PB/hr Backups and Long Term Archive April 24, 2014 Gideon Senderov Director, Advanced Storage Products NEC Corporation of America Long-Term Data in the Data Center (EB) 140 120
More informationWhite Paper Simplified Backup and Reliable Recovery
Simplified Backup and Reliable Recovery NEC Corporation of America necam.com Overview Amanda Enterprise from Zmanda - A Carbonite company, is a backup and recovery solution that offers fast installation,
More informationScale-out Data Deduplication Architecture
Scale-out Data Deduplication Architecture Gideon Senderov Product Management & Technical Marketing NEC Corporation of America Outline Data Growth and Retention Deduplication Methods Legacy Architecture
More informationScale-Out Architectures for Secondary Storage
Technology Insight Paper Scale-Out Architectures for Secondary Storage NEC is a Pioneer with HYDRAstor By Steve Scully, Sr. Analyst February 2018 Scale-Out Architectures for Secondary Storage 1 Scale-Out
More informationMaximizing Data Efficiency: Benefits of Global Deduplication
Maximizing Data Efficiency: Benefits of Global Deduplication Advanced Storage Products Group Table of Contents 1 Understanding Deduplication 2 Scalability Limitations 4 Scope Limitations 5 Islands of Capacity
More informationHedvig as backup target for Veeam
Hedvig as backup target for Veeam Solution Whitepaper Version 1.0 April 2018 Table of contents Executive overview... 3 Introduction... 3 Solution components... 4 Hedvig... 4 Hedvig Virtual Disk (vdisk)...
More informationReducing The De-linearization of Data Placement to Improve Deduplication Performance
Reducing The De-linearization of Data Placement to Improve Deduplication Performance Yujuan Tan 1, Zhichao Yan 2, Dan Feng 2, E. H.-M. Sha 1,3 1 School of Computer Science & Technology, Chongqing University
More informationDELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE
WHITEPAPER DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE A Detailed Review ABSTRACT While tape has been the dominant storage medium for data protection for decades because of its low cost, it is steadily
More informationData Deduplication Methods for Achieving Data Efficiency
Data Deduplication Methods for Achieving Data Efficiency Matthew Brisse, Quantum Gideon Senderov, NEC... SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies
More informationMilestone Solution Partner IT Infrastructure Components Certification Report
Milestone Solution Partner IT Infrastructure Components Certification Report NEC HYDRAstor 30-05-2016 Table of Contents Introduction... 4 Certified Products... 4 Solution Architecture... 5 Topology...
More informationDeduplication Storage System
Deduplication Storage System Kai Li Charles Fitzmorris Professor, Princeton University & Chief Scientist and Co-Founder, Data Domain, Inc. 03/11/09 The World Is Becoming Data-Centric CERN Tier 0 Business
More informationIBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM
IBM Tivoli Storage Manager Version 7.1.6 Introduction to Data Protection Solutions IBM IBM Tivoli Storage Manager Version 7.1.6 Introduction to Data Protection Solutions IBM Note: Before you use this
More informationFoster B-Trees. Lucas Lersch. M. Sc. Caetano Sauer Advisor
Foster B-Trees Lucas Lersch M. Sc. Caetano Sauer Advisor 14.07.2014 Motivation Foster B-Trees Blink-Trees: multicore concurrency Write-Optimized B-Trees: flash memory large-writes wear leveling defragmentation
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung ACM SIGOPS 2003 {Google Research} Vaibhav Bajpai NDS Seminar 2011 Looking Back time Classics Sun NFS (1985) CMU Andrew FS (1988) Fault
More informationCOM Verification. PRESENTATION TITLE GOES HERE Alan G. Yoder, Ph.D. SNIA Technical Council Huawei Technologies, LLC
COM Verification PRESENTATION TITLE GOES HERE Alan G. Yoder, Ph.D. SNIA Technical Council Huawei Technologies, LLC Outline COM overview How they work Verifying the COMs SNIA Emerald TM Training ~ June
More informationHow to Reduce Data Capacity in Objectbased Storage: Dedup and More
How to Reduce Data Capacity in Objectbased Storage: Dedup and More Dong In Shin G-Cube, Inc. http://g-cube.kr Unstructured Data Explosion A big paradigm shift how to generate and consume data Transactional
More informationADVANCED DEDUPLICATION CONCEPTS. Thomas Rivera, BlueArc Gene Nagle, Exar
ADVANCED DEDUPLICATION CONCEPTS Thomas Rivera, BlueArc Gene Nagle, Exar SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may
More informationCA485 Ray Walshe Google File System
Google File System Overview Google File System is scalable, distributed file system on inexpensive commodity hardware that provides: Fault Tolerance File system runs on hundreds or thousands of storage
More informationIBM Spectrum Protect Version Introduction to Data Protection Solutions IBM
IBM Spectrum Protect Version 8.1.2 Introduction to Data Protection Solutions IBM IBM Spectrum Protect Version 8.1.2 Introduction to Data Protection Solutions IBM Note: Before you use this information
More informationDEMYSTIFYING DATA DEDUPLICATION A WHITE PAPER
DEMYSTIFYING DATA DEDUPLICATION A WHITE PAPER DEMYSTIFYING DATA DEDUPLICATION ABSTRACT While data redundancy was once an acceptable operational part of the backup process, the rapid growth of digital content
More informationADVANCED DATA REDUCTION CONCEPTS
ADVANCED DATA REDUCTION CONCEPTS Thomas Rivera, Hitachi Data Systems Gene Nagle, BridgeSTOR Author: Thomas Rivera, Hitachi Data Systems Author: Gene Nagle, BridgeSTOR SNIA Legal Notice The material contained
More informationCopyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.
1 Using patented high-speed inline deduplication technology, Data Domain systems identify redundant data as they are being stored, creating a storage foot print that is 10X 30X smaller on average than
More informationPurity: building fast, highly-available enterprise flash storage from commodity components
Purity: building fast, highly-available enterprise flash storage from commodity components J. Colgrove, J. Davis, J. Hayes, E. Miller, C. Sandvig, R. Sears, A. Tamches, N. Vachharajani, and F. Wang 0 Gala
More informationIn-line Deduplication for Cloud storage to Reduce Fragmentation by using Historical Knowledge
In-line Deduplication for Cloud storage to Reduce Fragmentation by using Historical Knowledge Smitha.M. S, Prof. Janardhan Singh Mtech Computer Networking, Associate Professor Department of CSE, Cambridge
More informationSolidFire and Ceph Architectural Comparison
The All-Flash Array Built for the Next Generation Data Center SolidFire and Ceph Architectural Comparison July 2014 Overview When comparing the architecture for Ceph and SolidFire, it is clear that both
More informationVeeam with Cohesity Data Platform
Veeam with Cohesity Data Platform Table of Contents About This Guide: 2 Data Protection for VMware Environments: 2 Benefits of using the Cohesity Data Platform with Veeam Backup & Replication: 4 Appendix
More informationData Reduction Meets Reality What to Expect From Data Reduction
Data Reduction Meets Reality What to Expect From Data Reduction Doug Barbian and Martin Murrey Oracle Corporation Thursday August 11, 2011 9961: Data Reduction Meets Reality Introduction Data deduplication
More informationMulti-level Selective Deduplication for VM Snapshots in Cloud Storage
Multi-level Selective Deduplication for VM Snapshots in Cloud Storage Wei Zhang, Hong Tang, Hao Jiang, Tao Yang, Xiaogang Li, Yue Zeng Dept. of Computer Science, UC Santa Barbara. Email: {wei, tyang}@cs.ucsb.edu
More informationWhite paper ETERNUS CS800 Data Deduplication Background
White paper ETERNUS CS800 - Data Deduplication Background This paper describes the process of Data Deduplication inside of ETERNUS CS800 in detail. The target group consists of presales, administrators,
More informationThe What, Why and How of the Pure Storage Enterprise Flash Array. Ethan L. Miller (and a cast of dozens at Pure Storage)
The What, Why and How of the Pure Storage Enterprise Flash Array Ethan L. Miller (and a cast of dozens at Pure Storage) Enterprise storage: $30B market built on disk Key players: EMC, NetApp, HP, etc.
More informationHICAMP Bitmap. A Space-Efficient Updatable Bitmap Index for In-Memory Databases! Bo Wang, Heiner Litz, David R. Cheriton Stanford University DAMON 14
HICAMP Bitmap A Space-Efficient Updatable Bitmap Index for In-Memory Databases! Bo Wang, Heiner Litz, David R. Cheriton Stanford University DAMON 14 Database Indexing Databases use precomputed indexes
More informationSoftware-defined Storage: Fast, Safe and Efficient
Software-defined Storage: Fast, Safe and Efficient TRY NOW Thanks to Blockchain and Intel Intelligent Storage Acceleration Library Every piece of data is required to be stored somewhere. We all know about
More informationWHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION?
WHAT IS FALCONSTOR? FAQS FalconStor Optimized Backup and Deduplication is the industry s market-leading virtual tape and LAN-based deduplication solution, unmatched in performance and scalability. With
More informationThe Google File System (GFS)
1 The Google File System (GFS) CS60002: Distributed Systems Antonio Bruto da Costa Ph.D. Student, Formal Methods Lab, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur 2 Design constraints
More information! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like
Cloud background Google File System! Warehouse scale systems " 10K-100K nodes " 50MW (1 MW = 1,000 houses) " Power efficient! Located near cheap power! Passive cooling! Power Usage Effectiveness = Total
More informationGoogle File System. Arun Sundaram Operating Systems
Arun Sundaram Operating Systems 1 Assumptions GFS built with commodity hardware GFS stores a modest number of large files A few million files, each typically 100MB or larger (Multi-GB files are common)
More informationActiveScale Erasure Coding and Self Protecting Technologies
WHITE PAPER AUGUST 2018 ActiveScale Erasure Coding and Self Protecting Technologies BitSpread Erasure Coding and BitDynamics Data Integrity and Repair Technologies within The ActiveScale Object Storage
More informationMicrosoft DPM Meets BridgeSTOR Advanced Data Reduction and Security
2011 Microsoft DPM Meets BridgeSTOR Advanced Data Reduction and Security BridgeSTOR Deduplication, Compression, Thin Provisioning and Encryption Transform DPM from Good to Great BridgeSTOR, LLC 4/4/2011
More informationThe Logic of Physical Garbage Collection in Deduplicating Storage
The Logic of Physical Garbage Collection in Deduplicating Storage Fred Douglis Abhinav Duggal Philip Shilane Tony Wong Dell EMC Shiqin Yan University of Chicago Fabiano Botelho Rubrik 1 Deduplication in
More informationTIBX NEXT-GENERATION ARCHIVE FORMAT IN ACRONIS BACKUP CLOUD
TIBX NEXT-GENERATION ARCHIVE FORMAT IN ACRONIS BACKUP CLOUD 1 Backup Speed and Reliability Are the Top Data Protection Mandates What are the top data protection mandates from your organization s IT leadership?
More informationGFS: The Google File System. Dr. Yingwu Zhu
GFS: The Google File System Dr. Yingwu Zhu Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one big CPU More storage, CPU required than one PC can
More informationDeduplication File System & Course Review
Deduplication File System & Course Review Kai Li 12/13/13 Topics u Deduplication File System u Review 12/13/13 2 Storage Tiers of A Tradi/onal Data Center $$$$ Mirrored storage $$$ Dedicated Fibre Clients
More informationWHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper
WHITE PAPER DATA DEDUPLICATION BACKGROUND: A Technical White Paper CONTENTS Data Deduplication Multiple Data Sets from a Common Storage Pool.......................3 Fixed-Length Blocks vs. Variable-Length
More informationChunkStash: Speeding Up Storage Deduplication using Flash Memory
ChunkStash: Speeding Up Storage Deduplication using Flash Memory Biplob Debnath +, Sudipta Sengupta *, Jin Li * * Microsoft Research, Redmond (USA) + Univ. of Minnesota, Twin Cities (USA) Deduplication
More informationThe World s Fastest Backup Systems
3 The World s Fastest Backup Systems Erwin Freisleben BRS Presales Austria 4 EMC Data Domain: Leadership and Innovation A history of industry firsts 2003 2004 2005 2006 2007 2008 2009 2010 2011 First deduplication
More informationSMORE: A Cold Data Object Store for SMR Drives
SMORE: A Cold Data Object Store for SMR Drives Peter Macko, Xiongzi Ge, John Haskins Jr.*, James Kelley, David Slik, Keith A. Smith, and Maxim G. Smith Advanced Technology Group NetApp, Inc. * Qualcomm
More informationIBM řešení pro větší efektivitu ve správě dat - Store more with less
IBM řešení pro větší efektivitu ve správě dat - Store more with less IDG StorageWorld 2012 Rudolf Hruška Information Infrastructure Leader IBM Systems & Technology Group rudolf_hruska@cz.ibm.com IBM Agenda
More informationFile systems CS 241. May 2, University of Illinois
File systems CS 241 May 2, 2014 University of Illinois 1 Announcements Finals approaching, know your times and conflicts Ours: Friday May 16, 8-11 am Inform us by Wed May 7 if you have to take a conflict
More informationCLOUD-SCALE FILE SYSTEMS
Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients
More informationIME Infinite Memory Engine Technical Overview
1 1 IME Infinite Memory Engine Technical Overview 2 Bandwidth, IOPs single NVMe drive 3 What does Flash mean for Storage? It's a new fundamental device for storing bits. We must treat it different from
More informationDelta Compressed and Deduplicated Storage Using Stream-Informed Locality
Delta Compressed and Deduplicated Storage Using Stream-Informed Locality Philip Shilane, Grant Wallace, Mark Huang, and Windsor Hsu Backup Recovery Systems Division EMC Corporation Abstract For backup
More informationECS High Availability Design
ECS High Availability Design March 2018 A Dell EMC white paper Revisions Date Mar 2018 Aug 2017 July 2017 Description Version 1.2 - Updated to include ECS version 3.2 content Version 1.1 - Updated to include
More informationHP D2D & STOREONCE OVERVIEW
HP D2D & STOREONCE OVERVIEW Robert Clifford Pre-Sales Storage Consultant 1 Data loss (Recovery Point Objective) Let s put it in context.. Last Transaction Increased availability Mission critical support
More informationThe Google File System
October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google SOSP 03, October 19 22, 2003, New York, USA Hyeon-Gyu Lee, and Yeong-Jae Woo Memory & Storage Architecture Lab. School
More informationDisclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme
STO1926BU A Day in the Life of a VSAN I/O Diving in to the I/O Flow of vsan John Nicholson (@lost_signal) Pete Koehler (@vmpete) VMworld 2017 Content: Not for publication #VMworld #STO1926BU Disclaimer
More informationOperating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017
Operating Systems Lecture 7.2 - File system implementation Adrien Krähenbühl Master of Computer Science PUF - Hồ Chí Minh 2016/2017 Design FAT or indexed allocation? UFS, FFS & Ext2 Journaling with Ext3
More informationCohesity SpanFS and SnapTree. White Paper
Cohesity SpanFS and SnapTree White Paper Cohesity SpanFS TM and SnapTree TM Web-Scale File System, Designed for Secondary Storage in the Cloud Era Why Do Enterprises Need a New File System? The Enterprise
More informationCOS 318: Operating Systems. NSF, Snapshot, Dedup and Review
COS 318: Operating Systems NSF, Snapshot, Dedup and Review Topics! NFS! Case Study: NetApp File System! Deduplication storage system! Course review 2 Network File System! Sun introduced NFS v2 in early
More informationBackup and archiving need not to create headaches new pain relievers are around
Backup and archiving need not to create headaches new pain relievers are around Frank Reichart Senior Director Product Marketing Storage Copyright 2012 FUJITSU Hot Spots in Data Protection 1 Copyright
More informationIntroducing Tegile. Company Overview. Product Overview. Solutions & Use Cases. Partnering with Tegile
Tegile Systems 1 Introducing Tegile Company Overview Product Overview Solutions & Use Cases Partnering with Tegile 2 Company Overview Company Overview Te gile - [tey-jile] Tegile = technology + agile Founded
More informationarxiv: v3 [cs.dc] 27 Jun 2013
RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups arxiv:1302.0621v3 [cs.dc] 27 Jun 2013 Chun-Ho Ng and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong
More informationTom Sas HP. Author: SNIA - Data Protection & Capacity Optimization (DPCO) Committee
Advanced PRESENTATION Data Reduction TITLE GOES HERE Concepts Tom Sas HP Author: SNIA - Data Protection & Capacity Optimization (DPCO) Committee SNIA Legal Notice The material contained in this tutorial
More informationCumulus: Filesystem Backup to the Cloud
Cumulus: Filesystem Backup to the Cloud 7th USENIX Conference on File and Storage Technologies (FAST 09) Michael Vrable Stefan Savage Geoffrey M. Voelker University of California, San Diego February 26,
More informationFile Systems Management and Examples
File Systems Management and Examples Today! Efficiency, performance, recovery! Examples Next! Distributed systems Disk space management! Once decided to store a file as sequence of blocks What s the size
More informationA Review on Backup-up Practices using Deduplication
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 9, September 2015,
More informationCold Storage: The Road to Enterprise Ilya Kuznetsov YADRO
Cold Storage: The Road to Enterprise Ilya Kuznetsov YADRO Agenda Technical challenge Custom product Growth of aspirations Enterprise requirements Making an enterprise cold storage product 2 Technical Challenge
More informationDistributed Filesystem
Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the
More informationFGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance
FGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance Yujuan Tan, Jian Wen, Zhichao Yan, Hong Jiang, Witawas Srisa-an, aiping Wang, Hao Luo College of Computer Science, Chongqing
More informationAdministração e Optimização de Bases de Dados 2012/2013 Index Tuning
Administração e Optimização de Bases de Dados 2012/2013 Index Tuning Bruno Martins DEI@Técnico e DMIR@INESC-ID Index An index is a data structure that supports efficient access to data Condition on Index
More informationNPTEL Course Jan K. Gopinath Indian Institute of Science
Storage Systems NPTEL Course Jan 2012 (Lecture 39) K. Gopinath Indian Institute of Science Google File System Non-Posix scalable distr file system for large distr dataintensive applications performance,
More informationTechnology Insight Series
IBM ProtecTIER Deduplication for z/os John Webster March 04, 2010 Technology Insight Series Evaluator Group Copyright 2010 Evaluator Group, Inc. All rights reserved. Announcement Summary The many data
More informationICS Principles of Operating Systems
ICS 143 - Principles of Operating Systems Lectures 17-20 - FileSystem Interface and Implementation Prof. Ardalan Amiri Sani Prof. Nalini Venkatasubramanian ardalan@ics.uci.edu nalini@ics.uci.edu Outline
More informationGoogle File System, Replication. Amin Vahdat CSE 123b May 23, 2006
Google File System, Replication Amin Vahdat CSE 123b May 23, 2006 Annoucements Third assignment available today Due date June 9, 5 pm Final exam, June 14, 11:30-2:30 Google File System (thanks to Mahesh
More informationDesign Tradeoffs for Data Deduplication Performance in Backup Workloads
Design Tradeoffs for Data Deduplication Performance in Backup Workloads Min Fu,DanFeng,YuHua,XubinHe, Zuoning Chen *, Wen Xia,YuchengZhang,YujuanTan Huazhong University of Science and Technology Virginia
More informationCOMP 530: Operating Systems File Systems: Fundamentals
File Systems: Fundamentals Don Porter Portions courtesy Emmett Witchel 1 Files What is a file? A named collection of related information recorded on secondary storage (e.g., disks) File attributes Name,
More informationDistributed File Systems II
Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation
More informationENCRYPTED DATA MANAGEMENT WITH DEDUPLICATION IN CLOUD COMPUTING
ENCRYPTED DATA MANAGEMENT WITH DEDUPLICATION IN CLOUD COMPUTING S KEERTHI 1*, MADHAVA REDDY A 2* 1. II.M.Tech, Dept of CSE, AM Reddy Memorial College of Engineering & Technology, Petlurivaripalem. 2. Assoc.
More informationLattice: A Decentralized, Distributed Datastore
1 Lattice: A Decentralized, Distributed Datastore Computes, inc. February 2018 Abstract Lattice is a decentralized, distributed datastore that is used to build a distributed work queue, ledger, and general-purpose
More informationActiveScale Erasure Coding and Self Protecting Technologies
NOVEMBER 2017 ActiveScale Erasure Coding and Self Protecting Technologies BitSpread Erasure Coding and BitDynamics Data Integrity and Repair Technologies within The ActiveScale Object Storage System Software
More informationVeeam and HP: Meet your backup data protection goals
Sponsored by Veeam and HP: Meet your backup data protection goals Eric Machabert Сonsultant and virtualization expert Introduction With virtualization systems becoming mainstream in recent years, backups
More informationSpeeding Up Cloud/Server Applications Using Flash Memory
Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta and Jin Li Microsoft Research, Redmond, WA, USA Contains work that is joint with Biplob Debnath (Univ. of Minnesota) Flash Memory
More informationChapter 7: File-System
Chapter 7: File-System Interface and Implementation Chapter 7: File-System Interface and Implementation File Concept File-System Structure Access Methods File-System Implementation Directory Structure
More informationEMC DATA DOMAIN PRODUCT OvERvIEW
EMC DATA DOMAIN PRODUCT OvERvIEW Deduplication storage for next-generation backup and archive Essentials Scalable Deduplication Fast, inline deduplication Provides up to 65 PBs of logical storage for long-term
More informationGFS: The Google File System
GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one
More informationCS720 - Operating Systems
CS720 - Operating Systems File Systems File Concept Access Methods Directory Structure File System Mounting File Sharing - Protection 1 File Concept Contiguous logical address space Types: Data numeric
More informationReducing Replication Bandwidth for Distributed Document Databases
Reducing Replication Bandwidth for Distributed Document Databases Lianghong Xu 1, Andy Pavlo 1, Sudipta Sengupta 2 Jin Li 2, Greg Ganger 1 Carnegie Mellon University 1, Microsoft Research 2 Document-oriented
More informationFLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568
FLAT DATACENTER STORAGE Paper-3 Presenter-Pratik Bhatt fx6568 FDS Main discussion points A cluster storage system Stores giant "blobs" - 128-bit ID, multi-megabyte content Clients and servers connected
More informationToday CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space
Today CSCI 5105 Coda GFS PAST Instructor: Abhishek Chandra 2 Coda Main Goals: Availability: Work in the presence of disconnection Scalability: Support large number of users Successor of Andrew File System
More informationThe term "physical drive" refers to a single hard disk module. Figure 1. Physical Drive
HP NetRAID Tutorial RAID Overview HP NetRAID Series adapters let you link multiple hard disk drives together and write data across them as if they were one large drive. With the HP NetRAID Series adapter,
More informationBigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng
Bigtable: A Distributed Storage System for Structured Data Andrew Hon, Phyllis Lau, Justin Ng What is Bigtable? - A storage system for managing structured data - Used in 60+ Google services - Motivation:
More informationSED: SCALABLE & EFFICIENT DE-DUPLICATION FILE SYSTEM FOR VIRTUAL MACHINE IMAGES
SED: SCALABLE & EFFICIENT DE-DUPLICATION FILE SYSTEM FOR VIRTUAL MACHINE IMAGES Author Name: Poonguzhali. S B. Tech, Information Technology, M.A.M College of Engineering, Tiruchirapalli. kuzhalit@gmail.com
More informationDesign of Global Data Deduplication for A Scale-out Distributed Storage System
218 IEEE 38th International Conference on Distributed Computing Systems Design of Global Data Deduplication for A Scale-out Distributed Storage System Myoungwon Oh, Sejin Park, Jungyeon Yoon, Sangjae Kim,
More informationClotho: Transparent Data Versioning at the Block I/O Level
Clotho: Transparent Data Versioning at the Block I/O Level Michail Flouris Dept. of Computer Science University of Toronto flouris@cs.toronto.edu Angelos Bilas ICS- FORTH & University of Crete bilas@ics.forth.gr
More informationTechnical White Paper: IntelliFlash Architecture
Executive Summary... 2 IntelliFlash OS... 3 Achieving High Performance & High Capacity... 3 Write Cache... 4 Read Cache... 5 Metadata Acceleration... 5 Data Reduction... 6 Enterprise Resiliency & Capabilities...
More informationIndex Tuning. Index. An index is a data structure that supports efficient access to data. Matching records. Condition on attribute value
Index Tuning AOBD07/08 Index An index is a data structure that supports efficient access to data Condition on attribute value index Set of Records Matching records (search key) 1 Performance Issues Type
More information50 TB. Traditional Storage + Data Protection Architecture. StorSimple Cloud-integrated Storage. Traditional CapEx: $375K Support: $75K per Year
Compelling Economics: Traditional Storage vs. StorSimple Traditional Storage + Data Protection Architecture StorSimple Cloud-integrated Storage Servers Servers Primary Volume Disk Array ($100K; Double
More informationData De-duplication for Distributed Segmented Parallel FS
Data De-duplication for Distributed Segmented Parallel FS Boris Zuckerman & Oskar Batuner Hewlett-Packard Co. Objectives Expose fundamentals of highly distributed segmented parallel file system architecture
More information