HYDRAstor: a Scalable Secondary Storage

Size: px
Start display at page:

Download "HYDRAstor: a Scalable Secondary Storage"

Transcription

1 HYDRAstor: a Scalable Secondary Storage 7th USENIX Conference on File and Storage Technologies (FAST '09) February 26 th 2009 C. Dubnicki, L. Gryz, L. Heldt, M. Kaczmarczyk, W. Kilian, P. Strzelczak, J. Szczepkowski, M. Welnicki C. Ungureanu

2 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 2 Scalable secondary storage Characteristics Huge amount of data Small backup windows Duplication between backup streams Reliable, on-line retrieval Varying value of data Requirements - Scalability (dynamic) - Low cost per TB - Very high write performance - Global deduplication - Failure tolerance - High restore performance - Adjust resilience overhead - Data deletion

3 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 3 Scalable secondary storage Characteristics Huge amount of data Small backup windows Duplication between backup streams Reliable, on-line retrieval Varying value of data Requirements - Scalability (dynamic) - Low cost per TB - Very high write performance - Global deduplication - Failure tolerance - High restore performance - Adjust resilience overhead - Data deletion

4 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 4 Scalable secondary storage Characteristics Huge amount of data Small backup windows Duplication between backup streams Reliable, on-line retrieval Varying value of data Requirements - Scalability (dynamic) - Low cost per TB - Very high write performance - Global deduplication - Failure tolerance - High restore performance - Adjust resilience overhead - Data deletion

5 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 5 Scalable secondary storage Characteristics Huge amount of data Small backup windows Duplication between backup streams Reliable, on-line retrieval Varying value of data Requirements - Scalability (dynamic) - Low cost per TB - Very high write performance - Global deduplication - Failure tolerance - High restore performance - Adjust resilience overhead - Data deletion

6 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 6 Scalable secondary storage Characteristics Huge amount of data Small backup windows Duplication between backup streams Reliable, on-line retrieval Varying value of data Requirements - Scalability (dynamic) - Low cost per TB - Very high write performance - Global deduplication - Failure tolerance - High restore performance - Adjust resilience overhead - Data deletion

7 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 7 Challenges High-performance, decentralized global deduplication... in a dynamic, distributed system... with deletion and failures Combination introduces complexity Tension between: Deduplication and dynamic scalability Deduplication and on-demand deletion Failure tolerance and deletion

8 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 8 Satisfies Scalable secondary storage requirements Started as a research project at NEC Laboratories America, in Princeton, NJ Successfully commercialized Today: real-world, commercial system Sold by NEC in the US and Japan Development of back-end continues at 9LivesData, LLC in Warsaw, Poland Spinoff from NEC Laboratories

9 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 9 HYDRAstor functionality Content addressable storage (CAS) Vast data repository Storing and extracting streams of blocks Single system image built of independent nodes Support for standard access methods Filesystem, VTL Dynamic capacity sharing Self-recovery from failures On-demand deletion

10 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 10 Programming Model Repository of blocks Content-addressed Immutable Variable-sized hash=011..0

11 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 11 Programming Model Repository of blocks Content-addressed Immutable Variable-sized Exposed pointers to other blocks E hash=011..0

12 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 12 Programming Model Repository of blocks Content-addressed hash= Root1 E Immutable Variable-sized Exposed pointers to other blocks E Trees of blocks E hash=011..0

13 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 13 Programming Model Repository of blocks Content-addressed hash= Root1 E Root2 E Immutable hash= Variable-sized Exposed pointers to other blocks Trees of blocks DAGs due to deduplication No cycles possible E E hash=011..0

14 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 14 Programming Model Repository of blocks Content-addressed hash= Root1 E Root2 E Immutable hash= Variable-sized Exposed pointers to other blocks Trees of blocks DAGs due to deduplication No cycles possible E E Deletion of whole trees hash=011..0

15 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 15 Programming Model Repository of blocks Content-addressed hash= Root1 E Root2 E Immutable hash= Variable-sized Exposed pointers to other blocks Trees of blocks DAGs due to deduplication No cycles possible E E Deletion of whole trees hash=011..0

16 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 16 Programming Model Repository of blocks Content-addressed hash= Root1 E Root2 E Immutable hash= Variable-sized Exposed pointers to other blocks Trees of blocks DAGs due to deduplication No cycles possible E E Deletion of whole trees hash=011..0

17 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 17 Programming Model Repository of blocks Content-addressed Root2 E Immutable hash= Variable-sized Exposed pointers to other blocks Trees of blocks DAGs due to deduplication No cycles possible E Deletion of whole trees hash=011..0

18 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 18 Architecture overview Standard server-grade hardware running Linux Scalability on data-center level Access Nodes NFS / CIFS Front-end Internal Network Back-end (CAS Layer) Storage Nodes

19 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 19 Requirements on scalable storage Failure tolerance Data organization: selected requirements Required internal data services Identify data resilience reduction Fast data rebuilding High performance Preserve locality of data streams Prefetching Dynamic scalability Decentralized data management Load balancing Fast data transfer to new location Deduplication Location of potential duplicates Availability & resiliency verification On-demand deletion Failure-tolerant, distributed deletion

20 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 20 Requirements on scalable storage Failure tolerance Data organization: selected requirements Required internal data services Identify data resilience reduction Fast data rebuilding High performance Preserve locality of data streams Prefetching Dynamic scalability Decentralized data management Load balancing Fast data transfer to new location Deduplication Location of potential duplicates Availability & resiliency verification On-demand deletion Failure-tolerant, distributed deletion

21 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 21 Requirements on scalable storage Failure tolerance Data organization: selected requirements Required internal data services Identify data resilience reduction Fast data rebuilding High performance Preserve locality of data streams Prefetching Dynamic scalability Decentralized data management Load balancing Fast data transfer to new location Deduplication Location of potential duplicates Availability & resiliency verification On-demand deletion Failure-tolerant, distributed deletion

22 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 22 Requirements on scalable storage Failure tolerance Data organization: selected requirements Required internal data services Identify data resilience reduction Fast data rebuilding High performance Preserve locality of data streams Prefetching Dynamic scalability Decentralized data management Load balancing Fast data transfer to new location Deduplication Location of potential duplicates Availability & resiliency verification On-demand deletion Failure-tolerant, distributed deletion

23 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 23 Requirements on scalable storage Failure tolerance Data organization: selected requirements Required internal data services Identify data resilience reduction Fast data rebuilding High performance Preserve locality of data streams Prefetching Dynamic scalability Decentralized data management Load balancing Fast data transfer to new location Deduplication Location of potential duplicates Availability & resiliency verification On-demand deletion Failure-tolerant, distributed deletion

24 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 24 Failure tolerance: erasure coding Block erasure-coded into N fragments Storage overhead tunable Example: N=8, m=5 Original block Encode Redundant Fragments Original Fragments Decode Any 3 fragments can be lost

25 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 25 Scalability with DHT: data placement Block location: DHT with prefix routing empty prefix

26 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 26 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix hash= empty prefix Block

27 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 27 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components Hosted on SNs empty prefix 0 1 hash= Block N=4 N components per prefix Node Node Node Node Node Node

28 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 28 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components Hosted on SNs empty prefix 0 1 hash= Block N=4 N components per prefix Node Store fragments Node Node Node Node Node

29 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 29 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components empty prefix hash= Block Hosted on SNs 0 1 N=4 N components per prefix Node Store fragments Node Distributed consensus Node 13 Node 14 Node Node

30 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 30 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components empty prefix hash= Block Hosted on SNs 0 1 N=4 N components per prefix Node Store fragments Node Distributed consensus Node 13 Node 14 Node Node

31 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 31 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components empty prefix hash= Block Hosted on SNs 0 1 N=4 N components per prefix Node Store fragments Node Distributed consensus Node 13 Node 14 Node Node

32 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 32 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components empty prefix hash= Block Hosted on SNs 0 1 N=4 N components per prefix Node Store fragments Node Distributed consensus Node 13 Node 14 Node Node

33 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 33 Scalability with DHT: data placement Block location: DHT with prefix routing Block mapped to hash prefix Prefix components empty prefix hash= Block Hosted on SNs 0 1 N=4 N components per prefix Node Store fragments Node Distributed consensus Load balancing Node 13 Node 14 Node 15 Node

34 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 34 Data organization: synchrun chains A B C D E F G Data stream split to blocks

35 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 35 Data organization: synchrun chains A B C D E F G Data stream split to blocks es of blocks computed

36 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 36 Data organization: synchrun chains A B C D E F G Data stream split to blocks es of blocks computed Routing through DHT

37 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 37 Data organization: synchrun chains A B C D E F G Prefix Data stream split to blocks es of blocks computed Routing through DHT

38 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 38 Data organization: synchrun chains A B C D E F G Prefix Compression Data stream split to blocks es of blocks computed Routing through DHT Erasure Coding

39 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 39 Data organization: synchrun chains A B C D E F G 010 Prefix Compression Erasure Coding Data stream split to blocks es of blocks computed Routing through DHT Erasure-coded fragments stored by components

40 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 40 Data organization: synchrun chains A B C D E F G 010 Prefix Compression Data stream split to blocks es of blocks computed Routing through DHT 0 Erasure Coding A D F Erasure-coded fragments stored by components 1 A D F 2 3 A D F A D F

41 Synchrun HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 41 Data organization: synchrun chains A B C D E F G 010 Prefix Compression Data stream split to blocks es of blocks computed Routing through DHT 0 Erasure Coding Synchrun 1 Synchrun 2 Synchrun 3 A D F Erasure-coded fragments stored by components Grouped into synchruns 1 A D F 2 A D F 3 A D F

42 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 42 Data organization: synchrun chains A B C D E F G 010 Prefix Compression Data stream split to blocks es of blocks computed Routing through DHT 0 Erasure Coding Synchrun 1 Synchrun 2 Synchrun 3 A D F Erasure-coded fragments stored by components Grouped into synchruns 1 2 A D F A D F Containers stored on disks Fragment metadata separately from data 3 A D F Container Synchrun

43 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 43 Data organization: synchrun chains A B C D E F G 010 Prefix Compression Data stream split to blocks es of blocks computed Routing through DHT 0 Erasure Coding Synchrun 1 Synchrun 2 Synchrun 3 A D F Erasure-coded fragments stored by components Grouped into synchruns A D F A D F A D F Containers stored on disks Fragment metadata separately from data Ordered synchrun chains Preserve order & locality Container Synchrun Manageable

44 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 44 Synchrun chains in a dynamic system 01:1

45 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 45 System growth: split 01:1 010:1 011:1

46 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 46 System growth: split 01:1 010:1 011:1

47 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 47 System growth: split 01:1 010:1 011:1

48 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 48 Concatenation 01:1 010:1

49 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 49 Concatenation 01:1 010:1 010:1

50 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 50 Marking blocks to reclaim 01:1 010:1 010:1

51 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 51 Space reclamation & Concatenation 01:1 010:1 010:1 010:1

52 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 52 Data Services: Identification of data resiliency level Missing fragments 01:0 01:1 01:2 01:3

53 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 53 Data Services: Identification of data resiliency level 01:0 01:1 01:2 01:3 Chain scanning

54 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 54 Data Services: Identification of data resiliency level 01:0 01:1 01:2 01:3 Chain scanning

55 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 55 Data Services: Identification of data resiliency level 01:0 01:1 01:2 01:3 Chain scanning

56 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 56 Data Services: Identification of data resiliency level 01:0 01:1 01:2 01:3 Chain scanning

57 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 57 Data services: reconstruction 01:0 01:1 01:2 01:3 Sequential read/write of entire Containers Erasure decoding and re-encoding

58 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 58 Data services: reconstruction 01:0 01:1 01:2 01:3 Sequential read/write of entire Containers Erasure decoding and re-encoding

59 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 59 Data services: reconstruction 01:0 01:1 01:2 01:3 Sequential read/write of entire Containers Erasure decoding and re-encoding

60 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 60 Data services: fast data transfer 01:0 01:1 01:2 01:3 Old component 01:3 Location of new node (DHT)

61 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 61 Data services: fast data transfer 01:0 01:1 01:2 01:3 Data transfer Old component 01:3

62 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 62 Data services: fast data transfer 01:0 01:1 01:2 01:3 Data transfer Old component 01:3

63 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 63 Data services: fast data transfer 01:0 01:1 01:2 01:3 Data transfer Old component 01:3

64 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 64 Data services: fast data transfer 01:0 01:1 01:2 01:3 Old component 01:3

65 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 65 Data services for deduplication Global: duplicates detected in entire system DHT routing based on content Inline deduplication: has to be high-performance Prefetching Containers for streams of duplicates Block hashes stored separately

66 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 66 Data services for deduplication hash=011.. Block 01:0 Choose complete chain 01:1 01:2 01:3 Completeness: definitely not a duplicate Deletion interaction: wasn't the block scheduled for deletion?

67 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 67 Data services for deduplication hash=011.. Block 01:0 01:1 Query 01:2 01:3

68 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 68 Data services for deduplication hash=011.. Block 01:0 01:1 01:2 01:3 Local candidate found

69 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 69 Data services for deduplication hash=011.. Block 01:0 01:1 01:2 Successful dedup 01:3 Candidate verification

70 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 70 On-demand data deletion Distributed garbage collection Per-block reference counter stored perfragment Failure-tolerant Block reference counter calculated independently on peer Container chains Interference with duplicate elimination: read-only phase for block tree traversal space reclamation in background

71 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 71 Writes during node failure Writing Reconstruction

72 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 72 Write Scaling nodes added while writing

73 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 73 Write Scaling nodes added while writing

74 HYDRAstor: a Scalable Secondary Storage. 9LivesData, LLC 74 Questions?

HYDRAstor: a Scalable Secondary Storage

HYDRAstor: a Scalable Secondary Storage HYDRAstor: a Scalable Secondary Storage 7th TF-Storage Meeting September 9 th 00 Łukasz Heldt Largest Japanese IT company $4 Billion in annual revenue 4,000 staff www.nec.com Polish R&D company 50 engineers

More information

HydraFS: a High-Throughput File System for the HYDRAstor Content-Addressable Storage System

HydraFS: a High-Throughput File System for the HYDRAstor Content-Addressable Storage System HydraFS: a High-Throughput File System for the HYDRAstor Content-Addressable Storage System Cristian Ungureanu, Benjamin Atkin, Akshat Aranya, Salil Gokhale, Steve Rago, Grzegorz Calkowski, Cezary Dubnicki,

More information

Scale-out Object Store for PB/hr Backups and Long Term Archive April 24, 2014

Scale-out Object Store for PB/hr Backups and Long Term Archive April 24, 2014 Scale-out Object Store for PB/hr Backups and Long Term Archive April 24, 2014 Gideon Senderov Director, Advanced Storage Products NEC Corporation of America Long-Term Data in the Data Center (EB) 140 120

More information

White Paper Simplified Backup and Reliable Recovery

White Paper Simplified Backup and Reliable Recovery Simplified Backup and Reliable Recovery NEC Corporation of America necam.com Overview Amanda Enterprise from Zmanda - A Carbonite company, is a backup and recovery solution that offers fast installation,

More information

Scale-out Data Deduplication Architecture

Scale-out Data Deduplication Architecture Scale-out Data Deduplication Architecture Gideon Senderov Product Management & Technical Marketing NEC Corporation of America Outline Data Growth and Retention Deduplication Methods Legacy Architecture

More information

Scale-Out Architectures for Secondary Storage

Scale-Out Architectures for Secondary Storage Technology Insight Paper Scale-Out Architectures for Secondary Storage NEC is a Pioneer with HYDRAstor By Steve Scully, Sr. Analyst February 2018 Scale-Out Architectures for Secondary Storage 1 Scale-Out

More information

Maximizing Data Efficiency: Benefits of Global Deduplication

Maximizing Data Efficiency: Benefits of Global Deduplication Maximizing Data Efficiency: Benefits of Global Deduplication Advanced Storage Products Group Table of Contents 1 Understanding Deduplication 2 Scalability Limitations 4 Scope Limitations 5 Islands of Capacity

More information

Hedvig as backup target for Veeam

Hedvig as backup target for Veeam Hedvig as backup target for Veeam Solution Whitepaper Version 1.0 April 2018 Table of contents Executive overview... 3 Introduction... 3 Solution components... 4 Hedvig... 4 Hedvig Virtual Disk (vdisk)...

More information

Reducing The De-linearization of Data Placement to Improve Deduplication Performance

Reducing The De-linearization of Data Placement to Improve Deduplication Performance Reducing The De-linearization of Data Placement to Improve Deduplication Performance Yujuan Tan 1, Zhichao Yan 2, Dan Feng 2, E. H.-M. Sha 1,3 1 School of Computer Science & Technology, Chongqing University

More information

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE WHITEPAPER DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE A Detailed Review ABSTRACT While tape has been the dominant storage medium for data protection for decades because of its low cost, it is steadily

More information

Data Deduplication Methods for Achieving Data Efficiency

Data Deduplication Methods for Achieving Data Efficiency Data Deduplication Methods for Achieving Data Efficiency Matthew Brisse, Quantum Gideon Senderov, NEC... SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies

More information

Milestone Solution Partner IT Infrastructure Components Certification Report

Milestone Solution Partner IT Infrastructure Components Certification Report Milestone Solution Partner IT Infrastructure Components Certification Report NEC HYDRAstor 30-05-2016 Table of Contents Introduction... 4 Certified Products... 4 Solution Architecture... 5 Topology...

More information

Deduplication Storage System

Deduplication Storage System Deduplication Storage System Kai Li Charles Fitzmorris Professor, Princeton University & Chief Scientist and Co-Founder, Data Domain, Inc. 03/11/09 The World Is Becoming Data-Centric CERN Tier 0 Business

More information

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM IBM Tivoli Storage Manager Version 7.1.6 Introduction to Data Protection Solutions IBM IBM Tivoli Storage Manager Version 7.1.6 Introduction to Data Protection Solutions IBM Note: Before you use this

More information

Foster B-Trees. Lucas Lersch. M. Sc. Caetano Sauer Advisor

Foster B-Trees. Lucas Lersch. M. Sc. Caetano Sauer Advisor Foster B-Trees Lucas Lersch M. Sc. Caetano Sauer Advisor 14.07.2014 Motivation Foster B-Trees Blink-Trees: multicore concurrency Write-Optimized B-Trees: flash memory large-writes wear leveling defragmentation

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung ACM SIGOPS 2003 {Google Research} Vaibhav Bajpai NDS Seminar 2011 Looking Back time Classics Sun NFS (1985) CMU Andrew FS (1988) Fault

More information

COM Verification. PRESENTATION TITLE GOES HERE Alan G. Yoder, Ph.D. SNIA Technical Council Huawei Technologies, LLC

COM Verification. PRESENTATION TITLE GOES HERE Alan G. Yoder, Ph.D. SNIA Technical Council Huawei Technologies, LLC COM Verification PRESENTATION TITLE GOES HERE Alan G. Yoder, Ph.D. SNIA Technical Council Huawei Technologies, LLC Outline COM overview How they work Verifying the COMs SNIA Emerald TM Training ~ June

More information

How to Reduce Data Capacity in Objectbased Storage: Dedup and More

How to Reduce Data Capacity in Objectbased Storage: Dedup and More How to Reduce Data Capacity in Objectbased Storage: Dedup and More Dong In Shin G-Cube, Inc. http://g-cube.kr Unstructured Data Explosion A big paradigm shift how to generate and consume data Transactional

More information

ADVANCED DEDUPLICATION CONCEPTS. Thomas Rivera, BlueArc Gene Nagle, Exar

ADVANCED DEDUPLICATION CONCEPTS. Thomas Rivera, BlueArc Gene Nagle, Exar ADVANCED DEDUPLICATION CONCEPTS Thomas Rivera, BlueArc Gene Nagle, Exar SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may

More information

CA485 Ray Walshe Google File System

CA485 Ray Walshe Google File System Google File System Overview Google File System is scalable, distributed file system on inexpensive commodity hardware that provides: Fault Tolerance File system runs on hundreds or thousands of storage

More information

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM IBM Spectrum Protect Version 8.1.2 Introduction to Data Protection Solutions IBM IBM Spectrum Protect Version 8.1.2 Introduction to Data Protection Solutions IBM Note: Before you use this information

More information

DEMYSTIFYING DATA DEDUPLICATION A WHITE PAPER

DEMYSTIFYING DATA DEDUPLICATION A WHITE PAPER DEMYSTIFYING DATA DEDUPLICATION A WHITE PAPER DEMYSTIFYING DATA DEDUPLICATION ABSTRACT While data redundancy was once an acceptable operational part of the backup process, the rapid growth of digital content

More information

ADVANCED DATA REDUCTION CONCEPTS

ADVANCED DATA REDUCTION CONCEPTS ADVANCED DATA REDUCTION CONCEPTS Thomas Rivera, Hitachi Data Systems Gene Nagle, BridgeSTOR Author: Thomas Rivera, Hitachi Data Systems Author: Gene Nagle, BridgeSTOR SNIA Legal Notice The material contained

More information

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved. 1 Using patented high-speed inline deduplication technology, Data Domain systems identify redundant data as they are being stored, creating a storage foot print that is 10X 30X smaller on average than

More information

Purity: building fast, highly-available enterprise flash storage from commodity components

Purity: building fast, highly-available enterprise flash storage from commodity components Purity: building fast, highly-available enterprise flash storage from commodity components J. Colgrove, J. Davis, J. Hayes, E. Miller, C. Sandvig, R. Sears, A. Tamches, N. Vachharajani, and F. Wang 0 Gala

More information

In-line Deduplication for Cloud storage to Reduce Fragmentation by using Historical Knowledge

In-line Deduplication for Cloud storage to Reduce Fragmentation by using Historical Knowledge In-line Deduplication for Cloud storage to Reduce Fragmentation by using Historical Knowledge Smitha.M. S, Prof. Janardhan Singh Mtech Computer Networking, Associate Professor Department of CSE, Cambridge

More information

SolidFire and Ceph Architectural Comparison

SolidFire and Ceph Architectural Comparison The All-Flash Array Built for the Next Generation Data Center SolidFire and Ceph Architectural Comparison July 2014 Overview When comparing the architecture for Ceph and SolidFire, it is clear that both

More information

Veeam with Cohesity Data Platform

Veeam with Cohesity Data Platform Veeam with Cohesity Data Platform Table of Contents About This Guide: 2 Data Protection for VMware Environments: 2 Benefits of using the Cohesity Data Platform with Veeam Backup & Replication: 4 Appendix

More information

Data Reduction Meets Reality What to Expect From Data Reduction

Data Reduction Meets Reality What to Expect From Data Reduction Data Reduction Meets Reality What to Expect From Data Reduction Doug Barbian and Martin Murrey Oracle Corporation Thursday August 11, 2011 9961: Data Reduction Meets Reality Introduction Data deduplication

More information

Multi-level Selective Deduplication for VM Snapshots in Cloud Storage

Multi-level Selective Deduplication for VM Snapshots in Cloud Storage Multi-level Selective Deduplication for VM Snapshots in Cloud Storage Wei Zhang, Hong Tang, Hao Jiang, Tao Yang, Xiaogang Li, Yue Zeng Dept. of Computer Science, UC Santa Barbara. Email: {wei, tyang}@cs.ucsb.edu

More information

White paper ETERNUS CS800 Data Deduplication Background

White paper ETERNUS CS800 Data Deduplication Background White paper ETERNUS CS800 - Data Deduplication Background This paper describes the process of Data Deduplication inside of ETERNUS CS800 in detail. The target group consists of presales, administrators,

More information

The What, Why and How of the Pure Storage Enterprise Flash Array. Ethan L. Miller (and a cast of dozens at Pure Storage)

The What, Why and How of the Pure Storage Enterprise Flash Array. Ethan L. Miller (and a cast of dozens at Pure Storage) The What, Why and How of the Pure Storage Enterprise Flash Array Ethan L. Miller (and a cast of dozens at Pure Storage) Enterprise storage: $30B market built on disk Key players: EMC, NetApp, HP, etc.

More information

HICAMP Bitmap. A Space-Efficient Updatable Bitmap Index for In-Memory Databases! Bo Wang, Heiner Litz, David R. Cheriton Stanford University DAMON 14

HICAMP Bitmap. A Space-Efficient Updatable Bitmap Index for In-Memory Databases! Bo Wang, Heiner Litz, David R. Cheriton Stanford University DAMON 14 HICAMP Bitmap A Space-Efficient Updatable Bitmap Index for In-Memory Databases! Bo Wang, Heiner Litz, David R. Cheriton Stanford University DAMON 14 Database Indexing Databases use precomputed indexes

More information

Software-defined Storage: Fast, Safe and Efficient

Software-defined Storage: Fast, Safe and Efficient Software-defined Storage: Fast, Safe and Efficient TRY NOW Thanks to Blockchain and Intel Intelligent Storage Acceleration Library Every piece of data is required to be stored somewhere. We all know about

More information

WHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION?

WHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION? WHAT IS FALCONSTOR? FAQS FalconStor Optimized Backup and Deduplication is the industry s market-leading virtual tape and LAN-based deduplication solution, unmatched in performance and scalability. With

More information

The Google File System (GFS)

The Google File System (GFS) 1 The Google File System (GFS) CS60002: Distributed Systems Antonio Bruto da Costa Ph.D. Student, Formal Methods Lab, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur 2 Design constraints

More information

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

! Design constraints.  Component failures are the norm.  Files are huge by traditional standards. ! POSIX-like Cloud background Google File System! Warehouse scale systems " 10K-100K nodes " 50MW (1 MW = 1,000 houses) " Power efficient! Located near cheap power! Passive cooling! Power Usage Effectiveness = Total

More information

Google File System. Arun Sundaram Operating Systems

Google File System. Arun Sundaram Operating Systems Arun Sundaram Operating Systems 1 Assumptions GFS built with commodity hardware GFS stores a modest number of large files A few million files, each typically 100MB or larger (Multi-GB files are common)

More information

ActiveScale Erasure Coding and Self Protecting Technologies

ActiveScale Erasure Coding and Self Protecting Technologies WHITE PAPER AUGUST 2018 ActiveScale Erasure Coding and Self Protecting Technologies BitSpread Erasure Coding and BitDynamics Data Integrity and Repair Technologies within The ActiveScale Object Storage

More information

Microsoft DPM Meets BridgeSTOR Advanced Data Reduction and Security

Microsoft DPM Meets BridgeSTOR Advanced Data Reduction and Security 2011 Microsoft DPM Meets BridgeSTOR Advanced Data Reduction and Security BridgeSTOR Deduplication, Compression, Thin Provisioning and Encryption Transform DPM from Good to Great BridgeSTOR, LLC 4/4/2011

More information

The Logic of Physical Garbage Collection in Deduplicating Storage

The Logic of Physical Garbage Collection in Deduplicating Storage The Logic of Physical Garbage Collection in Deduplicating Storage Fred Douglis Abhinav Duggal Philip Shilane Tony Wong Dell EMC Shiqin Yan University of Chicago Fabiano Botelho Rubrik 1 Deduplication in

More information

TIBX NEXT-GENERATION ARCHIVE FORMAT IN ACRONIS BACKUP CLOUD

TIBX NEXT-GENERATION ARCHIVE FORMAT IN ACRONIS BACKUP CLOUD TIBX NEXT-GENERATION ARCHIVE FORMAT IN ACRONIS BACKUP CLOUD 1 Backup Speed and Reliability Are the Top Data Protection Mandates What are the top data protection mandates from your organization s IT leadership?

More information

GFS: The Google File System. Dr. Yingwu Zhu

GFS: The Google File System. Dr. Yingwu Zhu GFS: The Google File System Dr. Yingwu Zhu Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one big CPU More storage, CPU required than one PC can

More information

Deduplication File System & Course Review

Deduplication File System & Course Review Deduplication File System & Course Review Kai Li 12/13/13 Topics u Deduplication File System u Review 12/13/13 2 Storage Tiers of A Tradi/onal Data Center $$$$ Mirrored storage $$$ Dedicated Fibre Clients

More information

WHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper

WHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper WHITE PAPER DATA DEDUPLICATION BACKGROUND: A Technical White Paper CONTENTS Data Deduplication Multiple Data Sets from a Common Storage Pool.......................3 Fixed-Length Blocks vs. Variable-Length

More information

ChunkStash: Speeding Up Storage Deduplication using Flash Memory

ChunkStash: Speeding Up Storage Deduplication using Flash Memory ChunkStash: Speeding Up Storage Deduplication using Flash Memory Biplob Debnath +, Sudipta Sengupta *, Jin Li * * Microsoft Research, Redmond (USA) + Univ. of Minnesota, Twin Cities (USA) Deduplication

More information

The World s Fastest Backup Systems

The World s Fastest Backup Systems 3 The World s Fastest Backup Systems Erwin Freisleben BRS Presales Austria 4 EMC Data Domain: Leadership and Innovation A history of industry firsts 2003 2004 2005 2006 2007 2008 2009 2010 2011 First deduplication

More information

SMORE: A Cold Data Object Store for SMR Drives

SMORE: A Cold Data Object Store for SMR Drives SMORE: A Cold Data Object Store for SMR Drives Peter Macko, Xiongzi Ge, John Haskins Jr.*, James Kelley, David Slik, Keith A. Smith, and Maxim G. Smith Advanced Technology Group NetApp, Inc. * Qualcomm

More information

IBM řešení pro větší efektivitu ve správě dat - Store more with less

IBM řešení pro větší efektivitu ve správě dat - Store more with less IBM řešení pro větší efektivitu ve správě dat - Store more with less IDG StorageWorld 2012 Rudolf Hruška Information Infrastructure Leader IBM Systems & Technology Group rudolf_hruska@cz.ibm.com IBM Agenda

More information

File systems CS 241. May 2, University of Illinois

File systems CS 241. May 2, University of Illinois File systems CS 241 May 2, 2014 University of Illinois 1 Announcements Finals approaching, know your times and conflicts Ours: Friday May 16, 8-11 am Inform us by Wed May 7 if you have to take a conflict

More information

CLOUD-SCALE FILE SYSTEMS

CLOUD-SCALE FILE SYSTEMS Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients

More information

IME Infinite Memory Engine Technical Overview

IME Infinite Memory Engine Technical Overview 1 1 IME Infinite Memory Engine Technical Overview 2 Bandwidth, IOPs single NVMe drive 3 What does Flash mean for Storage? It's a new fundamental device for storing bits. We must treat it different from

More information

Delta Compressed and Deduplicated Storage Using Stream-Informed Locality

Delta Compressed and Deduplicated Storage Using Stream-Informed Locality Delta Compressed and Deduplicated Storage Using Stream-Informed Locality Philip Shilane, Grant Wallace, Mark Huang, and Windsor Hsu Backup Recovery Systems Division EMC Corporation Abstract For backup

More information

ECS High Availability Design

ECS High Availability Design ECS High Availability Design March 2018 A Dell EMC white paper Revisions Date Mar 2018 Aug 2017 July 2017 Description Version 1.2 - Updated to include ECS version 3.2 content Version 1.1 - Updated to include

More information

HP D2D & STOREONCE OVERVIEW

HP D2D & STOREONCE OVERVIEW HP D2D & STOREONCE OVERVIEW Robert Clifford Pre-Sales Storage Consultant 1 Data loss (Recovery Point Objective) Let s put it in context.. Last Transaction Increased availability Mission critical support

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google SOSP 03, October 19 22, 2003, New York, USA Hyeon-Gyu Lee, and Yeong-Jae Woo Memory & Storage Architecture Lab. School

More information

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme STO1926BU A Day in the Life of a VSAN I/O Diving in to the I/O Flow of vsan John Nicholson (@lost_signal) Pete Koehler (@vmpete) VMworld 2017 Content: Not for publication #VMworld #STO1926BU Disclaimer

More information

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017 Operating Systems Lecture 7.2 - File system implementation Adrien Krähenbühl Master of Computer Science PUF - Hồ Chí Minh 2016/2017 Design FAT or indexed allocation? UFS, FFS & Ext2 Journaling with Ext3

More information

Cohesity SpanFS and SnapTree. White Paper

Cohesity SpanFS and SnapTree. White Paper Cohesity SpanFS and SnapTree White Paper Cohesity SpanFS TM and SnapTree TM Web-Scale File System, Designed for Secondary Storage in the Cloud Era Why Do Enterprises Need a New File System? The Enterprise

More information

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review COS 318: Operating Systems NSF, Snapshot, Dedup and Review Topics! NFS! Case Study: NetApp File System! Deduplication storage system! Course review 2 Network File System! Sun introduced NFS v2 in early

More information

Backup and archiving need not to create headaches new pain relievers are around

Backup and archiving need not to create headaches new pain relievers are around Backup and archiving need not to create headaches new pain relievers are around Frank Reichart Senior Director Product Marketing Storage Copyright 2012 FUJITSU Hot Spots in Data Protection 1 Copyright

More information

Introducing Tegile. Company Overview. Product Overview. Solutions & Use Cases. Partnering with Tegile

Introducing Tegile. Company Overview. Product Overview. Solutions & Use Cases. Partnering with Tegile Tegile Systems 1 Introducing Tegile Company Overview Product Overview Solutions & Use Cases Partnering with Tegile 2 Company Overview Company Overview Te gile - [tey-jile] Tegile = technology + agile Founded

More information

arxiv: v3 [cs.dc] 27 Jun 2013

arxiv: v3 [cs.dc] 27 Jun 2013 RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups arxiv:1302.0621v3 [cs.dc] 27 Jun 2013 Chun-Ho Ng and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong

More information

Tom Sas HP. Author: SNIA - Data Protection & Capacity Optimization (DPCO) Committee

Tom Sas HP. Author: SNIA - Data Protection & Capacity Optimization (DPCO) Committee Advanced PRESENTATION Data Reduction TITLE GOES HERE Concepts Tom Sas HP Author: SNIA - Data Protection & Capacity Optimization (DPCO) Committee SNIA Legal Notice The material contained in this tutorial

More information

Cumulus: Filesystem Backup to the Cloud

Cumulus: Filesystem Backup to the Cloud Cumulus: Filesystem Backup to the Cloud 7th USENIX Conference on File and Storage Technologies (FAST 09) Michael Vrable Stefan Savage Geoffrey M. Voelker University of California, San Diego February 26,

More information

File Systems Management and Examples

File Systems Management and Examples File Systems Management and Examples Today! Efficiency, performance, recovery! Examples Next! Distributed systems Disk space management! Once decided to store a file as sequence of blocks What s the size

More information

A Review on Backup-up Practices using Deduplication

A Review on Backup-up Practices using Deduplication Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 9, September 2015,

More information

Cold Storage: The Road to Enterprise Ilya Kuznetsov YADRO

Cold Storage: The Road to Enterprise Ilya Kuznetsov YADRO Cold Storage: The Road to Enterprise Ilya Kuznetsov YADRO Agenda Technical challenge Custom product Growth of aspirations Enterprise requirements Making an enterprise cold storage product 2 Technical Challenge

More information

Distributed Filesystem

Distributed Filesystem Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the

More information

FGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance

FGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance FGDEFRAG: A Fine-Grained Defragmentation Approach to Improve Restore Performance Yujuan Tan, Jian Wen, Zhichao Yan, Hong Jiang, Witawas Srisa-an, aiping Wang, Hao Luo College of Computer Science, Chongqing

More information

Administração e Optimização de Bases de Dados 2012/2013 Index Tuning

Administração e Optimização de Bases de Dados 2012/2013 Index Tuning Administração e Optimização de Bases de Dados 2012/2013 Index Tuning Bruno Martins DEI@Técnico e DMIR@INESC-ID Index An index is a data structure that supports efficient access to data Condition on Index

More information

NPTEL Course Jan K. Gopinath Indian Institute of Science

NPTEL Course Jan K. Gopinath Indian Institute of Science Storage Systems NPTEL Course Jan 2012 (Lecture 39) K. Gopinath Indian Institute of Science Google File System Non-Posix scalable distr file system for large distr dataintensive applications performance,

More information

Technology Insight Series

Technology Insight Series IBM ProtecTIER Deduplication for z/os John Webster March 04, 2010 Technology Insight Series Evaluator Group Copyright 2010 Evaluator Group, Inc. All rights reserved. Announcement Summary The many data

More information

ICS Principles of Operating Systems

ICS Principles of Operating Systems ICS 143 - Principles of Operating Systems Lectures 17-20 - FileSystem Interface and Implementation Prof. Ardalan Amiri Sani Prof. Nalini Venkatasubramanian ardalan@ics.uci.edu nalini@ics.uci.edu Outline

More information

Google File System, Replication. Amin Vahdat CSE 123b May 23, 2006

Google File System, Replication. Amin Vahdat CSE 123b May 23, 2006 Google File System, Replication Amin Vahdat CSE 123b May 23, 2006 Annoucements Third assignment available today Due date June 9, 5 pm Final exam, June 14, 11:30-2:30 Google File System (thanks to Mahesh

More information

Design Tradeoffs for Data Deduplication Performance in Backup Workloads

Design Tradeoffs for Data Deduplication Performance in Backup Workloads Design Tradeoffs for Data Deduplication Performance in Backup Workloads Min Fu,DanFeng,YuHua,XubinHe, Zuoning Chen *, Wen Xia,YuchengZhang,YujuanTan Huazhong University of Science and Technology Virginia

More information

COMP 530: Operating Systems File Systems: Fundamentals

COMP 530: Operating Systems File Systems: Fundamentals File Systems: Fundamentals Don Porter Portions courtesy Emmett Witchel 1 Files What is a file? A named collection of related information recorded on secondary storage (e.g., disks) File attributes Name,

More information

Distributed File Systems II

Distributed File Systems II Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation

More information

ENCRYPTED DATA MANAGEMENT WITH DEDUPLICATION IN CLOUD COMPUTING

ENCRYPTED DATA MANAGEMENT WITH DEDUPLICATION IN CLOUD COMPUTING ENCRYPTED DATA MANAGEMENT WITH DEDUPLICATION IN CLOUD COMPUTING S KEERTHI 1*, MADHAVA REDDY A 2* 1. II.M.Tech, Dept of CSE, AM Reddy Memorial College of Engineering & Technology, Petlurivaripalem. 2. Assoc.

More information

Lattice: A Decentralized, Distributed Datastore

Lattice: A Decentralized, Distributed Datastore 1 Lattice: A Decentralized, Distributed Datastore Computes, inc. February 2018 Abstract Lattice is a decentralized, distributed datastore that is used to build a distributed work queue, ledger, and general-purpose

More information

ActiveScale Erasure Coding and Self Protecting Technologies

ActiveScale Erasure Coding and Self Protecting Technologies NOVEMBER 2017 ActiveScale Erasure Coding and Self Protecting Technologies BitSpread Erasure Coding and BitDynamics Data Integrity and Repair Technologies within The ActiveScale Object Storage System Software

More information

Veeam and HP: Meet your backup data protection goals

Veeam and HP: Meet your backup data protection goals Sponsored by Veeam and HP: Meet your backup data protection goals Eric Machabert Сonsultant and virtualization expert Introduction With virtualization systems becoming mainstream in recent years, backups

More information

Speeding Up Cloud/Server Applications Using Flash Memory

Speeding Up Cloud/Server Applications Using Flash Memory Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta and Jin Li Microsoft Research, Redmond, WA, USA Contains work that is joint with Biplob Debnath (Univ. of Minnesota) Flash Memory

More information

Chapter 7: File-System

Chapter 7: File-System Chapter 7: File-System Interface and Implementation Chapter 7: File-System Interface and Implementation File Concept File-System Structure Access Methods File-System Implementation Directory Structure

More information

EMC DATA DOMAIN PRODUCT OvERvIEW

EMC DATA DOMAIN PRODUCT OvERvIEW EMC DATA DOMAIN PRODUCT OvERvIEW Deduplication storage for next-generation backup and archive Essentials Scalable Deduplication Fast, inline deduplication Provides up to 65 PBs of logical storage for long-term

More information

GFS: The Google File System

GFS: The Google File System GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one

More information

CS720 - Operating Systems

CS720 - Operating Systems CS720 - Operating Systems File Systems File Concept Access Methods Directory Structure File System Mounting File Sharing - Protection 1 File Concept Contiguous logical address space Types: Data numeric

More information

Reducing Replication Bandwidth for Distributed Document Databases

Reducing Replication Bandwidth for Distributed Document Databases Reducing Replication Bandwidth for Distributed Document Databases Lianghong Xu 1, Andy Pavlo 1, Sudipta Sengupta 2 Jin Li 2, Greg Ganger 1 Carnegie Mellon University 1, Microsoft Research 2 Document-oriented

More information

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568 FLAT DATACENTER STORAGE Paper-3 Presenter-Pratik Bhatt fx6568 FDS Main discussion points A cluster storage system Stores giant "blobs" - 128-bit ID, multi-megabyte content Clients and servers connected

More information

Today CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space

Today CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space Today CSCI 5105 Coda GFS PAST Instructor: Abhishek Chandra 2 Coda Main Goals: Availability: Work in the presence of disconnection Scalability: Support large number of users Successor of Andrew File System

More information

The term "physical drive" refers to a single hard disk module. Figure 1. Physical Drive

The term physical drive refers to a single hard disk module. Figure 1. Physical Drive HP NetRAID Tutorial RAID Overview HP NetRAID Series adapters let you link multiple hard disk drives together and write data across them as if they were one large drive. With the HP NetRAID Series adapter,

More information

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng Bigtable: A Distributed Storage System for Structured Data Andrew Hon, Phyllis Lau, Justin Ng What is Bigtable? - A storage system for managing structured data - Used in 60+ Google services - Motivation:

More information

SED: SCALABLE & EFFICIENT DE-DUPLICATION FILE SYSTEM FOR VIRTUAL MACHINE IMAGES

SED: SCALABLE & EFFICIENT DE-DUPLICATION FILE SYSTEM FOR VIRTUAL MACHINE IMAGES SED: SCALABLE & EFFICIENT DE-DUPLICATION FILE SYSTEM FOR VIRTUAL MACHINE IMAGES Author Name: Poonguzhali. S B. Tech, Information Technology, M.A.M College of Engineering, Tiruchirapalli. kuzhalit@gmail.com

More information

Design of Global Data Deduplication for A Scale-out Distributed Storage System

Design of Global Data Deduplication for A Scale-out Distributed Storage System 218 IEEE 38th International Conference on Distributed Computing Systems Design of Global Data Deduplication for A Scale-out Distributed Storage System Myoungwon Oh, Sejin Park, Jungyeon Yoon, Sangjae Kim,

More information

Clotho: Transparent Data Versioning at the Block I/O Level

Clotho: Transparent Data Versioning at the Block I/O Level Clotho: Transparent Data Versioning at the Block I/O Level Michail Flouris Dept. of Computer Science University of Toronto flouris@cs.toronto.edu Angelos Bilas ICS- FORTH & University of Crete bilas@ics.forth.gr

More information

Technical White Paper: IntelliFlash Architecture

Technical White Paper: IntelliFlash Architecture Executive Summary... 2 IntelliFlash OS... 3 Achieving High Performance & High Capacity... 3 Write Cache... 4 Read Cache... 5 Metadata Acceleration... 5 Data Reduction... 6 Enterprise Resiliency & Capabilities...

More information

Index Tuning. Index. An index is a data structure that supports efficient access to data. Matching records. Condition on attribute value

Index Tuning. Index. An index is a data structure that supports efficient access to data. Matching records. Condition on attribute value Index Tuning AOBD07/08 Index An index is a data structure that supports efficient access to data Condition on attribute value index Set of Records Matching records (search key) 1 Performance Issues Type

More information

50 TB. Traditional Storage + Data Protection Architecture. StorSimple Cloud-integrated Storage. Traditional CapEx: $375K Support: $75K per Year

50 TB. Traditional Storage + Data Protection Architecture. StorSimple Cloud-integrated Storage. Traditional CapEx: $375K Support: $75K per Year Compelling Economics: Traditional Storage vs. StorSimple Traditional Storage + Data Protection Architecture StorSimple Cloud-integrated Storage Servers Servers Primary Volume Disk Array ($100K; Double

More information

Data De-duplication for Distributed Segmented Parallel FS

Data De-duplication for Distributed Segmented Parallel FS Data De-duplication for Distributed Segmented Parallel FS Boris Zuckerman & Oskar Batuner Hewlett-Packard Co. Objectives Expose fundamentals of highly distributed segmented parallel file system architecture

More information