Deduplication File System & Course Review

Similar documents
Deduplication Storage System

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE

EMC DATA DOMAIN PRODUCT OvERvIEW

Speeding Up Cloud/Server Applications Using Flash Memory

EMC DATA DOMAIN OPERATING SYSTEM

ChunkStash: Speeding Up Storage Deduplication using Flash Memory

COS 318: Operating Systems. Journaling, NFS and WAFL

WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression

Get More Out of Storage with Data Domain Deduplication Storage Systems

DELL EMC DATA DOMAIN OPERATING SYSTEM

Sparse Indexing: Large-Scale, Inline Deduplication Using Sampling and Locality

Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES

Delta Compressed and Deduplicated Storage Using Stream-Informed Locality

CSE 153 Design of Operating Systems

See what s new: Data Domain Global Deduplication Array, DD Boost and more. Copyright 2010 EMC Corporation. All rights reserved.

DELL EMC DATA DOMAIN OPERATING SYSTEM

Trends in Data Protection and Restoration Technologies. Mike Fishman, EMC 2 Corporation

2 nd Half. Memory management Disk management Network and Security Virtual machine

Got Isilon? Need IOPS? Get Avere.

EMC Data Domain for Archiving Are You Kidding?

Operating Systems CMPSCI 377 Spring Mark Corner University of Massachusetts Amherst

CSE 451: Operating Systems. Section 10 Project 3 wrap-up, final exam review

HP Dynamic Deduplication achieving a 50:1 ratio

Understanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp.

Increasing Performance of Existing Oracle RAC up to 10X

Topics. " Start using a write-ahead log on disk " Log all updates Commit

Rethinking Deduplication Scalability

The Logic of Physical Garbage Collection in Deduplicating Storage

The Google File System

Introducing Tegile. Company Overview. Product Overview. Solutions & Use Cases. Partnering with Tegile

Configuring Short RPO with Actifio StreamSnap and Dedup-Async Replication

Backup and archiving need not to create headaches new pain relievers are around

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 420, York College. November 21, 2006

Last 2 Classes: Introduction to Operating Systems & C++ tutorial. Today: OS and Computer Architecture

Algorithms and Data Structures for Efficient Free Space Reclamation in WAFL

A. Deduplication rate is less than expected, accounting for the remaining GSAN capacity

The What, Why and How of the Pure Storage Enterprise Flash Array. Ethan L. Miller (and a cast of dozens at Pure Storage)

Storage Systems. Storage Systems

Mass-Storage Structure

CS 537 Fall 2017 Review Session

To Everyone... iii To Educators... v To Students... vi Acknowledgments... vii Final Words... ix References... x. 1 ADialogueontheBook 1

HYDRAstor: a Scalable Secondary Storage

Chapter 10: Mass-Storage Systems

Data Deduplication Methods for Achieving Data Efficiency

V. Mass Storage Systems

CSI3131 Final Exam Review

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

Accelerate the Journey to 100% Virtualization with EMC Backup and Recovery. Copyright 2010 EMC Corporation. All rights reserved.

File. File System Implementation. File Metadata. File System Implementation. Direct Memory Access Cont. Hardware background: Direct Memory Access

Software-defined Storage: Fast, Safe and Efficient

CS3600 SYSTEMS AND NETWORKS

NetVault Backup Client and Server Sizing Guide 2.1

QUESTION BANK UNIT I

NetVault Backup Client and Server Sizing Guide 3.0

Considerations to Accurately Measure Solid State Storage Systems

Warsaw. 11 th September 2018

White paper ETERNUS CS800 Data Deduplication Background

TIBX NEXT-GENERATION ARCHIVE FORMAT IN ACRONIS BACKUP CLOUD

PASS4TEST. IT Certification Guaranteed, The Easy Way! We offer free update service for one year

Lecture 18: Reliable Storage

Dell DR4100. Disk based Data Protection and Disaster Recovery. April 3, Birger Ferber, Enterprise Technologist Storage EMEA

dedupv1: Improving Deduplication Throughput using Solid State Drives (SSD)

SMORE: A Cold Data Object Store for SMR Drives

IBM Tivoli Storage Manager for Windows Version Installation Guide IBM

Architectural Support. Processes. OS Structure. Threads. Scheduling. CSE 451: Operating Systems Spring Module 28 Course Review

Google File System. Arun Sundaram Operating Systems

Protect enterprise data, achieve long-term data retention

Balakrishnan Nair. Senior Technology Consultant Back Up & Recovery Systems South Gulf. Copyright 2011 EMC Corporation. All rights reserved.

CS5460: Operating Systems Lecture 20: File System Reliability

Reducing The De-linearization of Data Placement to Improve Deduplication Performance

Flashed-Optimized VPSA. Always Aligned with your Changing World

Review for Midterm. Starring Ari and Tyler

ASN Configuration Best Practices

Lecture 2: September 9

Chapter 11: Implementing File Systems

DELL EMC DATA DOMAIN DEDUPLICATION STORAGE SYSTEMS

Vendor: EMC. Exam Code: E Exam Name: Backup and Recovery Solutions Exam for Technology Architects. Version: Demo

Database Systems II. Secondary Storage

What is a file system

W H I T E P A P E R. Comparison of Storage Protocol Performance in VMware vsphere 4

Computer Science 146. Computer Architecture

CA485 Ray Walshe Google File System

Main Points of the Computer Organization and System Software Module

CS-736 Midterm: Beyond Compare (Spring 2008)

CHAPTER 12 AND 13 - MASS-STORAGE STRUCTURE & I/O- SYSTEMS

Clotho: Transparent Data Versioning at the Block I/O Level

Table of Contents. Introduction 3

Exam : S Title : Snia Storage Network Management/Administration. Version : Demo

The World s Fastest Backup Systems

Hitachi Virtual Storage Platform Family

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

Mainframe Backup Modernization Disk Library for mainframe

DATA DOMAIN INVULNERABILITY ARCHITECTURE: ENHANCING DATA INTEGRITY AND RECOVERABILITY

OPERATING SYSTEM. Chapter 12: File System Implementation

AWS Storage Gateway. Not your father s hybrid storage. University of Arizona IT Summit October 23, Jay Vagalatos, AWS Solutions Architect

Scale-out Object Store for PB/hr Backups and Long Term Archive April 24, 2014

Memory hierarchy review. ECE 154B Dmitri Strukov

Transcription:

Deduplication File System & Course Review Kai Li 12/13/13

Topics u Deduplication File System u Review 12/13/13 2

Storage Tiers of A Tradi/onal Data Center $$$$ Mirrored storage $$$ Dedicated Fibre Clients Server Primary storage 12/13/13 Kai Li 3

Replacing Tape Library using Disk Storage? Dedicate d Fibre Mirrored storage Clients Server Primary storage Costs Purchase: $$$$ Bandwidth: $$$$ Power: $$$ 20 primary (3-month retention) Onsite WAN 20 primary Remote 12/13/13 Kai Li 4

Vision: Deduplica+on Storage Eco- System Dedicated WAN Mirrored storage Clients Server Primary storage Value Proposi/ons: Purchase: ~Tape libraries Space: 10-30X reduc/on WAN BW: 10-50X reduc/on Power: ~10X reduc/on Onsite WAN Remote 12/13/13 Kai Li 5

Before: 17 Tape Libraries 12/13/13 Kai Li 6

After: 3 3-U Systems! 12/13/13 Kai Li 7

DD Protected 26EB & Replaced 33M Tapes - EMC Blog by C. Gordon (4/26/2013)

Local vs. Global Redundancies Local compression Deduplication ~100k Encode a sliding window [Ziv&Lempel77] WAN ~2-3X compression ~10-30X compression Larger windows have more redundancies 12/13/13 Kai Li 9

Dedup Storage System for Backups Backup data streams Media server Media server Consolidated data streams Dedup Storage Controller Replication WAN Meta data Unique Data segments Physical storage For deduped data Dedup Storage 12/13/13 Kai Li 10

How Does It Work? Segmented data stream Dedup Storage Controller Meta data Unique data segments Dedup Storage 12/13/13 Kai Li 11

Fixed vs. Variable Segmenta/on Fixed size 4k 4k 4k... Cannot handle deletes, shifts Content- based, variable size... A X C D A Y C D A B C D A B No problem w/ deletes, shifts [Manber93, Brin94] fp = 10110110 fp = 10110100... fp = 10110000 12/13/13 Kai Li 12

More Details Segmented data stream Segmented data stream Dedup Storage Controller Meta data Unique Data segments Dedup Storage For each segment Compute a strong fingerprint Use an index to lookup Dedup Storage Controller Meta data If unique Locally compress the segment store segment Unique data store meta data segments If duplicate store meta data Dedup Storage 12/13/13 Kai Li 13

Design Challenges Very reliable and self-healing Corrupting a segment may corrupt multiple files NVRAM to store log (transactions) Invulnerability features: Frequent verifications Metadata reconstruction from self-describing containers Self-correction from RAID-6 High-speed high-compression at low HW cost Speed challenge: data 2X/18 months and 24 hours/day Compression: reduce cost Use commodity server hardware 12/13/13 14

High Deduplica/on Ra/o High deduplica/on factor è hardware cost Smaller segments achieve higher compression ra/os? Smaller segments result in higher ra/o of metadata to physical segments Data Domain s approach Use the sweet spots of segment sizes Mul/ple local compression algorithms 12/13/13 Kai Li 15

Segment Sizes for Backup Data Rule of thumb: 2X segment size will u increase space for unique segments by 15% u decrease metadata by about 50% u deduce disk I/Os for writes and reads 12/13/13 Kai Li 16

Real World Example at Datacenter A 12/13/13 Kai Li 17

Real World Compression at Datacenter A 12/13/13 Kai Li 18

Real World Example at Datacenter B 12/13/13 Kai Li 19

Real World Compression at Datacenter B 12/13/13 Kai Li 20

High Throughput Is Challenging Divide data streams into segments Lookup Fingerprint Index Index size for 160TB w/ 8KB segments = (160TB/8KB) * 24B = 480GB! 12/13/13 Kai Li 21

Caching? Divide data streams into segments Lookup Index Cache Miss Fingerprint Index Problem: No locality. 12/13/13 Kai Li 22

Parallel Index Need Many Disks [Venti02] Divide data streams into segments Lookup Index Cache Miss... Fingerprint Index Problem: Need a lot of disks. 7200RPM disk does 120 lookups/sec. 1MB/sec with 8KB segment per disk 1GB/sec needs 1,000 disks! 12/13/13 Kai Li 23

Staging Needs More Disks Data Streams Divide data streams into segments Very Big Disk Buffer Lookup... Fingerprint Index Problem: The Buffer needs to be as large or larger than the full backup! Big delay for replication 12/13/13 Kai Li 24

Stream Informed Segment Layout u Log structured layout (inspired by LFS) u Fixed size large containers to create locality u A container l Segments from the same stream l Metadata (index data) Metadata section Metadata section Metadata section Data section Data section Data section Stream 1 Stream 2 Stream 3 12/13/13 25

A Combination of Techniques u Stream Informed Segment Layout u A sophisticated cache for the fingerprint index l Summary data structure for new data l locality-preserved caching for old data u Parallelized software systems to leverage multicore processors Benjamin Zhu, Kai Li and Hugo Patterson. Avoiding the Disk Bottleneck in the Data Domain Deduplication File System. In Proceedings of The 6 th USENIX Conference on File and Storage Technologies (FAST 08). February 2008 12/13/13 Kai Li 26

Summary Data Structure Goal: Use minimal memory to test for new data Summarize what segments have been stored, with Bloom filter (Bloom 70) in RAM If Summary Vector says no, it s new segment 1 1 1 Summary Vector Approximation Set bits fp(s i ) h 1 h 2 h 3 1 0 1 1 Index Data Structure Read bits fp(s i ) h 1 h 2 h 3 12/13/13 27

Locality Preserved Caching Algorithm u Disk Index has all <fingerprint, containerid> pairs u On a miss, lookup Disk Index to find containerid u Load the container into memory u Load the metadata of a container into Index Cache, replace if needed Fingerprint Disk Index ContainerID Metadata Data Load metadata Index Cache Replacement Container 12/13/13 Kai Li 28

Putting Them Together A fingerprint Duplicate New Index Cache No Summary Vector Replacement Maybe Disk Index metadata metadata metadata data metadata data data data 12/13/13 29

Disk I/O Reduction Results Exchange data (2.56TB) 135-daily full backups Engineering data (2.39TB) 100-day daily inc, weekly full # disk I/Os % of total # disk I/Os % of total No summary, No SISL/LPC 328,613,503 100.00% 318,236,712 100.00% Summary only 274,364,788 83.49% 259,135,171 81.43% SISL/LPC only 57,725,844 17.57% 60,358,875 18.97% Summary & SISL/LPC 3,477,129 1.06% 1,257,316 0.40% 12/13/13 Kai Li 30

Deduplica/on Throughput Revenue ($million) 1100 1000 900 800 700 600 500 400 300 200 100 0 4083 1500 400 40 80 100 200 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 A1 A2 B C IPO M/A Improved by ~100X in 6 years 4000 3500 3000 2500 2000 1500 1000 500 0 Dedup Throughput (MB/sec) 12/13/13 Kai Li 31

Physical Capacity Revenue ($million) 1100 1000 900 800 700 600 500 400 300 200 100 0 290 140 48 20 0.875 4 8 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 A1 A2 B C IPO M/A Increased by ~330X in 6 years 300 250 200 150 100 50 0 Physical Capacity (TB) 12/13/13 Kai Li 32

Even More Details Segmented data stream Dedup Storage Controller Meta data Unique Data Segs Dedup Storage Concurrent Garbage Collec/on A segment may be shared by mul/ple files Backups need to be deleted some/me GC cannot interfere with backups F1 F2 A B C D 12/13/13 Kai Li 33

More Details Segmented data stream Dedup Storage Controller Meta data Unique Data Segs Dedup Storage Concurrent Garbage Collec/on A segment may be shared by mul/ple files Backups need to be deleted some/me GC cannot interfere with backups F1 F2 A B C D 12/13/13 Kai Li 34

Summary Dedup storage systems Replace tape libraries Near- line and archival storage Primary storage More Redundancy in larger windows Locality preserved caching Reduce cost Achieve high throughput 12/13/13 Kai Li 35

Review Topics OS structure Process management CPU scheduling I/O devices Virtual memory Disks and file systems General concepts 36

Operating System Structure Abstraction Protection and security Kernel structure Layered Monolithic Micro-kernel Virtualization Virtual machine monitor 37

Process Management Implementation State, creation, context switch Threads and processes Synchronization Race conditions and inconsistencies Mutual exclusion and critical sections Semaphores: P() and V() Atomic operations: interrupt disable, test-and-set. Monitors and Condition Variables Mesa-style monitor l Deadlocks How deadlocks occur? How to prevent deadlocks? 38

CPU Scheduling Allocation Non-preemptible resources Scheduling -- Preemptible resources FIFO Round-robin STCF Lottery 39

I/O Devices Latency and bandwidth Interrupts and exceptions DMA mechanisms Synchronous I/O operations Asynchronous I/O operations Message passing 12/13/13 40

Virtual Memory Mechanisms Paging Segmentation Page and segmentation TLB and its management Page replacement FIFO with second chance Working sets WSClock 41

Disks and File Systems Disks Disk behavior and disk scheduling RAID4, RAID5 and RAID6 Flash memory Write performance Wear leveling Flash translation layer Directories and implementation File layout Buffer cache Transaction and journaling file system NFS and Stateless file system Snapshot Deduplication file system 42

Major Concepts Locality Spatial and temporal locality Scheduling Use the past to predict the future Layered abstractions Synchronization, transactions, file systems, etc Caching TLB, VM, buffer cache, etc 43

Operating System as Illusionist Physical reality Single CPU Interrupts Limited memory No protection Raw storage device Abstraction Infinite number of CPUs Cooperating sequential threads Unlimited virtual memory Each address has its own machine Organized and reliable storage system 44

Future Courses in Systems Spring COS 461: computer networks COS 598C: Analytics and systems of big data Fall: COS 432: computer security COS 561: Advance computer networks or (or COS 518: Advanced OS) Some grad seminars in systems 12/13/13 45