- SLED: single large expensive disk - RAID: redundant array of (independent, inexpensive) disks

Similar documents
HP AutoRAID (Lecture 5, cs262a)

HP AutoRAID (Lecture 5, cs262a)

Today s Papers. Array Reliability. RAID Basics (Two optional papers) EECS 262a Advanced Topics in Computer Systems Lecture 3

CSE 153 Design of Operating Systems

ECE Enterprise Storage Architecture. Fall 2018

CS5460: Operating Systems Lecture 20: File System Reliability

RAID. Redundant Array of Inexpensive Disks. Industry tends to use Independent Disks

Storage. Hwansoo Han

2. PICTURE: Cut and paste from paper

u Covered: l Management of CPU & concurrency l Management of main memory & virtual memory u Currently --- Management of I/O devices

Distributed Video Systems Chapter 5 Issues in Video Storage and Retrieval Part 2 - Disk Array and RAID

5.11 Parallelism and Memory Hierarchy: Redundant Arrays of Inexpensive Disks 485.e1

Definition of RAID Levels

Today: Coda, xfs! Brief overview of other file systems. Distributed File System Requirements!

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

SYSTEM UPGRADE, INC Making Good Computers Better. System Upgrade Teaches RAID

Lecture 23: Storage Systems. Topics: disk access, bus design, evaluation metrics, RAID (Sections )

The term "physical drive" refers to a single hard disk module. Figure 1. Physical Drive

Lecture 21: Reliable, High Performance Storage. CSC 469H1F Fall 2006 Angela Demke Brown

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Physical Storage Media

CSE 451: Operating Systems Winter Redundant Arrays of Inexpensive Disks (RAID) and OS structure. Gary Kimura

Operating Systems. Operating Systems Professor Sina Meraji U of T

Administrivia. CMSC 411 Computer Systems Architecture Lecture 19 Storage Systems, cont. Disks (cont.) Disks - review

CS370: System Architecture & Software [Fall 2014] Dept. Of Computer Science, Colorado State University

COS 318: Operating Systems. Storage Devices. Jaswinder Pal Singh Computer Science Department Princeton University

File systems CS 241. May 2, University of Illinois

COS 318: Operating Systems. Storage Devices. Vivek Pai Computer Science Department Princeton University

I/O CANNOT BE IGNORED

I/O, Disks, and RAID Yi Shi Fall Xi an Jiaotong University

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University

RAID (Redundant Array of Inexpensive Disks)

RAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE

Database Systems. November 2, 2011 Lecture #7. topobo (mit)

I/O CANNOT BE IGNORED

Mass-Storage Structure

CSE 451: Operating Systems Spring Module 18 Redundant Arrays of Inexpensive Disks (RAID)

Caching and consistency. Example: a tiny ext2. Example: a tiny ext2. Example: a tiny ext2. 6 blocks, 6 inodes

Chapter 6. Storage and Other I/O Topics

COMP283-Lecture 3 Applied Database Management

Today: Coda, xfs. Case Study: Coda File System. Brief overview of other file systems. xfs Log structured file systems HDFS Object Storage Systems

Appendix D: Storage Systems

Ch 11: Storage and File Structure

Using Transparent Compression to Improve SSD-based I/O Caches

Chapter 11: File System Implementation. Objectives

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

Concepts Introduced. I/O Cannot Be Ignored. Typical Collection of I/O Devices. I/O Issues

File Systems. Chapter 11, 13 OSPP

Disks and RAID. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, E. Sirer, R. Van Renesse]

Computer Science 146. Computer Architecture

Storage and File Structure. Classification of Physical Storage Media. Physical Storage Media. Physical Storage Media

Operating Systems. File Systems. Thomas Ropars.

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE

Automated Storage Tiering on Infortrend s ESVA Storage Systems

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Differential RAID: Rethinking RAID for SSD Reliability

Storage Systems. Storage Systems

CMSC 424 Database design Lecture 12 Storage. Mihai Pop

COSC 6385 Computer Architecture. Storage Systems

The Google File System

Storage and File Structure

Storage systems. Computer Systems Architecture CMSC 411 Unit 6 Storage Systems. (Hard) Disks. Disk and Tape Technologies. Disks (cont.

Professor: Pete Keleher! Closures, candidate keys, canonical covers etc! Armstrong axioms!

CS 537 Fall 2017 Review Session

[537] RAID. Tyler Harter

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

CSE325 Principles of Operating Systems. Mass-Storage Systems. David P. Duggan. April 19, 2011

CS2410: Computer Architecture. Storage systems. Sangyeun Cho. Computer Science Department University of Pittsburgh

Failure is not an option... Disk Arrays Mar. 23, 2005

This Unit: Main Memory. Building a Memory System. First Memory System Design. An Example Memory System

V. Mass Storage Systems

Modern RAID Technology. RAID Primer A Configuration Guide

The What, Why and How of the Pure Storage Enterprise Flash Array. Ethan L. Miller (and a cast of dozens at Pure Storage)

CS3600 SYSTEMS AND NETWORKS

An Introduction to RAID

Name: Instructions. Problem 1 : Short answer. [48 points] CMU / Storage Systems 23 Feb 2011 Spring 2012 Exam 1

CSCI-GA Database Systems Lecture 8: Physical Schema: Storage

Main Points. File systems. Storage hardware characteristics. File system usage patterns. Useful abstractions on top of physical devices

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University

Clustering and Reclustering HEP Data in Object Databases

Computer Architecture Computer Science & Engineering. Chapter 6. Storage and Other I/O Topics BK TP.HCM

Disks. Storage Technology. Vera Goebel Thomas Plagemann. Department of Informatics University of Oslo

The Google File System

NPTEL Course Jan K. Gopinath Indian Institute of Science

Storage System COSC UCB

CSE 120. Operating Systems. March 27, 2014 Lecture 17. Mass Storage. Instructor: Neil Rhodes. Wednesday, March 26, 14

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

Chapter 6. Storage and Other I/O Topics. ICE3003: Computer Architecture Spring 2014 Euiseong Seo

Chapter 10: Mass-Storage Systems

Mass-Storage. ICS332 - Fall 2017 Operating Systems. Henri Casanova

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

Operating Systems 2010/2011

CA485 Ray Walshe Google File System

1. Introduction. Traditionally, a high bandwidth file system comprises a supercomputer with disks connected

Chapter 13: Mass-Storage Systems. Disk Scheduling. Disk Scheduling (Cont.) Disk Structure FCFS. Moving-Head Disk Mechanism

Chapter 13: Mass-Storage Systems. Disk Structure

Database Management Systems, 2nd edition, Raghu Ramakrishnan, Johannes Gehrke, McGraw-Hill

COS 318: Operating Systems. Journaling, NFS and WAFL

Transcription:

RAID and AutoRAID RAID background Problem: technology trends - computers getting larger, need more disk bandwidth - disk bandwidth not riding moore s law - faster CPU enables more computation to support storage - data intensive applications - Approaches: - SLED: single large expensive disk - RAID: redundant array of (independent, inexpensive) disks NOTE: - Disk arrays had been done before - Contribution of this paper is a taxonomy and a way to compare them and organize them Key ideas: - striping: write blocks of a file to multiple disks, can read/write in parallel - Redundancy: write extra data to extra disks for failure recovery. E.g. parity, ecc, duplicate data. Redundancy can improve performance have choice of disk (latency), 2 disks (throughput) Why arrays? - Cheaper disks - Lower power - Smaller enclosures - Higher reliability o Can survive a disk failure - Larger bandwidth o Can read or write multiple disks at a time How do you compare disk setups? - Price? - Power? 1

- Size? - Performance? o What performance? o Large reads o Small reads o Large writes o Small writes o Read / modify / write (TP) Organization: - take N disks, put into groups of G RAID versions: JBOD: just a bunch of disks, mount as separate volumes - Read / write performance for a file limited to single disk - Reliability for a byte is same as single disk, but file system can tolerate some disk failures with partial data loss RAID 0: striping - Striping data across disks - Best overall performance: G reads/sec, G writes/sec - Worst reliability: MTTF = MTTF(disk) / G RAID 1: mirroring - store all data on two disks - write to both disks - read from whichever disk is faster (better positioned) - Write performance = single disk - Read performance = double - Overhead is 100% RAID 2: bit-wise ECC - stripe data across disks in small units - Store ECC bitwise on a parity disk - All reads / writes hit all disks - Can detect / correct lots of errors - Bad performance - FILL ME PERF RAID 3: bit parity - rely on disk for error detection - Still read from all disks (but parity), write to all disks 2

- RAID 4: block parity - use single disk for error correction, rely on controllers for detection - Can read from a single disk (no need to compute ecc) - can write to two disks (data disk + update parity) - Bottleneck: single parity disk for all writes - Small writes require 4 accesses: read only block, old parity, write new block+ new parity RAID 5: distributed parity - same as level 4 but parity disk changes for each block - Removes hotspot of parity disk - Large writes efficient just one extra access for parity RAID 6: more error correction - 2 parity disks allows detection 2 disk failures - Throughput per dollar small read small write large read large write storage efficiency Reason raid 0 1 1 1 1 1 raid 1 1 ½ 1 ½ ½ extra disk raid 3 1/G 1/G (G-1)/G (G-1)/G (G-1)/G one disk doesn t contribute Raid 5 1 max(1/g, ¼) 1 (G-1)/G (G-1)/G Notes: Raid 2 inferior like raid 3 but more ECC drives (good with driver failure not identified). Raid 4 inferior to Raid 5 similar best case, but throughput limited by single parity disk Choices of RAID - QUESTION: what should you choose, when? - Issues: o Cost of disks is it relevant? Perhaps space/power more relevant o Workload: lots of small reads/writes indicates raid 1, lots of large reads and writes indicates 5 AutoRAID 3

1. AutoRAID problem a. RAID 1 provides best performance/reliability b. RAID 5 is more efficient cost wise i. Performance good for large reads/writes 1. bad for small writes c. Performance depends on i. Number of disks, size of groups d. Managing the variety of RAID configurations is hard i. Changing layout requires copying data off to another system ii. Adding a disk requires copying data off to another system iii. All disks must be the same size iv. hot spares for fast repair do nothing to improve performance e. NOTE: same thing is true to day with disk + flash storage i. Cannot migrate a device between the two ii. Within flash, different encoding mechansims (MLC vs SLC) possible with similar tradeoffs 2. Desired goal a. A bunch of disks b. A workload c. Storage system determines best configuration for the workload i. QUESTION: what is that? 1. Mirror as much as possible 2. Store cold (not overwritten) data to RAID 5 a. Raid 5 performance is fine for small/large reads b. QUESTION: Who should do this? i. Administrator? ii. File system? iii. RAID controller? 1. Depends on what you sell: want to reach as much of your customer base as possible a. Sun: File system b. IBM: administrator c. HP: RAID controller 3. Possible organizations a. Cache: treat some set of mirrored disks as a fast cache in front of RAID 5 i. QUESTION: Upsides/downsides? 1. Less capacity? a. ratio of RAID 1 to RAID 5 is 100% overhead to 10% 4

b. Performance ratio is 1-10x ii. QUESTION: what about for flash 1. Smaller volume of flash make caching more attractive 2. Separate physical device allows disk to be removed and be consistent while leaving cache behind b. Tiering: i. Data lives in either mirrored tier or raid 5 tier ii. Data moves between tiers but lives at only one place iii. NOTE: Apple s Fusion drive using flash does this 1. all writes go to flash. 2. When < 4GB left in flash, move data to disk to ensure 4 GB available 4. AutoRAID Layout Terminology a. Physical Layout: disk blocks grouped into: i. segments: contiguous range on one disk allocated to a stripe 1. RAID stripes write one segment per disk 2. Size chosen to get good sequential performance (large) but spread workload for small accesses (small) ii. Physical Extent (PEX): (largish) set of segments on one disk, unit of allocation to raid 5/mirroring iii. Physical extent group (PEG): set of PEXs on different disks with desired redundancy (all on different disks, correct # of different disks) b. Logical layout: relocation blocks (64 kb) i. Unit of storage that can be assigned to different places 1. Larger than a disk block for efficiency ii. Unit if address translation: AutoRAID stores a table saying where all RBs are stored (persistently) 5. AutoRAID mechanisms: a. Mirrored reads: i. Just like RAID 1 for disks in PEG b. RAID 5 reads i. Pretty much like RAID 5 for disks in PEG c. Writes i. Go to NVRAM buffering for low latency before going anywhere d. Demotion i. Move data from Mirrored PEG to RAID 5 PEG e. Promotion i. Delete/free data from RAID 5, re-allocate in mirrored ii. WHY not move data? 5

1. No point:read performance is the same; only benefit of RAID 1 is writes 2. QUESTION: for Flash with fast reads, would this change? 6. AutoRAID Policies a. Normal access: i. On a read: 1. Read data wherever it is ii. On a write: 1. All data goes to NVRAM 2. Then mirroring (unless array is full) b. Demotion: when are blocks demoted from mirroring to RAID 5? i. QUESTION: What kinds of blocks benefit from mirroring? 1. Frequently updated 2. Randomly written ii. Policy: least-recently-written 1. read access not matter (see above) c. Layout: how are blocks laid out? i. Mirroring: random access: find a free RB and write there (free block bitmap) ii. RAID 5: logging 1. Always write sequentially to RAID 5, try to fill a whole stripe a. Gives maximum write performance b. Avoids read/modify/write penalty (small write in RAID) c. If not full stripe: i. Recompute parity on the fly 2. Use address translation to locate data 3. Safe parity updates a. Problem: what if you crash between writing the data and the new parity? b. Answer: use address translation (like LFS)/no-overwrite updates i. Write new data ii. Write new parity iii. Update translation table to new data d. Cleaning i. Mirrored storage: no cleaning necessary; can just overwrite holes 1. Copying/compaction used to make free PEXs for RAID 5 6

a. Disks all start as mirrored, must compact to start making RAID 5 space. ii. RAID 5 storage: 1. QUESTION: When do holes occur? a. Data overwritten i. Now lives in mirrored tier ii. Or somewhere else in RAID 5 2. QUESTION: How often is this? a. Rarely: data in RAID 5 is rarely written 3. POLICY: a. Hole filling: for mostly utilized ranges, overwrite vacant spot with new RBs from mirrored tier b. Cleaning: LFS copy/compact 7. Interesting features: a. On read/write, no decision as to where to put data i. Reads in place ii. Writes to mirrored b. Data movement is all all asynchronous i. background demotion ii. No promotion c. No-overwrite for consistency i. Write new data/parity then update map d. NVRAM for low latency i. Holds blocks before written 1. Can buffer data to until demotion makes space e. Automatically balances Mirrored/RAID 5 space i. Uses as much of capacity as possible for mirroring 1. No idle spares ii. Demotes cold data f. 7