Lecture 23: I/O Redundant Arrays of Inexpensive Disks Professor Randy H Katz Computer Science 252 Spring 996 RHKS96
Review: Storage System Issues Historical Context of Storage I/O Storage I/O Performance Measures Secondary and Tertiary Storage Devices A Little Queuing Theory Processor Interface Issues I/O & Memory Buses RAID ABCs of UNIX File Systems I/O Benchmarks Comparing UNIX File System Performance RHKS96 2
Review: Busses Bus: a shared communication link between subsystems Disadvantage: a communication bottleneck, possibly limiting the maximum I/O throughput Bus speed is limited by physical factors Two generic types of busses: I/O and CPU Bus transaction: sending address & receiving or sending data RHKS96 3
Review: Bus Options Option High performance Low cost Bus width Separate address Multiplex address & data lines & data lines Data width Wider is faster Narrower is cheaper (eg, 32 bits) (eg, 8 bits) Transfer size Multiple words has Single-word transfer less bus overhead is simpler Bus masters Multiple Single master (requires arbitration) (no arbitration) Split Yes separate No continuous transaction? Request and Reply connection is cheaper packets gets higher and has lower latency bandwidth (needs multiple masters) Clocking Synchronous Asynchronous RHKS96 4
Review: 99 Bus Survey VME FutureBus Multibus II IPI SCSI Signals 28 96 96 6 8 Addr/Data mux no yes yes n/a n/a Data width 6-32 32 32 6 8 Masters multi multi multi single multi Clocking Async Async Sync Async either MB/s (ns, 25 37 2 25 5 (asyn) word) 5 (sync) 5ns word 29 55 = = ns block 279 952 4 = = 5ns block 36 28 33 = = Max devices 2 2 2 8 7 Max meters 5 5 5 5 25 Standard IEEE 4 IEEE 896 ANSI/IEEE ANSI X329 ANSI X33 296 RHKS96 5
Review: 993 I/O Bus Survey Bus SBus TurboChannel MicroChannel PCI Originator Sun DEC IBM Intel Clock Rate (MHz) 6-25 25-25 async 33 Addressing Virtual Physical Physical Physical Data Sizes (bits) 8,6,32 8,6,24,32 8,6,24,32,64 8,6,24,32,64 Master Multi Single Multi Multi Arbitration Central Central Central Central 32 bit read (MB/s) 33 25 2 33 Peak (MB/s) 89 84 75 (222) Max Power (W) 6 26 3 25 RHKS96 6
993 MP Server Memory Bus Survey Bus Summit Challenge XDBus Originator HP SGI Sun Clock Rate (MHz) 6 48 66 Split transaction? Yes Yes Yes? Address lines 48 4?? Data lines 28 256 44 (parity) Data Sizes (bits) 52 24 52 Clocks/transfer 4 5 4? Peak (MB/s) 96 2 56 Master Multi Multi Multi Arbitration Central Central Central Addressing Physical Physical Physical Slots 6 9 Busses/system 2 Length 3 inches 2? inches 7 inches RHKS96 7
Review: Improving Bandwidth of Secondary Storage Processor performance growth phenomenal I/O? I/O certainly has been lagging in the last decade Seymour Cray, Public Lecture (976) Also, I/O needs a lot of work David Kuck, Keynote Address, (988) RHKS96 8
Network Attached Storage Decreasing Disk Diameters 4"» "» 8"» 525"» 35"» 25"» 8"» 3"» high bandwidth disk systems based on arrays of disks Network provides well defined physical and logical interfaces: separate CPU and storage system! High Performance Storage Service on a High Speed Network Network File Services OS structures supporting remote file access 3 Mb/s» Mb/s» 5 Mb/s» Mb/s» Gb/s» Gb/s networks capable of sustaining high bandwidth transfers Increasing Network Bandwidth RHKS96 9
Manufacturing Advantages of Disk Arrays Disk Product Families Conventional: 4 disk designs 35 525 4 Low End High End Disk Array: disk design 35 RHKS96
Replace Small # of Large Disks with Large # of Small Disks! IBM 339 (K) IBM 35" 6 x7 Data Capacity 2 GBytes 32 MBytes 23 GBytes Volume 97 cu ft cu ft cu ft Power 3 KW W KW Data Rate 5 MB/s 5 MB/s 2 MB/s I/O Rate 6 I/Os/s 55 I/Os/s 39 IOs/s MTTF 25 KHrs 5 KHrs??? Hrs Cost $25K $2K $5K large data and I/O rates Disk Arrays have potential for high MB per cu ft, high MB per KW awful reliability RHKS96
Redundant Arrays of Disks Files are "striped" across multiple spindles Redundancy yields high data availability Disks will fail Contents reconstructed from data redundantly stored in the array Capacity penalty to store it Bandwidth penalty to update Mirroring/Shadowing (high capacity cost) Techniques: Horizontal Hamming Codes (overkill) Parity & Reed-Solomon Codes Failure Prediction (no capacity overhead!) VaxSimPlus Technique is controversial RHKS96 2
Array Reliability Reliability of N disks = Reliability of Disk N 5, Hours 7 disks = 7 hours Disk system MTTF: Drops from 6 years to month! Arrays without redundancy too unreliable to be useful! Hot spares support reconstruction in parallel with access: very high media availability can be achieved RHKS96 3
Redundant Arrays of Disks (RAID) Techniques Disk Mirroring, Shadowing Each disk is fully duplicated onto its "shadow" Logical write = two physical writes % capacity overhead Parity Data Bandwidth Array Parity computed horizontally Logically a single high data bw disk High I/O Rate Parity Array Interleaved parity blocks Independent reads and writes Logical write = 2 reads + 2 writes Parity + Reed-Solomon codes RHKS96 4
Problems of Disk Arrays: Small Writes RAID-5: Small Write Algorithm Logical Write = 2 Physical Reads + 2 Physical Writes D' D D D2 D3 P new data old data ( Read) old parity (2 Read) + XOR + XOR (3 Write) (4 Write) D' D D2 D3 P' RHKS96 5
Redundant Arrays of Disks RAID : Disk Mirroring/Shadowing recovery group Each disk is fully duplicated onto its "shadow" Very high availability can be achieved Bandwidth sacrifice on write: Logical write = two physical writes Reads may be optimized Most expensive solution: % capacity overhead Targeted for high I/O rate, high availability environments RHKS96 6
Redundant Arrays of Disks RAID 3: Parity Disk P logical record Striped physical records Parity computed across recovery group to protect against hard disk failures 33% capacity cost for parity in this configuration wider arrays reduce capacity costs, decrease expected availability, increase reconstruction time Arms logically synchronized, spindles rotationally synchronized logically a single high capacity, high transfer rate disk Targeted for high bandwidth applications: Scientific, Image Processing RHKS96 7
Redundant Arrays of Disks RAID 5+: High I/O Rate Parity A logical write becomes four physical I/Os Independent writes possible because of interleaved parity Reed-Solomon Codes ("Q") for protection during reconstruction Targeted for mixed applications D D D2 D3 P D4 D5 D6 P D7 D8 D9 P D D D2 P D3 D4 D5 P D6 D7 D8 D9 Increasing Logical Disk Addresses Stripe Stripe Unit D2 D2 D22 D23 P Disk Columns RHKS96 8
RA9: Ave Seek: 85 ms Rotation: 67 ms Xfer Rate: 28 MB/s Capacity: 2 MB IBM Small Disks: Ave Seek: 25 ms Rotation: 4 ms Xfer Rate: 24 MB/s Capacity: 3 MB Normal Operating Range of Most Existing Systems 4+2 Array Group Mirrored RA9's all writes all RHKS96 reads 9
Subsystem Organization host host adapter array controller single board disk controller manages interface to host, DMA control, buffering, parity logic physical device control striping software off-loaded from host to array controller no applications modifications no reduction of host performance single board disk controller single board disk controller single board disk controller often piggy-backed in small format devices RHKS96 2
System Availability: Orthogonal RAIDs String Controller Array Controller String Controller String Controller String Controller String Controller String Controller Data Recovery Group: unit of data redundancy Redundant Support Components: fans, power supplies, controller, cables End to End Data Integrity: internal parity protected data paths RHKS96 2
System-Level Availability host host I/O Controller Fully dual redundant I/O Controller Array Controller Array Controller Goal: No Single Points of Failure Recovery Group with duplicated paths, higher performance can be obtained when there are no failures RHKS96 22