UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568 Part 6 Input/Output Israel Koren ECE568/Koren Part.6. Motivation: Why Care About I/O? CPU Performance: almost 5% per year I/O system performance limited by mechanical delays (e.g., disk I/O) < % per year (IO per sec) Amdahl's Law: system speed-up limited by the slowest part I/O bottleneck Diminishing fraction of time in CPU Diminishing value of faster CPUs ECE568/Koren Part.6.2 Page
I/O Systems Processor interrupts Cache Memory - I/O Bus Main Memory I/O I/O I/O Disks: Long-term, nonvolatile storage Large, inexpensive, also serve as slow level in the storage hierarchy ECE568/Koren Part.6.3 Disk Disk Graphics Network Disk Terminology Arm Head Sector Inner Track Outer Track Actuator Platter Several platters, with information recorded magnetically on both surfaces (usually) Bits recorded sequentially in tracks, divided into sectors (e.g., 52 Bytes) Actuator moves head (end of arm, /surface) over track ( seek ), select surface, wait for sector rotate under head, then read or write Cylinder : all tracks under heads ECE568/Koren Part.6.4 Page 2
Disk Device Performance Outer Track Inner Track Sector Spindle Head Arm Platter Actuator Disk Latency = Seek Time + Rotation Time + Transfer Time + Overhead Seek Time: depends on number of tracks arm moves Rotation Time: depends on speed disk rotates, how far sector is from head s current position Transfer Time: depends on data rate (bandwidth) of disk (bit density), size of request ECE568/Koren Part.6.5 Disk Device Performance Rotation Time: Avg. distance to sector from head /2 time of a rotation 72 Revolutions Per Minute 2 Rev/sec revolution = / 2 sec 8.33 milliseconds /2 rotation (revolution) 4.7 ms Seek Time: Average number of tracks arm moves Sum all possible seek distances from all possible tracks / Total_#_ops» Assumes random seek distance Disk industry standard benchmark j i Typical: ~8 ms Transfer rate -4 MByte/sec Capacity: s Gbytes to TeraBytes Quadruples every 2 years ECE568/Koren Part.6.6 Page 3
Track Example: Barracuda 8 Track Buffer Arm Head Sector Cylinder Platter 8.6 GB, 3.5 inch disk 2 platters, 24 surfaces 24,247 cylinders 7,2 RPM; (4.2 ms avg. latency) 7.4 ms avg. seek 65 MB/s. ms controller time Calculate time to read 64 KB (28 sectors) using specs Disk latency = avg. seek time + avg. rotational delay + transfer time + controller overhead = 7.4 ms +.5 * /(72 RPM) + 64 KB/(65 MB/s) +. ms =7.4 +4.2 +. +. ms =2.7 ms ECE568/Koren Part.6.7 source: www.seagate.com Disk Performance Example Recalculated Calculate again using /3 quoted seek time (not random, mostly to adjacent tracks), 3/4 of internal bandwidth (check bits and gaps between sectors) Disk latency = average seek time + average rotational delay + transfer time + controller overhead = (.33 * 7.4 ms) +.5 * /(72 RPM) + 64 KB / (.75 * 65 MB/s) +. ms = 2.5 ms +.5 /(72 RPM/(6ms/M)) + 64 KB / (47 KB/ms) +. ms = 2.5 + 4.2 +.4 +. ms = 8.2 ms (64% of 2.7) ECE568/Koren Part.6.8 Page 4
Large Arrays of Disks Servers and data-centers require large storage capacity but large arrays of disks have low reliability Reliability_disk = exp(-λt) where λ is a constant failure rate MTTF = Mean_Time_to_Failure = / λ A single disk has MTTF=5, hours 6 years Reliability of N disks = [exp(- λt)}^n = exp(-nλt) MTTF_array = / N λ For N=7 disks: 5,/7= 7 hours month Arrays (without redundancy) too unreliable ECE568/Koren Part.6.9 RAID Redundant Arrays of Independent Disks Files are "striped" across multiple disks with redundant data added Upon failure: Contents reconstructed from data redundantly stored in the array Redundancy yields high data availability service still provided to user, even if some components fail but Capacity penalty to store redundant info Performance penalty to update redundant info ECE568/Koren Part.6. Page 5
RAID : Disk Mirroring/Shadowing Each disk is fully duplicated onto its mirror Very high availability Bandwidth sacrifice on write: Logical write = two physical writes Reads may be optimized Most expensive solution: % capacity overhead (RAID 2 not interesting, so skip) ECE568/Koren Part.6. RAID 3: Parity Disk P Striped physical record P = sum mod 2 of other disks (parity) If disk fails, subtract P from sum of other disks to find missing information Capacity overhead Wider arrays reduce capacity overhead, but decrease reliability ECE568/Koren Part.6.2 Page 6
Inspiration for RAID 4 RAID 3 relies on parity disk to detect and correct errors on Read Must read from all disks every time But every sector has an error check field Rely on error check field to catch errors on read, not on the parity disk Use parity disk only for complete disk failure ECE568/Koren Part.6.3 Insides of of five disks RAID 4: High I/O Rate 2 3 D D D2 D3 P D4 D5 D6 D7 P 4 Increasing Logical Disk Address D8 D9 D D P Example: small read D & D5, large write D2-D5 ECE568/Koren Part.6.4 D2 D3 D4 D5 D6 D7 D8 D9 D2 D2 D22 D23 P... Disk. Columns.. P P Stripe Page 7
RAID 5: High I/O Rate Interleaved Parity 2 3 4 Independent writes possible because of of interleaved parity D D D2 D3 P D4 D5 D6 P D7 D8 D9 P D D Increasing Logical Disk Addresses Example: write to D & D5 uses disks,, 3 and 4 ECE568/Koren Part.6.5 D2 P D3 D4 D5 P D6 D7 D8 D9 D2 D2 D22 D23 P.... Disk. Columns.... System Availability: Orthogonal RAIDs Array RAID Group: data redundancy Common Support Components: fans, power supplies, controller, cables ECE568/Koren Part.6.6 Page 8
Summary: RAID Techniques Disk Mirroring, Shadowing (RAID ) Each disk is fully duplicated Logical write = two physical writes % capacity overhead High I/O Rate Parity Array (RAID 5) Interleaved parity blocks Independent reads and writes Logical write = 2 reads + 2 writes ECE568/Koren Part.6.7 Solid state drives- flash technology Organization similar to DRAM but basic cell is different ECE568/Koren Part.6.8 Page 9
ECE568/Koren Part.6.9 Solid state drives Most are currently based on NAND flash Read blocks (4K byte) Write erasure (multiple blocks) slower than Read SLC single layer cell; MLC multi-level cell SLC low density, read ~25µs write ~25µs; write endurance ~, cycles TLC (Triple level) high density, read ~75µs; write ~µs; write endurance ~, cycles Wear leveling re-mapping of the contents on every write avoid the high frequency of re-writes to certain blocks Random bit errors due the noisy environment in the chip and the weak signals from the cells Error correction (for transient and permanent bit failures) BCH code (a cyclic code based on finite-fields), e.g., can correct up to 55 bit errors in a 52 byte sector Solid-state vs. hard disk drives Start-up time Access latency Transfer rate Cost Power consumption Capacity Reliability Environment impact ECE568/Koren Part.6.2 Solid-state Drive Instantaneous.ms or less, random access -6 MB/sec $.37/GB About 5-% 52 GB and up Power outage may cause damage No sensitive and less noisy Hard Disk Drive Spin-up 3-2 msec, varies with relative position About 5 MB/sec $.5-./GB.35 2Watt 8 TB Mechanical failures, unpowered: long lifetime Sensitive to shocks and temperature Page