High-Performance Storage Systems

I/O Systems Processor interrupts Cache Memory - I/O Bus Main Memory I/O Controller I/O Controller I/O Controller Disk Disk Graphics Network 2

Storage Technology Drivers Driven by the prevailing computing paradigm 95s: migration from batch to on-line processing 99s: migration to ubiquitous computing computers in phones, books, cars, video cameras, nationwide fiber optical network with wireless tails Today: digital media everywhere Digital forms of voice, picture, and video Data from scientific computing such as earthquake simulation, high energy physical experiments, bioinformatics In forms of personal storages, web server, peer-to-peer storage, grid storage Effects on storage industry: Embedded storage smaller, cheaper, more reliable, lower power Data utilities high capacity, hierarchically managed storage 3

Magnetic Disks Purpose: Long-term, nonvolatile storage Large, inexpensive, slow level in the storage hierarchy Characteristics: Seek Time (~8 ms avg) positional latency rotational latency Transfer rate -4 MByte/sec Blocks Capacity Gigabytes Quadruples every 2 years Head Track Sector Cylinder Platter 72 RPM = 2 RPS => 8 ms per rev ave rot latency = 4 ms 28 sectors per track => 25 ms per sector KB per sector => 6 MB / s Response time = Queue + Controller + Seek + Rot + Xfer Service time 4

Photo of Disk Head, Arm, Actuator Spindle Arm Head Actuator Platters (2) 5

Seagate Barracuda 8 Track Buffer { per access + per byte Track Sector Cylinder Arm Head Platter Latency = Queuing Time + Controller time + Seek Time + Rotation Time + Size / Bandwidth source: wwwseagatecom 86 GB, 35 inch disk 2 platters, 24 surfaces 24,247 cylinders 7,2 RPM; (42 ms avg latency) 74/82 ms avg seek (r/w) 64 to 35 MB/s (internal) ms controller time 3 watts (idle) 6

Disk Performance Factors Actual disk seek and rotation time depends on the current head position Seek time: how far is the head to the track? Disk industry standard: assume random position of the head, eg, average 8ms seek time In practice: disk accesses have locality Rotation time: how far is the head to sector? Can safely assume ½ of rotation time (disk keeps rotating) Revolutions Per Minute 6667 Rev/sec revolution = / 6667 sec 6 ms /2 rotation (revolution) 3 ms Data Transfer time: What are the rotation speed, disk density, and sectors per transfer? RPM a track of data per 6 ms Outer tracks are longer and may support higher bandwidth 7

Disk Performance Example Rule of Thumb: Observed average seek time is typically about /4 to /3 of quoted seek time (ie, 3X-4X faster) Rule of Thumb: disks deliver about 3/4 of internal media rate (3X slower) for data Calculate time to read 64 KB for UltraStar 72, using /3 quoted 74ms seek time, 3/4 of 64MB/s internal outer track bandwidth Disk latency = average seek time + average rotational delay + transfer time + controller overhead = (33 * 74 ms) + 5 * /(72 RPM/(6ms/M)) + 64 KB / (75 * 65 MB/s) + ms = 25 ms + 5 /(72 RPM/(6ms/M)) + 64 KB / (47 KB/ms) + ms = 25 + 42 + 4 + ms = 82 ms (64% of 27) 8

Disk Characteristics in 2 Disk diameter (inches) Formatted data capacity (GB) Seagate Cheetah ST7344LC Ultra6 SCSI IBM Travelstar 32GH DJSA - 232 ATA-4 IBM GB Microdrive DSCM- 35 25 734 32 Cylinders 4, 2,664 7,67 Disks 2 4 Recording Surfaces (Heads) 24 8 2 Bytes per sector 52 to 496 52 52 Avg Sectors per track (52 byte) Max areal density(gbit/sqin) ~ 424 ~ 36 ~ 4 6 4 52 $828 $447 $435 9

Disk Performance/Cost Trends Capacity + %/year (2X / yrs) Transfer rate (BW) + 4%/year (2X / 2 yrs) Rotation + Seek time 8%/ year (/2 in yrs) MB/$ > %/year (2X / yrs) Fewer chips + areal density Seagate 2GB Internal Hard Drive ST3226A, $5 at staple (list price) Maxtor 2GB 8MB Cache Hard Drive $5984 after rebate at OfficeDepot IBM Microdrive

Disk System Performance System-level Metrics: Response Time Throughput Response time = Queue + Controller + service time ( ) 3 2 Response Time (ms) % Throughput (% total BW) % Queue Proc IOC Device Response time = Queue + Device Service time

How About Queuing Time? Queuing time can be the most significant one in disk response time Arrivals Departures More interested in long term, steady state than in startup => Arrivals = Departures Little s Law: Mean number tasks in system = arrival rate x mean reponse time Applies to any system in equilibrium, as long as nothing in black box is creating or destroying tasks 2

A Little Queuing Theory: Notation Queue System server Proc IOC Device Queuing models assume state of equilibrium: input rate = output rate Notation: r average number of arriving customers/second T ser average time to service a customer (tradtionally µ= / T ser ) u server utilization (): u = r x T ser (or u = r / µ ) T q average time/customer in queue =T s er x u / ( u) T sys average time/customer in system: T sys = T q + T ser L q average length of queue: L q = r x T q L sys average length of system: L sys = r x T sys Little s Law: Length server = rate x Time server (Mean number customers = arrival rate x mean service time) 3

A Little Queuing Theory: Example Processor sends 5 x 8KB disk I/Os per sec, requests & service exponentially distrib, avg disk service = 2 ms On average, how is the disk utilized? What is the number of requests in the queue? What is the average time a spent in the queue? What is the average response time for a disk request? Notation: r average number of arriving customers/second= 5 T ser average time to service a customer= 2 ms u server utilization (): u = r x T ser = 5/s x 2s = 6 T q average time/customer in queue = T s er x u / ( u) = 2x 6/(-6) = 2x5 = 8 ms T sys L q average time/customer in system: T sys =T q +T ser = 3 ms average length of queue:l q = r x T q = 5/s x 8s = 9 requests in queue L sys average # tasks in system : L sys = r x T sys = 5/s x 3s = 5 Look into textbook when you need to work on I/O 4

How to build Large Storage: Disk Array Array Controller String Controller String Controller String Controller String Controller String Controller String Controller Not practical to build large disks 5

Array Reliability Reliability of N disks = Reliability of Disk N 5, Hours 7 disks = 7 hours Disk system MTTF: Drops from 6 years to month! (MTTF: Mean Time to Failure) Arrays (without redundancy) too unreliable to be useful! Solution: RAID -- Redundant Arrays of Inexpensive Disks 6

RAID: The Idea logical record Striped physical records P contains sum of other disks per stripe mod 2 ( parity ) If disk fails, subtract P from sum of other disks to find missing information P RAID-3 shown 7

RAID 4: High I/O Rate Parity Insides of 5 disks D D D2 D3 P D4 D5 D6 D7 P Increasing Logical Disk Address Example: small read D & D5, large write D2-D5 D8 D9 D D P D2 D3 D4 D5 D6 D7 D8 D9 D2 D2 D22 D23 P Disk Columns P P Stripe 8

RAID 5: High I/O Rate Interleaved Parity Independent writes possible because of of interleaved parity Example: write to D, D5 uses disks,, 3, 4 D D D2 D3 P D4 D5 D6 P D7 D8 D9 P D D D2 P D3 D4 D5 P D6 D7 D8 D9 D2 D2 D22 D23 P Disk Columns Increasing Logical Disk Addresses No disk hot spot! 9

Future Storage Trends Disks: Extraodinary advance in capacity/drive, $/GB Currently 7 Gbit/sq inch; can continue past Gbit/sq inch? Bandwidth, seek time not keeping up: 35 inch form factor makes sense? 25 inch form factor in near future? inch form factor in long term? Tapes Old technique, no investment in innovation Are they already dead? What is a tapeless backup system? Other Storage CD/DVD Compact Flash, USB key storage, MRAM 2