Review: Major Components of a Computer Processor Control Datapath Cache Memory Main Memory Secondary Memory (Disk) Devices Output Input Magnetic Disk Purpose Long term, nonvolatile storage Lowest level in the memory hierarchy Sector» slow, large, inexpensive Track General structure A rotating platter coated with a magnetic surface A moveable read/write head to access the information on the disk Typical numbers to platters (each platter has two recordable disk surfaces) per disk of to 5.25 in diameter (3.5 dominate in 200) The stack of platters is rotated at speeds of 5,00 to 5,000 RPM 0,000 to 50,000 tracks per surface» cylinder - all the tracks under the head at a given point on all surfaces 00 to 500 sectors per track» the smallest unit that can be read/written (typically 52B) 2 Magnetic Disk Characteristic Disk read/write components. Seek time: position the head over the proper track (3 to ms avg)» due to locality of disk references the actual average seek time may be only 25% to 33% of the advertised number Controller + Cache Head 2. Rotational latency: wait for the desired sector to rotate under the head (½ of /RPM converted to ms)» 0.5/500RPM = 5.6ms to 0.5/5000RPM = 2.0ms 3. Transfer time: transfer a block of bits (one or more sectors) under the head to the disk controller s cache (30 to 80 MB/s are typical disk transfer rates)» the disk controller s cache takes advantage of spatial locality in disk accesses cache transfer rates are much faster (e.g., 320 MB/s). Controller time: the overhead the disk controller imposes in performing a disk I/O access (typically <.2 ms) 3 Track Sector Cylinder Platter Typical Disk Access Time The average time to read or write a 52B sector for a disk rotating at 0,000RPM with average seek time of 6ms, a 50MB/sec transfer rate, and a 0.2ms controller overhead If the measured average seek time is 25% of the advertised average seek time, then The rotational latency is usually the largest component of the access time Page
Typical Disk Access Time The average time to read or write a 52B sector for a disk rotating at 0,000RPM with average seek time of 6ms, a 50MB/sec transfer rate, and a 0.2ms controller overhead Avg disk read/write = 6.0ms + 0.5/(0000RPM/(60sec/minute) )+ 0.5KB/(50MB/sec) + 0.2ms = 6.0 + 3.0 + 0.0 + 0.2 = 9.2ms If the measured average seek time is 25% of the advertised average seek time, then Avg disk read/write =.5 + 3.0 + 0.0 + 0.2 =.7ms The rotational latency is usually the largest component of the access time 5 Magnetic Disk Examples (www.seagate.com) Characteristic Seagate ST37 Disk diameter (inches) 3.5 Capacity (GB) 73. # of surfaces (heads) 8 Rotation speed (RPM) 5,000 Transfer rate (MB/sec) 57-86 Minimum seek (ms) 0.2-0. Average seek (ms) 3.6-3.9 MTTF (hours@25 o C),200,000 Dimensions (inches) x x5.8 GB/cu.inch 3 Power: op/idle/sb (watts) 20/2/- GB/watt Weight (pounds).9 Seagate ST32 3.5 200 7,200 32-58.0-.2 8.5-9.5 600,000 x x5.8 9 2/8/ 6. 6 Seagate ST9 2.5 0 2 5,00 3.5-2.0 2-330,000 0. x2.7 x3.9 0 2.//0. 7 0.2 Disk Latency & Bandwidth Milestones RSpeed (RPM) Year Capacity (Gbytes) Diameter (inches) Interface Bandwidth (MB/s) Latency (msec) CDC Wren 3600 983 0.03 5.25 ST-2 0.6 8.3 SG ST 500 990 5.25 7. 2.7 Disk latency is one average seek time plus the rotational latency. Disk bandwidth is the peak transfer time of formatted data from the media (not from the cache).. SG ST5 7200 99.3 3.5 9 SG ST39 0000 998 9. 3.0 2 8.8 7 SG ST37 5000 2003 73. 2.5 86 5.7 Patterson, CACM Vol 7, #0, 200 Latency & Bandwidth Improvements In the time that the disk bandwidth doubles the latency improves by a factor of only.2 to. 00 80 60 0 20 0 Bandwidth (MB/s) Latency (msec) 983 990 99 998 2003 Year of Introduction 8 Page 2
9 0 Aside: Media Bandwidth/Latency Demands Dependability, Reliability, Availability Bandwidth requirements High quality video» Digital data = (30 frames/s) (60 x 80 pixels) (2-b color/pixel) = 22 Mb/s (27.625 MB/s) High quality audio» Digital data = (,00 audio samples/s) (6-b audio samples) (2 audio channels for stereo) =. Mb/s (0.75 MB/s) Compression reduces the bandwidth requirements considerably Latency issues How sensitive is your eye (ear) to variations in video (audio) rates? How can you ensure a constant rate of delivery? How important is synchronizing the audio and video streams?» 5 to 20 ms early to 30 to 0 ms late is tolerable Reliability measured by the mean time to failure (MTTF). Service interruption is measured by mean time to repair (MTTR) Availability a measure of service accomplishment Availability = MTTF/(MTTF + MTTR) To increase MTTF, either improve the quality of the components or design the system to continue operating in the presence of faulty components. Fault avoidance: preventing fault occurrence by construction 2. Fault tolerance: using redundancy to correct or bypass faulty components (hardware) Fault detection versus fault correction Permanent faults versus transient faults RAIDs: Disk Arrays Redundant Array of Inexpensive Disks RAID: Level 0 (No Redundancy; Striping) blk blk2 blk3 blk 2 Arrays of small and inexpensive disks Increase potential throughput by having many disk drives» Data is spread over multiple disk» Multiple accesses are made to several disks at a time Reliability is lower than a single disk But availability can be improved by adding redundant disks (RAID) Lost information can be from redundant information MTTR: mean time to repair is in the order of hours MTTF: mean time to failure of disks is tens of years Multiple smaller disks as opposed to one big disk Spreading the blocks over multiple disks striping means that multiple blocks can be accessed in parallel increasing the performance» A disk system gives four times the throughput of a disk system Same cost as one big disk assuming small disks cost the same as one big disk No redundancy, so what if one disk fails? Failure of one or more disks is more likely as the number of disks in the system increases Page 3
3 RAID: Level (Redundancy via Mirroring) RAID: Level 0+ (Striping with Mirroring) blk. blk.2 blk.3 blk. blk. blk.2 blk.3 blk. blk blk2 blk3 blk blk blk2 blk3 blk redundant (check) data redundant (check) data Uses twice as many disks as RAID 0 (e.g., 8 smaller disks with second set of duplicating the first set) so there are always two copies of the data # redundant disks = # of data disks so twice the cost of one big disk» writes have to be made to both sets of disks, so writes would be only /2 the performance of RAID 0 What if one disk fails? If a disk fails, the system just goes to the mirror for the data Combines the best of RAID 0 and RAID, data is striped across four disks and mirrored to four disks Four times the throughput (due to striping) # redundant disks = # of data disks so twice the cost of one big disk» writes have to be made to both sets of disks, so writes would be only /2 the performance of RAID 0 What if one disk fails? If a disk fails, the system just goes to the mirror for the data RAID: Level 2 (Redundancy via ECC) RAID2 has fallen into disuse (skip) blk,b0 blk,b blk,b2 blk,b3 Checks,5,6,7 Checks 2,3,6,7 Checks,3,5,7 0 0 0 3 5 6 0 7 2 ECC disks ECC disks and 2 point to either data disk 6 or 7, but ECC disk says disk 7 is okay, so disk 6 must be in error ECC disks contain the parity of data on a set of distinct overlapping disks # redundant disks = 3 for data disks; almost twice the cost of one big disk» writes require computing parity to write to the ECC disks» reads require reading ECC disk and confirming parity 5 RAID: Level 3 (Bit-Interleaved Parity) blk,b0 blk,b blk,b2 blk,b3 0 0 (odd) bit parity disk Cost of higher availability is reduced to /N where N is the number of disks in a protection group # redundant disks = # of protection groups» writes require writing the new data to the data disk as well as computing the parity, meaning reading the other disks, so that the parity disk can be updated» reads require reading all the operational data disks as well as the parity disk to calculate the missing data that was stored on the failed disk 6 Page
RAID: Level 3 (Bit-Interleaved Parity) blk,b0 blk,b blk,b2 blk,b3 0 0 disk fails (odd) bit parity disk Cost of higher availability is reduced to /N where N is the number of disks in a protection group # redundant disks = # of protection groups» writes require writing the new data to the data disk as well as computing the parity, meaning reading the other disks, so that the parity disk can be updated» reads require reading all the operational data disks as well as the parity disk to calculate the missing data that was stored on the failed disk 7 RAID: Level (Block-Interleaved Parity) blk blk2 blk3 blk block parity disk Cost of higher availability still only /N but the parity is stored as blocks associated with sets of data blocks Four times the throughput (striping) # redundant disks = # of protection groups Supports small reads and small writes (reads and writes that go to just one (or a few) data disk in a protection group)» by watching which bits change when writing new information, need only to change the corresponding bits on the parity disk» the parity disk must be updated on every write, so it is a bottleneck for back-to-back writes 8 RAID 3 small writes New D data Small Writes 9 RAID: Level 5 (Distributed Block-Interleaved Parity) 20 D D2 D3 D P 3 reads and 2 writes involving all the disks RAID small writes New D data 2 reads and 2 writes involving just two disks D D2 D3 D P D D2 D3 D P D D2 D3 D P one of these assigned as the block parity disk Cost of higher availability still only /N but the parity block can be located on any of the disks so there is no single bottleneck for writes Still four times the throughput (striping) # redundant disks = # of protection groups Supports small reads and small writes (reads and writes that go to just one (or a few) data disk in a protection group) Allows multiple simultaneous writes as long as the accompanying parity blocks are not located on the same disk Page 5
2 22 Distributing Parity Blocks RAID RAID 5 Summary 2 3 P0 5 6 7 8 P 9 0 2 P2 3 5 6 P3 2 3 P0 5 6 7 P 8 9 0 P2 2 3 P3 5 6 By distributing parity blocks to all disks, some small writes can be performed in parallel Four components of disk access time: Seek Time: advertised to be 3 to ms but lower in real systems Rotational Latency: 5.6 ms at 500 RPM and 2.0 ms at 5000 RPM Transfer Time: 30 to 80 MB/s Controller Time: typically less than.2 ms RAIDS can be used to improve availability RAID 0 and RAID 5 widely used in servers, one estimate is that 80% of disks in servers are RAIDs RAID (mirroring) EMC, Tandem, IBM RAID 3 Storage Concepts RAID Network Appliance RAIDS have enough redundancy to allow continuous operation, but not hot swapping Page 6