Chapter 4: Mass-Storage Systems Logical Disk Structure Logical Disk Structure Disk Schedulig Disk Maagemet RAID Structure Disk drives are addressed as large -dimesioal arrays of logical blocks, where the logical block is the smallest uit of trasfer. I the simplest arragemet, the -dimesioal array of logical blocks is mapped ito the sectors of the disk sequetially. Sector is the first sector of the first track o the outermost cylider. Mappig proceeds i order through that track, the the rest of the tracks i that cylider, ad the through the rest of the cyliders from outermost to iermost. Operatig System Cocepts 3. Silberschatz, Galvi ad Gage 22 Operatig System Cocepts 3.2 Silberschatz, Galvi ad Gage 22 Disk Schedulig Disk Schedulig (Cot.) The operatig system is resposible for usig hardware efficietly for the disk drives, this meas havig a fast access time ad high disk badwidth. Access time has two major compoets Seek time is the time for the disk are to move the heads to the cylider cotaiig the desired sector. Rotatioal latecy is the additioal time waitig for the disk to rotate the desired sector to the disk head. Miimize seek time Seek time ª seek distace Disk badwidth is the total umber of bytes trasferred divided by time to trasfer the data Several algorithms exist to schedule the servicig of disk I/O requests. We illustrate them with a request queue to access logical blocks -99. 98, 83, 37, 22, 4, 24, 65, 67 Head poiter curretly at block 53 Operatig System Cocepts 3.3 Silberschatz, Galvi ad Gage 22 Operatig System Cocepts 3.4 Silberschatz, Galvi ad Gage 22 CS SST Illustratio shows total head movemet of 64 cyliders. Selects the request with the miimum seek time from the curret head positio. SST schedulig is a form of SJ schedulig; may cause starvatio of some requests. Illustratio shows total head movemet of 236 cyliders. Operatig System Cocepts 3.5 Silberschatz, Galvi ad Gage 22 Operatig System Cocepts 3.6 Silberschatz, Galvi ad Gage 22
SST (Cot.) SCAN The disk arm starts at oe ed of the disk, ad moves toward the other ed, servicig requests util it gets to the other ed of the disk, where the head movemet is reversed ad servicig cotiues. Sometimes called the elevator algorithm. Illustratio shows total head movemet of 28 cyliders. Operatig System Cocepts 3.7 Silberschatz, Galvi ad Gage 22 Operatig System Cocepts 3.8 Silberschatz, Galvi ad Gage 22 SCAN (Cot.) C-SCAN rovides a more uiform wait time tha SCAN. The head moves from oe ed of the disk to the other. servicig requests as it goes. Whe it reaches the other ed, however, it immediately returs to the begiig of the disk, without servicig ay requests o the retur trip. Treats the cyliders as a circular list that wraps aroud from the last cylider to the first oe. Operatig System Cocepts 3.9 Silberschatz, Galvi ad Gage 22 Operatig System Cocepts 3. Silberschatz, Galvi ad Gage 22 C-SCAN (Cot.) C-LOOK Versio of C-SCAN Arm oly goes as far as the last request i each directio, the reverses directio immediately, without first goig all the way to the ed of the disk. Operatig System Cocepts 3. Silberschatz, Galvi ad Gage 22 Operatig System Cocepts 3.2 Silberschatz, Galvi ad Gage 22 2
C-LOOK (Cot.) Selectig a Disk-Schedulig Algorithm SST is commo ad has a atural appeal SCAN ad C-SCAN perform better for systems that place a heavy load o the disk. Either SST or LOOK is a reasoable choice for the default algorithm. erformace depeds o the umber ad types of requests. Requests for disk service ca be iflueced by the file-allocatio method. The disk-schedulig algorithm is writte as a separate module of the operatig system, allowig it to be replaced with a differet algorithm if ecessary. Operatig System Cocepts 3.3 Silberschatz, Galvi ad Gage 22 Operatig System Cocepts 3.4 Silberschatz, Galvi ad Gage 22 hysical Disk Maagemet MS-DOS Disk Layout Low-level formattig, or physical formattig Dividig a disk ito sectors that the disk cotroller ca read ad write. To use a disk to hold files, the operatig system still eeds to record its ow data structures o the disk. artitio the disk ito oe or more groups of cyliders. Logical formattig or makig a file system. Boot block iitializes system. The bootstrap is stored i ROM. Bootstrap loader program. Operatig System Cocepts 3.5 Silberschatz, Galvi ad Gage 22 Operatig System Cocepts 3.6 Silberschatz, Galvi ad Gage 22 RAID Structure RAID (cot) RAID multiple disk drives provides reliability via redudacy. Several improvemets i disk-use techiques ivolve the use of multiple disks workig cooperatively. RAID is arraged ito six differet levels. Disk stripig spreads the blocks i a file across multiple disks i certai patters. RAID schemes improve performace ad improve the reliability of the storage system by storig redudat data. Mirrorig or shadowig keeps duplicate of each disk. Block iterleaved parity uses much less redudacy. Operatig System Cocepts 3.7 Silberschatz, Galvi ad Gage 22 Operatig System Cocepts 3.8 Silberschatz, Galvi ad Gage 22 3
RAID Levels Data is Striped for improved performace Distributes data over multiple disks to make them appear as a sigle fast large disk Allows multiple I/Os to be serviced i parallel 4 Multiple idepedet requests serviced i parallel 4 A block request may be serviced i parallel by multiple disks Data is Redudat for improved reliability Large umber of disks i a array lowers the reliability of the array 4 Reliability of N disks = Reliability of disk /N 4 Example: 5, hours / 7 disks = 7 hours Disk System MTT drops from 6 years to moth Arrays without redudacy are ureliable to be useful Operatig System Cocepts 3.9 Silberschatz, Galvi ad Gage 22 Operatig System Cocepts 3.2 Silberschatz, Galvi ad Gage 22RAID Disk Disk 2 Disk 3 Disk 4 RAID (No-redudat) Stripes Data; but does ot employ redudacy Lowest cost of ay RAID D D D2 D3 Best Write performace - o redudat iformatio Ay sigle disk failure is catastrophic Used i eviromets where performace is more importat tha reliability. Stripe Uit D4 D5 D6 D7 Stripe D8 D9 D D D2 D3 D4 D5 D6 D7 D8 D9 Operatig System Cocepts 3.2 Silberschatz, Galvi ad Gage 22RAID Operatig System Cocepts 3.22 Silberschatz, Galvi ad Gage 22RAID RAID (Mirrored) Uses twice as may disks as o-redudat arrays - % Capacity Overhead - Two copies of data are maitaied Data is simultaeously writte to both arrays RAID 2 (Memory Style ECC) Uses Hammig code - parity for distict overlappig subsets of data # of redudat disks is proportioal to log of total # of disks - Better for large # of disks - e.g., 4 data disks require 3 redudat disks Data is read from the array with shorter queuig, seek ad rotatio delays - Best Read erformace. Whe a disk fails, mirrored copy is still available If disk fails, other data i subset is used to regeerate lost data Multiple redudat disks are eeded to idetify faulty disk Used i eviromets where availability ad performace (I/O rate) are more importat tha storage efficiecy. Operatig System Cocepts 3.23 Silberschatz, Galvi ad Gage 22RAID Operatig System Cocepts 3.24 Silberschatz, Galvi ad Gage 22RAID 4
RAID 3 (Bit Iterleaved arity) RAID 4 (Block Iterleaved arity) Data is bit -wise over the data disks Similar to bit-iterleaved parity disk array; except data is block- iterleaved (Stripig Uits) Uses Sigle parity disk to tolerate disk failures - Overhead is /N Read requests smaller tha oe stripig uit, access oe Stripig uit Logically a sigle high capacity, high trasfer rate disk Write requests update the data block; ad the parity block. Reads access data disks oly; Writes access both data ad parity disks Geeratig parity requires 4 I/O accesses (RMW) Used i eviromets that require high BW (Scietific, Image rocessig, etc.), ad ot high I/O arity disk gets updates o all writes - a bottleeck rates Operatig System Cocepts 3.25 Silberschatz, Galvi ad Gage 22RAID Operatig System Cocepts 3.26 Silberschatz, Galvi ad Gage 22RAID RAID 5 (Block-Iterleaved Distributed arity) Elimiates the parity disk bottleeck i RAID 4 - Distributes parity amog all the disks Data is distributed amog all disks D D D2 D3 All disks participates i read requests - Better performace tha RAID 4 Write requests update the data block; ad the parity block. Geeratig parity requires 4 I/O accesses (RMW) Left symmetry v.s. Right Symmetry - Allows each disk to be traversed oce before ay disk twice Stripe Uit D4 D5 D6 D7 Stripe D8 D9 D D D2 D3 D4 D5 D6 D7 D8 D9 Operatig System Cocepts 3.27 Silberschatz, Galvi ad Gage 22RAID Operatig System Cocepts 3.28 Silberschatz, Galvi ad Gage 22RAID New Data D D D D2 D3 + Old Data. Read Old arity (2. Read) RAID 6 ( + Q Redudacy) Uses Reed-Solomo codes to protect agaist up to 2 disk failures Data is distributed amog all disks Two sets of parity & Q Write requests update the data block; ad the parity blocks. Geeratig parity requires 6 I/O accesses (RMW) - update both & Q Used i eviromets that require striget reliability requiremets 3. Write New Data + 4. Write New arity D D D2 D3 Operatig System Cocepts 3.29 Silberschatz, Galvi ad Gage 22RAID Operatig System Cocepts 3.3 Silberschatz, Galvi ad Gage 22RAID 5
Comparisos Comparisos Read/Write erformace referred Eviromets RAID : erformace & capacity are more importat tha reliability 4 RAID provides the best Write performace RAID : High I/O rate, high availability eviromets 4 RAID provides the best Read erformace RAID 2: Large I/O Data Trasfer Cost - Total # of Disks RAID 3: High BW Applicatios (Scietific, Image rocessig ) 4 RAID is most expesive - % capacity overhead - 2N Disks RAID 4: High bit BW Applicatios 4 RAID is least expesive - N Disks - o redudacy RAID 5 & RAID 6: Mixed Applicatios 4 RAID 2 eeds N + ceilig(log 2 N) + 4 RAID 3, RAID 4 & RAID 5 eeds N + disks Operatig System Cocepts 3.3 Silberschatz, Galvi ad Gage 22RAID Operatig System Cocepts 3.32 Silberschatz, Galvi ad Gage 22RAID The table below, which shows Throughput per $$ relative to RAID, assumes that G drives i a error correctig group RAID Level Small Reads Small Writes Large Reads Large Writes Storage Efficiecy RAID RAID /2 /2 /2 RAID 3 /G /G (G-)/G (G-)/G (G-)/G RAID 5 max(/g,/4) (G-)/G (G-)/G What RAID for which applicatio ast Workstatio: 4 Cachig is importat to improve I/O rate 4 If large files are istalled, the RAID may be ecessary 4 It is preferred to put the OS ad swap files i separate drives from user drives to miimize movemet betwee swap file area & user area. Small Server: 4 RAID is preferred Mid-Size Server: 4 If more capacity is eeded, the RAID 5 is recommeded Large Server: e.g. Database Servers 4 RAID 5 is preferred 4 Separate differet I/Os i mechaically idepedet arrays; place idex & data files i databases i differet arrays RAID 6 max(/g,/6) (G-2)/G (G-2)/G Operatig System Cocepts 3.33 Silberschatz, Galvi ad Gage 22RAID Operatig System Cocepts 3.34 Silberschatz, Galvi ad Gage 22RAID rice per Megabyte of DRAM, rom 98 to 2 rice per Megabyte of Magetic Hard Disk, rom 98 to 2 Operatig System Cocepts 3.35 Silberschatz, Galvi ad Gage 22 Operatig System Cocepts 3.36 Silberschatz, Galvi ad Gage 22 6
rice per Megabyte of a Tape Drive, rom 984-2 Operatig System Cocepts 3.37 Silberschatz, Galvi ad Gage 22 7