Storage System COSC UCB - PDF Free Download

Storage System COSC4201 1 1999 UCB

I/O and Disks Over the years much less attention was paid to I/O compared with CPU design. As frustrating as a CPU crash is, disk crash is a lot worse. Disks are mechanical devices, bottleneck and according to Amdahl s law, diminishes the progress in CPU s Types: hard disks, optical disks, and tapes. 2 1999 UCB

I/O system Processor interrupts Cache Memory - I/O Bus Main Memory I/O Controller I/O Controller I/O Controller Disk Disk Graphics Network 3 1999 UCB

I/O System 4 1999 UCB

Magnetic Disks Arm Head Sector Inner Track Outer Track Actuator Platter Information recorded on several platters (both sides usually). Bits are recorded sequentially on tracks. Tracks are divided into sectors Heads are connected to arms that moves to position itself over the required track by actuators. Heads could be fixed (one per track) or movable. Cylinder is all the tracks under the head. Areal density: bits per square inch 5 1999 UCB

Magnetic Disks Typical values: Rotation speed 3600-15,000 RPM. Number of platters 1-12. Diameter 1-3.5 inches. Number of tracks per surface 5,000-30,000 Sectors: typically 512 bytes. Disk latency: seek timerotation timetransfer timecontroller overhead. 6 1999 UCB

Disk Performance Rotation delay: average is halfway around the disk, for 10,000RPM, the average rotation latency is 60*0.5/10000=3.0 ms. Seek time: average number of tracks arm moves. Typically 8ms. (overlaps with rotation time). Transfer time: 3-60 MB per second. Read ahead is used to cache nearby sectors in the disk cache (0.1-4 MB). More sectors may be on the outer track compared to inner track, constant bit density. (although not quite constant). Another solution is zone-bit recording. 7 1999 UCB

Optical Disks Compact Disks CD ROM, DVD ROM (Digital Versatile Disk). CD-RW, and WORM (Write Once Read Many) Magneto Optical Disks: uses an optical laser to enhance the capabilities of a magnetic disc system. Reading is optical: direction of magnetization can be detected by a polarized laser light. The disc is coated with a material whose polarity can be altered only at high temp. Laser is used to heat tiny spots and then applying a magnetic field. 8 1999 UCB

Magnetic Tapes. Same idea as magnetic disks. Sequential access. Helical scan tapes: information is recorded using a tape reader that spins much faster than the tape, and is recorded on a diagonal to the tape (one limit to tape drivers is the speed the tape can be spun without jamming). Tapes wear off: Automated tape library: A robot is used to automate loading and changing the tapes. 9 1999 UCB

RAID Redundant Arrays of Inexpensive Disks. Used to improve the system performance and reliability. The idea is to distribute the data among more than one physical disk. 10 1999 UCB

RAID Level 0 Is a misnomer, no redundancy Strips are distributed among many disks A strip is a block, or a sector Fast access since we can read many strips at the same time 11 1999 UCB

RAID Level 1 Mirroring Each disk is fully duplicated onto its mirror. Expensive 100% overhead Must write twice (but in parallel 12 1999 UCB

RAID Level 2 Raid level 2 performs stripping with a strip size of 1 bit or 1 byte. Must have extra disks to store error correcting codes. No commercial product was released for Level 1 RAID 13 1999 UCB

RAID Level 3 Data are stripped in small units. One extra parity disk. Spindles are synchronized (head is over the same sector in each disk). Works fine for access of big sequential data. Only 1 I/O request can be done at a time (not very good for transaction based environment). 14 1999 UCB

RAID Level 4 Similar to 3, but blocks are distributed on disks instead of bits. One disk is used for parity (potential bottleneck). Penalty for small write requests. For large writes, the parity is calculated from the written data (otherwise from the difference between old and new data). 15 1999 UCB

RAID Level 5 Parity is distributed over all disks (avoiding potential bottleneck) To write Strip 0 S 0. Must do Read S0, and P0 EXOR(S 0,S 0 ) and EXOR with P 0 write to P 0 Write S 0 16 1999 UCB

RAID Level 6 Uses 2 extra disks. Parity is distributed over all disks 2 different parity are used, one is the EX- OR, one is another parity. 3 disk failure in the MTTF in order to loose data, 17 1999 UCB

Reliability A fault creates one or more error. Errors are latent. Latent errors become effective once activated. If the error affects delivered service, a component failure. Module Reliability: is a measure of continuous service (MTTF) Availability = MTTF/(MTTFMTTR). 18 1999 UCB

Reliability If a collection of modules have an exponentially distributed lifetimes, then The overall failure rate is the some of failure rates for individual component. The failure rate, is the reciprocal of the MTTF 19 1999 UCB

Reliability Assume the following 10 disks, each has MTTF of 1,000,000 hours 1 SCSI controller, MTTF of 500,000 hours 1 Power supply, 250,000 hours MTTF 1 fan, with 200,000 hours MTTF 1 SCSI Cable with 1,000,000 hours MTTF Failure rate MTTF system system 1 1 = 10 6 10 500,000 23 = 1,000,000 hours = 1,000,000 23 1 200,000 = 43,500 hours = 5 years 1 200,000 1 10 6 20 1999 UCB

Example 2500 MIPS CPU for $20,000 16-byte-wide interleaved memory 10ns 1000MB/sec I/O bus with room for 20 Ultra3 SCSI busses and controllers Wide Ultra SCSI bus 160MB/sec (can support up to 15 disks pr bus called SCSI strings) A $500 Ultra3 SCSI controller.3 ms delay OS uses 50,000 instructions per IO Large 80, or 40GB disk $10 per GB $1500 enclosure power and cooling to 8 large disks or 12 small disks 15,000 RPM 5 msec seek 40MB/sec disk Storage must be 1920GB, with 32KB per IO 21 1999 UCB

Example CPU Memory 1000M B/B/sec I/O BUS SCSI controller SCSI controller SCSI controller 160MB/sec SCSI BUS 22 1999 UCB

Example Find the cost of IOPS (IO Per Sec, for both small and large disks (assume 100% utilization). We have a chain of components, CPU, OS, Memory, different busses, disks, and controllers. The performance of the system is limited by the weakest link in the chain. 23 1999 UCB

Example CPU 50,000 IOPS 25000MIPS Maximum IOPS = = 50,000 Instructions/IO 50,000IOPS For Memory 50,000 IOPS 16 /(10ns) Maximum IOPS = = 32KB per I/O 50,000IOPS 24 1999 UCB

Example I/O Bus 31,250 100MB / sec Maximum IOPS = = 32KB per I/O 31,250IOPS SCSI Controllers 2,000IOPS (this is per SCSI, we may have more than one) 32KB Time to transfer a block = = 0.2msec 160MB/sec Total time per block = 0.2 0.3 = 0.5msec. 1 5msec = 2000IOPS 25 1999 UCB

Example Now, the disks themselves. I/O time = 5 msec 0.5 15,000ROM 32KB 40MB / sec = 7.8msec. Maximum IOPS per disk is 128, that is of course assuming 100% utilization. Now how many disks we need, total capacity is 1920GB, so we need 24 large disks or 48 small disks. We have to be sure do we have enough SCSI strings. 26 1999 UCB

Example If we used 24 disks, the max. number of IOPS is 24 x 128=3072. If we use 48 disks the ma. Is 6144 IOPS For choice 1 (24 large disks) We need 24/15 = 2 SCSI strings. For choice 2 (small disks) We need 48/15=4 SCSI strings Both are O.K. although three enclosure for 2 SCSI controllers may not be the best way to go, increase them to 3 SCSI controllers. 27 1999 UCB

Example The limit is 3072 for large disks and 6144 for small disks. Now for the cost Large disks 20,0003*$50024(80x10)1500*3=$45,200 Small disks 20,0003*$50048(40x10)1500*4=47,200 28 1999 UCB

Example Calculate reliability assuming Component CPU/Memory MTTF (hours)disk 1,000,000 Disk 1,000,000 SCSI controller 500,000 Power supply 200,000 SCSI cable 1,000,000 Enclosure 1,000,000 Fan 200,000 29 1999 UCB

Example Consider failure rate Big disk 1 Big disk = 1,000,000 = 67 1,000,000 MTTF = 14,925 hours Small disk CPU disks controller enclosure power supply fan 24 1,000,000 3 500,000 3 2,00,000 3 2,00,000 3 2,00,000 cables 3 2,00,000 Big disk MTTF = = 1 1,000,000 105 1,000,000 = 9524 hours 48 1,000,000 4 500,000 4 2,00,000 4 2,00,000 4 2,00,000 4 2,00,000 30 1999 UCB

Example --Availability The configuration of the previous example has changed due to limitations on the utilization of the different parts. We did not cover that, but here is the final configuration for the next example (availability). 80GB disks 4 strings, 4 enclosures, 24 disks 40GB disks 8 strings, 4 enclosures, 48 disks 31 1999 UCB

Example -- Availability What about availability If we have n component, each has MMTF, the total MMTF n = MTTF/n For RAID, one data is lost if a second disk failed before the first disk is repaired. Probability of that is MTTR/MTTF MTDL = MTTF MTTR MTTF disk disk / disk /( G N 1) G is the number of disks in the group protected by a parity,n is the number of disks in the system. 32 1999 UCB

Orthogonal RAID String Controller... String Controller... Array Controller String Controller... String Controller... String Controller... String Controller... If a string controller fails, the system is available If a string controller fails, all disks in the group fails and data is lost 33 1999 UCB

Example -- Availability SO far, for large disks 4 strings 24 disks 4 enclosures For small disks 8 strings 48 disks 4 enclosures. 34 1999 UCB

Example --Availability I/O BUS SCSI SCSI SCSI SCSI 6 6 6 6 I/O BUS SS CC SIS 6I6 S C S I6 S C S I6 S C S I6 S C S I6 S C S I6 S C S I6 S C S I6 35 1999 UCB

Example --Availability Large Disks 4 enclosures, each one SCSI controller and 6 disks ADD 1 enclosure, controller and 6 disks Small disks 4 enclosures, each has 2 controllers with 12 disks ADD 1 enclosure 2 controllers with 12 disks each 36 1999 UCB

Example --Availability Now to calculate the MTTF per enclosure Disks SCSI controller power fan cable enclosure Enclosure Failure Rate big = 6 1,000,000 1 500,000 1 200,000 1 200,000 1 1,000,000 1 1,000,000 = 20 1,000,000 Enclosure Failure Rate small = 6 1,000,000 1 500,000 1 200,000 1 200,000 1 1,000,000 1 1,000,000 = 29 1,000,000 MTTF big = 50,000 hours MTTF small = 34,500 hours 37 1999 UCB

Example -- Availability Now, consider the MTDL, note that G=N=5. MTTR=24 hours. MTDL MTDL big small = 2 50,000 5 4 25 = 2 34,500 5 4 25 = 5,200,000 hours = 2,500,000 hours Cost big =20,0005x$50030x(80*10)5*1500 =$54,000 Cost small =20,00010*50060*(10*40)5*1500 =$56,500 The big disk costs $10 per 1000 hours of operation The small disk costs $23 per 1000 hours of operation 38 1999 UCB