// CS 6C: Great Ideas in Computer Architecture (Machine Structures) Project Speed- up and RAID Instructors: Randy H. Katz David A. PaGerson hgp://inst.eecs.berkeley.edu/~cs6c/fa // Fall - - Lecture #9 // Fall - - Lecture #9 Project #, Part I sgemm (x) levels blocking, loop unrolling Class Data Peak Simple Blocked Thanks TA Andrew! // Fall - - Lecture #9 // Fall - - Lecture # Project #, Part I No SSE Num. Submissions sgemm (x) Performance Histogram sgemm (x) levels blocking, loop unrolling Thanks TA Andrew! // Fall - - Lecture # // Fall - - Lecture #9 6
// No SSE No SSE sgemm (x, Peak: 7) levels blocking, loop unrolling sgemm (x) Performance Histogram.9.8 % peak.7.6..... Num. Submissions // Fall - - Lecture #9 7 // Fall - - Lecture #9 8 8 6 SSE sgemm SSE ( thread) A = x A'A.8.7 SSE % of Goto Single-thread x A'A 8 6 Simple Blocked GotoBLAS CS6c % Goto (7. ).6..... // Fall - - Lecture #9 9 // Fall - - Lecture #9 SSE OMP Histogram of Performance (SSE) A = x sgemm OMP (8 threads,a'a) A = x # submissions 8 6 8 6 Naive GotoBLAS CS6c...... 6. 7. 8. 9..... // Fall - - Lecture #9 // Fall - - Lecture #9
// OMP OMP.9 % of Goto OMP (8 threads) x Histogram of performance (OMP) A = x % Goto ( ).8.7.6... # submissions 9 8 7 6.. 6 7 8 9 // Fall - - Lecture #9 // Fall - - Lecture #9 EC and Part (serial) A'A (x) EC and Part (8 threads) A'A (x) EC Part EC Part // Fall - - Lecture #9 // Fall - - Lecture #9 6 Performance Change (serial) A'A (x) Performance Change (8 threads) A'A (x). Speedup (EC/Part).. Speedup (EC/Part)... // Fall - - Lecture #9 7 // Fall - - Lecture #9 8
// Administrivia Final Review: Mon Dec 6, - PM, Evans Final: Mon Dec 8- AM ( Hearst Gym) Like midterm: T/F, M/C, short answers Whole Course: readings, lectures, projects, labs, hw Emphasize nd half of 6C + midterm mistakes // Fall - - Lecture #9 9 // Fall - - Lecture #9 Cloud Compucng Datacenter Economics // Fall - - Lecture #9 // Fall - - Lecture #9 Evolucon of the Disk Drive Arrays of Small Disks Can smaller disks be used to close gap in performance between disks and CPUs? IBM 9K, 986 Convenconal: disk designs.. Low End High End IBM RAMAC, 96 Apple SCSI, 986 // Fall - - Lecture #9 Disk Array: disk design. // Fall - - Lecture #9
// Replace Small Number of Large Disks with Large Number of Small Disks! (988 Disks) IBM 9K IBM." 6 x7 GBytes MBytes GBytes Capacity Volume Power Data Rate I/O Rate MTTF Cost 97 cu. i. KW MB/s 6 I/Os/s KHrs $K. cu. i. W. MB/s I/Os/s KHrs $K cu. i. KW MB/s 9 IOs/s??? Hrs $K Disk Arrays have potencal for large data and I/O rates, high MB per cu. i., high MB per KW, but what about reliability? // Fall - - Lecture #9 9X X 8X 6X RAID: Redundant Arrays of (Inexpensive) Disks Files are "striped" across mulcple disks Redundancy yields high data availability Availability: service scll provided to user, even if some components failed Disks will scll fail Contents reconstructed from data redundantly stored in the array Capacity penalty to store redundant info Bandwidth penalty to update redundant info // Fall - - Lecture #9 6 Redundant Arrays of Inexpensive Disks RAID : Disk Mirroring/Shadowing recovery group Each disk is fully duplicated onto its mirror Very high availability can be achieved Bandwidth sacrifice on write: Logical write = two physical writes Reads may be opcmized Most expensive solucon: % capacity overhead // Fall - - Lecture #9 7 Redundant Array of Inexpensive Disks RAID : Parity Disk... logical record Striped physical records P contains sum of other disks per stripe mod ( parity ) If disk fails, subtract P from sum of other disks to find missing informacon // Fall - - Lecture #9 8 P Redundant Arrays of Inexpensive Disks RAID : High I/O Rate Parity Insides of disks Example: small read D & D, large write D- D D D D D P D D D6 D7 P D8 D9 D D P D D D D D6 D7 D8 D9 D D D D P."."."."."." Disk." Columns."."." //." Fall." - - Lecture." #9."." 9 P P Increasing Logical Disk Address Stripe Inspiracon for RAID works well for small reads Small writes (write to one disk): Opcon : read other data disks, create new sum and write to Parity Disk Opcon : since P has old sum, compare old data to new data, add the difference to P Small writes are limited by Parity Disk: Write to D, D both also write to P disk D D D D P D D D6 D7 P // Fall - - Lecture #9
// RAID : High I/O Rate Interleaved Parity Problems of Disk Arrays: Small Writes RAID- : Small Write Algorithm D Independent writes possible because of interleaved parity D D D D D D6 D7 D8 D9 D D Increasing Logical Disk Addresses Logical Write = Physical Reads + Physical Writes D' D D D D old data (Read) new data old (Read) parity + XOR D D D D D6 D7 D8 D9 D D D D + XOR Example: write to D, D uses disks,,, // (Write) Disk Columns - - Lecture #9 Fall D' // Tech Report Read Round the World (Write) D D D Fall - - Lecture #9 P' RAID- I (December 987) - I (989) Consisted of a Sun /8 workstacon with 8 MB of DRAM, four dual- string SCSI controllers, 8.- inch SCSI disks and specialized disk striping soiware // Fall - - Lecture #9 // Fall - - Lecture #9 RAID II RAID II 99-99 Early Network AGached Storage (NAS) System running a Log Structured File System (LFS) Impact: $ Billion/year in Over $ Billion in RAID device sold since 99- + RAID companies (at the peak) Soiware RAID a standard component of modern OSs // Fall - - Lecture #9 // Fall - - Lecture #9 6 6
// Summary Logical- to- physical block mapping, parity striping, read- modify- write processing Embedded caches and orchestracng data staging between network interfaces, parity hardware, and file server interfaces Failed disk replacement, hot spares, background copies and backup Embedded log- structured file systems, compression on the fly Soiware complexity dominates hardware! // Fall - - Lecture #9 7 7