CS61C : Machine Structures

Similar documents
CS61C : Machine Structures

CS61C : Machine Structures

CS61C : Machine Structures

UCB CS61C : Machine Structures

www-inst.eecs.berkeley.edu/~cs61c/

Review. Lecture #25 Input / Output, Networks II, Disks ABCs of Networks: 2 Computers

Recap of Networking Intro. Lecture #28 Networking & Disks Protocol Family Concept. Protocol for Networks of Networks?

UCB CS61C : Machine Structures

UCB CS61C : Machine Structures

CS61C - Machine Structures. Week 7 - Disks. October 10, 2003 John Wawrzynek

CS2410: Computer Architecture. Storage systems. Sangyeun Cho. Computer Science Department University of Pittsburgh

Computer Science 146. Computer Architecture

inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 26 IO Basics, Storage & Performance I ! HACKING THE SMART GRID!

Goals for today:! Recall : 5 components of any Computer! ! What do we need to make I/O work?!

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

CS61C - Machine Structures. Week 6 - Performance. Oct 3, 2003 John Wawrzynek.

Agenda. Project #3, Part I. Agenda. Project #3, Part I. No SSE 12/1/10

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

CS61C : Machine Structures

Lecture 23: I/O Redundant Arrays of Inexpensive Disks Professor Randy H. Katz Computer Science 252 Spring 1996

Today: Coda, xfs! Brief overview of other file systems. Distributed File System Requirements!

UC Berkeley CS61C : Machine Structures

Lecture 25: Dependability and RAID

Outline. Lecture 40 Hardware Parallel Computing Thanks to John Lazarro for his CS152 slides inst.eecs.berkeley.

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

Topics. Lecture 8: Magnetic Disks

CS61C Performance. Lecture 23. April 21, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson)

Last time. Lecture #29 Performance & Parallel Intro

High-Performance Storage Systems

CS61C : Machine Structures

Overview of Today s Lecture: Cost & Price, Performance { 1+ Administrative Matters Finish Lecture1 Cost and Price Add/Drop - See me after class

CS61C : Machine Structures

Advanced Computer Architecture

I/O CANNOT BE IGNORED

Chapter 6. Storage and Other I/O Topics

CS 61C: Great Ideas in Computer Architecture. Input/Output

UC Berkeley CS61C : Machine Structures

Agenda. Agenda 11/12/12. Review - 6 Great Ideas in Computer Architecture

Computer Science 61C Spring Friedland and Weaver. Input/Output

Lecture 2: Computer Performance. Assist.Prof.Dr. Gürhan Küçük Advanced Computer Architectures CSE 533

CS61C : Machine Structures

UCB CS61C : Machine Structures

Motivation for Input/Output

Review. Motivation for Input/Output. What do we need to make I/O work?

Block Size Tradeoff (1/3) Benefits of Larger Block Size. Lecture #22 Caches II Block Size Tradeoff (3/3) Block Size Tradeoff (2/3)

I/O CANNOT BE IGNORED

Administrivia. CMSC 411 Computer Systems Architecture Lecture 19 Storage Systems, cont. Disks (cont.) Disks - review

UCB CS61C : Machine Structures

Page 1. Magnetic Disk Purpose Long term, nonvolatile storage Lowest level in the memory hierarchy. Typical Disk Access Time

Database Management Systems, 2nd edition, Raghu Ramakrishnan, Johannes Gehrke, McGraw-Hill

UC Berkeley CS61C : Machine Structures

CS61C : Machine Structures

Lecture 33 Caches III What to do on a write hit? Block Size Tradeoff (1/3) Benefits of Larger Block Size

Storage Systems. Storage Systems

CSE 451: Operating Systems Spring Module 12 Secondary Storage. Steve Gribble

Storage. Hwansoo Han

CSE 451: Operating Systems Spring Module 12 Secondary Storage

COS 318: Operating Systems. Storage Devices. Vivek Pai Computer Science Department Princeton University

UC Berkeley CS61C : Machine Structures

CSCI-GA Database Systems Lecture 8: Physical Schema: Storage

Page 1. Motivation: Who Cares About I/O? Lecture 5: I/O Introduction: Storage Devices & RAID. I/O Systems. Big Picture: Who cares about CPUs?

EECS2021. EECS2021 Computer Organization. EECS2021 Computer Organization. Morgan Kaufmann Publishers September 14, 2016

UCB CS61C : Machine Structures

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

ECE C61 Computer Architecture Lecture 2 performance. Prof. Alok N. Choudhary.

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Storage System COSC UCB

CSE325 Principles of Operating Systems. Mass-Storage Systems. David P. Duggan. April 19, 2011

RAID (Redundant Array of Inexpensive Disks)

Most important topics today. I/O bus (computer bus) Polling vs interrupts Switch vs shared network Network protocol layers Various types of RAID

IEEE Standard 754 for Binary Floating-Point Arithmetic.

Lecture #28 I/O, networks, disks

Chapter 1. Computer Abstractions and Technology. Lesson 2: Understanding Performance

COMP283-Lecture 3 Applied Database Management

Review. EECS 252 Graduate Computer Architecture. Lec 18 Storage. Introduction to Queueing Theory. Deriving Little s Law

Performance evaluation. Performance evaluation. CS/COE0447: Computer Organization. It s an everyday process

Disks and RAID. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, E. Sirer, R. Van Renesse]

Concepts Introduced. I/O Cannot Be Ignored. Typical Collection of I/O Devices. I/O Issues

I/O, Disks, and RAID Yi Shi Fall Xi an Jiaotong University

Lecture 3: Evaluating Computer Architectures. How to design something:

CS61C : Machine Structures

CS61C : Machine Structures

www-inst.eecs.berkeley.edu/~cs61c/

Appendix D: Storage Systems

NOW Handout Page 1 S258 S99 1

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Lecture 29. Friday, March 23 CS 470 Operating Systems - Lecture 29 1

CS61C : Machine Structures

Computer Architecture Computer Science & Engineering. Chapter 6. Storage and Other I/O Topics BK TP.HCM

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

UC Berkeley CS61C : Machine Structures

Outline. EEL-4713 Computer Architecture I/O Systems. I/O System Design Issues. The Big Picture: Where are We Now? I/O Performance Measures

CSE 120. Operating Systems. March 27, 2014 Lecture 17. Mass Storage. Instructor: Neil Rhodes. Wednesday, March 26, 14

CSE 153 Design of Operating Systems Fall 2018

Computer System Architecture

The Computer Revolution. Classes of Computers. Chapter 1

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

Computer Architecture 计算机体系结构. Lecture 6. Data Storage and I/O 第六讲 数据存储和输入输出. Chao Li, PhD. 李超博士

CENG3420 Lecture 08: Memory Organization

Transcription:

inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures CS61C L26 RAID & Performance (1) Lecture #26 RAID & Performance CPS today! 2005-12-05 There is one handout today at the front and back of the room! Lecturer PSOE, new dad Dan Garcia www.cs.berkeley.edu/~ddgarcia Samsung pleads guilty!! They were convicted of price-fixing DRAM from 1999-04 to 2002-06 through emails, etc & ordered to pay $0.3 Billion (2 nd largest fine in criminal antitrust case). www.cnn.com/2005/tech/biztech/12/01/samsung.price.fixing.ap

Review Protocol suites allow heterogeneous networking Another form of principle of abstraction Protocols! operation in presence of failures Standardization key for LAN, WAN Magnetic Disks continue rapid advance: 60%/yr capacity, 40%/yr bandwidth, slow on seek, rotation improvements, MB/$ improving 100%/yr? Designs to fit high volume form factor CS61C L26 RAID & Performance (2)

State of the Art: Two camps (2005) Performance Enterprise apps, servers E.g., Seagate Cheetah 15K.4 Serial-Attached SCSI, Ultra320 SCSI, 2Gbit Fibre Channel interface 146 GB, 3.5-inch disk 15,000 RPM 4 discs, 8 heads 13 watts (idle) 3.5 ms avg. seek 200 MB/s transfer rate 1.4 Million hrs MTBF 5 year warrantee $1000 = $6.8 / GB Capacity Mainstream, home uses E.g., Seagate Barracuda 7200.9 Serial ATA 3Gb/s, Ultra ATA/100 source: www.seagate.com 500 GB, 3.5-inch disk 7,200 RPM? discs,? heads 7 watts (idle) 8.5 ms avg. seek 300 MB/s transfer rate? Million hrs MTBF 5 year warrantee $330 = $0.66 / GB CS61C L26 RAID & Performance (3)

1 inch disk drive! 2005 Hitachi Microdrive: 40 x 30 x 5 mm, 13g 8 GB, 3600 RPM, 1 disk, 10 MB/s, 12 ms seek 400G operational shock, 2000G non-operational Can detect a fall in 4 and retract heads to safety For ipods, cameras, phones 2006 MicroDrive? 16 GB, 12 MB/s! Assuming past trends continue CS61C L26 RAID & Performance (4) www.hitachigst.com

Where does Flash memory come in? Microdrives and Flash memory (e.g., CompactFlash) are going head-to-head Both non-volatile (no power, data ok) Flash benefits: durable & lower power (no moving parts) Flash limitations: finite number of write cycles (wear on the insulating oxide layer around the charge storage mechanism) - OEMs work around by spreading writes out How does Flash memory work? NMOS transistor with an additional conductor between gate and source/drain which traps electrons. The presence/absence is a 1 or 0. wikipedia.org/wiki/flash_memory CS61C L26 RAID & Performance (5)

What does Apple put in its ipods? Thanks to Andy Dahl for the tip shuffle nano mini ipod Toshiba 0.5,1GB flash Samsung 2,4GB flash Hitachi 1 inch 4,6GB MicroDrive Toshiba 1.8-inch 30,60GB (MK1504GAL) CS61C L26 RAID & Performance (6)

Use Arrays of Small Disks Katz and Patterson asked in 1987: Can smaller disks be used to close gap in performance between disks and CPUs? Conventional: 4 disk designs 3.5 5.25 Low End 10 14 High End Disk Array: 1 disk design 3.5 CS61C L26 RAID & Performance (7)

Replace Small Number of Large Disks with Large Number of Small Disks! (1988 Disks) Capacity Volume Power Data Rate I/O Rate MTTF Cost IBM 3390K 20 GBytes 97 cu. ft. 3 KW 15 MB/s 600 I/Os/s 250 KHrs $250K IBM 3.5" 0061 320 MBytes 0.1 cu. ft. 11 W 1.5 MB/s 55 I/Os/s 50 KHrs $2K x70 23 GBytes 11 cu. ft. 1 KW 120 MB/s 3900 IOs/s??? Hrs $150K Disk Arrays potentially high performance, high MB per cu. ft., high MB per KW, but what about reliability? 9X 3X 8X 6X CS61C L26 RAID & Performance (8)

Array Reliability Reliability - whether or not a component has failed measured as Mean Time To Failure (MTTF) Reliability of N disks = Reliability of 1 Disk N (assuming failures independent) 50,000 Hours 70 disks = 700 hour Disk system MTTF: Drops from 6 years to 1 month! Disk arrays too unreliable to be useful! CS61C L26 RAID & Performance (9)

Review Magnetic disks continue rapid advance: 2x/yr capacity, 2x/2-yr bandwidth, slow on seek, rotation improvements, MB/$ 2x/yr! Designs to fit high volume form factor RAID Motivation: In the 1980s, there were 2 classes of drives: expensive, big for enterprises and small for PCs. They thought make one big out of many small! Higher performance with more disk arms per $ Adds option for small # of extra disks (the R ) Started @ Cal by CS Profs Katz & Patterson CS61C L26 RAID & Performance (10)

Redundant Arrays of (Inexpensive) Disks Files are striped across multiple disks Redundancy yields high data availability Availability: service still provided to user, even if some components failed Disks will still fail Contents reconstructed from data redundantly stored in the array! Capacity penalty to store redundant info! Bandwidth penalty to update redundant info CS61C L26 RAID & Performance (11)

Berkeley History, RAID-I RAID-I (1989) Consisted of a Sun 4/280 workstation with 128 MB of DRAM, four dual-string SCSI controllers, 28 5.25- inch SCSI disks and specialized disk striping software Today RAID is > $32 billion dollar industry, 80% nonpc disks sold in RAIDs CS61C L26 RAID & Performance (12)

RAID 0 : No redundancy = AID Assume have 4 disks of data for this example, organized in blocks Large accesses faster since transfer from several disks at once This and next 5 slides from RAID.edu, http://www.acnc.com/04_01_00.html CS61C L26 RAID & Performance (13)

RAID 1: Mirror data!each disk is fully duplicated onto its mirror Very high availability can be achieved Bandwidth reduced on write: 1 Logical write = 2 physical writes Most expensive solution: 100% capacity overhead CS61C L26 RAID & Performance (14)

RAID 3: Parity (RAID 2 has bit-level striping) Parity computed across group to protect against hard disk failures, stored in P disk Logically, a single high capacity, high transfer rate disk 25% capacity cost for parity in this example vs. 100% for RAID 1 (5 disks vs. 8 disks) CS61C L26 RAID & Performance (15)

RAID 4: parity plus small sized accesses RAID 3 relies on parity disk to discover errors on Read But every sector has an error detection field Rely on error detection field to catch errors on read, not on the parity disk Allows small independent reads to different disks simultaneously CS61C L26 RAID & Performance (16)

Inspiration for RAID 5 Small writes (write to one disk): Option 1: read other data disks, create new sum and write to Parity Disk (access all disks) Option 2: since P has old sum, compare old data to new data, add the difference to P: 1 logical write = 2 physical reads + 2 physical writes to 2 disks Parity Disk is bottleneck for Small writes: Write to A0, B1 => both write to P disk A0 B0 C0 D0 P A1 B1 C1 D1 P CS61C L26 RAID & Performance (17)

RAID 5: Rotated Parity, faster small writes Independent writes possible because of interleaved parity Example: write to A0, B1 uses disks 0, 1, 4, 5, so can proceed in parallel Still 1 small write = 4 physical disk accesses CS61C L26 RAID & Performance (18)

RAID products: Software, Chips, Systems CS61C L26 RAID & Performance (19) RAID was $32 B industry in 2002, 80% nonpc disks sold in RAIDs

Margin of Safety in CS&E? Patterson reflects Operator removing good disk vs. bad disk Temperature, vibration causing failure before repair In retrospect, suggested RAID 5 for what we anticipated, but should have suggested RAID 6 (double failure OK) for unanticipated/safety margin CS61C L26 RAID & Performance (20)

Peer Instruction 1. RAID 1 (mirror) and 5 (rotated parity) help with performance and availability 2. RAID 1 has higher cost than RAID 5 3. Small writes on RAID 5 are slower than on RAID 1 CS61C L26 RAID & Performance (21) ABC 1: FFF 2: FFT 3: FTF 4: FTT 5: TFF 6: TFT 7: TTF 8: TTT

Peer Instruction Answer 1. All RAID (0-5) helps with performance, only RAID 0 doesn"t help availability. TRUE 2. Surely! Must buy 2x disks rather than 1.25x (from diagram, in practice even less) TRUE 3. RAID5 (2R,2W) vs. RAID1 (2W). Latency worse, throughput ( writes) better. TRUE 1. RAID 1 (mirror) and 5 (rotated parity) help with performance and availability 2. RAID 1 has higher cost than RAID 5 3. Small writes on RAID 5 are slower than on RAID 1 ABC 1: FFF 2: FFT 3: FTF 4: FTT 5: TFF 6: TFT 7: TTF 8: TTT CS61C L26 RAID & Performance (22)

Administrivia Please attend Wednesday"s lecture! HKN Evaluations at the end Compete in the Performance contest! Deadline is Mon, 2005-12-12 @ 11:59pm, 1 week from now Sp04 Final exam + solutions online! Final Review: 2005-12-11 @ 2pm in 10 Evans Final: 2005-12-17 @ 12:30pm in 2050 VLSB Only bring pen{,cil}s, two 8.5 x11 handwritten CS61C L26 RAID & Performance (23) sheets + green. Leave backpacks, books,

Upcoming Calendar Week # #15 Last Week o" Classes #16 Sun 2pm Review 10 Evans Mon Performance Performance competition due tonight @ midnight Wed LAST CLASS Summary, Review, & HKN Evals Thu Lab I/O Networking & 61C Feedback Survey Sat FINAL EXAM SAT 12-17 @ 12:30pm- 3:30pm 2050 VLSB Performance awards CS61C L26 RAID & Performance (24)

Performance Purchasing Perspective: given a collection of machines (or upgrade options), which has the - best performance? - least cost? - best performance / cost? Computer Designer Perspective: faced with design options, which has the - best performance improvement? - least cost? - best performance / cost? All require basis for comparison and metric for evaluation Solid metrics lead to solid progress! CS61C L26 RAID & Performance (25)

Two Notions of Performance Plane Boeing 747 BAD/Sud Concorde DC to Paris 6.5 hours 3 hours Top Speed 610 mph 1350 mph Passengers Throughput (pmph) 470 286,700 132 178,200 Which has higher performance? Time to deliver 1 passenger? Time to deliver 400 passengers? In a computer, time for 1 job called Response Time or Execution Time In a computer, jobs per day called Throughput or Bandwidth CS61C L26 RAID & Performance (26)

Definitions Performance is in units of things per sec bigger is better If we are primarily concerned with response time performance(x) = 1 execution_time(x) " F(ast) is n times faster than S(low) " means performance(f) n = = performance(s) execution_time(s) execution_time(f) CS61C L26 RAID & Performance (27)

Example of Response Time v. Throughput Time of Concorde vs. Boeing 747? Concord is 6.5 hours / 3 hours = 2.2 times faster Throughput of Boeing vs. Concorde? Boeing 747: 286,700 pmph / 178,200 pmph = 1.6 times faster Boeing is 1.6 times ( 60% ) faster in terms of throughput Concord is 2.2 times ( 120% ) faster in terms of flying time (response time) We will focus primarily on execution time for a single job CS61C L26 RAID & Performance (28)

Confusing Wording on Performance Will (try to) stick to n times faster ; its less confusing than m % faster As faster means both increased performance and decreased execution time, to reduce confusion we will (and you should) use improve performance or improve execution time CS61C L26 RAID & Performance (29)

What is Time? Straightforward definition of time: Total time to complete a task, including disk accesses, memory accesses, I/O activities, operating system overhead,... real time, response time or elapsed time Alternative: just time processor (CPU) is working only on your program (since multiple processes running at same time) CPU execution time or CPU time Often divided into system CPU time (in OS) and user CPU time (in user program) CS61C L26 RAID & Performance (30)

How to Measure Time? User Time! seconds CPU Time: Computers constructed using a clock that runs at a constant rate and determines when events take place in the hardware These discrete time intervals called clock cycles (or informally clocks or cycles) Length of clock period: clock cycle time (e.g., 2 nanoseconds or 2 ns) and clock rate (e.g., 500 megahertz, or 500 MHz), which is the inverse of the clock period; use these! CS61C L26 RAID & Performance (31)

Measuring Time using Clock Cycles (1/2) CPU execution time for a program or = Clock Cycles for a program x Clock Cycle Time = Clock Cycles for a program Clock Rate CS61C L26 RAID & Performance (32)

Measuring Time using Clock Cycles (2/2) One way to define clock cycles: Clock Cycles for program = Instructions for a program (called Instruction Count ) x Average Clock cycles Per Instruction (abbreviated CPI ) CPI one way to compare two machines with same instruction set, since Instruction Count would be the same CS61C L26 RAID & Performance (33)

Performance Calculation (1/2) CPU execution time for program = Clock Cycles for program x Clock Cycle Time Substituting for clock cycles: CPU execution time for program = (Instruction Count x CPI) x Clock Cycle Time = Instruction Count x CPI x Clock Cycle Time CS61C L26 RAID & Performance (34)

Performance Calculation (2/2) CPU time = Instructions x Cycles x Seconds Program Instruction Cycle CPU time = Instructions x Cycles x Seconds Program Instruction Cycle CPU time = Instructions x Cycles x Seconds Program Instruction Cycle CPU time = Seconds Program Product of all 3 terms: if missing a term, can"t predict time, the real measure of performance CS61C L26 RAID & Performance (35)

How Calculate the 3 Components? Clock Cycle Time: in specification of computer (Clock Rate in advertisements) Instruction Count: Count instructions in loop of small program Use simulator to count instructions Hardware counter in spec. register - (Pentium II,III,4) CPI: Calculate: Execution Time / Clock cycle time Instruction Count Hardware counter in special register (PII,III,4) CS61C L26 RAID & Performance (36)

Calculating CPI Another Way First calculate CPI for each individual instruction (add, sub, and, etc.) Next calculate frequency of each individual instruction Finally multiply these two for each instruction and add them up to get final CPI (the weighted sum) CS61C L26 RAID & Performance (37)

Example (RISC processor) Op Freq i CPI i Prod (% Time) ALU 50% 1.5 (23%) Load 20% 5 1.0 (45%) Store 10% 3.3 (14%) Branch 20% 2.4 (18%) Instruction Mix 2.2 (Where time spent) What if Branch instructions twice as fast? CS61C L26 RAID & Performance (38)

RAID And in conclusion Motivation: In the 1980s, there were 2 classes of drives: expensive, big for enterprises and small for PCs. They thought make one big out of many small! Higher performance with more disk arms/$, adds option for small # of extra disks (the R) Started @ Cal by CS Profs Katz & Patterson Latency v. Throughput Performance doesn"t depend on any single factor: need Instruction Count, Clocks Per Instruction (CPI) and Clock Rate to get valid estimations User Time: time user waits for program to execute: depends heavily on how OS switches between tasks CPU Time: time spent executing a single program: depends solely on processor design (datapath, pipelining effectiveness, caches, etc.) CPU time = Instructions x Cycles x Seconds Program Instruction Cycle CS61C L26 RAID & Performance (39)