Lecture 13. Storage, Network and Other Peripherals

Similar documents
Chapter 8. A Typical collection of I/O devices. Interrupts. Processor. Cache. Memory I/O bus. I/O controller I/O I/O. Main memory.

Chapter 6. I/O issues

CS152 Computer Architecture and Engineering Lecture 20: Busses and OS s Responsibilities. Recap: IO Benchmarks and I/O Devices

Computer Architecture. Hebrew University Spring Chapter 8 Input/Output. Big Picture: Where are We Now?

EE108B Lecture 17 I/O Buses and Interfacing to CPU. Christos Kozyrakis Stanford University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Introduction to Input and Output

CPS104 Computer Organization and Programming Lecture 18: Input-Output. Outline of Today s Lecture. The Big Picture: Where are We Now?

Virtual Memory Input/Output. Admin

INPUT/OUTPUT DEVICES Dr. Bill Yi Santa Clara University

Computer Architecture Computer Science & Engineering. Chapter 6. Storage and Other I/O Topics BK TP.HCM

Buses. Disks PCI RDRAM RDRAM LAN. Some slides adapted from lecture by David Culler. Pentium 4 Processor. Memory Controller Hub.

Storage. Hwansoo Han

Input/Output Introduction

Introduction. Motivation Performance metrics Processor interface issues Buses

Chapter 6. Storage and Other I/O Topics. ICE3003: Computer Architecture Fall 2012 Euiseong Seo

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Chapter 6. Storage and Other I/O Topics

Chapter 6 Storage and Other I/O Topics

ECE232: Hardware Organization and Design

Thomas Polzer Institut für Technische Informatik

I/O CANNOT BE IGNORED

Chapter 6. Storage and Other I/O Topics

Chapter 6. Storage and Other I/O Topics

Where We Are in This Course Right Now. ECE 152 Introduction to Computer Architecture Input/Output (I/O) Copyright 2012 Daniel J. Sorin Duke University

Administrivia. CMSC 411 Computer Systems Architecture Lecture 19 Storage Systems, cont. Disks (cont.) Disks - review

Lecture 23: Storage Systems. Topics: disk access, bus design, evaluation metrics, RAID (Sections )

Computer Systems Laboratory Sungkyunkwan University

Computer Science 146. Computer Architecture

ECE331: Hardware Organization and Design

Introduction I/O 1. I/O devices can be characterized by Behavior: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec

Storage Systems. Storage Systems

IO System. CP-226: Computer Architecture. Lecture 25 (24 April 2013) CADSL

Chapter 6. Storage and Other I/O Topics. Jiang Jiang

ECE468 Computer Organization and Architecture. OS s Responsibilities

INPUT/OUTPUT ORGANIZATION

Chapter Seven Morgan Kaufmann Publishers

Storage systems. Computer Systems Architecture CMSC 411 Unit 6 Storage Systems. (Hard) Disks. Disk and Tape Technologies. Disks (cont.

Chapter 6. Storage and Other I/O Topics. ICE3003: Computer Architecture Spring 2014 Euiseong Seo

INPUT/OUTPUT ORGANIZATION

Quiz for Chapter 6 Storage and Other I/O Topics 3.10

INPUT/OUTPUT ORGANIZATION

Systems Architecture II

CS152 Computer Architecture and Engineering Lecture 19: I/O Systems

I/O CANNOT BE IGNORED

Computer Architecture CS 355 Busses & I/O System

COSC121: Computer Systems. Exceptions and Interrupts

Readings. Storage Hierarchy III: I/O System. I/O (Disk) Performance. I/O Device Characteristics. often boring, but still quite important

Chapter 6. Storage and Other I/O Topics

Reading and References. Input / Output. Why Input and Output? A typical organization. CSE 410, Spring 2004 Computer Systems

Lectures More I/O

8. Interfacing Processors and Peripherals

Lecture 23. Finish-up buses Storage

Module 6: INPUT - OUTPUT (I/O)

Concepts Introduced. I/O Cannot Be Ignored. Typical Collection of I/O Devices. I/O Issues

Appendix D: Storage Systems

Chapter 6. Storage & Other I/O

CSE 120. Overview. July 27, Day 8 Input/Output. Instructor: Neil Rhodes. Hardware. Hardware. Hardware

UC Santa Barbara. Operating Systems. Christopher Kruegel Department of Computer Science UC Santa Barbara

Recap: Making address translation practical: TLB. CS152 Computer Architecture and Engineering Lecture 24. Busses (continued) Queueing Theory Disk IO

Storage System COSC UCB

Chapter 3. Top Level View of Computer Function and Interconnection. Yonsei University

Input/Output. Today. Next. Principles of I/O hardware & software I/O software layers Disks. Protection & Security

CSE 380 Computer Operating Systems

Interconnecting Components

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Page 1. Magnetic Disk Purpose Long term, nonvolatile storage Lowest level in the memory hierarchy. Typical Disk Access Time

Computer Architecture I/O Systems cs 152 L19.io.1

CSE 120. Operating Systems. March 27, 2014 Lecture 17. Mass Storage. Instructor: Neil Rhodes. Wednesday, March 26, 14

Example Networks on chip Freescale: MPC Telematics chip

Lecture 23: I/O Redundant Arrays of Inexpensive Disks Professor Randy H. Katz Computer Science 252 Spring 1996

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Lecture 25: Busses. A Typical Computer Organization

Where We Are in This Course Right Now. ECE 152 Introduction to Computer Architecture Input/Output (I/O) Copyright 2011 Daniel J. Sorin Duke University

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

I/O. Fall Tore Larsen. Including slides from Pål Halvorsen, Tore Larsen, Kai Li, and Andrew S. Tanenbaum)

I/O. Fall Tore Larsen. Including slides from Pål Halvorsen, Tore Larsen, Kai Li, and Andrew S. Tanenbaum)

Recap: What is virtual memory? CS152 Computer Architecture and Engineering Lecture 22. Virtual Memory (continued) Buses

Unit 3 and Unit 4: Chapter 4 INPUT/OUTPUT ORGANIZATION

ECE 250 / CPS 250 Computer Architecture. Input/Output

Bus System. Bus Lines. Bus Systems. Chapter 8. Common connection between the CPU, the memory, and the peripheral devices.

Introduction to I/O. April 30, Howard Huang 1

Announcement. Computer Architecture (CSC-3501) Lecture 23 (17 April 2008) Chapter 7 Objectives. 7.1 Introduction. 7.2 I/O and Performance

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

Computer System Architecture

PCI and PCI Express Bus Architecture

CSE325 Principles of Operating Systems. Mass-Storage Systems. David P. Duggan. April 19, 2011

CMSC 611: Advanced Computer Architecture

I/O Systems. EECC551 - Shaaban. Processor. Cache. Memory - I/O Bus. Main Memory. I/O Controller. I/O Controller. I/O Controller. Graphics.

CMSC 313 Lecture 26 DigSim Assignment 3 Cache Memory Virtual Memory + Cache Memory I/O Architecture

SEMICON Solutions. Bus Structure. Created by: Duong Dang Date: 20 th Oct,2010

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

CMSC 424 Database design Lecture 12 Storage. Mihai Pop

This page intentionally left blank

Input/Output. Today. Next. Principles of I/O hardware & software I/O software layers Secondary storage. File systems

High-Performance Storage Systems

Input/Output Problems. External Devices. Input/Output Module. I/O Steps. I/O Module Function Computer Architecture

CS 134. Operating Systems. April 8, 2013 Lecture 20. Input/Output. Instructor: Neil Rhodes. Monday, April 7, 14

Operating System: Chap13 I/O Systems. National Tsing-Hua University 2016, Fall Semester

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 09, SPRING 2013

Transcription:

Lecture 13 Storage, Network and Other Peripherals 1

I/O Systems Processor interrupts Cache Processor & I/O Communication Memory - I/O Bus Main Memory I/O Controller I/O Controller I/O Controller Disk Disk Graphics Network 2

I/O System Evaluation Metric Dependability Expandability Cost Performance I/O Device Characteristics Behavior» Input, output or storage Partner Data rate I/O Device Behavior Partner Data Rate (KB/sec) Keyboard Input Human 0.01 Mouse Input Human 0.02 Floppy disk Storage Machine 50.00 Laser Printer Output Human 100.00 Optical Disk Storage Machine 500.00 Magnetic Disk Storage Machine 5,000.00 Network-LAN Input or Output Machine 20 1,000.00 Graphics Display Output Human 30,000.00 3

I/O System Performance I/O System performance depends on many aspects of the system ( limited by weakest link in the chain ): The CPU The memory system:» Internal and external caches» Main Memory The underlying interconnection (buses) The I/O controller The I/O device The speed of the I/O software (Operating System) The efficiency of the software s use of the I/O devices Two common performance metrics: Throughput:» How much data can we move through the system in a certain time? (data rate)» How many I/O operations can we do per unit of time? (I/O rate) Response time: Latency 4

Organization of a Hard Magnetic Disk Platters Track Purpose: Sector Long-term, nonvolatile storage Large, inexpensive, slow level in the storage hierarchy 5

Devices: Magnetic Disks Read/Write data is three-stage process: Seek Time (~20 ms avg, 1M cyc at 50MHz)» move arm over track rotational latency -» wait for the sector to rotate under head» Average = (0.5)/3600RPM = 8.3ms Transfer rate» About a sector per ms (1-10 MB/s) Controller time The overhead of controller imposes in performing an I/O access Head Track Sector Cylinder Platter Disk Latency = Seek Time + Rotation Time + Xfer Time + Controller time 6

Disk Parameters: 512-byte sector Disk Time Example Advertised average seek time is 6 ms Transfer rate is 50 MB/sec Disk spins at 10,000 RPM. Controller overhead is 0.2ms Assume that the disk is idle so there is no waiting time What is the average time to read/write a sector? Ave seek time + ave rot time + xfer time + control overhead 6 + 0.5 / 10,000RPM + 0.5KB / (50 MB/sec) + 0.2 = 9.2 7

Reliability, Availability and Dependability Dependability Computer System dependability is the quality of delivered service such that reliance can justifiably be placed on this service. The service delivered by a system is its observed actual behavior as perceived by other systems interacting with the system s users. Each module also has an ideal specified behavior, where a service is an agreed description of the expected behavior. A system failure occurs when the actual behavior deviates from the specified behavior. Two System State Service accomplishment Service interruption Failures Service accom. Service Inter. Restorations 8

Reliability, Availability and Dependability (cont.) Reliability A measure of the continuous service accomplished from a reference initial instant. MTTF - Mean Time to Failure Availability A measure of the service accomplished with respect to the alternation between the two states of accomplishment & interruption MTTR - Mean Time to Repair MTBF - Mean Time Between Failure» MTTF + MTTR Availability = MTTF (MTTF + MTTR) System not available System Available MTTR MTTF Failure occurs Failure repair Failure occurs 9

Fault avoidance How to Improve MTTF Prevent fault occurrence by construction Fault tolerance Using redundancy to allow the service to comply with the service spec despite of failures Fault forecasting Predict the presence and creation of faults 10

Improving Availability with Redundancy RAID Redundant arrays of inexpensive disks Redundancy offers 2 advantage: Data not lost: reconstruct data onto new disk Continuous operation in the presence of failures Disk Array vs. a large disk performance reliability availability (with redundancy) 3.5 11

Raid 0 : Striping & Non-Redundancy Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 Block 8 Block 9 Block 10 Block 11 Block 12 Block 13 Block 14 Block 15 Block 16 Block 17 Block 18 Block 19 Block 20 Block 21 Block 22 Block 23 Block 24 Block 25 Disk 0 Disk 1 Disk 2 Disk 3 Disk 4 12

RAID 1: Disk Mirroring/Shadowing Block 1 Block 2 Block 3 Block 4 Block 5 Block 1 Block 2 Block 3 Block 4 Block 5 Disk 0 Disk 1 Each disk is fully duplicated onto its "shadow Very high availability can be achieved Most expensive solution: 100% capacity overhead 13

RAID 2: Error Detecting and Correcting Code (Unused) Bit 1 Bit 2 ECC 1-32 ECC 1-32 Bit 33 Bit 65 Bit 34 Bit 66... ECC 33-64 ECC 65-96 ECC 33-64 ECC 65-96 Bit 97 Bit 98 ECC 97-128 ECC 97-128 Bit 129 Bit 130 ECC 129-160 ECC 129-160 Disk 0 Disk 1 ECC Disk 6 ECC Disk 7 14

Raid 3: Bit-Interleaved Parity Bit 1 Bit 2 Bit 3 Parity 1-32 Bit 33 Bit 65 Bit 34 Bit 66 Bit 35 Bit 67... Parity 33-64 Parity 65-96 Bit 97 Bit 98 Bit 99 Parity 97-128 Bit 129 Bit 130 Bit 131 Parity 129-160 Disk 0 Disk 1 Disk 1 Parity Disk Parity = sum mod 2 15

16 RAID 3: Bit-Interleaved Parity (cont.) Parity computed across recovery group to protect against hard disk failures 33% capacity cost for parity in this configuration P 1 0 0 1 0 0 1 1 1 1 0 0 1 1 0 1 1 0 0 1 0 0 1 1 1 1 0 0 1 1 0 1 = sum mod 2 Protection group

Raid 4: Block-Interleaved Parity Block 1 Block 2 Block 3 Block 4 Parity 1-4 Block 5 Block 6 Block 7 Block 8 Parity 5-8 Block 9 Block 10 Block 11 Block 12 Parity 9-12 Block 13 Block 14 Block 15 Block 16 Parity 13-16 Block 17 Block 18 Block 19 Block 20 Parity 17-20 Disk 0 Disk 1 Disk 2 Disk 3 Parity Disk Small read does not need to access all disks Allow independent accesses to occur in parallel Do writes need to access all disks? 17

Raid 4: Block-Interleaved Parity (cont.) D0 D0 D1 D2 D3 P + XOR D0 D1 D2 D3 P D0 D0 D1 D2 D3 P + + D0 D1 D2 D3 P 18

RAID 4&5 RAID 4 RAID 5 D0 D1 D2 D3 P D0 D1 D2 D3 P D4 D5 D6 D7 P D4 D5 D6 P D7 D8 D9 D10 D11 P D8 D9 P D10 D11 D12 D13 D14 D15 P D12 P D13 D14 D15 D16 D17 D18 D19 P P D16 D17 D18 D19 D20 D21 D22 D23 P D20 D21 D22 D23 P Block-interleaved Parity only one write per group can occur at one time Block-interleaved Distributed Parity: allow multiple writes to occur simultaneously as long as the stripe units are not located in the same disks 19

Connection between Processors, Memory, and I/O Device: Buses Processor I/O Device I/O Device I/O Device Memory Advantage Versatility:» New devices can be added easily» Peripherals can be moved between computer systems that use the same bus standard Low Cost:» A single set of wires is shared in multiple ways 20

Buses: Disadvantage Processor I/O Device I/O Device I/O Device Memory It creates a communication bottleneck The bandwidth of that bus can limit the maximum I/O throughput The maximum bus speed is largely limited by: The length of the bus The number of devices on the bus The need to support a range of devices with:» Widely varying latencies» Widely varying data transfer rates 21

The General Organization of a Bus Control Lines Data Lines Control lines: Signal requests and acknowledgments Indicate what type of information is on the data lines Data lines carry information between the source and the destination: Data and Addresses Complex commands 22

Master versus Slave Bus Master Master issues command Data can go either way Bus Slave A bus transaction includes two parts: Issuing the command (and address) request Transferring the data action Master is the one who starts the bus transaction by: issuing the command (and address) Slave is the one who responds to the address by: Sending data to the master if the master ask for data Receiving data from the master if the master wants to send data 23

Processor-Memory Bus (design specific) Short and high speed Only need to match the memory system» Maximize memory-to-processor bandwidth Connects directly to the processor Optimized for cache block transfers I/O Bus (industry standard) Types of Buses Usually is lengthy and slower Need to match a wide range of I/O devices Connects to the processor-memory bus or backplane bus Backplane Bus (standard or proprietary) Backplane: an interconnection structure within the chassis Allow processors, memory, and I/O devices to coexist Cost advantage: one bus for all components 24

A Three-Bus System Processor Processor Memory Bus Memory Backplane Bus Bus Adaptor Bus Adaptor Bus Adaptor I/O Bus I/O Bus A small number of backplane buses tap into the processor-memory bus Processor-memory bus is used for processor memory traffic I/O buses are connected to the backplane bus 25

Synchronous and Asynchronous Bus Synchronous Bus: Includes a clock in the control lines A fixed protocol for communication that is relative to the clock Advantage: involves very little logic and can run very fast Disadvantages:» Every device on the bus must run at the same clock rate» To avoid clock skew, they cannot be long if they are fast Asynchronous Bus: It is not clocked It can accommodate a wide range of devices It can be lengthened without worrying about clock skew It requires a handshaking protocol 26

Asynchronous handshaking: Read Transaction 1. When memory sees the ReadReq line, it reads the addresses from the data bus and raises Ack to indicate it has been seen. 2. I/O device sees the Ack line high and releases the ReadReq and data lines 3. Memory sees that ReadReq is low and drops the Ack line to acknowledge the Readreq signal. 4. This step starts when the memory has the data ready. It places the data from the read request on the data lines and raises DataRdy. 5. The I/O device sees DataRdy, reads the data from the bus, and signals that it has the data by raising Ack 6. The memory sees the Ack signal, drops DataRdy and releases the data lines. 7. Finally, the I/O device, seeing DataRdy go low. Drops the Ack line, which indicates that the transmission is completed. 27

Arbitration: Obtaining Access to the Bus Bus Master Control: Master initiates requests Data can go either way Bus Slave One of the most important issues in bus design: How is the bus reserved by a devices that wishes to use it? Chaos is avoided by a master-slave arrangement: Only the bus master can control access to the bus: It initiates and controls all bus requests A slave responds to read and write requests The simplest system: Processor is the only bus master All bus requests must be controlled by the processor Major drawback: the processor is involved in every transaction 28

Multiple Potential Bus Masters: the Need for Arbitration Bus arbitration scheme: A bus master wanting to use the bus asserts the bus request A bus master cannot use the bus until its request is granted A bus master must signal to the arbiter after finish using the bus Bus arbitration schemes usually try to balance two factors: Bus priority: the highest priority device should be serviced first Fairness: Even the lowest priority device should never be completely locked out from the bus Bus arbitration schemes Daisy chain arbitration: single device with all request lines. Centralized, parallel arbitration: see next-next slide 29

The Daisy Chain Bus Arbitrations Scheme Device 1 Highest Priority Device 2 Device N Lowest Priority Bus Arbiter Grant Grant Grant Release Request Advantage: simple Disadvantages: Cannot assure fairness: A low-priority device may be locked out indefinitely The use of the daisy chain grant signal also limits the bus speed 30

Centralized Parallel Arbitration Device 1 Device 2 Device N Bus Arbiter Grant Req Used in essentially all processor-memory busses and in high-speed I/O busses 31

The Buses and Networks of Pentium 4 Pentium 4 processor Main memory DIMMs DDR 400 (3.2 GB/sec) DDR 400 (3.2 GB/sec) Memory controller hub (north bridge) 82875P System bus (800 MHz, 604 GB/sec) AGP 8X (2.1 GB/sec) CSA (0.266 GB/sec) Graphics output 1 Gbit Ethernet Disk Serial ATA (150 MB/sec) (266 MB/sec) Parallel ATA (100 MB/sec) CD/DVD Disk Serial ATA (150 MB/sec) AC/97 (1 MB/sec) Stereo (surroundsound) USB 2.0 (60 MB/sec) I/O controller hub (south bridge) 82801EB Parallel ATA (100 MB/sec) (20 MB/sec) Tape 10/100 Mbit Ethernet... PCI bus (132 MB/sec) 32

Increasing the Bus Bandwidth Split transaction protocol- separate versus multiplexed address and data lines: Address and data can be transmitted in one bus cycle if separate address and data lines are available Cost: (a) more bus lines, (b) increased complexity Data bus width: By increasing the width of the data bus, transfers of multiple words require fewer bus cycles Example: SPARCstation 20 s memory bus is 128 bit wide Cost: more bus lines Block transfers: Allow the bus to transfer multiple words in back-to-back bus cycles Only one address needs to be sent at the beginning The bus is not released until the last word is transferred Cost: (a) increased complexity (b) decreased response time for request 33

Example Suppose we have a system with the following characteristics: A memory and bus system supporting block access of 4 to 16 32-bit words A 64-bit synchronous bus clocked at 200 MHZ, with each 64-bit transfer taking 1 cycle, and 1 clock cycle required to send an address to memory Two clock cycles needed between each bus operation. (Assume the bus is idle before an access) A memory access time for the first four words of 200 ns; each additional set of four words can be read in 20 ns. Assume that a bus transfer of the most recently read data and a read of the next four words can be overlapped. Find (1) the sustained bandwidth and the latency (2) the effective number of bus transactions per second for a read of 256 words for transactions that uses 4-word and 16-word blocks, respectively. 34

Memory-mapped I/O Communicating with I/O Portion of address space is assigned to I/O devices; Read and writes to those addresses causes data to be transferred CPU 0 Memory Interface Interface I/O Peripheral Peripheral n Dedicated I/O instruction Specify both the device # and command word 35

Communicating with the Processor Polling Interrupt DMA Intelligent I/O Controller 36

Polling CPU Memory IOC device Polling overhead: 1. Transferring to the polling routine 2. Accessing the device 3. Restarting the user program Is the data ready? yes read data done? yes store data no no busy wait loop not an efficient way to use the CPU unless the device is very fast! but checks for I/O completion can be dispersed among computationally intensive code 37

Interrupt Driven Data Transfer Memory CPU IOC (1) I/O interrupt (2) save PC add sub and or nop user program device (3) interrupt service addr read store... rti memory I/O interrupt is asynchronous with respect to the instruction execution Prioritized I/O interrupt (IPL- Interrupt Priority Level) I/O has lower priority than internal exception high-speed devices are associated with higher priority (4) interrupt service routine 38

Direct Memory Access CPU setup DMA 1. Identity of the device 2. Operation (read, write) 3. Memory address of the source or destination 4. Number of bytes to transfer CPU Memory DMAC IOC device DMA (1) issue requests to device (2) interrupt the processor when the transfer completes 39

DMA and Memory System I/O and consistency of data between cache and memory I/O always see the latest data interfere with CPU CPU Not interfering with the CPU Might see the stale data Output: write-through Input: Non-cacheable SW: flush the cache HW: check I/O address on input CPU Cache I/O Bridge Cache Main Memory Main Memory DMA I/O Bridge 40

Input/Output Processors CPU IOP D1 Mem main memory bus D2... Dn I/O bus issues instruction to IOP CPU (1) (4) IOP interrupts when done (2) target device where cmnds are OP Device Address looks in memory for commands (3) memory OP Addr Cnt Other Device to/from memory transfers are controlled by the IOP directly. what to do where to put data how much special requests 41

Responsibility of OS in I/O The OS guarantees that a user s program accesses only the portions of an I/O device to which the user has rights The OS supplies routines to handle low-level device operations The OS handles the I/O interrupts as other exceptions The OS provides equitable access to the shared IO resources and schedule accesses 42

Transaction processing: Examples: Airline reservations systems and bank ATMs Small changes to large shared software Response time & Throughput I/O Rate: No. disk accesses / second given upper limit for latency Benchmark: TPC-, TPC-R, TPC-W File system & Web I/O benchmark Synthetic benchmark» Simulate typical file access pattern 90% of accesses are to files less than 10 KB, 67% reads, 27% writes 6% read-modify-write, 90% of all file accesses are to data with sequential addresses on the disk SPECSFS measuring NFS (Network File System) SPECWeb Webserver benchmark I/O Rate & Latency 43

Impact of I/O on System Performance Suppose we have a benchmark that executes in 100 seconds of elapsed time, where 90 seconds is CPU time and the rest is I/O time. If CPU time improves by 50% per year for the next five years but I/O time doesn t improve, how much faster will our program run at the end of five years? After n years CPU time I/O time Elapsed time 0 90 seconds 10 second s 1 90 / 1.5 = 60 seconds 10 second s 100 second s 70 second s % I/O time 10% 14% 1. Elapsed time = CPU time + I/O time 100 = 90 + I/O time I/O time = 10 seconds 2. The improvement in CPU performance over five years is 90 / 12 = 7.5 The improvement in elapsed time is only 2 60 / 1.5 = 40 seconds 3 40 / 1.5 = 27 seconds 4 27 / 1.5 = 18 seconds 10 second s 10 second s 10 second s 50 second s 37 second s 28 second s 20% 27% 36% 100 / 22 = 4.5 5 18 / 1.5 = 12 seconds 10 second s 22 second s 45% 44

Designing an I/O System I/O system often needs to meet two constraints Latency constraint Bandwidth constraint Methodology Find the weakest link in the I/O system» Processor, memory, bus, I/O controller or device Configure this component to sustain the required bandwidth Determine the requirements for the rest of the system and configure them to support the bandwidth 45

I/O System Design Consider the following computer system: A CPU that sustains 3 billion instructions per second and averages 100,000 instructions in the operating system per I/O operation A memory backplane bus capable of sustaining a transfer rate of 1000 MB/sec SCSI Ultra 320 controllers with a transfer rate of 320 MB/sec and accommodating up to 7 disks Disk drives with a read/write bandwidth of 75 MB/sec and an average seek plus rotational latency of 6 ms If the workload consists of 64 KB reads (where the block is sequential on a track) and the user program needs 200,000 instructions per I/O operation, find 1. The maximum sustainable I/O rate 2. The number of disks and SCSI controllers required. Assume that the reads can always be done on an idle disk if one exists (i.e., ignore disk conflicts). CPU SCSI controller??? 46

Answer The two fixed components of the system are the memory bus and the CPU. Let s first find the I/O rate that these two components can sustain and determine which of these is the bottleneck. Each I/O takes 200,000 user instructions and 100,000 OS instructions, so Maximum I/O rate of CPU = = Instruction execution rate Instructions per I/O 3 x10^9 (200+100) x 10^3 = 10000 IOs/sec Each I/O transfers 64 KB, and bus bandwidth is 1000 MB/sec Maximum I/O rate of bus = Bus bandwidth Bytes per I/O = 1000 x 10 ^6 64 x 10^3 = 15,625 IOs/sec CPU is the bottleneck!! 47

Answer (cont.) The CPU is the bottleneck, so we can now configure the rest of the system to perform at the level dictated by the CPU, 10,000 I/Os per second. Let s determine how many disks we need to be able to accommodate 10,000 I/Os per second. Time per I/O at disk = Seek + rotational time + Transfer time = 6 ms + 64KB / 75 Mb/sec = 6.9 ms IOs per sec = 1000 ms = 6.9 ms 146 IOs/sec Therefore, to saturate the CPU requirements 10,000 IOs per sec need 10,000/146 = 69 disks Can SCSI bus sustain the I/O transfer rate? Transfer rate = Transfer size Transfer time = 64KB / 6.9 ms 9.56 MB/sec 9.56 x 7 70 MB/sec << 320 MB/sec So how many SCSI drivers do we need? 69 / 7 ~ 10 SCSI drivers 48

I/O performance is limited by weakest link in chain between OS and device Disk I/O Benchmarks: I/O rate vs. Data rate vs. latency Bus Synchronous vs. Asynchornous Bus arbitration Summary I/O device notifying the operating system: Polling: it can waste a lot of processor time I/O interrupt: similar to exception except it is asynchronou Delegating I/O responsibility from the CPU: DMA 49