Chapter 6. I/O issues

Similar documents
CS152 Computer Architecture and Engineering Lecture 20: Busses and OS s Responsibilities. Recap: IO Benchmarks and I/O Devices

Lecture 13. Storage, Network and Other Peripherals

Lecture 23: I/O Redundant Arrays of Inexpensive Disks Professor Randy H. Katz Computer Science 252 Spring 1996

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Input/Output Introduction

Introduction. Motivation Performance metrics Processor interface issues Buses

Buses. Disks PCI RDRAM RDRAM LAN. Some slides adapted from lecture by David Culler. Pentium 4 Processor. Memory Controller Hub.

Computer Science 146. Computer Architecture

Introduction to Input and Output

CPS104 Computer Organization and Programming Lecture 18: Input-Output. Outline of Today s Lecture. The Big Picture: Where are We Now?

Computer Architecture. Hebrew University Spring Chapter 8 Input/Output. Big Picture: Where are We Now?

Recap: What is virtual memory? CS152 Computer Architecture and Engineering Lecture 22. Virtual Memory (continued) Buses

EE108B Lecture 17 I/O Buses and Interfacing to CPU. Christos Kozyrakis Stanford University

Chapter 8. A Typical collection of I/O devices. Interrupts. Processor. Cache. Memory I/O bus. I/O controller I/O I/O. Main memory.

ECE468 Computer Organization and Architecture. OS s Responsibilities

Virtual Memory Input/Output. Admin

Storage. Hwansoo Han

Where We Are in This Course Right Now. ECE 152 Introduction to Computer Architecture Input/Output (I/O) Copyright 2012 Daniel J. Sorin Duke University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

CS152 / Kubiatowicz Lec23.1. Capacity Access Time Cost. CPU Registers 100s Bytes <10s ns. Cache K Bytes ns $ /bit

INPUT/OUTPUT DEVICES Dr. Bill Yi Santa Clara University

Lecture 12: I/O: Metrics, A Little Queuing Theory, and Busses Professor David A. Patterson Computer Science 252 Fall 1996

INPUT/OUTPUT ORGANIZATION

Chapter 6 Storage and Other I/O Topics

Storage Systems. Storage Systems

I/O CANNOT BE IGNORED

Readings. Storage Hierarchy III: I/O System. I/O (Disk) Performance. I/O Device Characteristics. often boring, but still quite important

INPUT/OUTPUT ORGANIZATION

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

Recap: Making address translation practical: TLB. CS152 Computer Architecture and Engineering Lecture 24. Busses (continued) Queueing Theory Disk IO

Administrivia. CMSC 411 Computer Systems Architecture Lecture 19 Storage Systems, cont. Disks (cont.) Disks - review

Interconnecting Components

Lecture 23: Storage Systems. Topics: disk access, bus design, evaluation metrics, RAID (Sections )

PCI and PCI Express Bus Architecture

Input/Output. Today. Next. Principles of I/O hardware & software I/O software layers Disks. Protection & Security

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

Reading and References. Input / Output. Why Input and Output? A typical organization. CSE 410, Spring 2004 Computer Systems

UC Santa Barbara. Operating Systems. Christopher Kruegel Department of Computer Science UC Santa Barbara

CSE325 Principles of Operating Systems. Mass-Storage Systems. David P. Duggan. April 19, 2011

Computer Systems Laboratory Sungkyunkwan University

Chapter Seven Morgan Kaufmann Publishers

I/O Systems. EECC551 - Shaaban. Processor. Cache. Memory - I/O Bus. Main Memory. I/O Controller. I/O Controller. I/O Controller. Graphics.

Storage systems. Computer Systems Architecture CMSC 411 Unit 6 Storage Systems. (Hard) Disks. Disk and Tape Technologies. Disks (cont.

Bus System. Bus Lines. Bus Systems. Chapter 8. Common connection between the CPU, the memory, and the peripheral devices.

Advanced Computer Architecture

Unit 3 and Unit 4: Chapter 4 INPUT/OUTPUT ORGANIZATION

Programmed I/O Interrupt-Driven I/O Direct Memory Access (DMA) I/O Processors. 10/12/2017 Input/Output Systems and Peripheral Devices (02-2)

CS152 Computer Architecture and Engineering Lecture 24. I/O and Storage Systems. April 29, 2004 John Kubiatowicz (

1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7.

Lecture 25: Busses. A Typical Computer Organization

INPUT/OUTPUT ORGANIZATION

RAID (Redundant Array of Inexpensive Disks)

ECE 485/585 Microprocessor System Design

IO System. CP-226: Computer Architecture. Lecture 25 (24 April 2013) CADSL

I/O CANNOT BE IGNORED

Computer Architecture Computer Science & Engineering. Chapter 6. Storage and Other I/O Topics BK TP.HCM

Chapter 3. Top Level View of Computer Function and Interconnection. Yonsei University

Lecture 23. Finish-up buses Storage

Definition of RAID Levels

Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses

INPUT-OUTPUT ORGANIZATION

CSE 120. Overview. July 27, Day 8 Input/Output. Instructor: Neil Rhodes. Hardware. Hardware. Hardware

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 09, SPRING 2013

Introduction to Embedded System I/O Architectures

Today: Coda, xfs! Brief overview of other file systems. Distributed File System Requirements!

Introduction I/O 1. I/O devices can be characterized by Behavior: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec

ECE 551 System on Chip Design

CSE 380 Computer Operating Systems

SEMICON Solutions. Bus Structure. Created by: Duong Dang Date: 20 th Oct,2010

6 Direct Memory Access (DMA)

Lecture #9-10: Communication Methods

CS2410: Computer Architecture. Storage systems. Sangyeun Cho. Computer Science Department University of Pittsburgh

SCSI is often the best choice of bus for high-specification systems. It has many advantages over IDE, these include:

Computer Architecture CS 355 Busses & I/O System

Thomas Polzer Institut für Technische Informatik

Storage System COSC UCB

CS152 Computer Architecture and Engineering Lecture 26. Low Power Design. May 3, 2001 John Kubiatowicz (http.cs.berkeley.

STANDARD I/O INTERFACES

Where We Are in This Course Right Now. ECE 152 Introduction to Computer Architecture Input/Output (I/O) Copyright 2011 Daniel J. Sorin Duke University

1. Define Peripherals. Explain I/O Bus and Interface Modules. Peripherals: Input-output device attached to the computer are also called peripherals.

Unit DMA CONTROLLER 8257

Chapter 6. Storage and Other I/O Topics. ICE3003: Computer Architecture Fall 2012 Euiseong Seo

Systems Architecture II

Appendix D: Storage Systems

CSC Operating Systems Spring Lecture - XIX Storage and I/O - II. Tevfik Koşar. Louisiana State University.

RAID Structure. RAID Levels. RAID (cont) RAID (0 + 1) and (1 + 0) Tevfik Koşar. Hierarchical Storage Management (HSM)


Chapter-6. SUBJECT:- Operating System TOPICS:- I/O Management. Created by : - Sanjay Patel

The CoreConnect Bus Architecture

Chapter 6. Storage and Other I/O Topics

ECE 250 / CPS 250 Computer Architecture. Input/Output

Buses. Maurizio Palesi. Maurizio Palesi 1

System Buses Ch 3. Computer Function Interconnection Structures Bus Interconnection PCI Bus. 9/4/2002 Copyright Teemu Kerola 2002

The von Neuman architecture characteristics are: Data and Instruction in same memory, memory contents addressable by location, execution in sequence.

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

Introduction to I/O. April 30, Howard Huang 1

Dr e v prasad Dt

Replacement Policy: Which block to replace from the set?

Chapter 6. Storage and Other I/O Topics

Transcription:

Computer Architectures Chapter 6 I/O issues Tien-Fu Chen National Chung Cheng Univ Chap6 - Input / Output Issues I/O organization issue- CPU-memory bus, I/O bus width A/D multiplex Split transaction Synchronous vs asynchronous Interface to CPU Polling device Interrupt-driven I/O -mapped I/O Direct memory access (DMA) I/O processors Interaction with OS state date DMA vs virtual memory Disk caches Disk array - RAID Chap6 -

What is a bus? AIs: shared communication link single set of wires used to connect multiple subsystems Processor Control Input Datapath Output A is also a fundamental tool for composing large, complex systems systematic means of abstraction Chap6-2 Advantages of es Processer I/O Device I/O Device I/O Device Versatility: New devices can be added easily Peripherals can be moved between computer systems that use the same bus standard Low Cost: A single set of wires is shared in multiple ways Chap6-3

Disadvantage of es Processor I/O Device I/O Device I/O Device It creates a communication bottleneck The bandwidth of that bus can limit the maximum I/O throughput The maximum bus speed is largely limited by: The length of the bus The number of devices on the bus The need to support a range of devices with: Widely varying latencies Widely varying data transfer rates Chap6-4 General Organization of a Control Lines Data Lines Control lines: Signal requests and acknowledgments Indicate what type of information is on the data lines Data lines carry information between the source and the destination: Data and Addresses Complex commands Chap6-5

Master versus Slave Master Master issues command Data can go either way Slave A bus transaction includes two parts: Issuing the command (and address) request Transferring the data action Master is the one who starts the bus transaction by: issuing the command (and address) Slave is the one who responds to the address by: Sending data to the master if the master ask for data Receiving data from the master if the master wants to send data Chap6-6 Types of ses Processor- (design specific) Short and high speed Only need to match the memory system Maximize memory-to-processor bandwidth Connects directly to the processor Optimized for cache block transfers I/O (industry standard) Usually is lengthy and slower Need to match a wide range of I/O devices Connects to the processor-memory bus or backplane bus Backplane (standard or proprietary) Backplane: an interconnection structure within the chassis Allow processors, memory, and I/O devices to coexist Chap6-7

Example: Pentium System Organization Processor/ PCI I/O ses Chap6-8 Processor A Two- System Processor Adaptor I/O Adaptor I/O Adaptor I/O I/O buses tap into the processor-memory bus via bus adaptors: Processor-memory bus: mainly for processor-memory traffic I/O buses: provide expansion slots for I/O devices Apple Macintosh-II Nu: Processor, memory, and a few selected I/O devices SCCI : the rest of the I/O devices Chap6-9

Processor A Three- System Processor Backplane Adaptor Adaptor Adaptor I/O I/O A small number of backplane buses tap into the processormemory bus Processor-memory bus is only used for processor-memory traffic I/O buses are connected to the backplane bus Advantage: loading on the processor bus is greatly reduced Chap6 - North/South Bridge architectures:separate busses Processor Processor backside cache Backplane Adaptor Adaptor I/O I/O Separate sets of pins for different functions bus Caches Graphics bus (for fast frame buffer) I/O busses are connected to the backplane bus Advantage: ses can run at different speeds Much less overall loading! Chap6 -

Synchronous and Asynchronous Synchronous : Includes a clock in the control lines A fixed protocol for communication that is relative to the clock Advantage: involves very little logic and can run very fast Disadvantages: Every device on the bus must run at the same clock rate To avoid clock skew, they cannot be long if they are fast Asynchronous : It is not clocked It can accommodate a wide range of devices It can be lengthened without worrying about clock skew It requires a handshaking protocol Chap6-2 ses so far Master Slave Control Lines Address Lines Data Lines Master: has ability to control the bus, initiates transaction Slave: module activated by the transaction Communication Protocol: specification of sequence of events and timing requirements in transferring information Asynchronous Transfers: control lines (req, ack) serve to orchestrate sequencing Synchronous Transfers: sequence relative to common clock Chap6-3

Arbitration: Obtaining Access to the Master Control: Master initiates requests Data can go either way Slave One of the most important issues in bus design: How is the bus reserved by a device that wishes to use it? Chaos is avoided by a master-slave arrangement: Only the bus master can control access to the bus: It initiates and controls all bus requests A slave responds to read and write requests The simplest system: Processor is the only bus master All bus requests must be controlled by the processor Major drawback: the processor is involved in every transaction Chap6-4 Multiple Potential Masters: Need for Arbitration arbitration scheme: A bus master wanting to use the bus asserts the bus request A bus master cannot use the bus until its request is granted A bus master must signal to the arbiter the end of the bus utilization arbitration schemes usually try to balance two factors: priority: the highest priority device should be serviced first Fairness: Even the lowest priority device should never be completely locked out from the bus arbitration schemes can be divided into four broad classes: Daisy chain arbitration Centralized, parallel arbitration Distributed arbitration by self-selection: each device wanting the bus places a code indicating its identity on the bus Distributed arbitration by collision detection: Each device just goes for it Problems found after the fact Chap6-5

TheDaisyChainArbitrationsScheme Device Highest Priority Device 2 Device N Lowest Priority Arbiter Grant Grant Grant Release Request wired-or Advantage: simple Disadvantages: Cannot assure fairness: A low-priority device may be locked out indefinitely The use of the daisy chain grant signal also limits the bus speed Chap6-6 Centralized Parallel Arbitration Device Device 2 Device N Grant Req Arbiter Used in essentially all processor-memory busses and in high-speed I/O busses Chap6-7

Simple Synchronous Protocol BReq BG R/W Address Data Cmd+Addr Data Data2 Even memory busses are more complex than this memory (slave) may take time to respond it may need to control data rate Chap6-8 Typical Synchronous Protocol BReq BG R/W Address Cmd+Addr Wait Data Data Data Data2 Slave indicates when it is prepared for data xfer Actual transfer goes at bus rate Chap6-9

993 Backplane/IO Survey S TurboChannel MicroChannel PCI Originator Sun DEC IBM Intel Clock Rate (MHz) 6-25 25-25 async 33 Addressing Virtual Physical Physical Physical Data Sizes (bits) 8,6,32 8,6,24,32 8,6,24,32,64 8,6,24,32,64 Master Multi Single Multi Multi Arbitration Central Central Central Central 32 bit read (MB/s) 33 25 2 33 Peak (MB/s) 89 84 75 (222) Max Power (W) 6 26 3 25 Chap6-2 Processor Interface Issues Processor interface Interrupts mapped I/O I/O Control Structures Polling Interrupts DMA I/O Controllers I/O Processors Capacity, Access Time, Bandwidth Interconnections ses Chap6-2

I/O Interface Independent I/O CPU memory bus Interface Interface Separate I/O instructions (in,out) Peripheral Peripheral common memory & I/O bus CPU Interface Peripheral Interface Peripheral Lines distinguish between I/O and memory transfers VME bus Multibus-II Nubus 4 Mbytes/sec optimistically MIP processor completely saturates the bus! Chap6-22 Mapped I/O CPU Single & I/O No Separate I/O Instructions ROM Interface Interface RAM CPU Peripheral Peripheral $ I/O L2 $ I/O bus Adaptor Chap6-23

Interrupt Driven Data Transfer CPU IOC () I/O interrupt (2) save PC add sub and or nop user program device User program progress only halted during actual transfer (3) interrupt service addr (4) transfers at ms each: interrupts @ 2 µsec per interrupt interrupt service @ 98 µsec each = CPU seconds read store rti memory -6 Device xfer rate = MBytes/sec => x sec/byte => µsec/byte => bytes = µsec transfers x µsecs = ms = CPU seconds Still far from device transfer rate! /2 in interrupt overhead interrupt service routine Chap6-24 Programmed I/O (Polling) CPU IOC device Is the data ready? yes read data done? yes store data no no busy wait loop not an efficient way to use the CPU unless the device is very fast! but checks for I/O completion can be dispersed among computationally intensive code Chap6-25

Delegating I/O Responsibility from the CPU: IOP CPU IOP D Mem main memory bus D2 I/O bus Dn target device where cmnds are () Issues instruction to IOP CPU IOP (3) Device to/from memory transfers are controlled by the IOP directly (4) IOP interrupts CPU when done (2) memory IOP steals memory cycles OP Device Address IOP looks in memory for commands what to do OP Addr Cnt Other where to put data how much special requests Chap6-26 Delegating I/O Responsibility from the CPU: DMA CPU sends a starting address, direction, and length count to DMAC Then issues "start" Direct Access (DMA): External to the CPU Act as a master on the bus Transfers blocks of data to or from memory without CPU intervention CPU DMAC IOC device DMAC provides handshake signals for Peripheral Controller, and Addresses and handshake signals for Chap6-27

Direct Access CPU sends a starting address, direction, and length count to DMAC Then issues "start" CPU Time to do xfers at msec each: DMA set-up sequence @ 5 µsec interrupt @ 2 µsec interrupt service sequence @ 48 µsec second of CPU time ROM DMAC IOC Mapped I/O RAM device DMAC provides handshake signals for Peripheral Controller, and Addresses and handshake signals for n Peripherals DMAC Chap6-28 Redundant Arrays of (Inexpensive) Disks Files are "striped" across multiple disks Redundancy yields high data availability Availability: service still provided to user, even if some components failed Disks will still fail Contents reconstructed from data array Capacity penalty to store redundant info Bandwidth penalty to update redundant info Techniques: Mirroring/Shadowing redundantly stored in the (high capacity cost) Horizontal Hamming Codes (overkill) Parity & Reed-Solomon Codes Failure Prediction (no capacity overhead!) VaxSimPlus Technique is controversial Chap6-29

RAID : Disk Mirroring/Shadowing recovery group Each disk is fully duplicated onto its "shadow" Very high availability can be achieved Bandwidth sacrifice on write: Logical write = two physical writes Reads may be optimized Most expensive solution: % capacity overhead Targeted for high I/O rate, high availability environments Chap6-3 RAID 3: Parity Disk logical record Striped physical records Parity computed across recovery group to protect against hard disk failures 33% capacity cost for parity in this configuration wider arrays reduce capacity costs, decrease expected availability, increase reconstruction time Arms logically synchronized, spindles rotationally synchronized logically a single high capacity, high transfer rate disk Targeted for high bandwidth applications: Scientific, Image Processing P Chap6-3

Inspiration for RAID 4 RAID 3 relies on parity disk to discover errors on Read But every sector has an error detection field Rely on error detection field to catch errors on read, not on the parity disk Allows independent reads to different disks simultaneously Chap6-32 Redundant Arrays of Inexpensive Disks RAID 4: High I/O Rate Parity Increasing Logical D D D2 D3 P Disk Address Insides of 5 disks D4 D5 D6 D7 P D8 D9 D D P Example: small read D & D5, large write D2-D5 D2 D3 D4 D5 D6 D7 D8 D9 D2 D2 D22 D23 P Disk Columns P P Stripe Chap6-33

Inspiration for RAID 5 RAID 4 works well for small reads Small writes (write to one disk): Option : read other data disks, create new sum and write to Parity Disk Option 2: since P has old sum, compare old data to new data, add thedifferencetop Small writes are limited by Parity Disk: Write to D, D5 both also write to P disk D D D2 D3 P D4 D5 D6 D7 P Chap6-34 Redundant Arrays of Inexpensive Disks RAID 5: High I/O Rate Interleaved Parity Independent writes possible because of interleaved parity Example: write to D, D5 uses disks,, 3, 4 D D D2 D3 P D4 D5 D6 P D7 D8 D9 P D D D2 P D3 D4 D5 P D6 D7 D8 D9 D2 D2 D22 D23 P Disk Columns Increasing Logical Disk Addresses Chap6-35

Problems of Disk Arrays: Small Writes RAID-5: Small Write Algorithm Logical Write = 2 Physical Reads + 2 Physical Writes D' D D D2 D3 P new data old data ( Read) old (2 Read) parity + XOR + XOR (3 Write) (4 Write) D' D D2 D3 P' Chap6-36 Subsystem Organization host host adapter array controller single board disk controller manages interface to host, DMA control, buffering, parity logic physical device control striping software off-loaded from host to array controller no applications modifications no reduction of host performance single board disk controller single board disk controller single board disk controller often piggy-backed in small format devices Chap6-37

System Availability: Orthogonal RAIDs String Controller Array Controller String Controller String Controller String Controller String Controller String Controller Data Recovery Group: unit of data redundancy Redundant Support Components: fans, power supplies, controller, cables End to End Data Integrity: internal parity protected data paths Chap6-38 System-Level Availability host I/O Controller Fully dual redundant host I/O Controller Array Controller Array Controller Goal: No Single Points of Failure Recovery Group with duplicated paths, higher performance can be obtained when there are no failures Chap6-39

Summary: RAID Techniques: Goal was performance, popularity due to reliability of storage Disk Mirroring, Shadowing (RAID ) Each disk is fully duplicated onto its "shadow" Logical write = two physical writes % capacity overhead ParityDataBandwidthArray(RAID3) Parity computed horizontally Logically a single high data bw disk High I/O Rate Parity Array (RAID 5) Interleaved parity blocks Independent reads and writes Logical write = 2 reads + 2 writes Chap6-4 Network Attached Storage (NAS) Decreasing Disk Diameters 4"» "» 8"» 525"» 35"» 25"» 8"» 3"» high bandwidth disk systems based on arrays of disks Network provides well defined physical and logical interfaces: separate CPU and storage system! High Performance Storage Service on a High Speed Network Network File Services OS structures supporting remote file access 3Mb/s»Mb/s»5Mb/s»Mb/s»Gb/s» Gb/s networks capable of sustaining high bandwidth transfers Increasing Network Bandwidth Chap6-4

Summary es are important for building large-scale systems Their speed is critically dependent on factors such as length, number of devices, etc Critically limited by capacitance Tricks: esoteric drive technology such as GTL Important terminology: Master: The device that can initiate new transactions Slaves: Devices that respond to the master Two types of bus timing: Synchronous: bus includes clock Asynchronous: no clock, just REQ/ACK strobing Direct Access (DMA) allows fast, burst transfer into processor s memory: Processor s memory acts like a slave Probably requires some form of cache-coherence so that DMA ed memory can be invalidated from cache Chap6-42 I/O Summary: I/O performance limited by weakest link in chain between OS and device Three Components of Disk Access Time: Seek Time: advertised to be 8 to 2 ms May be lower in real life Rotational Latency: 4 ms at 72 RPM and 83 ms at 36 RPM Transfer Time: 2 to 2 MB per second I/O device notifying the operating system: Polling: it can waste a lot of processor time I/O interrupt: similar to exception except it is asynchronous Delegating I/O responsibility from the CPU: DMA, or even IOP Chap6-43