Operating Systems 2010/ PDF Free Download

Operating Systems 2010/2011 Input/Output Systems part 2 (ch13, ch12) Shudong Chen 1

Recap Discuss the principles of I/O hardware and its complexity Explore the structure of an operating system s I/O subsystem application I/O interface 2

Layered view on i/o kernel API listen/accept/bind/ connect/send/receive driver standardizes device interface + seek raw device driver level layered protocol stack 3

Agenda I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations Streams Performance (disk scheduling) RAID TU/e Computer Science, System Architecture and Networking 04/01/2011 4

I/O subsystems (services) Scheduling Buffering Caching Spooling Device reservation Error handling Protection 5

I/O scheduling Scheduling determine a good order in which to execute a set of I/O requests improve overall system performance share device access fairly among processes reduce the average waiting time for I/O to complete Example: disk scheduling, see later I/O scheduler Some I/O request ordering via per-device queue OS maintains a wait queue of requests for each device When an I/O system call is issued, the request is placed on the queue for that device I/O scheduler rearrange the order of the queue to improve the overall system efficiency and the average response time of the request Some OSs try fairness so that no one application receives especially poor service Some OSs give priority service for delay-sensitive request e.g., requests from virtual memory subsystem take priority over application requests the essence of I/O scheduling 6

Device-status Table To keep track of many I/O request at the same time OS attach the wait queue to a devicestatus table Contains an entry for each I/O device Each entry indicates the device s type, address, and state The type of request with parameters are stored in the table entry for that device 7

Buffering, motivation A buffer is a memory area that stores data being transferred between two devices or between a device and an application. General efficiency To cope with device speed mismatch, e.g., storing a file which is received from a modem on a hard disk A modem is slower than a hard disk Bytes are accumulated in a modem buffer With a full buffer, the data is written to the disk in a single operation To cope with device transfer size mismatch fragmentation and reassembly of network messages Large messages are fragmented for small packets for sending At the receiving side: they are placed in a reassembly buffer to form an image of the source data. To maintain copy semantics Write() system call use kernel buffer to store application data Disk write is performed from the kernel buffer Subsequent changes to the application buffer have no effect 8

Buffering alternatives picture from Stallings, Operating Systems Principles 9

Buffer use schemes for producer/consumer No buffering either party provides buffer address at least that party may not be swapped out even though the operation may be queued or delayed can relax to: the (shared) buffer must not be swapped out producer and consumer must be in same context during transfer (or: must share this buffer context) no concurrency One buffer can swap process out needs management buffers become internal kernel data structures notice: swapping the process makes no sense if pending operation is input from same io device (disk) as swapping is onto 10

Schemes for producer/consumer Two buffers (buffer swapping) admit concurrency in producer and consumer e.g. interrupt handler and data usage copying the other buffer from kernel to user space running the application two processors running consumer and producer concurrently including their kernel activities disadvantage: latency, proportional to buffer size More buffers the general consumer/producer scheme implementation: circular buffers (fixed in size) & buffer queues (extensible) admits bursts 11

Caching, Spooling and Device Reservation Caching - fast memory holding copies of data Always just a copy, e.g., Shared files among applications Rapidly written and reread files Key to performance Access to the cached copy is more efficient than access to the original If cache available in main memory, then physical I/O can be avoided the delaying writes strategy: accumulate writes in buffer and allow large transfer Spooling - hold output for a device If device can serve only one request at a time, i.e., printing Each application s output is spooled to a separate disk file After one printing finishes, the spooling system queues the corresponding spool file for output to the printer Device reservation - provides exclusive access to a device System calls for idle device allocation and deallocation Watch out for deadlock Enforce a limit of one open file handle to such a device Provide functions that enable processes to coordinate exclusive access among themselves 12

Error Handling Devices and I/O transfers can fail in many ways transient reasons: overloaded network permanent reasons: a defective disk controller OS can recover from transient failures disk read() failure a read() retry network send() error a resend() Most return an error number or code when I/O request fails errno in Unix --- to applications detailed error information provided by hardware --- hidden to applications System error logs hold problem reports 13

I/O Protection User process may accidentally or purposefully attempt to disrupt normal operation via illegal I/O instructions All I/O instructions defined to be privileged Users cannot issue I/O instructions directly They can only do it through the OS I/O must be performed via system calls Memory-mapped and I/O port memory locations must be protected too Use of a System Call to Perform I/O 14

The kernel keeps state information for I/O components, including open file tables network connections character device state Many, many complex data structures to track buffers, memory allocation, dirty blocks Kernel Data Structures Unix encapsulates these differences within a uniform structure using an object-oriented technique The open-file record contains a dispatch table that holds pointers to the appropriate routines, depending on the type of file UNIX I/O kernel structure Windows NT uses a message-passing implementation for I/O I/O requests are converted into messages, sent to the I/O manager and then to the device driver. Pros and Cons: simplifies the structure and design of the I/O system and adds flexibility add overhead 15

Basic handshaking between host & controller Handshaking example (output operation) --- the following loop is repeated for each byte 1. The host repeatedly reads the busy bit until that bit becomes clear 2. The host sets the write bit in the command register and writes a byte into the data-out register 3. The host sets the command-ready bit CPU is busy-waiting or polling 4. When the controller notices that the command-ready bit is set, it sets the busy bit 5. The controller reads the command register and sees the write command. It reads the data-out register to get the byte and does the I/O to the device 6. The controller clears the command-ready bit, clears the error bit in the status register to indicate that the device I/O succeeded, and clears the busy bit to indicate that it is finished TU/e Computer Science, System Architecture and Networking 04/01/2011 17

Transforming I/O requests to hardware operations Consider reading a file from disk for a process: Determine the device holding the file Translate the file name to a device representation The application refers to the data by a file name. Within a disk, FS maps from the file name through the FS descriptors to obtain the space allocation of the file. OSs obtain flexibility from the multiple stages of lookup tables in the path between a request and a physical device controller. Physically read data from disk into buffer Make data available to requesting process Return control to process 18

Life cycle of a blocking I/O request 19

STREAMS STREAM a full-duplex communication channel between a user-level process and a device in Unix System V and beyond A STREAM consists of: STREAM head interfaces with the user process driver end interfaces with the device zero or more STREAM modules between them Each module contains a read queue and a write queue Message passing is used to communicate between queues in adjacent modules Functionalities of STREAMS ioctl (): to push modules onto a stream write() or putmsg() : to write data to a device read() or getmsg() : to read data from the stream head the STREAMS structure 21

Performance I/O is a major factor in system performance: demands CPU to execute device driver, kernel I/O code results in context switches due to interrupts load down memory bus during data copying network traffic is especially stressful causes high context switch rate Improving performance reduce the number of context switches reduce the number of times of data copying reduce the frequency of interrupts by using large transfers, smart controllers, polling (if busy waiting can be minimized) use DMA or channel to offload simple data copying from the CPU balance CPU, memory, bus, and I/O performance for the highest throughput 23

Scheduling example: rotating disk Information is stored on platters by recording it magnetically. A read-write head flies above each platter. The heads are attached to a disk arm that moves all the heads as a unit. The surface is logically divided into circular tracks, which are subdivided into sectors. The set of tracks that are at one arm position makes up a cylinder. Most disks rotate 60 to 200 times per second. Tracks and sectors per surface numbering takes head movement into account; two or more sides 24

Disk scheduling The operating system is responsible for using hardware efficiently --- for the disk drives, this means having a fast access time and disk bandwidth. Access time (positioning time) has two major components Seek time is the time for the disk to move the heads to the cylinder containing the desired sector. Rotational latency is the additional time waiting for the disk to rotate the desired sector to the disk head. Optimize disk access time for a series of requests: minimize seek time Disk bandwidth (transfer rate) is the total number of bytes transferred, divided by the total time between the first request for service and the completion of the last transfer. We can improve both the access time and the bandwidth by managing the order in which disk I/O requests are serviced. 25

Disk scheduling (Cont.) Several algorithms exist to schedule the servicing of disk I/O requests. We illustrate them with a request queue for I/O to blocks on cylinders: 98, 183, 37, 122, 14, 124, 65, 67 Head starts at 53 26

FCFS (first-come, first-served) Intrinsically fair But does not provide the fastest service Illustration shows total head movement of 640 cylinders 27

SSTF (shortest-seek-time-first) Selects the request with the minimum seek time from the current head position. SSTF scheduling is a form of SJF scheduling; may cause starvation of some requests. Illustration shows total head movement of 236 cylinders. 28

SCAN The disk arm starts at one end of the disk, and moves toward the other end, servicing requests until it gets to the other end of the disk, where the head movement is reversed and servicing continues. Sometimes called the elevator algorithm. Illustration shows total head movement of 208 cylinders. 29

C-SCAN (circular SCAN) Treats the cylinders as a circular list that wraps around from the last cylinder to the first one. The head moves from one end of the disk to the other, servicing requests as it goes. When it reaches the other end, however, it immediately returns to the beginning of the disk, without servicing any requests on the return trip. Provides a more uniform wait time than SCAN. 30

C-LOOK Version of C-SCAN Arm only goes as far as the last request in each direction, then reverses direction immediately, without first going all the way to the end of the disk. 31

Selecting a disk-scheduling algorithm SSTF is common and has a natural appeal It increases performance over FCFS. SCAN and C-SCAN perform better for systems that place a heavy load on the disk. They are less likely to cause a starvation problem. Requests for disk service can be influenced by the file-allocation method. Reading a contiguously allocated file may results in less head movement than reading a linked or indexed file. The location of directory and index blocks is also important caching can help The disk-scheduling algorithm should be written as a separate module of the operating system, allowing it to be replaced with a different algorithm if necessary. Either SSTF or LOOK is a reasonable choice for the default algorithm. 32

Agenda I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations Streams Performance (disk scheduling) RAIDs (Redundant Arrays of Independent Disks) TU/e Computer Science, System Architecture and Networking 04/01/2011 33

Availability and Reliability System reliability: the ability of a system or component to perform its required functions under stated conditions for a specified period of time (IEEE definition) identified by the mean time to failure (MTTF) formally: the probability to function for a period t: reliability (t) = exp (-t/mttf) (negative exponential distribution with expected value MTTF) alternatively: the expected time until the system fails System availability the degree to which a system or component is operational and accessible when required for use (IEEE) formally: the probability that the system is functioning equivalently: the fraction of time it is functioning 34

Relationship Can a system be: highly available while unreliable? yes, assume it fails every second, while recovering in 0.01 second highly reliable while unavailable? yes, take a MTTF of a year and no recovery Hence, the (mean) time to repair (MTTR) plays a role Formally, Availability = MTTF / (MTTF + MTTR) Increasing MTTF improves both reliability and availability 35

How to increase MTTF Generally, increasing the number of components decreases reliability of a system more opportunities of failing MTTF(N disks) = MTTF(1 disk) / N only particular organizations and combinations increase reliability RAID: combine a series of disks to improve average data access times total throughput reliability, through redundancy and replication based on a paper by Patterson et. al. Patterson, Gibs and Katz, A case for redundant arrays of inexpensive disks, Proc. of the 1988 ACM SIGMOD international conference on Management of data, 1988. as opposed to SLED single large expensive disks Inexpensive since then superseded by independent 36

RAID terminology Raid levels: 0.. 6 levels indicate particular approaches, not monotonic improvements Stripes (or strips): logical units of data, fragments size dependent on RAID level Striping, Striped the mapping of stripes to disks defined by the RAID level 37

Level 0: increase performance Level 0: distribute a single logical disk by a round-robin mapping of stripes concurrent access of consecutive strips high transfer rate concurrent handling of independent requests short response time pictures taken from Wikipedia 38

Level 1: mirror Level 1: make a full copy of a disk (or of another RAID level system) concurrent read requests double writes (slight penalty for synchronization) simple recovery 39

Level 2: error correcting codes Level 2: use error correcting codes over small strips bit level the ECC bits go to separate disks Properties all disks involved in all reading, writing less space wasted than level 1 regarded as overkill does not take the failure model into account (= single disk failing, not just a bit) 40

Levels 3+4: parity strip Level 3: strip is one byte, or bit Level 4: strip is one block Parities stored on an extra disk this serves as a backup disk for an arbitrary failing disk can recompute the values in the missing disk Properties concurrent access of stripes high transfer rate for large transfers level 4 allows independent transactions as well extra disk becomes write hotspot 41

Parity computation Parity of two bits: P(b0, b1) = (b0 b1) parity is one iff the bits are unequal exclusive or, xor Assume X4 = X3 xor X2 xor X1 xor X0 Modify X1 to X1 X4 = X3 xor X2 xor X1 xor X0 = X3 xor X2 xor X1 xor X0 xor X1 xor X1 = X3 xor X2 xor X1 xor X0 xor X1 xor X1 = X4 xor X1 xor X1 42

Levels 5+6: distributed parity Level 5: just distribute the RAID 4 parity addresses the hotspot problem Level 6: add two redundancy bits of different data-check algorithms published one year later ( 89) allows two drive failures 43

RAID (0 + 1) and (1 + 0) RAID 0 + 1: a set of disks are striped, and then the stripe is mirrored to another, equivalent stripe RAID 1 + 0: disks are mirrored as pairs, and then these mirrored pairs are striped has theoretical advantages over RAID 0 + 1 single disk failure will not lead to an entire stripe inaccessible 44

Exercises Ch12 2, 8 Check website TU/e Computer Science, System Architecture and Networking 04/01/2011 45