VIII. Input/Output 1
Intended Schedule Date Lecture Hand out Submission 0 20.04. Introduction to Operating Systems Course registration 1 27.04. Systems Programming using C (File Subsystem) 1. Assignment 2 04.05. Systems Programming using C (Process Control) 2. Assignment 1. Assignment 3 11.05. Process Scheduling 3. Assignment 2. Assignment 4 18.05. Process Synchronization 4. Assignment 3. Assignment 5 25.05. Inter Process Communication 5. Assignment 4. Assignment 01.06. Pfingstmontag 6. Assignment 5. Assignment 6 08.06. Deadlocks 7. Assignment 6. Assignment 7 15.06. 8. Assignment 7. Assignment Memory Management 8 22.06. 9. Assignment 8. Assignment 9 29.06. Input / Output 10. Assignment 9. Assignment 10 06.07. Filesystems 10. Assignment 11 13.07. Special subject: XQuery your Filesystem 12 20.07. Wrap up session 27.07. First examination date 12.10. Second examination date 2
I/O Hardware 3
I/O Hardware Control Unit Input / Output (I/O) Different Peripherals Processing Unit Main Memory Mass Storage Devices Glatz 2005 4
I/O Hardware Incredible variety of I/O devices Common concepts Port Bus (daisy chain or shared direct access) Controller (host adapter) I/O instructions control devices Devices have addresses, used by Direct I/O instructions Memory-mapped I/O 5
A Typical PC Bus Structure 6
Device I/O Port Locations on PCs (partial) 7
CPU Device Interaction 8
Principal Choices CPU Device Interaction Programmed I/O (Polling) CPU waits (busy wait) for peripheral Interrupt-Driven I/O Peripheral signals ready to CPU by (hardware) interrupt Direct Memory Access (DMA) Steal cycles from CPU to transfer data from/to main memory w/o CPU interaction needs special DMA controller to execute I/O independently from CPU DMA controller signals ready to CPU via (hardware) interrupt 9
CPU Device Interaction by Polling CPU determines state of device by repeatedly checking busy flag.... prepares data in data-out register (for write operation)... sets command register to write... sets command-ready bit Once controller sees command-ready bit... sets busy bit... sees command=write and reads data-in register... performs write-i/o... clears command-ready, error & busy bits Interface Command-ready bit Busy bit Error bit Command register Data-in, data-out registers Busy-wait cycle to wait for I/O from device Example Glatz: Serial I/F 1 Zeichen senden Statusregister lesen ja bereit? nein 1 Zeichen senden Ende 10
CPU Device Interaction by Interrupts CPU Interrupt-request line (HW wire) Triggered by I/O devices Checked by CPU after each instruction Interrupt handler receives interrupts Maskable to ignore or delay some interrupts Interrupt vector to dispatch interrupt to correct handler Based on priorities Some nonmaskable Using interrupt vector (and chaining) CPU-instruction cycle Glatz Initialisierung loop FETCH (Instruktion holen) EXECUTE (Instruktion ausführen) INTERRUPT CHECK Interrupt mechanism can also used for Exceptions Virtual memory paging System calls,... 11
Interrupt-Driven I/O Cycle 12
Intel Pentium Processor Event-Vector Table 13
Direct Memory Access Used to avoid programmed I/O (PIO, byte-wise data transfer between memory and devices) for large data movement Requires DMA controller Bypasses CPU to transfer data directly between I/O device and memory During DMA access to memory, CPU cannot access memory ( cycle stealing ) Signals completion by a CPU interrupt DMA may be implemented into real or into virtual memory 14
Six Step Process to Perform DMA Transfer 15
DMA: More details Cycle Stealing in Single vs. Block Transfer DMA-Anforderung DMA-Anforderung DMA- Zyklen DMA- Zyklen Prozessor- Zyklen Zeit Prozessor- Zyklen Zeit DMA: Direct vs. Indirect Data Transfer (A) Programmierung DMA-Kontr. (B) Direkter Transfer (C) Indirekter Transfer Hauptspeicher CPU Hauptspeicher CPU Hauptspeicher CPU DMA- Kontroller Peripheriegerät DMA- Kontroller Peripheriegerät DMA- Kontroller Peripheriegerät Glatz 2005 16
Application I/O-Interface 17
Application I/O Interface I/O system calls encapsulate device behaviors in generic classes Device-driver layer hides differences among I/O controllers from kernel Devices vary in many dimensions Character-stream or block Treiberhierarchie Glatz Sequential or random-access Betriebssystemteil Anwenderprozess Sharable or dedicated Speed of operation read-write, read only, or write only logischer Treiber physischer Treiber Geräteverwaltung (I/O manager) Klassentreiber (class driver) Gerätetreiber (device/port driver) Peripheriekontroller (interface HW) Gerät (device) Software Hardware 18
A Kernel I/O Structure 19
Characteristics of I/O Devices 20
Block and Character Devices Block devices include disk drives Commands include read, write, seek Raw I/O or file-system access Memory-mapped file access possible Character devices include keyboards, mice, serial ports Commands include get, put Libraries layered on top allow line editing 21
Network Devices Varying enough from block and character to have own interface Unix and Windows NT/9x/2000 include socket interface Separates network protocol from network operation Includes select functionality Approaches vary widely (pipes, FIFOs, streams, queues, mailboxes) 22
Clocks and Timers Provide current time, elapsed time, timer Programmable interval timer used for timings, periodic interrupts ioctl (on UNIX) covers odd aspects of I/O such as clocks and timers 23
Blocking and Nonblocking I/O Blocking - process suspended until I/O completed Easy to use and understand Insufficient for some needs Nonblocking - I/O call returns as much as available User interface, data copy (buffered I/O) Implemented via multi-threading Returns quickly with count of bytes read or written Asynchronous - process runs while I/O executes Difficult to use I/O subsystem signals process when I/O completed 24
Two I/O Methods Synchronous Asynchronous 25
Kernel I/O Subsystem Scheduling which I/O request to serve next? Some I/O request ordering via per-device queue Some OSs try fairness Buffering store data in memory while transferring between devices To cope with device speed mismatch To cope with device transfer size mismatch To maintain copy semantics Caching fast memory holding copy of data Always just a copy Key to performance Spooling hold output for a device If device can serve only one request at a time, e.g., Printing Device reservation provides exclusive access to a device System calls for allocation and deallocation Watch out for deadlock! 26
Transforming I/O Requests to Hardware Operations Consider reading a file from disk for a process: Determine device holding file Translate name to device representation Physically read data from disk into buffer Make data available to requesting process Return control to process Software Hardware Geräte- Anwenderprozess 1 Treiber A Kontroller A Gerät A Anwenderprozess 2 verwaltung (I/O manager) Treiber B Kontroller B Gerät B Anwenderprozess n Treiber X Kontroller X Gerät X Glatz Treiber-Schnittstelle (driver interface) 27
Performance I/O a major factor in system performance: Demands CPU to execute device driver, kernel I/O code Context switches due to interrupts Data copying Network traffic especially stressful Improving Performance: Reduce number of context switches Reduce data copying in memory between application and device Reduce frequency of interrupts by using large transfers, smart controllers, polling Use DMA to increase concurrency (offload CPU) Move processing into device controllers (offload CPU & bus) Balance CPU, memory, bus, and I/O performance for highest throughput Overload in one of these will leave others idle! 28
Mass Storage 29
Secondary Storage Mass Storage Overview Magnetic disks provide bulk of secondary storage of modern computers Disks can be removable Drives attached to computer via I/O bus (e.h., EIDE,ATA, SATA, USB, FC, SCSI,...) Host controller (in computer) talks to disk controller (in device) via this bus Tertiary Storage Low cost is defining characteristic Typically uses removable media Considered off-line storage; robot-machinery can turn it into near-line Magnetic tapes, floppy, CD, DVD,... 30
Moving-head Disk Machanism Disk access time = queue waiting + seek + rotational delay + transfer 31
Disk Structure Disk drives are addressed as large 1-dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer. The 1-dimensional array of logical blocks is mapped into the sectors of the disk sequentially. Sector 0 is the first sector of the first track on the outermost cylinder. Mapping proceeds in order through that track, then the rest of the tracks in that cylinder, and then through the rest of the cylinders from outermost to innermost. Glatz Seite 0 Seite 1 Sektor 8 Sektor 1 Sektor 1 Sektor 8 7 Zylinder 0 15 Zylinder 1 23 31 Zylinder 2 Sektor 7 6 22 0 16 32 33 17 1 Sektor 2 Sektor 7 14 30 8 24 40 25 41 9 Zylinder 0 Zylinder 1 Zylinder 2 Sektor 2 Sektor 6 21 5 Sektor 5 20 4 18 2 19 3 Sektor 4 Sektor 3 Sektor 6 29 13 Sektor 5 28 12 26 10 27 11 Sektor 4 Sektor 3 Nummerierung: Zylinder 0.. (zero based), Seiten 0.. (zero based), Sektoren 1.. (one based) 32
Disk Attachment Host-attached storage accessed through I/O ports talking to I/O busses SCSI itself is a bus, up to 16 devices on one cable, SCSI initiator requests operation and SCSI targets perform tasks Each target can have up to 8 logical units (disks attached to device controller FC is high-speed serial architecture Can be switched fabric with 24-bit address space the basis of storage area networks (SANs) in which many hosts attach to many storage units Can be arbitrated loop (FC-AL) of 126 devices 33
Network-Attached Storage Network-attached storage (NAS) is storage made available over a network rather than over a local connection (such as a bus) NFS and CIFS are common protocols Implemented via remote procedure calls (RPCs) between host and storage New iscsi protocol uses IP network to carry the SCSI protocol 34
Storage Area Network Common in large storage environments (and becoming more common) Multiple hosts attached to multiple storage arrays - flexible 35
Disk Management Low-level formatting, or physical formatting Dividing a disk into sectors that the disk controller can read and write. To use a disk to hold files, the operating system still needs to record its own data structures on the disk. Partition the disk into one or more groups of cylinders. Logical formatting or making a file system. Boot block initializes system. The bootstrap is stored in ROM. Bootstrap loader program. Methods such as sector sparing used to handle bad blocks. 36
Disk Scheduling 37
Disk Scheduling The operating system is responsible for using hardware efficiently for the disk drives, this means having a fast access time and disk bandwidth. Access time has two major components Seek time is the time for the disk are to move the heads to the cylinder containing the desired sector. Rotational latency is the additional time waiting for the disk to rotate the desired sector to the disk head. Minimize seek time Seek time seek distance Disk bandwidth is the total number of bytes transferred, divided by the total time between the first request for service and the completion of the last transfer. 38
Disk Scheduling (Cont.) Several algorithms exist to schedule the servicing of disk I/O requests. We illustrate them with a request queue (0-199). 98, 183, 37, 122, 14, 124, 65, 67 Head pointer 53 39
FCFS First Come First Served Illustration shows total head movement of 640 cylinders. 40
SSTF Shortest Seek Time First Selects the request with the minimum seek time from the current head position. SSTF scheduling is a form of SJF scheduling; may cause starvation of some requests. Illustration shows total head movement of 236 cylinders. 41
SCAN (aka. Elevator Algorithm) The disk arm starts at one end of the disk, and moves toward the other end, servicing requests until it gets to the other end of the disk, where the head movement is reversed and servicing continues. Illustration shows total head movement of 208 cylinders. 42
C-SCAN Provides a more uniform wait time than SCAN. The head moves from one end of the disk to the other, servicing requests as it goes. When it reaches the other end, however, it immediately returns to the beginning of the disk, without servicing any requests on the return trip. Treats the cylinders as a circular list that wraps around from the last cylinder to the first one. ( Circular Scan ) 43
C-LOOK Somewhat smarter version of C-SCAN Arm only goes as far as the last request in each direction, then reverses direction immediately, without first going all the way to the end of the disk. 44
Selecting a Disk-Scheduling Algorithm SSTF is common and has a natural appeal. SCAN and C-SCAN perform better for systems that place a heavy load on the disk. Performance depends on the number and types of requests. Requests for disk service can be influenced by the file-allocation method. The disk-scheduling algorithm should be written as a separate module of the operating system, allowing it to be replaced with a different algorithm, if necessary. Either SSTF or LOOK is a reasonable choice for the default algorithm. 45
RAID-Systems (Disk Arrays) 46
RAID-Systems Basic Ideas Idea came up in the 1980s Winchester disk drives : smaller, cheaper, less power & space/ volume consumption, no air conditioning,... than state-of-the-art mainframe disk drives But: Many disk drives largely reduced MTTF (by factor 1/n, assuming independence) Need to use redundancy to improve/regain reliability Many disks can also provide much better performance Can also avoid restrictions of file systems: files must fit onto single disk logical volumes spanning disks Striping: use a group of disks as a single storage unit Mirroring/Shadowing or Parity/ECCs improve reliability by storing redundant data 47
RAID Levels Large variety of exact definitions of the levels, plus additional levels. Various choices as to how and where to implement RAID functionality (OS, drive, controller, SAN interconnect; HW/SW;...) Additional functionality can be integrated, such as replication and snapshots. Volume mgmt. software offers a lot of additional functionality. Hot spare disks can be reserved as immediate replacement. 48
More Aspects 49
Hierarchical Storage Management (HSM) A hierarchical storage system extends the storage hierarchy beyond primary memory and secondary storage to incorporate tertiary storage usually implemented as a jukebox of tapes or removable disks. Usually incorporate tertiary storage by extending the file system. Automatically migrate data up and down the storage hierarchy Small and frequently used files remain on disk. Large, old, inactive files are archived to the jukebox. HSM is usually found in supercomputing centers and other large installations that have enormous volumes of data. 50
Cost Main memory is much more expensive than disk storage The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive. The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years. Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives. 51
Price per Megabyte of DRAM (1981 to 2004) 52
Price per Megabyte of Magnetic Hard Disk (1981 to 2004) 53
Price per Megabyte of a Tape Drive (1984-2000) 54
Intended Schedule Date Lecture Hand out Submission 0 20.04. Introduction to Operating Systems Course registration 1 27.04. Systems Programming using C (File Subsystem) 1. Assignment 2 04.05. Systems Programming using C (Process Control) 2. Assignment 1. Assignment 3 11.05. Process Scheduling 3. Assignment 2. Assignment 4 18.05. Process Synchronization 4. Assignment 3. Assignment 5 25.05. Inter Process Communication 5. Assignment 4. Assignment 01.06. Pfingstmontag 6. Assignment 5. Assignment 6 08.06. Deadlocks 7. Assignment 6. Assignment 7 15.06. 8. Assignment 7. Assignment Memory Management 8 22.06. 9. Assignment 8. Assignment 9 29.06. Input / Output 10. Assignment 9. Assignment 10 06.07. Filesystems 10. Assignment 11 13.07. Special subject: XQuery your Filesystem 12 20.07. Wrap up session 27.07. First examination date 12.10. Second examination date 55
IX. File Systems 56