Asynchronous/Multithreaded I/O on Commodity Systems with Multiple Disks a performance study

Size: px
Start display at page:

Download "Asynchronous/Multithreaded I/O on Commodity Systems with Multiple Disks a performance study"

Transcription

1 Asynchronous/Multithreaded I/O on Commodity Systems with Multiple Disks a performance study Nick Cook Department of Computing Science, University of Newcastle October 2000

2 2

3 Abstract The purpose of this dissertation is to test the hypothesis that an asynchronous/multithreaded I/O service can be used to exploit the potential for parallelism of commodity systems with access to multiple disks. Three working systems are contributed: a simulation of asynchronous I/O; a POSIX-compliant asynchronous I/O library; and an extensible C++ class library for comparative tests of I/O services. The results of a simulation study and tests on real systems are used to demonstrate that, under specified circumstances, a significant increase in average application throughput is achieved through the use of asynchronous I/O. The types of workload most likely to benefit from this performance gain are characterised. However, for multithreaded implementations of asynchronous I/O, the improvements are shown to be achieved at the expense of average response times. These costs, sometimes as much as an order of magnitude increase in response time, are quantified. Declaration I declare that this dissertation represents my own work except where otherwise stated. Acknowledgements Thanks are due to my supervisors, Paul Watson and Jim Smith, for their technical input, guidance and support. Isi Mitrani offered advice on the simulation and modelling aspects of the project. I should also acknowledge Ulrich Drepper, author of much of GNU glibc and the asynchronous I/O implementation in particular. It would have been difficult to proceed with significant parts of this project without free access to the GNU source code. Lesley, Joseph and Eve are owed a great deal for their support and forbearance during two years of absenteeism on my part. It is no compensation but this dissertation is dedicated to them. 3

4 4

5 Contents 1 Introduction Target system and project scope I/O service definitions Approach Structure of the dissertation Background Related work File-systems and file-system modelling Disk characteristics and disk drive modelling Specific impact of related work Initial investigation The test applications Results of the initial investigation Programming languages and other tools used aiosim : simulation of asynchronous (and synchronous) I/O services Why simulation? Simulation methodology Overview of simulation model Disk drive model File-system cache model Simulation configuration Description of simulation components FileSys - the file-system class library buffer class disk class fsblock, cacheline and fscache classes filedes class request, reord_req and fill_req classes requestgen class aiosim classes and application arrival generation request setup service disk service results service

6 The simulation application Performance measures and analysis of results Model validation libaiomt : a multithreaded implementation of POSIX.4 asynchronous I/O Design and implementation of libaiomt Interface of POSIX-compliant asynchronous I/O Implementation of libaiomt Key internal functions Thread management Data structures Request allocation Access to shared resources and synchronisation points Comparison with GNU glibc librt implementation iot : a C++ class library for comparative tests of I/O services Composition of test applications Test configuration Description of main iot library components iot::iotest class iot::testfile class iot::request class iot::requestgen class Portability Analysis of selected results Test and simulation configuration System characteristics Common workload specification Sample variance and confidence in results Variance of tests on real systems Confidence in simulation results Average application throughput Tests on real systems Simulation results Average response time and system power estimates Priority queueing Request reordering optimisations Conclusion and further work Lessons learned and further work A aiosim code listing and configuration 107 A.1 simulation code listing A.1.1 aiosim.sim A.1.2 filesys.sim A.1.3 simutil.sim

7 A.2 example simulation configuration files A.2.1 simulation configuration file A.2.2 disk configuration file A.3 simulation parameter tables B libaiomt code listing 147 B.1 aiomt.h B.2 aio_cancel.c B.3 aio_error.c B.4 aio_fsync.c B.5 aio_read.c B.6 aio_return.c B.7 aio_suspend.c B.8 aio_write.c B.9 lio_listio.c B.10 aio_misc.h B.11 aio_misc.c C iot code listing and configuration 179 C.1 iot test application and library code listing C.1.1 test applications C.1.2 common definitions and constants C.1.3 iot::iotest class C.1.4 iot::ioterror class C.1.5 iot::reqlist class C.1.6 iot::request class C.1.7 iot::aiorequest class C.1.8 iot::siorequest class C.1.9 iot::requestgen class C.1.10 iot::runtime class C.1.11 iot::testfile class C.1.12 iot::tfconfig class C.2 Example I/O test configuration file

8 8

9 List of Figures 1.1 commodity system with multiple disks random read 2 100MB and 2 400MB files on 2 disks (50, KB requests) threads created (1000s) in librt async. I/O library for tests in Figure repeat of tests in Figure 2.1 with modified async. I/O library repeat of tests in Figure 2.1 with modified async. I/O library model of asynchronous (and synchronous) I/O service system with arrivals and departures convergence of L and L=TW as simulation batches increase decreasing confidence intervals as simulation batches increase decline in throughput with service rate at results service progress of requests through the libaiomt library comparison of request allocation time the impact of contention in libaiomt (20,000 random read req., uniprocessor PC) the impact of contention in libaiomt (20,000 random read req., multiprocessor PC) progress of requests through the GNU librt library throughput of sequential writes of 2 files on 2 disks seek-time-distance curve used for model of 500MB disk throughput and request queue size (random read 8MB and 16MB files on 2 disks) throughput and request size (random read 16MB files on 2 disks) throughput and request size (random read 32MB and 64MB files on 2 disks) throughput and file size (random read 2 files on 2 disks on multiprocessor PC) comparison of asynchronous I/O simulation and real system comparison of synchronous I/O simulation and real system simulation of the impact on throughput of additional disks (2-8 disks) simulated throughput for a scientific computing workload simulated throughput for a transaction processing workload simulated throughput for a database-like workload simulated throughput for a database-like workload with additional disks simulated response times for a database-like workload with additional disks simulated requests present for a database-like workload with additional disks simulated system power for a database-like workload with additional disks simulated system power for a scientific computing workload simulated system power for a transaction processing workload simulated response times for randomly assigned request priorities

10 6.20 simulated throughput with request reordering and request injection at the library-level

11 List of Tables 2.1 application throughput for sequential read of 2 files on 2 disks performance estimate for 32 batches of 0.01s simulation time (1.23s cpu-time) performance estimates for 8 batches of 0.1s simulation time (11.49s cpu-time) access to shared resources in libaiomt user-parameterisation of iot test applications valid method entry and exit states for iot::iotest valid method entry and exit states for iot::request system calls used in iot test library average performance characteristics of disk drive model throughput of random reads of 22 files on 2 disks average response time estimates for experiment in Table detailed performance estimates for simulated application workloads A.1 global simulation parameters A.2 per run simulation parameters (1) A.3 per run simulation parameters (2): request generation A.4 per run simulation parameters (3): arrival process A.5 per run simulation parameters (4): services A.6 per disk simulation parameters

12 12

13 Chapter 1 Introduction Recent years have seen dramatic increases in processor speeds (doubling year on year). Memory densities have doubled every two years with access times to memory decreasing by 30% to 80% per year [15][9]. Similarly, disk capacity has increased significantly as new technologies enable smaller, higher density and cheaper disks [14]. Advances have been made in I/O bus transfer rates, disk seek times and rotational speeds. However, access to disk, involving as it does the movement of mechanical parts, remains a potentially significant bottleneck in modern computer systems. Indeed the rapid advances in other areas, and continuing decline in price/performance ratio, serve to highlight this bottleneck. The advent of new applications that take advantage of gains in other areas also increase the demand on the I/O subsystem. In recognition of the disk I/O bottleneck, operating system designers and disk manufactures deploy a variety of mechanisms to minimise the impact of disk access. For example, caching is used to either amortize the costs of access to physical media by anticipating future requests or to defer, and where possible avoid, such access. The strategy adopted is to optimise for the common cases of sequential access in general and for short-lived writes in particular (where any given write request is overwritten in the near future by a subsequent write request). However, there are applications that generate I/O workloads that defeat such optimisation. Workloads that may, for example, exhibit worst-case characteristics such as random access to files and/or a requirement for guaranteed persistent writes (where every write must be committed to disk even if subsequently overwritten). For some applications that would present workloads of this type to a conventional file-system, such as database systems, it is common to develop customised storage subsystems. These subsystems by-pass the filesystem provided by the operating system and perform I/0 direct to disk. It is then possible to optimise access for the special case by controlling layout on disk according to predicted access patterns and/or application-specific semantics. The purpose of this dissertation is to test the hypothesis that multithreading can be used to exploit the potential for parallelism of commodity systems with access to multiple disks. In particular, it addresses the question of whether applications that produce worst case workloads, from file-system and disk caching viewpoints, can be supported efficiently by using an asynchronous/multithreaded I/O service. Any demonstrable improvement in performance would be of interest when it is either impractical or too costly to customise disk access for specific applications. If gains are very significant, then it may be possible to avoid development of customised disk access for a large number of applications. Section 1.1 describes the target system and project scope. Section 1.2 defines the types of I/O service studied. Section 1.3 gives an overview of the approach adopted. The structure of the remainder 13

14 of the dissertation is outlined in Section Target system and project scope The target system is presented in Figure 1.1. Applications run on a host system that provides the!! "# Figure 1.1: commodity system with multiple disks abstraction of a file-system for access to data on disk. At the application level, requests are made to read or write data from/to files that logically reside in the file-system. Access to disk is mediated through the file-system which determines whether a request can be serviced from/to its own cache or that access to disk is required. Requests for disk service are submitted through a device interface and may, in turn, be serviced in whole or part from/to the disk s cache. Any data that cannot be transferred from/to the file-system or the disk cache, incurs a physical disk media transfer. A disk controller manages access to the disk cache and initiates transfers from/to disk media. Communication between the host system and disk is via an I/O bus. The bus is shared by multiple disks. In general terms, an application request to read or write data will incur a combination of one or more of the following transfers: $ transfer to/from file-system cache $ transfer across the I/O bus to/from disk cache $ transfer to/from disk media from/to disk cache under control of the disk controller It is possible for the different types of transfer to overlap. For example, having initiated a transfer from its cache across the bus, the disk controller may fetch additional data from physical media while the transfer progresses. Further discussion of the technical details of the above transfers is deferred to later chapters. However, it can be seen that only the third type of transfer does not involve any access to the shared resources of the file-system cache or the I/O bus. In the context of the target system, it is during data transfer to/from physical disk media that the greatest opportunity for parallelism arises. It is assumed that the target system supports kernel-level multithreading (as distinct from userlevel threads). Threads provide separate execution contexts, within a process, that share some process state such as file descriptors and memory address space [6]. Kernel-level multithreading implies that 14

15 the separate threads of control map to kernel entities that may be scheduled separately. A kernel-level thread blocked on a system call (such as a file read()) will allow the scheduling of other threads in the client process. User-level threads within a process are all mapped to a single kernel entity and a blocking system call will block the whole process. The project is, in part, motivated by the development of the Polar parallel object database server [25]. One of the aims of the Polar project is to determine whether commodity systems, composed of a number high-performance PCs, can compete with current commercial parallel database systems that use custom designed components. The platform envisaged for the Polar project is the interconnection of systems such as the target system depicted in Figure 1.1 via a high-speed network. It is a requirement of the project that efficient access to multiple disks be available. The scope of the project has been generalised to investigate the possible performance gains of using an asynchronous I/O service to access multiple disks and to determine the likely I/O workloads that would gain from such a service. 1.2 I/O service definitions This dissertation presents a comparative study of the performance of two types of I/O service: traditional synchronous I/O and asynchronous I/O. A third category multithreaded I/O is essentially a variant of asynchronous I/O. All three categories are defined below. $ Synchronous I/O is POSIX standard blocking I/O. Requests are made for data to/from open files (referenced by a system-maintained file descriptor) using read() and write() system calls. The calling application blocks pending completion of a request. Synchronous I/O should not be confused with synchronised I/O, which refers to the guarantee that a write request has been committed to disk. Rather synchronous I/O indicates that the client application must wait for completion of its requested I/O or, more accurately, must wait until the file-system has completed processing of a request. As implied by the distinction, the return of a write() system call does not guarantee that data has been written to disk. A successful write() operation means that data has been written to file-system cache and scheduled for writing to disk. $ Asynchronous I/O allows the calling application to initiate an I/O request and immediately regain control without waiting for completion of the request. In the context of this project, asynchronous I/O refers to an implementation of the POSIX b-1993 standard (also known as POSIX.4) [8]. POSIX.4 defines an interface to asynchronous I/O and associated requirements on the implementation of the service. As with synchronous I/O, operations are performed on file descriptors, opened by the standard open() system call. However, an asynchronous I/O request (for example, aio_read() or aio_write()) does not block. The request is queued within the system and control is immediately returned to the calling application. When the I/O completes, the application is notified via some user-specified mechanism. For example, a signal may be queued on completion or the application may periodically check the status of its outstanding requests. Thus, I/O is potentially performed in parallel with other operations. The POSIX.4 standard does not specify implementation details and an implementation may be threads-based. $ Multithreaded I/O is the use of synchronous I/O in separate threads within an application. For example, I/O requests to different disks may be handled by different threads using blocking read() and write() calls. Given the assumptions of our target system, there is the potential for parallel I/O because a thread that is blocked pending I/O completion will not block the whole 15

16 process. It is therefore possible for another thread to be scheduled to service a request for its disk. Both multithreaded and asynchronous I/O incur some synchronisation overhead. They may contend for shared resources and at some point an application must handle the result of the I/O requested. They also incur the overhead of managing additional resources, such as request queues and threads. Offset against these overheads is the potential to overlap computation and I/O and, where I/O involves multiple devices, to overlap access to those devices. In the general sense of operations being performed asynchronously, multithreaded I/O is a variant of asynchronous I/O. One distinction between the two derives from the assumption that asynchronous I/O is an implementation of the POSIX.4 standard. The use of multithreaded I/O implies that the application developer is responsible for all thread management issues. Whereas asynchronous I/O presents an implementation-independent interface that relieves the developer of such responsibilities. Another significant difference between multithreaded I/O and asynchronous I/O is that the use of threads is just one approach to the implementation of the latter. The other main approach to implementation of asynchronous I/O is to modify the file-system to provide direct support for asynchronous operation. Some of the performance implications of this approach compared to a multithreaded implementation are addressed in Chapters 6 and Approach The approach adopted was to develop three simple test applications to conduct an initial investigation into the relative performance of synchronous, asynchronous and multithreaded I/O. This investigation is described in Chapter 2. The results of the investigation suggested that, under certain circumstances, there would be a performance gain from the use of asynchronous I/O. However, further work would be required to confidently characterise the types of application workload most likely to benefit and to explore the trade-offs between different aspects of performance. In addition, the investigation highlighted shortcomings in the implementation of asynchronous I/O used. Given the above, it was decided to conduct a more detailed investigation as follows: 1. develop a simulation of asynchronous I/O capable of assessing performance under a wider variety of workloads than would be possible to test on the real systems available 2. implement an optimised version of asynchronous I/O that addresses the short-comings highlighted during the initial investigation 3. produce a more flexible test application suite for comparison of asynchronous and synchronous I/O on real systems. 1.4 Structure of the dissertation The remainder of the dissertation is structured as follows: Chapter 2 provides background technical information to the project. It is divided into three sections. The first provides an overview of the main components of the I/O subsystem under consideration at the file-system and the disk level. Related work on both file-system and disk drive modelling and performance analysis is also introduced. Section 2.2, describes the initial investigation undertaken 16

17 and the results obtained. Section 2.3 briefly discusses the choice of programming languages and other tools used for the project. Chapters 3, 4 and 5 describe the design, implementation and use of: 1. the simulation program (aiosim); 2. the asynchronous I/O library (libaiomt); and 3. the I/O test library (iot) and associated application. Chapter 6 provides an analysis of selected results of the I/O tests and the simulation study Chapter 7 concludes the dissertation and suggests further work. There are three appendices to the dissertation: $ Appendix A provides aiosim source code listing; example configuration files and detailed parameter tables. $ Appendix B provides the source code listing of libaiomt. $ Appendix C provides source code for the iot test library and applications; and an example test configuration file. 17

18 18

19 Chapter 2 Background In this chapter: Section 2.1 introduces technical background to the project; Section 2.2 describes an initial investigation, including results obtained and lessons learned from the investigation; and Section 2.3 discusses the choice of programming languages and other tools used for the project. 2.1 Related work When considering how best to exploit the potential for I/O parallelism offered by a system that accesses multiple disks it is important to understand how both file-systems and disks drives operate. Of particular importance is the impact on performance of both file-system and disk drive caching under different I/O workloads. There follows an overview of file-system and disk drive technologies, including an introduction to related work on file-system and disk drive modelling and performance analysis. First, the processing of application requests for file I/O should be explained. An application may make requests to read or write arbitrary amounts of data within a file. The filesystem divides these requests into file-system block-sized (and block-aligned) requests and services them either from its own cache or by initiating a request for service from disk. In either case, an application request of % bytes will be translated by the file-system into a request for & blocks of data (where %(')&+*-,/ ;: ). As discussed below, for read requests, % is almost always less than &<*=,> ?79;: because the file-system will often arrange for the pre-fetching of subsequent blocks in anticipation of future requests [21]. A read or write request of arbitrary size, % bytes, from/to arbitrary offset, 0, within a file is, then, translated into access to a sequence of one or more file-system blocks:,/@ba4a4a/,/@ C6DFEGIH+JLKM7ONPGRQS&UT. Where is the block within which offset 0 resides. The application request may reside wholly within block,/@. In the following, access to a file is considered sequential if the previous access to the file was to (the block within which the current request starts) or to (the preceding block). This ensures that a series of requests that all reside within a single file-system block are considered sequential. It should be emphasised that a file-system block is the logical unit used by a file-system to organise data. There is no guarantee that data (logically) held in contiguous file-system blocks will be contiguous on disk. File-systems attempt to organise data on disk so that the majority, if not all, of a file s blocks are close together and that placement on disk corresponds to logical placement within a file. The smaller the file size, the greater the likelihood that the correspondence is maintained. The larger the file, and the greater the proportion of disk that is in use, the more likely it is that it s data 19

20 will spread across larger and less contiguous areas of the disk File-systems and file-system modelling It is widely accepted that access patterns at the file-system level are often sequential and that writes very often overwrite recently written data [18][4]. File-systems therefore deploy two types of caching to optimise for these common cases: $ A Least Recently Used (LRU) cache stores recently accessed blocks in anticipation that the same blocks will be accessed again. As the name suggests, the least recently used blocks are expelled first. So blocks are organised according to access times and frequently accessed blocks will remain in the cache. $ A write-back cache is used to buffer writes in memory for later transfer to disk. In this way, many writes do not survive for transfer to disk because they are overwritten by a subsequent request before the transfer takes place. A further common optimisation is read pre-fetching where blocks that logically follow those of the current request are pre-fetched into the file-system cache. The benefits of read pre-fetching [21] include: $ the cost associated with performing a disk I/O operation is amortised over a larger amount of data $ assuming sequential file access translates reasonably well to sequential layout on disk, prefetching will better utilise the disks own readahead cache (see below) $ presenting a larger list of requests to the disk controller provides greater opportunity for request re-ordering at the disk level (exploiting the actual layout of data on disk) Given a request, read pre-fetching adjusts to workload as follows: 1. the cache is checked for the blocks that the request resides in and for some number of pre-fetch blocks. If necessary, a disk request is initiated for the request blocks and/or for the pre-fetch blocks. For application requests that are sufficiently small both the request blocks and the pre-fetch blocks may already reside in cache. As long as the workload appears sequential, pre-fetching is triggered and the amount of data pre-fetched doubles up to some limit. 2. if, according to the LRU cache policy, a pre-fetched block is evicted from the cache before it is used then the file-system assumes it is pre-fetching too aggressively and halves the number of blocks to be pre-fetched at the next request. In this way, the file-system will reduce pre-fetching to no more than the next block should the presented workload not be sequential. Examples of workloads that should benefit from the file-system optimisations described above are: $ sequential streams of requests (whether all reads, all writes or a mixture of the two). For reads from or writes to data that has been recently accessed or that logically follows recently accessed data, the likelihood of cache hits is improved and, therefore, application throughput improves. $ access that frequently overwrites data recently written will often avoid disk access altogether. 20

21 $ streams of requests to small files, whether sequential or not. Such requests will often be mostly served from file-system cache. For small file sizes, most if not all the file may have been fetched to cache before (or even if) pre-fetching is reduced to a minimal level. Given an initial threshold for pre-fetching of % blocks and assuming the first, possibly random, request is not for data near the end of a file, any file near to % blocks in size will be mostly in cache after the first request. If subsequent requests turn out not to be sequential, they will still mostly be served from the cache. Examples of workloads not likely to benefit are: $ writes that require a guarantee of commitment to disk regardless of any subsequent activity $ long streams of requests large enough to overflow caches before any re-use of cached data. Access to large, continuous media files is likely to have these characteristics [4]. A continuous media file, such as a video file, is likely to be read in a long sequential stream of large requests. Even if the same video is replayed, the data from the start of the stream will have been evicted from file-system cache before being requested again. $ streams of random requests over larger files. Random requests to larger files are unlikely to gain much benefit from pre-fetching or caching since requests ranging across a large address space will result in fewer cache hits. Streams of small, random requests to large files may suffer more since they involve access to small areas of a relatively large address space. A recent study [17] suggests that the characteristics of workloads presented to file-systems may be changing and that the common case is becoming less common. It is reported that, in comparison to earlier studies, traces of file-system traffic show that the size of files being accessed is increasing and that a larger proportion of access to files is random. Further, access to large files (greater than 2MB) exhibited a much greater tendency to be random. Mail and WWW browser applications, in particular, were identified as producing random workloads. This is presumably because user-navigation through large mailboxes or WWW documents will not necessarily be sequential. There are other known application workloads that will defeat the file-system optimisations described. For example, amongst the workloads used to validate their disk model[20], Bell Labs describe a... database-like workload... that had... very little spatial locality.... It is an aim of this project to determine whether asynchronous I/O can be used to deliver a performance gain to applications that generate workloads for which file-system optimisations are ill-suited Disk characteristics and disk drive modelling Disk drives contain a mechanism and a controller [19]. The mechanism comprises the recording components (rotating disks and head to access them) and positional components (disk arm assembly etc.). The controller includes a microprocessor, a cache and an interface to the I/O bus. The controller mediates transfers to/from the host system to which a disk is attached. A request for data from disk always involves communication across the bus with the controller, and an associated command overhead. Depending on the state of the disk cache and the mapping of the requested logical blocks to physical locations on disk (performed by the disk controller), the request will incur a transfer of data from disk buffer or from disk media, or from both. From the point of view of this project, transfers to/from disk media, with the associated costs of physical seek and rotational delay, are of particular interest. It is during these transfers that the main opportunity for parallel access to disks arises. 21

22 Disks typically employ the following types of caching: $ a speed matching buffer ensures that data to/from disk is transferred when the host interface is ready. When servicing read requests, the buffer is partially filled (up to a fence value) before a bus data transfer is initiated. Writes are buffered to overlap with head positioning by the drive mechanism. $ a readahead cache actively retrieves and caches data that the controller expects the host to request in the near future. This is commonly implemented as continuing to read from where the last read left off. The readahead cache allows reads to be satisfied in the time it takes the controller to detect a cache hit and then transfer data at bus rate (as opposed to the much slower media transfer rate). A single readahead cache can only support a single sequential stream of requests. The cache is, therefore, often segmented to support interleaved sequential streams. $ a write cache provides immediate reporting of writes as soon as they are in the cache. From the host viewpoint, writes are serviced in the time taken to transfer data to the disk cache. The host experiences slower media rates for data that it explicitly requests is written through to disk. The write cache reduces the volume of writes to disk because overwrites can be made to in-cache data before it goes to disk. Command re-ordering is also supported where writes are schedule for near-optimality. In combination with readahead caching, writes and reads of adjacent blocks can proceed at bus transfer rates. Caching is also used to support command queueing at the controller. The controller is able to impose an ordering on incoming requests to minimise seek times and disk head movements typically ordering requests by Shortest Positioning Time First. The disk device driver will also normally hold a request queue that is ordered to improve response times Specific impact of related work The recent work at Bell Labs on analysing and modelling the performance of both multiple disks on a SCSI bus and of file-system pre-fetching [3][20][21], and that of Ruemmler and Wilkes [19] that preceded it, forms the basis of the modelling work described in Chapter 3. The work on multiple disks on a SCSI bus analyses and models similar workloads to the random read experiments discussed in Section 2.2 below, except that I/O requests are direct to disk (bypassing the file-system). For larger request sizes, they report convoy behaviour in disk I/O (termed rounds ) where, under heavy workloads, each disk services one request before any disk services its next request. This behaviour results in sub-optimal performance. They developed and validated an analytical model that accurately predicts the performance impairment and identifies the terms that characterise it. They suggest an optimisation that deploys an asynchronous read request to trigger disk readahead and thereby achieve greater overlap of bus transfers with disk seeks. This led to consideration of a possible optimisation of multithreaded, asynchronous I/O where incoming requests to a file are ordered by file offset and additional requests are inserted into a random request stream to make it appear more sequential when presented to the file-system. This optimisation is explored in Chapter 6. The work cited above; that of Peter Bosch on mixed media file-systems [4]; and the UC Berkeley work on file-system traffic [17]; all provided useful input for the parameterisation of the simulation model and for the characterisation of workloads for the experiments presented in Chapter 6. In addition to the work cited, there is a considerable body of existing work on both file-system and disk drive performance modelling and evaluation. The papers by Ruemmler and Wilkes [19][18] 22

23 provide a useful overviews of much as this work. Bosch s PhD thesis provides a more recent and detailed survey, with an emphasis on file-system support for mixed media systems. However, very little detailed work was found on the performance analysis of asynchronous I/O. There is work on specific implemenations of asynchronous I/O [5]. One paper was found on the comparison of a filesystem level implementation and a multithreaded implementation [26]. This work indicated that, as is to be expected, file-system support for asynchronous I/O is more efficient than a library level, multithreaded implementation. No comparison of either implementation with synchronous I/O was provided. The only such study found was an earlier paper by the same author that compared a filesystem implementation of asynchronous I/O with synchronous I/O in the specific context of an On- Line Transaction Processing application [27]. This work indicated a performance gain from the use of asynchronous I/O. In conclusion, no detailed work on the comparative performance of synchronous I/O and a multithreaded implementation of asynchronous I/O has been found. Specifically, no study has been found that addresses the possibility of using asynchronous I/O to parallelise access to multiple disks or that identifies the performance trade-offs between application throughput and average request response times that such use entails. Apart from the systems developed, it is the contribution of this dissertation to provide this performance analysis. 2.2 Initial investigation This section presents an initial investigation conducted to determine whether there was any likelihood of achieving an improvement in throughput by using asynchronous or multithreaded I/O to access multiple disks The test applications Three test applications were written 1 : 1. siotest: used standard blocking read(), write() calls to service requests 2. mtiotest: was a custom-built POSIX threads application that used a-priori knowledge of file-todisk mappings to assign threads to service requests to a file on a given disk. I/O operations were performed using standard (blocking) read(), write() calls within a thread 3. aiotest: used the GNU glibc [7] library implementation of asynchronous I/0 (librt) to service requests via calls to aio_read() and aio_write() For all applications, an initialisation phase opened the files and set up any data structures required. For example: the request queues for aiotest and mtiotest; and the area to be written from for write tests. The applications could be configured to produce a single run of one of the following types of request stream: sequential reads; sequential writes; reads from random offsets within a file or writes to random offsets within a file. Files could optionally be opened to request that writes be written through (or synched ) to disk. The siotest application blocked pending completion of each requested I/O operation. A request queue was not used and each request was dealt with as it was generated. For streams of random 1 these applications preceded development of both the simulation and iot test library described in Chapters 3 and 5 23

24 requests, an lseek() operation was performed to move the file position pointer to the requested offset. The system-maintained file position pointer was relied on for sequential requests. The mtiotest and aiotest applications used bounded queues to control the number of outstanding requests. The mtiotest application started separate threads for each of the disks identified at configuration time and a request thread generated requests to be queued the relevant disk thread. The mapping between open files and disks was specified at configuration. Each disk thread would wait for completion of each of its I/O operations. The application would only block when all disk queues were full and all disk threads were blocked pending I/O completion. A results queue was used by the disk threads to pass results back to the initial thread for handling. Ordering of requests between disks was non-deterministic and dependent on the scheduling of threads. Requests for each disk and each file on a disk were serviced in FIFO order. As with siotest, lseek() was only called for requests to random offsets within a file. The aiotest application submitted requests using the aio_read() or aio_write() asynchronous I/O calls. Control returned to the application immediately after submission of a request. The application was then free to handle completion of earlier requests (notified by a signal) and to submit further requests (up to the configured queue size). Application blocking would occur when the request queue was full the maximum number of requests had been submitted and none had yet completed. Asynchronous I/O provides no guarantee on the ordering of the service of requests and may reorder requests submitted. A request priority attribute may be used to lower the priority of an I/O request with respect to the calling application or in order to implement some priority scheme between requests. This priority scheduling scheme is similar to UNIX nice() priority scheduling and it is not possible to increase a request s priority above that of its calling application. Request priorities were not used in the aiotest application. Asynchronous I/O requests must provide a file offset at which to perform the requested I/O (except when a file is opened for appending using the O_APPEND flag). The file position pointer is not relied on and an implied lseek() to the requested offset is always performed. Please see Chapter 4 for further details of the asynchronous I/O interface. For each test, a minimum sample of four runs was used and the average time taken to complete a fixed number of requests recorded. This figure was used to estimate the application throughput in MB/s. The applications were I/O-intensive. I/O requests were generated repeatedly. The minimal computation necessary to check data read or written and to generate the next request was performed between requests. All tests were conducted on Pentium II 233MHz PC systems running the Linux operating system and accessing two SCSI disks Results of the initial investigation For a workload of small (0.5KB) sequential reads of 2 2.5MB files on separate disks, the results shown in Table 2.1 were recorded. test siotest mtiotest aiotest ave. application throughput 20 MB/s 5 MB/s <1MB/s Table 2.1: application throughput for sequential read of 2 files on 2 disks. 24

25 As a result of pre-fetching, from the application viewpoint, small requests in a sequential stream are served mostly from the file-system cache. In effect, the read() call does not block and it is not therefore possible to overlap requests to different disks (the application reads data direct from memory). For such workloads there is no benefit to multiple threads contending for access to file-system cache. Similar relative performance was found for a workload of sequential writes. Write-back caching means that writes to disk media are performed asynchronously and the application throughput experienced is that of writes to memory. It was found that, for sequential read access, performance appeared to converge as request size increased. For example, changing request size from 0.5KB to 8KB resulted in sequential reads from 2.5MB files being served at approximately 5MB/s for all three applications, except for the aiotest application at low queue sizes. When the request size is increased, and request time increases, the benefits of multithreading become apparent. Larger request sizes lead to more I/O blocking. Also, the overhead of multithreading is amortised over longer service times for a request. A series of tests were carried out with streams of random reads to large files and with varying queue sizes for multithreaded and asynchronous I/O. As can be seen from Figure 2.1, mtiotest con average application throughput (MB/s) aiotest MB mtiotest - 100MB file siotest - 100MB aiotest - 400MB mtiotest - 400MB siotest - 400MB request queue size Figure 2.1: random read 2 100MB and 2 400MB files on 2 disks (50, KB requests) sistently out-performs siotest (in terms of throughput). For the larger request queue sizes, aiotest matches mtiotest. The same pattern of relative performance is apparent for both 100MB and 400MB files. A series of small requests at random offsets within a large file will tend to be for data that is distributed widely across both both file and disk (particularly when the file represents a significant percentage of disk capacity 20% for a 100MB file and 80% for the 400MB files in this case). Seek distances between requests are likely to increase. File-system pre-fetching and disk drive readahead 25

26 will be less effective. As illustrated, performance is likely to degrade as file size increase. Another set of tests were carried out with files opened to write requests through to disk (using the O_FSYNC open flag) and therefore avoid write-back caching. These tests revealed a similar relative gain in throughput for both the aiotest and mtiotest applications over the siotest application. A significant feature of initial results was the poor performance of the asynchronous I/O library at low request queue sizes. An examination of the source code of the library revealed that it was a multithreaded implementation that used a separate thread to service requests to each file accessed. These threads remained active as long requests were queued in the library for the given file, but no longer). File thread management was implemented as follows. Thread creation: if (there is no active file thread for a request) create a new thread to service the request else queue the request for the file thread Thread actions: while (there are requests for this file) take the first request for this file from the request queue service the request if (notification by thread) create new thread to notify result else notify result exit Given that librt is a threads-based threads-based implementation of asynchronous I/O, it was assumed that performance of aiotest would be similar to that of mtiotest. A simple modification of the library to count threads created, demonstrated that the significant degradation at low queue sizes resulted directly from the thread management algorithm adopted. As shown in Figure 2.2, the library creates excessive numbers of threads when the file threads are not kept active (even though there may be requests for the relevant file in the future). This thread creation problem would have been even worse had result notification via thread been specified. In this case, an additional thread would have been created to notify each result. The performance of the librt implementation is dependent on the number of threads that are created compared to the number, and frequency, of requests. For I/O-intensive workloads, unless the time to create a thread is insignificant compared to the time to service a request, file threads must be kept active if their creation time is not to dominate performance. A file thread is kept active if the time taken to service a request is greater than the time taken for a subsequent request to be queued for the thread on the request list (to ensure that there is an outstanding request for the thread to service). Sequential streams of small requests, which are mostly memory accesses, do not satisfy this criteria. Nor do low request queue sizes which throttle the library. In the worst case, at low queue sizes, a thread is created after every other request. The application tends to become stable with a queue size of 20-24, when less than 2,000 threads were created over a test run. To address the thread creation problem, the library was modified to make threads wait for 1 second before exiting. If requests arrived before the timeout or were found to be present after the timeout, then a thread would not exit. As shown in Figures 2.3 and 2.4, this brought an immediate improvement in 26

Process size is independent of the main memory present in the system.

Process size is independent of the main memory present in the system. Hardware control structure Two characteristics are key to paging and segmentation: 1. All memory references are logical addresses within a process which are dynamically converted into physical at run time.

More information

Memory Design. Cache Memory. Processor operates much faster than the main memory can.

Memory Design. Cache Memory. Processor operates much faster than the main memory can. Memory Design Cache Memory Processor operates much faster than the main memory can. To ameliorate the sitution, a high speed memory called a cache memory placed between the processor and main memory. Barry

More information

A Comparison of File. D. Roselli, J. R. Lorch, T. E. Anderson Proc USENIX Annual Technical Conference

A Comparison of File. D. Roselli, J. R. Lorch, T. E. Anderson Proc USENIX Annual Technical Conference A Comparison of File System Workloads D. Roselli, J. R. Lorch, T. E. Anderson Proc. 2000 USENIX Annual Technical Conference File System Performance Integral component of overall system performance Optimised

More information

Memory Consistency. Challenges. Program order Memory access order

Memory Consistency. Challenges. Program order Memory access order Memory Consistency Memory Consistency Memory Consistency Reads and writes of the shared memory face consistency problem Need to achieve controlled consistency in memory events Shared memory behavior determined

More information

I/O Buffering and Streaming

I/O Buffering and Streaming I/O Buffering and Streaming I/O Buffering and Caching I/O accesses are reads or writes (e.g., to files) Application access is arbitary (offset, len) Convert accesses to read/write of fixed-size blocks

More information

Removing Belady s Anomaly from Caches with Prefetch Data

Removing Belady s Anomaly from Caches with Prefetch Data Removing Belady s Anomaly from Caches with Prefetch Data Elizabeth Varki University of New Hampshire varki@cs.unh.edu Abstract Belady s anomaly occurs when a small cache gets more hits than a larger cache,

More information

CS 426 Parallel Computing. Parallel Computing Platforms

CS 426 Parallel Computing. Parallel Computing Platforms CS 426 Parallel Computing Parallel Computing Platforms Ozcan Ozturk http://www.cs.bilkent.edu.tr/~ozturk/cs426/ Slides are adapted from ``Introduction to Parallel Computing'' Topic Overview Implicit Parallelism:

More information

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced?

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced? Chapter 10: Virtual Memory Questions? CSCI [4 6] 730 Operating Systems Virtual Memory!! What is virtual memory and when is it useful?!! What is demand paging?!! When should pages in memory be replaced?!!

More information

Chapter 8 Virtual Memory

Chapter 8 Virtual Memory Operating Systems: Internals and Design Principles Chapter 8 Virtual Memory Seventh Edition William Stallings Operating Systems: Internals and Design Principles You re gonna need a bigger boat. Steven

More information

Disk Scheduling COMPSCI 386

Disk Scheduling COMPSCI 386 Disk Scheduling COMPSCI 386 Topics Disk Structure (9.1 9.2) Disk Scheduling (9.4) Allocation Methods (11.4) Free Space Management (11.5) Hard Disk Platter diameter ranges from 1.8 to 3.5 inches. Both sides

More information

File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018

File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018 File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018 Today s Goals Supporting multiple file systems in one name space. Schedulers not just for CPUs, but disks too! Caching

More information

The control of I/O devices is a major concern for OS designers

The control of I/O devices is a major concern for OS designers Lecture Overview I/O devices I/O hardware Interrupts Direct memory access Device dimensions Device drivers Kernel I/O subsystem Operating Systems - June 26, 2001 I/O Device Issues The control of I/O devices

More information

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 13 Virtual memory and memory management unit In the last class, we had discussed

More information

6. Results. This section describes the performance that was achieved using the RAMA file system.

6. Results. This section describes the performance that was achieved using the RAMA file system. 6. Results This section describes the performance that was achieved using the RAMA file system. The resulting numbers represent actual file data bytes transferred to/from server disks per second, excluding

More information

Chapter 4. Cache Memory. Yonsei University

Chapter 4. Cache Memory. Yonsei University Chapter 4 Cache Memory Contents Computer Memory System Overview Cache Memory Principles Elements of Cache Design Pentium 4 and Power PC Cache 4-2 Key Characteristics 4-3 Location Processor Internal (main)

More information

Virtual Memory - Overview. Programmers View. Virtual Physical. Virtual Physical. Program has its own virtual memory space.

Virtual Memory - Overview. Programmers View. Virtual Physical. Virtual Physical. Program has its own virtual memory space. Virtual Memory - Overview Programmers View Process runs in virtual (logical) space may be larger than physical. Paging can implement virtual. Which pages to have in? How much to allow each process? Program

More information

Virtual Memory. Chapter 8

Virtual Memory. Chapter 8 Chapter 8 Virtual Memory What are common with paging and segmentation are that all memory addresses within a process are logical ones that can be dynamically translated into physical addresses at run time.

More information

a process may be swapped in and out of main memory such that it occupies different regions

a process may be swapped in and out of main memory such that it occupies different regions Virtual Memory Characteristics of Paging and Segmentation A process may be broken up into pieces (pages or segments) that do not need to be located contiguously in main memory Memory references are dynamically

More information

Memory Management Virtual Memory

Memory Management Virtual Memory Memory Management Virtual Memory Part of A3 course (by Theo Schouten) Biniam Gebremichael http://www.cs.ru.nl/~biniam/ Office: A6004 April 4 2005 Content Virtual memory Definition Advantage and challenges

More information

Disks and I/O Hakan Uraz - File Organization 1

Disks and I/O Hakan Uraz - File Organization 1 Disks and I/O 2006 Hakan Uraz - File Organization 1 Disk Drive 2006 Hakan Uraz - File Organization 2 Tracks and Sectors on Disk Surface 2006 Hakan Uraz - File Organization 3 A Set of Cylinders on Disk

More information

Background. 20: Distributed File Systems. DFS Structure. Naming and Transparency. Naming Structures. Naming Schemes Three Main Approaches

Background. 20: Distributed File Systems. DFS Structure. Naming and Transparency. Naming Structures. Naming Schemes Three Main Approaches Background 20: Distributed File Systems Last Modified: 12/4/2002 9:26:20 PM Distributed file system (DFS) a distributed implementation of the classical time-sharing model of a file system, where multiple

More information

Ref: Chap 12. Secondary Storage and I/O Systems. Applied Operating System Concepts 12.1

Ref: Chap 12. Secondary Storage and I/O Systems. Applied Operating System Concepts 12.1 Ref: Chap 12 Secondary Storage and I/O Systems Applied Operating System Concepts 12.1 Part 1 - Secondary Storage Secondary storage typically: is anything that is outside of primary memory does not permit

More information

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction Chapter 6 Objectives Chapter 6 Memory Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured.

More information

Chapter 11. I/O Management and Disk Scheduling

Chapter 11. I/O Management and Disk Scheduling Operating System Chapter 11. I/O Management and Disk Scheduling Lynn Choi School of Electrical Engineering Categories of I/O Devices I/O devices can be grouped into 3 categories Human readable devices

More information

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang CHAPTER 6 Memory 6.1 Memory 341 6.2 Types of Memory 341 6.3 The Memory Hierarchy 343 6.3.1 Locality of Reference 346 6.4 Cache Memory 347 6.4.1 Cache Mapping Schemes 349 6.4.2 Replacement Policies 365

More information

Implementation and Evaluation of Prefetching in the Intel Paragon Parallel File System

Implementation and Evaluation of Prefetching in the Intel Paragon Parallel File System Implementation and Evaluation of Prefetching in the Intel Paragon Parallel File System Meenakshi Arunachalam Alok Choudhary Brad Rullman y ECE and CIS Link Hall Syracuse University Syracuse, NY 344 E-mail:

More information

Q.1 Explain Computer s Basic Elements

Q.1 Explain Computer s Basic Elements Q.1 Explain Computer s Basic Elements Ans. At a top level, a computer consists of processor, memory, and I/O components, with one or more modules of each type. These components are interconnected in some

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Virtual Memory 11282011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review Cache Virtual Memory Projects 3 Memory

More information

Process- Concept &Process Scheduling OPERATING SYSTEMS

Process- Concept &Process Scheduling OPERATING SYSTEMS OPERATING SYSTEMS Prescribed Text Book Operating System Principles, Seventh Edition By Abraham Silberschatz, Peter Baer Galvin and Greg Gagne PROCESS MANAGEMENT Current day computer systems allow multiple

More information

Virtual Memory. Reading: Silberschatz chapter 10 Reading: Stallings. chapter 8 EEL 358

Virtual Memory. Reading: Silberschatz chapter 10 Reading: Stallings. chapter 8 EEL 358 Virtual Memory Reading: Silberschatz chapter 10 Reading: Stallings chapter 8 1 Outline Introduction Advantages Thrashing Principal of Locality VM based on Paging/Segmentation Combined Paging and Segmentation

More information

CS 4284 Systems Capstone

CS 4284 Systems Capstone CS 4284 Systems Capstone Disks & File Systems Godmar Back Disks & Filesystems Disk Schematics Source: Micro House PC Hardware Library Volume I: Hard Drives 3 Tracks, Sectors, Cylinders 4 Hard Disk Example

More information

Lecture 17: Threads and Scheduling. Thursday, 05 Nov 2009

Lecture 17: Threads and Scheduling. Thursday, 05 Nov 2009 CS211: Programming and Operating Systems Lecture 17: Threads and Scheduling Thursday, 05 Nov 2009 CS211 Lecture 17: Threads and Scheduling 1/22 Today 1 Introduction to threads Advantages of threads 2 User

More information

Chapter 4 File Systems. Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved

Chapter 4 File Systems. Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved Chapter 4 File Systems File Systems The best way to store information: Store all information in virtual memory address space Use ordinary memory read/write to access information Not feasible: no enough

More information

I/O Management and Disk Scheduling. Chapter 11

I/O Management and Disk Scheduling. Chapter 11 I/O Management and Disk Scheduling Chapter 11 Categories of I/O Devices Human readable used to communicate with the user video display terminals keyboard mouse printer Categories of I/O Devices Machine

More information

Chapter 13: I/O Systems

Chapter 13: I/O Systems Chapter 13: I/O Systems Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations Streams Performance 13.2 Silberschatz, Galvin

More information

1995 Paper 10 Question 7

1995 Paper 10 Question 7 995 Paper 0 Question 7 Why are multiple buffers often used between producing and consuming processes? Describe the operation of a semaphore. What is the difference between a counting semaphore and a binary

More information

Chapter-6. SUBJECT:- Operating System TOPICS:- I/O Management. Created by : - Sanjay Patel

Chapter-6. SUBJECT:- Operating System TOPICS:- I/O Management. Created by : - Sanjay Patel Chapter-6 SUBJECT:- Operating System TOPICS:- I/O Management Created by : - Sanjay Patel Disk Scheduling Algorithm 1) First-In-First-Out (FIFO) 2) Shortest Service Time First (SSTF) 3) SCAN 4) Circular-SCAN

More information

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University CS 555: DISTRIBUTED SYSTEMS [THREADS] Shrideep Pallickara Computer Science Colorado State University Frequently asked questions from the previous class survey Shuffle less/shuffle better Which actions?

More information

Operating System Performance and Large Servers 1

Operating System Performance and Large Servers 1 Operating System Performance and Large Servers 1 Hyuck Yoo and Keng-Tai Ko Sun Microsystems, Inc. Mountain View, CA 94043 Abstract Servers are an essential part of today's computing environments. High

More information

Reducing Disk Latency through Replication

Reducing Disk Latency through Replication Gordon B. Bell Morris Marden Abstract Today s disks are inexpensive and have a large amount of capacity. As a result, most disks have a significant amount of excess capacity. At the same time, the performance

More information

Lecture 2: Memory Systems

Lecture 2: Memory Systems Lecture 2: Memory Systems Basic components Memory hierarchy Cache memory Virtual Memory Zebo Peng, IDA, LiTH Many Different Technologies Zebo Peng, IDA, LiTH 2 Internal and External Memories CPU Date transfer

More information

Virtual Memory. Chapter 8

Virtual Memory. Chapter 8 Virtual Memory 1 Chapter 8 Characteristics of Paging and Segmentation Memory references are dynamically translated into physical addresses at run time E.g., process may be swapped in and out of main memory

More information

MEMORY. Objectives. L10 Memory

MEMORY. Objectives. L10 Memory MEMORY Reading: Chapter 6, except cache implementation details (6.4.1-6.4.6) and segmentation (6.5.5) https://en.wikipedia.org/wiki/probability 2 Objectives Understand the concepts and terminology of hierarchical

More information

Input Output (IO) Management

Input Output (IO) Management Input Output (IO) Management Prof. P.C.P. Bhatt P.C.P Bhatt OS/M5/V1/2004 1 Introduction Humans interact with machines by providing information through IO devices. Manyon-line services are availed through

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili Virtual Memory Lecture notes from MKP and S. Yalamanchili Sections 5.4, 5.5, 5.6, 5.8, 5.10 Reading (2) 1 The Memory Hierarchy ALU registers Cache Memory Memory Memory Managed by the compiler Memory Managed

More information

ECE519 Advanced Operating Systems

ECE519 Advanced Operating Systems IT 540 Operating Systems ECE519 Advanced Operating Systems Prof. Dr. Hasan Hüseyin BALIK (8 th Week) (Advanced) Operating Systems 8. Virtual Memory 8. Outline Hardware and Control Structures Operating

More information

Memory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts

Memory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts Memory management Last modified: 26.04.2016 1 Contents Background Logical and physical address spaces; address binding Overlaying, swapping Contiguous Memory Allocation Segmentation Paging Structure of

More information

CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES

CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES OBJECTIVES Detailed description of various ways of organizing memory hardware Various memory-management techniques, including paging and segmentation To provide

More information

Data Storage and Query Answering. Data Storage and Disk Structure (2)

Data Storage and Query Answering. Data Storage and Disk Structure (2) Data Storage and Query Answering Data Storage and Disk Structure (2) Review: The Memory Hierarchy Swapping, Main-memory DBMS s Tertiary Storage: Tape, Network Backup 3,200 MB/s (DDR-SDRAM @200MHz) 6,400

More information

Kernel Korner AEM: A Scalable and Native Event Mechanism for Linux

Kernel Korner AEM: A Scalable and Native Event Mechanism for Linux Kernel Korner AEM: A Scalable and Native Event Mechanism for Linux Give your application the ability to register callbacks with the kernel. by Frédéric Rossi In a previous article [ An Event Mechanism

More information

Device-Functionality Progression

Device-Functionality Progression Chapter 12: I/O Systems I/O Hardware I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations Incredible variety of I/O devices Common concepts Port

More information

Chapter 12: I/O Systems. I/O Hardware

Chapter 12: I/O Systems. I/O Hardware Chapter 12: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations I/O Hardware Incredible variety of I/O devices Common concepts Port

More information

Memory. Objectives. Introduction. 6.2 Types of Memory

Memory. Objectives. Introduction. 6.2 Types of Memory Memory Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured. Master the concepts

More information

Design Issues 1 / 36. Local versus Global Allocation. Choosing

Design Issues 1 / 36. Local versus Global Allocation. Choosing Design Issues 1 / 36 Local versus Global Allocation When process A has a page fault, where does the new page frame come from? More precisely, is one of A s pages reclaimed, or can a page frame be taken

More information

FFS: The Fast File System -and- The Magical World of SSDs

FFS: The Fast File System -and- The Magical World of SSDs FFS: The Fast File System -and- The Magical World of SSDs The Original, Not-Fast Unix Filesystem Disk Superblock Inodes Data Directory Name i-number Inode Metadata Direct ptr......... Indirect ptr 2-indirect

More information

An Approach to Task Attribute Assignment for Uniprocessor Systems

An Approach to Task Attribute Assignment for Uniprocessor Systems An Approach to ttribute Assignment for Uniprocessor Systems I. Bate and A. Burns Real-Time Systems Research Group Department of Computer Science University of York York, United Kingdom e-mail: fijb,burnsg@cs.york.ac.uk

More information

Questions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation. What s in a process? Organizing a Process

Questions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation. What s in a process? Organizing a Process Questions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation Why are threads useful? How does one use POSIX pthreads? Michael Swift 1 2 What s in a process? Organizing a Process A process

More information

Chapter 8: Memory-Management Strategies

Chapter 8: Memory-Management Strategies Chapter 8: Memory-Management Strategies Chapter 8: Memory Management Strategies Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel 32 and

More information

Course Outline. Processes CPU Scheduling Synchronization & Deadlock Memory Management File Systems & I/O Distributed Systems

Course Outline. Processes CPU Scheduling Synchronization & Deadlock Memory Management File Systems & I/O Distributed Systems Course Outline Processes CPU Scheduling Synchronization & Deadlock Memory Management File Systems & I/O Distributed Systems 1 Today: Memory Management Terminology Uniprogramming Multiprogramming Contiguous

More information

An AIO Implementation and its Behaviour

An AIO Implementation and its Behaviour An AIO Implementation and its Behaviour Benjamin C. R. LaHaise Red Hat, Inc. bcrl@redhat.com Abstract Many existing userland network daemons suffer from a performance curve that severely degrades under

More information

Chapter 7: Main Memory. Operating System Concepts Essentials 8 th Edition

Chapter 7: Main Memory. Operating System Concepts Essentials 8 th Edition Chapter 7: Main Memory Operating System Concepts Essentials 8 th Edition Silberschatz, Galvin and Gagne 2011 Chapter 7: Memory Management Background Swapping Contiguous Memory Allocation Paging Structure

More information

The Journey of an I/O request through the Block Layer

The Journey of an I/O request through the Block Layer The Journey of an I/O request through the Block Layer Suresh Jayaraman Linux Kernel Engineer SUSE Labs sjayaraman@suse.com Introduction Motivation Scope Common cases More emphasis on the Block layer Why

More information

Lecture 2 Process Management

Lecture 2 Process Management Lecture 2 Process Management Process Concept An operating system executes a variety of programs: Batch system jobs Time-shared systems user programs or tasks The terms job and process may be interchangeable

More information

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook

CS356: Discussion #9 Memory Hierarchy and Caches. Marco Paolieri Illustrations from CS:APP3e textbook CS356: Discussion #9 Memory Hierarchy and Caches Marco Paolieri (paolieri@usc.edu) Illustrations from CS:APP3e textbook The Memory Hierarchy So far... We modeled the memory system as an abstract array

More information

Block Device Driver. Pradipta De

Block Device Driver. Pradipta De Block Device Driver Pradipta De pradipta.de@sunykorea.ac.kr Today s Topic Block Devices Structure of devices Kernel components I/O Scheduling USB Device Driver Basics CSE506: Block Devices & IO Scheduling

More information

CS 318 Principles of Operating Systems

CS 318 Principles of Operating Systems CS 318 Principles of Operating Systems Fall 2018 Lecture 16: Advanced File Systems Ryan Huang Slides adapted from Andrea Arpaci-Dusseau s lecture 11/6/18 CS 318 Lecture 16 Advanced File Systems 2 11/6/18

More information

Chapter 8 Virtual Memory

Chapter 8 Virtual Memory Chapter 8 Virtual Memory Contents Hardware and control structures Operating system software Unix and Solaris memory management Linux memory management Windows 2000 memory management Characteristics of

More information

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATION SEMESTER: III

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATION SEMESTER: III GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATION SEMESTER: III Subject Name: Operating System (OS) Subject Code: 630004 Unit-1: Computer System Overview, Operating System Overview, Processes

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective. Part I: Operating system overview: Memory Management

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective. Part I: Operating system overview: Memory Management ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part I: Operating system overview: Memory Management 1 Hardware background The role of primary memory Program

More information

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 420, York College. November 21, 2006

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 420, York College. November 21, 2006 November 21, 2006 The memory hierarchy Red = Level Access time Capacity Features Registers nanoseconds 100s of bytes fixed Cache nanoseconds 1-2 MB fixed RAM nanoseconds MBs to GBs expandable Disk milliseconds

More information

CS 318 Principles of Operating Systems

CS 318 Principles of Operating Systems CS 318 Principles of Operating Systems Fall 2017 Lecture 16: File Systems Examples Ryan Huang File Systems Examples BSD Fast File System (FFS) - What were the problems with the original Unix FS? - How

More information

Cache Management for Shared Sequential Data Access

Cache Management for Shared Sequential Data Access in: Proc. ACM SIGMETRICS Conf., June 1992 Cache Management for Shared Sequential Data Access Erhard Rahm University of Kaiserslautern Dept. of Computer Science 6750 Kaiserslautern, Germany Donald Ferguson

More information

Introduction to OpenMP. Lecture 10: Caches

Introduction to OpenMP. Lecture 10: Caches Introduction to OpenMP Lecture 10: Caches Overview Why caches are needed How caches work Cache design and performance. The memory speed gap Moore s Law: processors speed doubles every 18 months. True for

More information

Eastern Mediterranean University School of Computing and Technology CACHE MEMORY. Computer memory is organized into a hierarchy.

Eastern Mediterranean University School of Computing and Technology CACHE MEMORY. Computer memory is organized into a hierarchy. Eastern Mediterranean University School of Computing and Technology ITEC255 Computer Organization & Architecture CACHE MEMORY Introduction Computer memory is organized into a hierarchy. At the highest

More information

Final Exam Preparation Questions

Final Exam Preparation Questions EECS 678 Spring 2013 Final Exam Preparation Questions 1 Chapter 6 1. What is a critical section? What are the three conditions to be ensured by any solution to the critical section problem? 2. The following

More information

On the Relationship of Server Disk Workloads and Client File Requests

On the Relationship of Server Disk Workloads and Client File Requests On the Relationship of Server Workloads and Client File Requests John R. Heath Department of Computer Science University of Southern Maine Portland, Maine 43 Stephen A.R. Houser University Computing Technologies

More information

SMD149 - Operating Systems

SMD149 - Operating Systems SMD149 - Operating Systems Roland Parviainen November 3, 2005 1 / 45 Outline Overview 2 / 45 Process (tasks) are necessary for concurrency Instance of a program in execution Next invocation of the program

More information

Role of OS in virtual memory management

Role of OS in virtual memory management Role of OS in virtual memory management Role of OS memory management Design of memory-management portion of OS depends on 3 fundamental areas of choice Whether to use virtual memory or not Whether to use

More information

Chapter 13: I/O Systems

Chapter 13: I/O Systems Chapter 13: I/O Systems Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations Streams Performance 13.2 Silberschatz, Galvin

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

Two hours. Question ONE is COMPULSORY UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Friday 25th January 2013 Time: 14:00-16:00

Two hours. Question ONE is COMPULSORY UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Friday 25th January 2013 Time: 14:00-16:00 Two hours Question ONE is COMPULSORY UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE Operating Systems Date: Friday 25th January 2013 Time: 14:00-16:00 Please answer Question ONE and any TWO other

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

L7: Performance. Frans Kaashoek Spring 2013

L7: Performance. Frans Kaashoek Spring 2013 L7: Performance Frans Kaashoek kaashoek@mit.edu 6.033 Spring 2013 Overview Technology fixes some performance problems Ride the technology curves if you can Some performance requirements require thinking

More information

Chapter 3: Important Concepts (3/29/2015)

Chapter 3: Important Concepts (3/29/2015) CISC 3595 Operating System Spring, 2015 Chapter 3: Important Concepts (3/29/2015) 1 Memory from programmer s perspective: you already know these: Code (functions) and data are loaded into memory when the

More information

Operating Systems Design Exam 2 Review: Spring 2011

Operating Systems Design Exam 2 Review: Spring 2011 Operating Systems Design Exam 2 Review: Spring 2011 Paul Krzyzanowski pxk@cs.rutgers.edu 1 Question 1 CPU utilization tends to be lower when: a. There are more processes in memory. b. There are fewer processes

More information

The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared-Memory Multiprocessor

The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared-Memory Multiprocessor IEEE Transactions on Parallel and Distributed Systems, Vol. 5, No. 6, June 1994, pp. 573-584.. The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared-Memory Multiprocessor David J.

More information

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2014 Lecture 14

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2014 Lecture 14 CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2014 Lecture 14 LAST TIME! Examined several memory technologies: SRAM volatile memory cells built from transistors! Fast to use, larger memory cells (6+ transistors

More information

CS 416: Opera-ng Systems Design March 23, 2012

CS 416: Opera-ng Systems Design March 23, 2012 Question 1 Operating Systems Design Exam 2 Review: Spring 2011 Paul Krzyzanowski pxk@cs.rutgers.edu CPU utilization tends to be lower when: a. There are more processes in memory. b. There are fewer processes

More information

Memory Management! How the hardware and OS give application pgms:" The illusion of a large contiguous address space" Protection against each other"

Memory Management! How the hardware and OS give application pgms: The illusion of a large contiguous address space Protection against each other Memory Management! Goals of this Lecture! Help you learn about:" The memory hierarchy" Spatial and temporal locality of reference" Caching, at multiple levels" Virtual memory" and thereby " How the hardware

More information

Chapter 8: Main Memory. Operating System Concepts 9 th Edition

Chapter 8: Main Memory. Operating System Concepts 9 th Edition Chapter 8: Main Memory Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel

More information

CHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang

CHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang CHAPTER 6 Memory 6.1 Memory 233 6.2 Types of Memory 233 6.3 The Memory Hierarchy 235 6.3.1 Locality of Reference 237 6.4 Cache Memory 237 6.4.1 Cache Mapping Schemes 239 6.4.2 Replacement Policies 247

More information

OSEK/VDX. Communication. Version January 29, 2003

OSEK/VDX. Communication. Version January 29, 2003 Open Systems and the Corresponding Interfaces for Automotive Electronics OSEK/VDX Communication Version 3.0.1 January 29, 2003 This document is an official release and replaces all previously distributed

More information

August 1994 / Features / Cache Advantage. Cache design and implementation can make or break the performance of your high-powered computer system.

August 1994 / Features / Cache Advantage. Cache design and implementation can make or break the performance of your high-powered computer system. Cache Advantage August 1994 / Features / Cache Advantage Cache design and implementation can make or break the performance of your high-powered computer system. David F. Bacon Modern CPUs have one overriding

More information

MEMORY MANAGEMENT/1 CS 409, FALL 2013

MEMORY MANAGEMENT/1 CS 409, FALL 2013 MEMORY MANAGEMENT Requirements: Relocation (to different memory areas) Protection (run time, usually implemented together with relocation) Sharing (and also protection) Logical organization Physical organization

More information

Chapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition

Chapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition Chapter 8: Memory- Management Strategies Operating System Concepts 9 th Edition Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Strategies Background Swapping Contiguous Memory Allocation

More information

COMP 3361: Operating Systems 1 Final Exam Winter 2009

COMP 3361: Operating Systems 1 Final Exam Winter 2009 COMP 3361: Operating Systems 1 Final Exam Winter 2009 Name: Instructions This is an open book exam. The exam is worth 100 points, and each question indicates how many points it is worth. Read the exam

More information

Memory Management. Goals of this Lecture. Motivation for Memory Hierarchy

Memory Management. Goals of this Lecture. Motivation for Memory Hierarchy Memory Management Goals of this Lecture Help you learn about: The memory hierarchy Spatial and temporal locality of reference Caching, at multiple levels Virtual memory and thereby How the hardware and

More information

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic

More information