Avoiding the Cache Coherence Problem in a. Parallel/Distributed File System. Toni Cortes Sergi Girona Jesus Labarta

Size: px
Start display at page:

Download "Avoiding the Cache Coherence Problem in a. Parallel/Distributed File System. Toni Cortes Sergi Girona Jesus Labarta"

Transcription

1 Avoiding the Cache Coherence Problem in a Parallel/Distributed File System Toni Cortes Sergi Girona Jesus Labarta Departament d'arquitectura de Computadors Universitat Politecnica de Catalunya - Barcelona ftoni, sergi, jesusg@ac.upc.es Abstract In this paper we present PAFS, a new parallel/distributed le system. Within the whole le system, special interest is placed on the caching and prefetching mechanisms. We present a cooperative cache that avoids the coherence problem while it continues to be highly scalable and achieves very good performance. We also present an aggressive prefetching algorithm that allows full utilization of the big caches oered by the cooperative cache mechanism. Keywords Input/Output, Parallel/Distributed File System, Cooperative Cache, Aggressive Prefetching, PAFS, xfs. 1 Introduction In recent years, lots of work has been devoted to parallel I/O and parallel/distributed le systems. This work has produced many dierent le systems along with as many dierent caching policies. A common exponent in most of these le systems is the idea of cooperation between nodes in order to achieve a better system performance. A good example is the idea of cooperative caches. In a cooperative cache all nodes work together in order to make a global cache. This kind of cooperation increases the cache size and the hit ratio thus improving the le system performance. This cooperation between nodes raises a very important problem: maintaining the sharedinformation coherent. A le system with cooperative cache usually has to implement complicated and expensive mechanisms in order to avoid incoherences in the cached data. In this paper we present a simple and ecient solution to this problem. PAFS is a parallel/distributed le system designed to work in a parallel machine or network of workstations. Each node in the network runs a micro-kernel operating system and all services are handled by user-level servers. Besides, each node may have none, one or even several discs connected to it. As part of this le system, we have also implemented a cooperative cache algorithm that avoids the coherence problem. The solution presented will not only solve the coherence problem allowing a much simpler code, but it will also increase the le system performance. This increase in performance has been measured through simulation comparing our proposal against the algorithms proposed in xfs (the le system designed as part of the NOW project [1, 2, 8]). As cooperative caches oer huge caches, we believe that they should be used in order to do some kind of aggressive prefetching. Along this line, we also present Full-File-On-Open, a prefetching algorithm that takes advantage of these huge cache sizes. This report has been supported by the Spanish Ministry of Education (CICYT) under the TIC and TIC contracts. 1

2 All the results presented in this paper are obtained through simulation so that a wide range of environments and architecture congurations may be studied. These simulations have been done using the Sprite workload [3]. This paper is structured into 11 sections. Section 2 gives an overview to some of the related work. Section 3 describes the environment where PAFS is expected to run and it is followed by a terminology section. Section 5 describes PAFS and the cooperative cache implemented in the le system. Section 6 describes xfs and its caching policy (N-Chance-Forwarding). A comparison between both le systems is also presented in this section. In Section 7 Full- File-On-Open, an aggressive prefetching policy, is described. Section 8 gives details of the simulator and the traces used in order to obtain the results presented later. Sections 9 and 10 give detailed performance results of the le system with both the cooperative cache and the prefetching algorithm. Finally, Section 11 gives the most signicant conclusions that can be extracted from this work. 2 Related Work In the last years many parallel/distributed le systems have been developed and most of them have placed special interest in developing cache strategies and prefetching algorithms (ParSys [4], Galley [16], Scotch [11], Zebra [12], PIOUS [24], Vesta [5] and sfs [22] among others). Besides all the above mentioned projects, we would like to place special interest on two other research projects as they have interesting similarities to the work presented here. The rst one is the xfs le system [1, 2, 8]. xfs is a server-less le system with a cooperative cache (N-Chance Forwarding) designed to work on a network of workstations. This system is the reference point used in our paper in order to compare the performance of our le system. The basic dierences between PAFS and xfs are the replacement algorithm used and the way the cache coherence problem is handled. Also, in their deep study of N-Chance Forwarding only read operations are contemplated while both read and write operations are studied in our work. More details about xfs are given in a later section of this paper(x6). The second one was developpedd by Le et al. They have done a great deal of theoretical work on the use of remote memory for caching activities [19, 20]. In their latest project they also propose a mechanism which also avoids the cache coherence problem. The main dierence is that our servers don't need to communicate among themselves in order to know the placement of a given block. Another dierence is that a distinct replacement algorithm is proposed in this paper. Furthermore, only a cooperative replacement algorithm was presented by Le et al. while we present a complete le system. Finally, in our work all measurements are done using a real workload. Besides the above-mentioned dierences with each particular related project, there is a general dierence from all of them when compared with our research. All projects presented so far are aimed at network of workstation running a unix-like operating system on top. Our work assumes that each node in the network runs a micro-kernel operating system and that the le-system services are provided by a server, or servers, running on top of the micro kernel. As the le system is implemented by a server all requests have to be sent to this server, or servers, and cannot be done through a simple system call. Besides, fewer actions can be completely handled locally in the requesting node. These dierences may somewhat modify the key issues of the design. There has also been interesting research in cooperative memory usage [9, 23, 21]. This kind of cooperation has also been studied in the database eld by Franklin et al. [10]. On the prefetching eld we should mention the work done by D. Kotz about prefetching on MIMD multiprocessors [14, 15]. We would also like to cite the transparent informed prefetching developed by Gibson et al. [11, 28]. In their work a very aggressive prefetching policy, similar to Full-File-On-Open, is presented. The dierence between our work and the research mentioned above is that we not only present a prefetching algorithm but the interaction with the cooperative cache mechanism is also studied. A centralized version of this work has also been developed by this research group [6, 26]. 2

3 3 Target Environment The le system presented in this paper is targeted for a parallel machine or network of workstations (NOWs). This parallel machine, or NOW, may have a very large number of nodes connected through a very fast interconnection network. Each node may have none, one or even several disks connected to it. From now on, we will refer to both architectures (NOW and parallel) as a parallel machine. Each node runs a micro-kernel operating system on top instead of a full unix-like one. All functions not oered by the kernel itself are implemented by user-level servers. This is also the case for the le-system operations. This operating-system architecture has been chosen as it allows better cooperation between nodes. Besides, it may oer a single system image to the applications running on top of the parallel machine, simplifying its use. We believe that this single image is the way to go. This work started as a le-system prototype for the PAROS operating system microkernel [17]. This target platform dened the environment we work with. In order to be able to implement our parallel/distributed le system, the underlying micro-kernel should oer the following abstractions and functionality. First, we should be able to have multi-threaded applications. This allows us to implement a le server with several threads which can work on dierent requests in parallel. Second, ports are needed to allow communication between applications. User requests and completion notications are sent using this mechanism. No data transfer is done using ports, as a faster mechanism can be used. Finally, a memory copy operation is needed. This mechanism is used to transfer data between the cache and the user. Our assumption is that any processor can set up a data transfer between any other two processors. The processor that invokes the copy is charged with all the overhead. When we refer to a memory copy, the copy request and the copy itself are all included. Similar remote-memory-access mechanisms are suported in a variety of distributed-memory systems [30, 7]. 4 Terminology In this section we describe some concepts and terminology that may help the reader to understand some of the ideas presented later in this paper. As in all caches, requesting a block means ending up either with a cache hit or a cache miss. The dierence, in a cooperative cache environment, is that a cache hit may be either local or remote. If the requested block is found in the same node as the requesting client is running we have local hit. On the other hand, if the requested block is found in the cache of a dierent node we have a remote hit. We also use the term global hit in order to identify both kinds of hits. It is also important to dierentiate between the possible situations that may be found on a cache miss. The rst one appears when block that has to be replaced has been modied but has not been written to disk yet (it is dirty). We call to this situation a miss on dirty. On the other hand, if the block to be replaced does not need to be written to disk (it is clean) we are looking at a miss on clean. 5 File System and Cache Design In this section we present PAFS, a parallel/distributed le system with a cooperative cache that avoids the coherence problem. This description will be divided into two main issues: the le system architecture and the cooperative cache design. 5.1 File System Architecture When designing a le system there are two main issues that have to be taken in account: how the data is distributed among the disks and how this data and meta-data are managed, by the servers, before they get to the clients. In this work we only examine the second point as the ideas we propose are valid no matter how data is stored on the disks. 3

4 Client File z cacheserver diskserver f-0 File f NET cacheserver Block f-0 Client File f Block f-1 diskserver f-1 Figure 1: PAFS architecture. Scalability is one of the most important issues in distributed/parallel le systems. In order to achieve the desirable scalability two kinds of servers are implemented in PAFS: cache-servers and disk-servers (Figure 1). Cache-servers are in charge of serving the clients requests. They manage the cache and meta-data information. If the data needed by a cacheserver is not in memory, and has to be fetched from disk, this information is requested from a disk-server. Disk-servers are processes responsible for physically reading and writing the blocks requested by the cache-servers. The system may have as many cache-servers as needed and they may run on any node, even if the node does not have a disk. Besides, there should be one disk-server running on each node with one or more disks. In order to implement a highly scalable system, the inter-server communication has to be as low as possible. For this reason, we propose a load distribution that does not need any communication between cache-servers. Each cache-server is responsible for a set of les. It keeps all the information needed to nd a block from the cache or the disk without the need of any other server. Clients know which servers are in charge of which les computing a hash function using the le name (or le-id). Disk-servers only serve cache-server requests. A disk-server reads blocks from a disk and places them on a given buer in a given node using a memory copy primitive. It can also write the contents of a buer from any node to the disk. When a disk-server receives a write operation it copies the block to a local buer, answers the cache-server and then proceeds to the physical write. Disk-servers do not know how the data is distributed between the disks, they only know how to nd a given block in their local disk (or disks). In our le system architecture there are no dedicated nodes. Cache-servers and disk-servers may share a node with any number of clients. Both servers try to consume as few resources as possible allowing an ecient node sharing. The current version places the blocks of a le among the disks using a round-robin algorithm. Each block is placed in a dierent disk as its predecessor and its successor. Although this distribution has been chosen, many others could also be used. 5.2 Cache The most important part of this le-system design, and the main topic presented in this paper, is the cache. We have designed a cooperative cache that has the advantages of cooperation and avoids the problems derived from the coherence mechanisms. A cooperative cache is a mechanism where all nodes in the parallel machine cooperate in order to obtain an improved global cache. In fact, in this research project we have suppressed the concept of "local cache" in favor of a single and big global one. Each node gives a part of its local memory to the global cache which is managed by the cache-servers as will be explained later. A given node is not allowed to modify the contents of the cache blocks placed on its memory as they belong to the whole system. This global cache is divided among all the cache-servers in partitions (Figure 2). Each cache-server is responsible for caching the data in its les in its partition of the cache. All the blocks that make a cache partition are scattered among all nodes. One server cannot access 4

5 cacheserver Cache Blocks Network cacheserver cacheserver Partition Partition Partition Figure 2: Cache blocks distribution in partitions. nor modify any block which belongs to another server's cache partition. A cache-server works completely isolated from the rest of cache-servers as there is no overlap in their responsibilities. The replacement algorithm used by the cache-servers is based on the the well-known LRU (Least Recently Used). This means that when a client requests a block, this block replaces the least recently used one in the cache partition managed by this server. We should notice that this block may be in any node and not necessarily in the client's. Our cooperative cache algorithm does not care about increasing the number of local hits. This idea diers from most cooperative algorithms implemented so far, as they try to maximize the number of local hits. In this paper we prove that this may not be the wisest thing to do if remote hits can be done eciently. The idea of not encouraging local hits may be good for whole-system performance but it may degrade applications with a small le working set. These applications would probably have a very high local-hit ratio if other cooperative algorithms were used. In order to be fair to these applications we have modied the LRU algorithm a little bit. This new version follows two steps before deciding which block is to be replaced. First, it checks for a block placed in the same node as the client's within the queue-tip. We dene the queue-tip as a percentage of the least recently used blocks from the LRU-queue. If such a block is found, it is replaced. Otherwise, the least recently used block is substituted. A study of the inuence this factor has on the overall system performance is detailed in the performance section (x9.6). From now on, we will call this replacement algorithm Pseudo-Global-LRU (PG-LRU). One of the most expensive and complicated issues in a parallel/distributed le system is to handle the cache coherence problems. These problems appear when replication is allowed and several copies of the same information are running around. This replication is mainly done to increase the local-hit ratio of the global cache. As we consider that a high local-hit ratio is not the key issue to obtain a high performing cache, we avoid any kind of replication. If a block is already found in the global cache, it is sent to the user but no replication on the client's node is performed. If there is no replication, no cache coherence problems may appear and we can get rid of all the coherence mechanisms. This simplies the cache and le-system design very much. In the performance section (x9) we will show that this simplication is not only innocuous to the system performance but it may even increase it. The last important issue consists of the way blocks are distributed among the servers. So far, we have a xed distribution. On boot time all cache blocks are evenly distributed among all servers. This may seem very simplistic but no problems have been detected due to this distribution. If any performance degradation appears, a dynamic redistribution of blocks should be studied. Until then, we propose the simplest algorithm: xed partitions. 5

6 Finally, each cache-server implements a delayed write policy. Every 30 seconds, all modied blocks in the cache are sent to the disk through the disk-servers. This interval between cache ushes has been taken from the one used in Unix. 5.3 Fault Tolerance The above caching mechanism is not fault tolerant as a node failure means that all modied blocks in the failed node's memory are lost without being saved in to the appropriate disk. As this may not be acceptable in some environments (especially NOWs) we propose a couple of mechanisms to achieve the desired fault tolerance. The simplest idea that achieves the objective consists of implementing a write-through policy. Every time a block is modied it is sent to the disk server which writes it to disk. This can be done as long as the disks are not too busy. In order to study the impact this policy has on the overall system performance some gures are presented in the performance section (x9.5). Another possibility, which has not been implemented yet, consists of emulating a RAID [27]. We propose that a set of blocks are used to keep the parity of a line of blocks. Every time a block is modied, the parity block is also modied. If a node with dirty blocks fails, all dirty blocks in its cache can be rebuilt and sent to the disk. As the write-through policy has worked well enough, we have not implemented this policy but it may be a useful one in some environments. 6 Algorithms Comparison In order to compare the performance obtained by PAFS we compare it with xfs. This le system was chosen because it has one of the latest cooperative cache algorithms found in the bibliography (N-Chance Forwarding). We will rst explain the le system and its cache briey and a conceptual comparison with xfs will follow. 6.1 xfs This le system and its cooperative cache were developed as part of the Berkeley NOW project [1, 2, 8]. In this section we explain the basic ideas of the xfs le system and its cooperative cache (N-Chance Forwarding). Although it is a very important issue in their work, we will not describe how the data is stripped on the disks as it is not relevant when comparing xfs to our proposal. Nor will we explain the servers needed for such disk organization File System Architecture There are two main abstractions in the subset of xfs we are interested in: the OS kernel and the manager. The kernel is a regular unix-like operating system. It handles all le system requests from the clients running on its top. This kernel has been slightly modied in order to be also able to serve operations requested from other nodes. Managers are servers which keep track of the le-system data and meta-data. They are also responsible for the consistency of the cache blocks. A le-system operation starts when the client requests a block from its OS kernel. If the requested block is in the kernel's buer-cache, it is handed to the user. So far no dierences compared to a regular unix operation appear. The dierences come into sight when the kernel does not have the block cached. As this block may be cached in a remote node, the kernel contacts a manager to request the block. The manager checks if a copy is already cached in any node. If such copy exists, the request is forwarded to the remote node holding the block. The kernel of this node will send a copy of the requested block to the kernel which asked for it on the rst time. If there are no copies cached, the manager reads the block from disk and sends it to the requesting kernel. A diagram of this process is presented in Figure 3. Each manager is responsible for a set of les. The kernel knows which manager is responsible for which les indexing the Manager-Map using the le-id. The Manager-Map is a table, replicated in all sites, that tells which server is responsible for which les. In order to increase 6

7 Data request The block is not in the global cache The block is in the cache Client Mgr. Kernel Kernel The block is not in the local cache The block is in a remote cache Kernel The block is sent to the requesting node Figure 3: Steps followed to fulll a client request in xfs. the eciency of the system, xfs tries to assign les used by a client to a manager colocated on that machine. This is done using a policy called First Writer. When a client creates a le, xfs chooses a manager colocated in the same machine [1]. In order to port this version to our micro-kernel operating system architecture only one change has been needed. As the micro-kernel does not handle le-system operations, we have placed a le-system server on each node. This server behaves as the unix kernel. The only dierence is that accessing this server means sending a message to a port instead of performing a system call. As this server is always colocated in the same machine as the client, not much overhead should be added. More detailed information on the inuence this modication has, can be found in the performance section (x9.4) N-Chance Forwarding N-Chance Forwarding is the cooperative cache algorithm implemented on xfs. It divides the cache a node has, into two parts. The rst one is used to cache the local data and the second one holds data cached by remote nodes. The size of these two parts is not xed but dynamically adjusted depending on the node I/O activity. N-Chance Forwarding allows each node to cache the blocks its applications request. The dierence with some kind of isolated self-caching algorithms is that it attempts to avoid discarding unreplicated blocks (singlets) from the client memory. When a client discards a block, the server checks to see if that block is the last copy in the whole cache. If the block is a singlet, rather than discarding it, it forwards the data to a random peer. The peer that receives the data adds the block to its LRU list as if it had been recently referenced. This forwarding can only take place N times before the block is referenced again. After N forwardings without being referenced the block is discarded. If a client has a remote hit, the block is replicated from the remote cache to the local one of the requesting client. The parameter N indicates how many times a singlet should be allowed to be forwarded without being referenced before nally being discarded. The value we have chosen for the comparison is N=2, as it was described as the best choice in the N-Chance Forwarding paper [8] Coherence Mechanism xfs utilizes a token-based cache consistency scheme similar to Sprite [25] and AFS [13] except that xfs manages consistency on a per-block rather than on a per-le basis. Before a kernel modies a block, it must acquire write ownership of that block. The client sends a message to the block's manager. The manager then invalidates any other cached copies of the block, updates its cache consistency information to indicate the new owner, and replies to the client, giving permission to write. Once the kernel owns a block, the kernel may write the block repeatedly without having to ask the manager for ownership each time. The client maintains 7

8 write ownership until some other clients reads or writes the data, at which point the manager revokes ownership, forcing the client to stop writing the block, to ush any changes to stable storage, and to forward the data to the new client. 6.2 Comparison In this paper we compare two le systems and especially their caching policies: N-Chance Forwarding (implemented in xfs) and PG-LRU (implemented in PAFS). In this section we describe the main dierences between both systems. We will also list the main issues that have to be taken into account when comparing both algorithms. The most general dierence between both le systems is the platform they are designed to run on. PAFS is designed to work on a parallel machine (or NOW) where each node runs a micro-kernel operating system. All services, including the le system, have to be implemented by user-level servers instead of by the kernel itself. On the other hand, xfs works on a unixlike operating system. The le system code is implemented in the kernel and some operations can be handled without sending or receiving any messages. A second dierence is the way the coherence problem is tackled. While PAFS avoids it by simply avoiding replications, xfs implements a token-based-on-a-per-block-basis cache consistency scheme. On the caching algorithms there is also a very important conceptual dierence. PG-LRU does not try to increase the number of local hits. It tries to speed up remote hits in order to make less signicant the local-hit ratio. N-Chance Forwarding places special interest in achieving a high local-hit ratio in order to increase its performance. Besides these general dierences there are a few issues that should be studied in order to compare both le systems. First the impact the coherence algorithms has on the system performance should de studied. The total time spent serving global hits should also be compared. If local hits are very fast but the remote hits take very long, the total time spent serving block hits may be too high. We should also study the overhead of maintaining a high local-hit ratio. This overhead may be too high if compared to the gains obtained. All these issues and some less important ones have to be taken in account when comparing both algorithms in the performance section. 7 Aggressive Prefetching A very important eect produced by any cooperative caching algorithm is the huge caches that can be obtained. These global caches are so big that most of the blocks kept are several hours old. For example, in our simulation (50 nodes with 16MBytes "local caches"), it took 13 hours to ll the global cache under the Sprite workload [3]. It is well known that prefetching is not always a good idea as it may end up delaying the application if many miss-predictions are made [15, 29]. Nevertheless, if the cache is big enough, these miss-predictions should not aect the overall cache performance as prefetched blocks replace very old data. Extending this idea, the bigger the cache is the more aggressive the prefetching policies can be. In order to take advantage of the huge caches oered by the cooperative mechanism, we present an aggressive prefetching algorithm named Full-File-On-Open. This algorithm starts to prefetch the whole le as soon as the le is open. This has several basic advantages over other common algorithms. As it starts prefetching before any data has been accessed, the rst block may be in cache when requested. This increases the le system performance on small les made up of one or two blocks. If les are open at the beginning of the program execution but the data is not accessed until some time later, many blocks may have already been cached before they are really needed. This algorithm also takes advantage of the time between requests. If this time is very long, as there is no limit on the number of blocks that can be brought to the cache, many blocks can be prefetched. This increases the probability of nding a block in the cache when needed. As each cache-server is responsible for a set of les it is also in charge of prefetching the blocks of those les. Each server only works on one prefetching block at a time, but as there are several servers, a good degree of parallelism can be obtained. 8

9 At rst sight, this algorithm may seem to degrade the le system performance if many huge les are open as they may ll the whole cache. This behavior is studied later in the performance section (x10). 8 Simulator and Trace Files 8.1 Simulator The le system and cache simulator used in this project is part of DIMEMAS 1 [18]. It reproduces the behavior of a distributed memory parallel machine. This software not only simulates the machine and disk access but also dierent short term process scheduling policies. The whole simulator is a trace-driven one where traces contain CPU, communication and I/O demand sequences of every process instead of the absolute time for each event. The communication model implemented in the simulator is very important to be able to understand the results presented later. All communications are divided into two parts: a startup or latency and a data transfer rate. The startup is constant for each type of communication (port or memory copy) and it is assumed to require CPU activity. This startup is dierent whether the communication is within a node or it crosses the interconnection network. The data transfer time is proportional to the size of the data sent and the interconnection network bandwidth. In our model, all communications are synchronous although asynchronous communication can be achieved by creating new threads. 8.2 Simulation Parameters In all runs, we have simulated a 50-node parallel machine where each node assigned 16MBytes of its local memory to the global cache. This cache was divided into 8KByte blocks, the same size as the disk blocks. The whole system had 8 disks where the data was distributed in a round-robin fashion. The disks we have used for our simulations are modeled using two parameters: latency and bandwidth. The latency is the time needed to seek and search a block. We have used a 10.5 milliseconds read latency and a 12.5 milliseconds write latency. The bandwidth is the amount of bytes that can be transfered in a unit of time. We have used a 10MBytes/s bandwidth. These values have been extracted as an average of several real disks. Although the environment used in this work is somehow dierent from the one used in the NOW project [1, 2, 8], we use similar network parameters. This should help the reader to compare both environments. A study of the inuence the network parameters have on the system performance will also be presented in the performance section (x9.2). Unless otherwise specied, nodes are connected through a 155 Mbits/s interconnection network and local copies are done at 320 Mbits/s. We assumed a 100-microsecond remote-port startup and 50-microsecond local-port startup. Memory copies have a 25-microsecond startup if they are within a node and a 50-microsecond one if the copy is between dierent nodes. PAFS has been simulated with 8 cache-servers and 8 disk-servers. On the other hand, xfs has been simulated with 50 servers as we want to have a server on each node in order to minimize the impact of implementing the le system as a server and not in the kernel. All servers are sharing their nodes with other applications. Regarding the particular parameters of each algorithm we have used N=2 for xfs and a queue-tip size of a 10% of the LRU-list of each cache-server. 8.3 Sprite Workload In order to get the results presented in this paper, we have used some parts of the Sprite workload, described in detail by Baker et al. [3]. The Sprite user community included about 30 full-time and 40 part-time users of the system. These traces list the activity of 48 client machines and some servers over a two day period measured in the Sprite operating system. 1 DIMEMAS is performance prediction simulator developed by CEPBA-UPC and is available as a PALLAS GmbH product 9

10 Total Warm Cache Ops. Blocks Ops. Blocks Read Write Open Close Seek Unlink Total Table 1: Number of operations and accessed blocks during the whole simulation and the period where the simulation results were taken. All values in this table are in thousands of operations or thousands of 8KByte blocks. Average Read Time Average Write Time PAFS micros micros. xfs micros micros. Gain 12.6 % 42.4 % Table 2: Average read and write operation times. Although the trace is two days long, all measurements presented in this paper are taken from the 15th hour to the 48th hour. This is done because we used the rst fteen hours to warm the cache. In Table 1, we can see the number of requested operations and accessed blocks during the simulation period. This table is meant to help the reader to understand the real load placed on the le system. 9 Cache Performance In this section we present the performance measurements obtained by PAFS and especially by PG-LRU (the cooperative caching algorithm presented in this paper). As there are many parameters that can inuence the cache performance, we have decided to study them one by one. Each of the following subsections focus on one of these parameters. All measurements presented in this section are compared to the ones obtained by xfs and N-Chance-Forwarding which have also been simulated. 9.1 Performance Comparison In this subsection we compare the average time spent to perform a read and a write operation by both le systems. In Table 2 we can see the average times mentioned above. We observe that both read and write operations are faster when using PAFS than when xfs is used. The average read time is a 12% faster and the write time is also 42% faster. In order to explain this gain in performance we will rst focus on the read operations and a dissertation on write operations will follow. In Figure 4 we can observe the total time spent performing read and write operations. These times have been normalized to the largest one (reads on xfs) in order to make the graph easier to understand. In this graph we can see the time spent working on misses, remote hits and local hits. The time spent performing global hits can be easily obtained by adding the time spent on both local and remote hits. In this gure we also observe that the time spent on global hits by PAFS is signicantly less than the one spent by xfs. As the global-hit ratio is practically the same (85.5%), this dierence is the reason behind some of the gain obtained by our le system. We should focus on the remote hits if we want to understand this dierence. A remote hit takes around 10 times longer in xfs than in PAFS. This means that our le system could have up to 10 times more remote hits than xfs and still spend less time in global hits. As PAFS only has

11 1.0 xfs Normalized Time PAFS Misses Remote Hits Local Hits 0.2 xfs PAFS 0.0 Read Write Figure 4: Time spent by the whole le system serving read and write operations. All times are normalized to the time spent by xfs serving read operations. remote hits for each of theirs, the total time is signicantly less. Their remote hits are so expensive because they have to copy data twice: once from a remote memory to the local memory, and once from the local cache to the user. They also have to forward many blocks, contact the server which will contact the block owner, revoking block ownership and so on. On the other hand, PAFS only copies the data from a remote cache to the user and that is all. Besides, when xfs copies the block from the remote cache, the whole block is copied through the interconnection network while our version only copies the bytes requested by the user. The average size of the blocks requested by the user is 4660 bytes, half the size of the cache block. Another important aspect is that misses on clean take longer in xfs that in PAFS. In their le system, bringing a block to the cache may require forwarding a block to another node. These extra work also decreases the overall read performance. Let's now explain the gain obtained in write operations. The main reason for the performance dierence is the overhead produced by the coherence algorithm. Most writes have to ask for the block ownership and quite a few of them also have to invalidate the copies kept in other nodes. All these operations are expensive and increase the average write time in xfs. Besides, block forwarding and the extra copies needed to bring the block to the local cache before modication, have a signicant impact on the write performance. As write operations are very fast, any overhead has a signicant impact on the average operation time. 9.2 Network Bandwidth Inuence A very important part of this work is to study the inuence the network bandwidth has on the results presented in the above subsection. In order to perform this study, we have run several simulations varying the interconnection network bandwidth. In Figure 5 we can examine this variation. The X axis shows the ratio between the local memory copy bandwidth (L BW) and the interconnection network one (R BW). The interval used starts at 1 where both bandwidths are equal and ends where the remote bandwidth is 10 times slower than the local one. This interval should include most parallel machines and NOW congurations. We observe that the bandwidth ratio aects the read operation in a similar way to both algorithms. This means that the time gained by xfs due to its higher local-hit ratio is lost because of its remote hits. As all communications needed to serve a remote hit go through the interconnection network, the xfs remote hits are highly penalized. Write operations behave in a dierent manner. This dierence resides in the way misses are treated in both algorithms. A miss in PAFS nearly always means a remote copy as the block will probably replace a block in a remote node. In consequence, a slow down in the interconnection network speed means a slow down in the write operation. On the other hand, a miss under xfs is always handled locally. It replaces a local block and no remote copies 11

12 4000 Microseconds PAFS Average Read Time 1000 PAFS Average Write Time xfs Average Read Time xfs Average Write Time L_BW/R_BW Ratio Figure 5: Network bandwidth inuence on the average read and write operation times PAFS Average Read Time PAFS Average Write Time xfs Average Read Time xfs Average Write Time Microseconds "Local-Cache" Size (in MBytes) Figure 6: "Local-Cache" size inuence on the average read and write operation times. have to be done unless a forwarding is needed. These extra remote copies performed by PAFS cannot be outweighed by the extra work due to the coherence algorithms and the dierence between the average write time in both le systems decreases as the network slows down. Summing up, we can see that the results presented in this paper are valid even with slow interconnection networks. Anyway, the faster the interconnection network is, the better the cooperative cache will behave. 9.3 "Local-Cache" Size Inuence Another important aspect that should be studied is the inuence the "local-cache" size has on the le system performance. In order to fulll this study we have simulated "local-cache" sizes from 1MByte up to 16MBytes which is the default conguration presented in this work (Figure 6). We can see, as expected, that decreasing the cache size, increases the read average time. As was shown in Figure 4, 80% of the total time spent on read operations was used to satisfy 15% of the blocks (misses). This means that the le system performance is led by the miss ratio. As can be seen in Table 3, smaller caches have a much lower global-hit ratio. This increase in misses also increases the average read time and it can end up doubling it in the worst case. In Table 3 we can also observe that xfs obtains a slightly better global-hit ratio than PAFS. This is due to the xed partition policy. There are some times when the blocks assigned to a cache-server are not enough to keep its working set and thus it has a lower hit ratio. Anyway, this didn't aect the overall results much. 12

13 1MB 2MB 4MB 8MB 16MB PAFS 67% 73% 78% 81% 85% xfs 68% 74% 78% 81% 85% Table 3: "Local-cache" size inuence on the read global-hit ratio Microseconds PAFS Average Read Time PAFS Average Write Time xfs Average Read Time xfs Average Write Time Local-Port Startup (in Microseconds) Figure 7: Local-Port startup inuence on the average read and write operation times. Write operations are quite insensitive to the cache size. In order to explain this somewhat surprising result we should rst examine the work needed to perform a write hit and a write miss. If the missed block is a new one (from a growing le), or the write operation overwrites the old block completely, the operation does not need to access the disk. This means that the time to serve a miss is very similar to the time needed to complete a hit. This fact explains the little impact that reducing the global-hit ratio has on write operations. In Figure 6 we can also observe that the "local-cache" size aects both algorithms (PAFS and xfs) in a similar way. The reason behind this behavior on read operation is the very similar global-hit ratio. The similarity of behavior on write operations is due to the small impact the "local cache" size has on this kind of operations. 9.4 Local-Port Startup Inuence As was mentioned when describing the xfs le system, the simulated version has one dierence from the original version. As we work on a micro-kernel architecture, the le system cannot reside on the kernel and has to be implemented by a server. This means that requesting data from the le system will need a message to a server instead of a system call. In order to study the impact this modication has on the system performance we have run some simulations modifying the local-port startup (Figure 7). The simulation range goes from an instantaneous startup (minimizing the inuence of sending a message to the server) to the default startup used in this paper (50 microseconds). This study shows that varying the local port startup only aects the xfs algorithm as PAFS does not base its performance on local communications. We can also observe that although some improvement is obtained on xfs with small startups, this improvement is not very signicant. 9.5 Write-Through Overhead All the results presented so far have not had any kind of fault tolerance mechanisms. Only a 30-second syncer was simulated. As this may be unacceptable in some environments, we proposed a write-through policy. In Table 4 we can see the average read and write operation times with write-through and without it. The simulation runs show that, under this workload, a write-through policy does not produce a signicant overhead. This little overhead is due to a not very high disk activity and 13

14 Average Read Time Average Write Time PAFS Delayed-Write micros micros. PAFS Write-Through micros micros. xfs micros micros. Table 4: Average read and write operations time when write-through and delayed-write are used. a good caching policy. We understand that if the workload placed more stress on the disk, a higher overhead could appear due to the write-through policy. 9.6 Queue-Tip Size Inuence The replacement algorithm presented in this paper tries to increase the local-hit ratio without increasing the complexity and overhead of the algorithm in order not to penalize applications with a small le working set. In order to achieve this objective we have dened a section of the LRU-queue (queue-tip) where we will rst try to nd a block in the node which requested it (Section 5.2). In this Section we study the impact the size of this queue-tip has on the overall performance. We have studied several queue-tip sizes. The simulated range goes from 0% up to a 25% of the LRU-queue. Throughout all this range we have seen that there is no real gain after the rst 5%. This means that there is no point in examining more than the least recently used 5% of the queue in order to achieve a higher local-hit ratio. 10 Prefetching Performance As cooperative caches oer such big caches, they should be used for aggressive prefetching. In this paper, one such algorithm (Full-File-On-Open) has been presented. In this section we study the impact this algorithm has on the system performance. The study presented in this section is only done on PAFS but the result should be extendible to any le system with a cooperative caching algorithm Algorithm Performance The rst step in order to study the performance gain obtained by Full-File-On-Open is to compare it with other algorithms. In Figure 8 we present the average read and write operation time when no prefetching is active and with two dierent prefetching algorithms: One-Block- Ahead and Full-File-On-Open. One-Block-Ahead is the typical algorithm implemented in many le systems. Every time a block is read or written, the next sequential block is queued in order to be prefetched. This algorithm tries to take advantage of the usual sequential le access. At the same time it is a very conservative algorithm as only one block is prefetched each time minimizing the impact of prefetch misses. Examining Figure 8 we observe that both prefetching algorithms improve the average read time, but no signicant gain is obtained in the average write time. Let's study both operations separately and explain the inuence the prefetching algorithms have on both of them. Read operations take advantage of any prefetching algorithm as the global-hit ratio is incremented by both of them. We should also notice that prefetch misses do not have any signicant inuence because the global cache is very big and most of the replaced blocks are several hours old. We can also see that Full-File-On-Open gives much better results that One-Block-Ahead. This happens due to several reasons. First, if the les are only one or two blocks long they are completely prefetched before the application starts reading them most of the time. Second, if the time between the open operation and the rst access to the le is a long one, some parts of the le have been prefetched before they are needed. And last, if there is a long interval 14

15 3000 Microseconds NP OBA FFOO 0 Read (4662 Bytes) Write (6912 Bytes) Figure 8: Performance comparison of PAFS with One-Block-Ahead (OBA) and Full-File-On-Open (FFOO) prefetching algorithms, and no prefetch (NP) Average Read Time (NP) Average Write Time (NP) Average Read Time (FFOO) Average Write Time (FFOO) Microseconds "Local-Cache" Size (in MBytes) Figure 9: File size inuence on the Full-File-On-Open (FFOO) prefetching algorithm. between two consecutive operations on a le, the prefetching mechanism may bring many blocks to the cache. Write operations are not improved and even a degradation of their performance may appear if prefetching algorithms are used. The little inuence the global-hit ratio has on write operations is the reason why no better average write-operation times are obtained. The reason behind this lack of inuence was explained in Subsection 9.3. The possible loss in performance appears when a block that would not need to be read from disk is written while the prefetching operation takes place. This write will have to wait until the block is in the cache but it will not be used as it will be completely overwritten File Size and Cache Size Inuence The aggressive prefetching algorithm presented in this paper may degrade the le system performance if the les are too big. In order to study this inuence without modifying the trace les, the size of the global cache has been reduced. In Figure 9, the results of this experiment are shown. We observe that even with small "local caches" the presence of the aggressive prefetching algorithm decreases the average read operation time. This means that no real interferences are found if big les are prefetched. Another important observation is that if an aggressive prefetching is done, smaller "local caches" are needed in order to obtain similar performance. For example, in our simulations, 15

16 8MByte "local-caches" with Full-File-On-Open behave nearly the same as 16MByte ones also with Full-File-On-Open. 11 Conclusions In this paper we have presented PAFS, a le system with a cooperative cache (PG-LRU) that needs no coherence mechanisms. We have done so with no loss in performance compared to other current cooperative caching algorithms. We have also seen that the importance that historically has been given to a high local-hit ratio may not be as important as it was believed. If remote hits are implemented eciently, a high local-hit ratio becomes a secondary issue. Another important result is that in cooperative caches, a global-hit ratio does not benet write operations and it may even degrade them. This can be extended to the prefetching area where no prefetching should be done if a block is to be overwritten. We can also conclude that cooperative mechanisms oer huge global caches that should be used by aggressive prefetching algorithms in order to improve the overall le system performance. A prefetching algorithm (Full-File-On-Open) that falls in such an aggressive category has also been presented. Finally, we have shown that the results presented in this paper are also valid with fast, and not so fast, interconnection networks. Congurations where the network has a bandwidth 10 times less than the local memory bandwidth may eciently run the le system presented in this paper. Acknowledgments We owe special thanks to Michael D. Dahlin for answering all our questions about the way N-Chances Forwarding works. We are grateful to the people at Berkeley who gathered the Sprite traces that helped us to feed our simulator and get the results we present in this paper. We would also like to thank E. Markatos and Pedro de Miguel whose comments improved the contents of this paper. Finally we thank Maite Ortega for her help in the implementation the rst prototype. References [1] T.E. Anderson, M.D. Culler, D.A Patterson et al. "A Case for NOW (Networks of Workstations)," IEEE Micro, February 1995, pp [2] T.E. Anderson, M.D. Dahlin, J.M Neefe, D.A Patterson et al. "Serverless Network File Systems," 15th Symposium on Operating System Principles, December 1995, pp [3] M.G. Baker, J.H. Hartman, M.D. Kupfer et al. "Measurements of a distributed File System," Proc. Of the 13th Symposium on Operating Syste Principles, 1991, pp [4] J. Carretero, F. Perez, P. de Miguel, et al. "ParFiSys: A Parallel File System for MPP," ACM SIGOPS, Vol. 30, No. 2, April 1996, pp [5] P. F. Corbett, S.J. Baylor and, D.G. Feitelson "Overview of the Vesta Parallel File System," ACM SIGARCH, Vol. 21, No. 5, 1993, pp [6] Same authors "PACA: A Cooperative File System Cache for Parallel Machines," Euro- Par'96, Lyon, August 1996 [7] D.E. Culler, A. Drusseau, S. Copen, et al. "Parallel Programming in Split-C," Proceedings of Supercomputing'93, pp [8] M.D. Dahlin, R.Y Wang, T.E. Anderson and, D.A Patterson "Cooperative Caching: Using Remote Client Memory to Improve File System Performance," Operating Systems Design and Implementation, Monterrey, November 1994, pp [9] M.J. Feeley, W.E. Morgan, F.H. Pighin et al. "Implementing Global Memory Management in a Workstation Cluster," 15th Symposium on Operating Systems Principles, December

Storage System. Distributor. Network. Drive. Drive. Storage System. Controller. Controller. Disk. Disk

Storage System. Distributor. Network. Drive. Drive. Storage System. Controller. Controller. Disk. Disk HRaid: a Flexible Storage-system Simulator Toni Cortes Jesus Labarta Universitat Politecnica de Catalunya - Barcelona ftoni, jesusg@ac.upc.es - http://www.ac.upc.es/hpc Abstract Clusters of workstations

More information

Linear Aggressive Prefetching: A Way to Increase the Performance of Cooperative Caches.

Linear Aggressive Prefetching: A Way to Increase the Performance of Cooperative Caches. Linear Aggressive Prefetching: A Way to Increase the Performance of Cooperative Caches. T. Cortes J. Labarta Departament d Arquitectura de Computadors Universitat Politècnica de Catalunya - Barcelona ftoni,jesusg@ac.upc.es

More information

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced?

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced? Chapter 10: Virtual Memory Questions? CSCI [4 6] 730 Operating Systems Virtual Memory!! What is virtual memory and when is it useful?!! What is demand paging?!! When should pages in memory be replaced?!!

More information

Issues in Reliable Network Memory Paging

Issues in Reliable Network Memory Paging Issues in Reliable Network Memory Paging Evangelos P. Markatos Computer Architecture and VLSI Systems Group Institute of Computer Science (ICS) Foundation for Research & Technology Hellas (FORTH) Vassilika

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

Multiprocessor Cache Coherency. What is Cache Coherence?

Multiprocessor Cache Coherency. What is Cache Coherence? Multiprocessor Cache Coherency CS448 1 What is Cache Coherence? Two processors can have two different values for the same memory location 2 1 Terminology Coherence Defines what values can be returned by

More information

Technische Universitat Munchen. Institut fur Informatik. D Munchen.

Technische Universitat Munchen. Institut fur Informatik. D Munchen. Developing Applications for Multicomputer Systems on Workstation Clusters Georg Stellner, Arndt Bode, Stefan Lamberts and Thomas Ludwig? Technische Universitat Munchen Institut fur Informatik Lehrstuhl

More information

Caching for NASD. Department of Computer Science University of Wisconsin-Madison Madison, WI 53706

Caching for NASD. Department of Computer Science University of Wisconsin-Madison Madison, WI 53706 Caching for NASD Chen Zhou Wanli Yang {chenzhou, wanli}@cs.wisc.edu Department of Computer Science University of Wisconsin-Madison Madison, WI 53706 Abstract NASD is a totally new storage system architecture,

More information

Memory Management Virtual Memory

Memory Management Virtual Memory Memory Management Virtual Memory Part of A3 course (by Theo Schouten) Biniam Gebremichael http://www.cs.ru.nl/~biniam/ Office: A6004 April 4 2005 Content Virtual memory Definition Advantage and challenges

More information

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for Comparison of Two Image-Space Subdivision Algorithms for Direct Volume Rendering on Distributed-Memory Multicomputers Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc Dept. of Computer Eng. and

More information

Virtual Memory - Overview. Programmers View. Virtual Physical. Virtual Physical. Program has its own virtual memory space.

Virtual Memory - Overview. Programmers View. Virtual Physical. Virtual Physical. Program has its own virtual memory space. Virtual Memory - Overview Programmers View Process runs in virtual (logical) space may be larger than physical. Paging can implement virtual. Which pages to have in? How much to allow each process? Program

More information

Page 1. Goals for Today" TLB organization" CS162 Operating Systems and Systems Programming Lecture 11. Page Allocation and Replacement"

Page 1. Goals for Today TLB organization CS162 Operating Systems and Systems Programming Lecture 11. Page Allocation and Replacement Goals for Today" CS162 Operating Systems and Systems Programming Lecture 11 Page Allocation and Replacement" Finish discussion on TLBs! Page Replacement Policies! FIFO, LRU! Clock Algorithm!! Working Set/Thrashing!

More information

Current Topics in OS Research. So, what s hot?

Current Topics in OS Research. So, what s hot? Current Topics in OS Research COMP7840 OSDI Current OS Research 0 So, what s hot? Operating systems have been around for a long time in many forms for different types of devices It is normally general

More information

CSE 153 Design of Operating Systems

CSE 153 Design of Operating Systems CSE 153 Design of Operating Systems Winter 2018 Lecture 22: File system optimizations and advanced topics There s more to filesystems J Standard Performance improvement techniques Alternative important

More information

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili Virtual Memory Lecture notes from MKP and S. Yalamanchili Sections 5.4, 5.5, 5.6, 5.8, 5.10 Reading (2) 1 The Memory Hierarchy ALU registers Cache Memory Memory Memory Managed by the compiler Memory Managed

More information

Memory hierarchy. 1. Module structure. 2. Basic cache memory. J. Daniel García Sánchez (coordinator) David Expósito Singh Javier García Blas

Memory hierarchy. 1. Module structure. 2. Basic cache memory. J. Daniel García Sánchez (coordinator) David Expósito Singh Javier García Blas Memory hierarchy J. Daniel García Sánchez (coordinator) David Expósito Singh Javier García Blas Computer Architecture ARCOS Group Computer Science and Engineering Department University Carlos III of Madrid

More information

Virtual Memory. Reading: Silberschatz chapter 10 Reading: Stallings. chapter 8 EEL 358

Virtual Memory. Reading: Silberschatz chapter 10 Reading: Stallings. chapter 8 EEL 358 Virtual Memory Reading: Silberschatz chapter 10 Reading: Stallings chapter 8 1 Outline Introduction Advantages Thrashing Principal of Locality VM based on Paging/Segmentation Combined Paging and Segmentation

More information

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 13 Virtual memory and memory management unit In the last class, we had discussed

More information

Operating Systems. Week 9 Recitation: Exam 2 Preview Review of Exam 2, Spring Paul Krzyzanowski. Rutgers University.

Operating Systems. Week 9 Recitation: Exam 2 Preview Review of Exam 2, Spring Paul Krzyzanowski. Rutgers University. Operating Systems Week 9 Recitation: Exam 2 Preview Review of Exam 2, Spring 2014 Paul Krzyzanowski Rutgers University Spring 2015 March 27, 2015 2015 Paul Krzyzanowski 1 Exam 2 2012 Question 2a One of

More information

Chapter 8 Virtual Memory

Chapter 8 Virtual Memory Operating Systems: Internals and Design Principles Chapter 8 Virtual Memory Seventh Edition William Stallings Operating Systems: Internals and Design Principles You re gonna need a bigger boat. Steven

More information

Virtual Memory. Chapter 8

Virtual Memory. Chapter 8 Chapter 8 Virtual Memory What are common with paging and segmentation are that all memory addresses within a process are logical ones that can be dynamically translated into physical addresses at run time.

More information

Dynamic load balancing of SCSI WRITE and WRITE SAME commands

Dynamic load balancing of SCSI WRITE and WRITE SAME commands Dynamic load balancing of SCSI WRITE and WRITE SAME commands Vladimir Tikhomirov Supervised by Dr. Simo Juvaste Advised by Antti Vikman and Timo Turunen Master's thesis. June 13, 2013 School of Computing

More information

Chapter 11: File System Implementation. Objectives

Chapter 11: File System Implementation. Objectives Chapter 11: File System Implementation Objectives To describe the details of implementing local file systems and directory structures To describe the implementation of remote file systems To discuss block

More information

Process size is independent of the main memory present in the system.

Process size is independent of the main memory present in the system. Hardware control structure Two characteristics are key to paging and segmentation: 1. All memory references are logical addresses within a process which are dynamically converted into physical at run time.

More information

Implementation and Evaluation of Prefetching in the Intel Paragon Parallel File System

Implementation and Evaluation of Prefetching in the Intel Paragon Parallel File System Implementation and Evaluation of Prefetching in the Intel Paragon Parallel File System Meenakshi Arunachalam Alok Choudhary Brad Rullman y ECE and CIS Link Hall Syracuse University Syracuse, NY 344 E-mail:

More information

6. Results. This section describes the performance that was achieved using the RAMA file system.

6. Results. This section describes the performance that was achieved using the RAMA file system. 6. Results This section describes the performance that was achieved using the RAMA file system. The resulting numbers represent actual file data bytes transferred to/from server disks per second, excluding

More information

CS162 Operating Systems and Systems Programming Lecture 11 Page Allocation and Replacement"

CS162 Operating Systems and Systems Programming Lecture 11 Page Allocation and Replacement CS162 Operating Systems and Systems Programming Lecture 11 Page Allocation and Replacement" October 3, 2012 Ion Stoica http://inst.eecs.berkeley.edu/~cs162 Lecture 9 Followup: Inverted Page Table" With

More information

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Donald S. Miller Department of Computer Science and Engineering Arizona State University Tempe, AZ, USA Alan C.

More information

Relative Reduced Hops

Relative Reduced Hops GreedyDual-Size: A Cost-Aware WWW Proxy Caching Algorithm Pei Cao Sandy Irani y 1 Introduction As the World Wide Web has grown in popularity in recent years, the percentage of network trac due to HTTP

More information

I, J A[I][J] / /4 8000/ I, J A(J, I) Chapter 5 Solutions S-3.

I, J A[I][J] / /4 8000/ I, J A(J, I) Chapter 5 Solutions S-3. 5 Solutions Chapter 5 Solutions S-3 5.1 5.1.1 4 5.1.2 I, J 5.1.3 A[I][J] 5.1.4 3596 8 800/4 2 8 8/4 8000/4 5.1.5 I, J 5.1.6 A(J, I) 5.2 5.2.1 Word Address Binary Address Tag Index Hit/Miss 5.2.2 3 0000

More information

Kevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a

Kevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a Asynchronous Checkpointing for PVM Requires Message-Logging Kevin Skadron 18 April 1994 Abstract Distributed computing using networked workstations oers cost-ecient parallel computing, but the higher rate

More information

Virtual Memory. Chapter 8

Virtual Memory. Chapter 8 Virtual Memory 1 Chapter 8 Characteristics of Paging and Segmentation Memory references are dynamically translated into physical addresses at run time E.g., process may be swapped in and out of main memory

More information

CS 31: Intro to Systems Virtual Memory. Kevin Webb Swarthmore College November 15, 2018

CS 31: Intro to Systems Virtual Memory. Kevin Webb Swarthmore College November 15, 2018 CS 31: Intro to Systems Virtual Memory Kevin Webb Swarthmore College November 15, 2018 Reading Quiz Memory Abstraction goal: make every process think it has the same memory layout. MUCH simpler for compiler

More information

VIRTUAL MEMORY READING: CHAPTER 9

VIRTUAL MEMORY READING: CHAPTER 9 VIRTUAL MEMORY READING: CHAPTER 9 9 MEMORY HIERARCHY Core! Processor! Core! Caching! Main! Memory! (DRAM)!! Caching!! Secondary Storage (SSD)!!!! Secondary Storage (Disk)! L cache exclusive to a single

More information

is developed which describe the mean values of various system parameters. These equations have circular dependencies and must be solved iteratively. T

is developed which describe the mean values of various system parameters. These equations have circular dependencies and must be solved iteratively. T A Mean Value Analysis Multiprocessor Model Incorporating Superscalar Processors and Latency Tolerating Techniques 1 David H. Albonesi Israel Koren Department of Electrical and Computer Engineering University

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

TR-CS The rsync algorithm. Andrew Tridgell and Paul Mackerras. June 1996

TR-CS The rsync algorithm. Andrew Tridgell and Paul Mackerras. June 1996 TR-CS-96-05 The rsync algorithm Andrew Tridgell and Paul Mackerras June 1996 Joint Computer Science Technical Report Series Department of Computer Science Faculty of Engineering and Information Technology

More information

Availability of Coding Based Replication Schemes. Gagan Agrawal. University of Maryland. College Park, MD 20742

Availability of Coding Based Replication Schemes. Gagan Agrawal. University of Maryland. College Park, MD 20742 Availability of Coding Based Replication Schemes Gagan Agrawal Department of Computer Science University of Maryland College Park, MD 20742 Abstract Data is often replicated in distributed systems to improve

More information

Lecture notes for CS Chapter 2, part 1 10/23/18

Lecture notes for CS Chapter 2, part 1 10/23/18 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

Chapter 8 Virtual Memory

Chapter 8 Virtual Memory Operating Systems: Internals and Design Principles Chapter 8 Virtual Memory Seventh Edition William Stallings Modified by Rana Forsati for CSE 410 Outline Principle of locality Paging - Effect of page

More information

On Object Orientation as a Paradigm for General Purpose. Distributed Operating Systems

On Object Orientation as a Paradigm for General Purpose. Distributed Operating Systems On Object Orientation as a Paradigm for General Purpose Distributed Operating Systems Vinny Cahill, Sean Baker, Brendan Tangney, Chris Horn and Neville Harris Distributed Systems Group, Dept. of Computer

More information

RAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE

RAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE RAID SEMINAR REPORT 2004 Submitted on: Submitted by: 24/09/2004 Asha.P.M NO: 612 S7 ECE CONTENTS 1. Introduction 1 2. The array and RAID controller concept 2 2.1. Mirroring 3 2.2. Parity 5 2.3. Error correcting

More information

The Memory System. Components of the Memory System. Problems with the Memory System. A Solution

The Memory System. Components of the Memory System. Problems with the Memory System. A Solution Datorarkitektur Fö 2-1 Datorarkitektur Fö 2-2 Components of the Memory System The Memory System 1. Components of the Memory System Main : fast, random access, expensive, located close (but not inside)

More information

RECONFIGURATION OF HIERARCHICAL TUPLE-SPACES: EXPERIMENTS WITH LINDA-POLYLITH. Computer Science Department and Institute. University of Maryland

RECONFIGURATION OF HIERARCHICAL TUPLE-SPACES: EXPERIMENTS WITH LINDA-POLYLITH. Computer Science Department and Institute. University of Maryland RECONFIGURATION OF HIERARCHICAL TUPLE-SPACES: EXPERIMENTS WITH LINDA-POLYLITH Gilberto Matos James Purtilo Computer Science Department and Institute for Advanced Computer Studies University of Maryland

More information

a process may be swapped in and out of main memory such that it occupies different regions

a process may be swapped in and out of main memory such that it occupies different regions Virtual Memory Characteristics of Paging and Segmentation A process may be broken up into pieces (pages or segments) that do not need to be located contiguously in main memory Memory references are dynamically

More information

Optimizing Parallel Access to the BaBar Database System Using CORBA Servers

Optimizing Parallel Access to the BaBar Database System Using CORBA Servers SLAC-PUB-9176 September 2001 Optimizing Parallel Access to the BaBar Database System Using CORBA Servers Jacek Becla 1, Igor Gaponenko 2 1 Stanford Linear Accelerator Center Stanford University, Stanford,

More information

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored

More information

A Cache Hierarchy in a Computer System

A Cache Hierarchy in a Computer System A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the

More information

ECE 341. Lecture # 18

ECE 341. Lecture # 18 ECE 341 Lecture # 18 Instructor: Zeshan Chishti zeshan@ece.pdx.edu December 1, 2014 Portland State University Lecture Topics The Memory System Cache Memories Performance Considerations Hit Ratios and Miss

More information

1. Memory technology & Hierarchy

1. Memory technology & Hierarchy 1 Memory technology & Hierarchy Caching and Virtual Memory Parallel System Architectures Andy D Pimentel Caches and their design cf Henessy & Patterson, Chap 5 Caching - summary Caches are small fast memories

More information

Lecture 2: Memory Systems

Lecture 2: Memory Systems Lecture 2: Memory Systems Basic components Memory hierarchy Cache memory Virtual Memory Zebo Peng, IDA, LiTH Many Different Technologies Zebo Peng, IDA, LiTH 2 Internal and External Memories CPU Date transfer

More information

Memory Design. Cache Memory. Processor operates much faster than the main memory can.

Memory Design. Cache Memory. Processor operates much faster than the main memory can. Memory Design Cache Memory Processor operates much faster than the main memory can. To ameliorate the sitution, a high speed memory called a cache memory placed between the processor and main memory. Barry

More information

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find

More information

MEMORY MANAGEMENT/1 CS 409, FALL 2013

MEMORY MANAGEMENT/1 CS 409, FALL 2013 MEMORY MANAGEMENT Requirements: Relocation (to different memory areas) Protection (run time, usually implemented together with relocation) Sharing (and also protection) Logical organization Physical organization

More information

Server 1 Server 2 CPU. mem I/O. allocate rec n read elem. n*47.0. n*20.0. select. n*1.0. write elem. n*26.5 send. n*

Server 1 Server 2 CPU. mem I/O. allocate rec n read elem. n*47.0. n*20.0. select. n*1.0. write elem. n*26.5 send. n* Information Needs in Performance Analysis of Telecommunication Software a Case Study Vesa Hirvisalo Esko Nuutila Helsinki University of Technology Laboratory of Information Processing Science Otakaari

More information

Application Programmer. Vienna Fortran Out-of-Core Program

Application Programmer. Vienna Fortran Out-of-Core Program Mass Storage Support for a Parallelizing Compilation System b a Peter Brezany a, Thomas A. Mueck b, Erich Schikuta c Institute for Software Technology and Parallel Systems, University of Vienna, Liechtensteinstrasse

More information

Review question: Protection and Security *

Review question: Protection and Security * OpenStax-CNX module: m28010 1 Review question: Protection and Security * Duong Anh Duc This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Review question

More information

Memory Management. Dr. Yingwu Zhu

Memory Management. Dr. Yingwu Zhu Memory Management Dr. Yingwu Zhu Big picture Main memory is a resource A process/thread is being executing, the instructions & data must be in memory Assumption: Main memory is infinite Allocation of memory

More information

Real-Time Scalability of Nested Spin Locks. Hiroaki Takada and Ken Sakamura. Faculty of Science, University of Tokyo

Real-Time Scalability of Nested Spin Locks. Hiroaki Takada and Ken Sakamura. Faculty of Science, University of Tokyo Real-Time Scalability of Nested Spin Locks Hiroaki Takada and Ken Sakamura Department of Information Science, Faculty of Science, University of Tokyo 7-3-1, Hongo, Bunkyo-ku, Tokyo 113, Japan Abstract

More information

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

OS and Hardware Tuning

OS and Hardware Tuning OS and Hardware Tuning Tuning Considerations OS Threads Thread Switching Priorities Virtual Memory DB buffer size File System Disk layout and access Hardware Storage subsystem Configuring the disk array

More information

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

Recall from Tuesday. Our solution to fragmentation is to split up a process s address space into smaller chunks. Physical Memory OS.

Recall from Tuesday. Our solution to fragmentation is to split up a process s address space into smaller chunks. Physical Memory OS. Paging 11/10/16 Recall from Tuesday Our solution to fragmentation is to split up a process s address space into smaller chunks. Physical Memory OS Process 3 Process 3 OS: Place Process 3 Process 1 Process

More information

Chapter 8. Virtual Memory

Chapter 8. Virtual Memory Operating System Chapter 8. Virtual Memory Lynn Choi School of Electrical Engineering Motivated by Memory Hierarchy Principles of Locality Speed vs. size vs. cost tradeoff Locality principle Spatial Locality:

More information

Algorithms Implementing Distributed Shared Memory. Michael Stumm and Songnian Zhou. University of Toronto. Toronto, Canada M5S 1A4

Algorithms Implementing Distributed Shared Memory. Michael Stumm and Songnian Zhou. University of Toronto. Toronto, Canada M5S 1A4 Algorithms Implementing Distributed Shared Memory Michael Stumm and Songnian Zhou University of Toronto Toronto, Canada M5S 1A4 Email: stumm@csri.toronto.edu Abstract A critical issue in the design of

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

Swapping. Operating Systems I. Swapping. Motivation. Paging Implementation. Demand Paging. Active processes use more physical memory than system has

Swapping. Operating Systems I. Swapping. Motivation. Paging Implementation. Demand Paging. Active processes use more physical memory than system has Swapping Active processes use more physical memory than system has Operating Systems I Address Binding can be fixed or relocatable at runtime Swap out P P Virtual Memory OS Backing Store (Swap Space) Main

More information

CSE380 - Operating Systems. Communicating with Devices

CSE380 - Operating Systems. Communicating with Devices CSE380 - Operating Systems Notes for Lecture 15-11/4/04 Matt Blaze (some examples by Insup Lee) Communicating with Devices Modern architectures support convenient communication with devices memory mapped

More information

Page Replacement. (and other virtual memory policies) Kevin Webb Swarthmore College March 27, 2018

Page Replacement. (and other virtual memory policies) Kevin Webb Swarthmore College March 27, 2018 Page Replacement (and other virtual memory policies) Kevin Webb Swarthmore College March 27, 2018 Today s Goals Making virtual memory virtual : incorporating disk backing. Explore page replacement policies

More information

OS and HW Tuning Considerations!

OS and HW Tuning Considerations! Administração e Optimização de Bases de Dados 2012/2013 Hardware and OS Tuning Bruno Martins DEI@Técnico e DMIR@INESC-ID OS and HW Tuning Considerations OS " Threads Thread Switching Priorities " Virtual

More information

Assignment 5. Georgia Koloniari

Assignment 5. Georgia Koloniari Assignment 5 Georgia Koloniari 2. "Peer-to-Peer Computing" 1. What is the definition of a p2p system given by the authors in sec 1? Compare it with at least one of the definitions surveyed in the last

More information

On the scalability of tracing mechanisms 1

On the scalability of tracing mechanisms 1 On the scalability of tracing mechanisms 1 Felix Freitag, Jordi Caubet, Jesus Labarta Departament d Arquitectura de Computadors (DAC) European Center for Parallelism of Barcelona (CEPBA) Universitat Politècnica

More information

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s]

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s] Fast, single-pass K-means algorithms Fredrik Farnstrom Computer Science and Engineering Lund Institute of Technology, Sweden arnstrom@ucsd.edu James Lewis Computer Science and Engineering University of

More information

Chapter 8: Virtual Memory. Operating System Concepts Essentials 2 nd Edition

Chapter 8: Virtual Memory. Operating System Concepts Essentials 2 nd Edition Chapter 8: Virtual Memory Silberschatz, Galvin and Gagne 2013 Chapter 8: Virtual Memory Background Demand Paging Copy-on-Write Page Replacement Allocation of Frames Thrashing Memory-Mapped Files Allocating

More information

Multiprocessors & Thread Level Parallelism

Multiprocessors & Thread Level Parallelism Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction

More information

ECE519 Advanced Operating Systems

ECE519 Advanced Operating Systems IT 540 Operating Systems ECE519 Advanced Operating Systems Prof. Dr. Hasan Hüseyin BALIK (8 th Week) (Advanced) Operating Systems 8. Virtual Memory 8. Outline Hardware and Control Structures Operating

More information

Virtual Memory COMPSCI 386

Virtual Memory COMPSCI 386 Virtual Memory COMPSCI 386 Motivation An instruction to be executed must be in physical memory, but there may not be enough space for all ready processes. Typically the entire program is not needed. Exception

More information

Background. 20: Distributed File Systems. DFS Structure. Naming and Transparency. Naming Structures. Naming Schemes Three Main Approaches

Background. 20: Distributed File Systems. DFS Structure. Naming and Transparency. Naming Structures. Naming Schemes Three Main Approaches Background 20: Distributed File Systems Last Modified: 12/4/2002 9:26:20 PM Distributed file system (DFS) a distributed implementation of the classical time-sharing model of a file system, where multiple

More information

Input/Output Management

Input/Output Management Chapter 11 Input/Output Management This could be the messiest aspect of an operating system. There are just too much stuff involved, it is difficult to develop a uniform and consistent theory to cover

More information

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

CPU issues address (and data for write) Memory returns data (or acknowledgment for write) The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives

More information

Plot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0;

Plot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0; How will execution time grow with SIZE? int array[size]; int A = ; for (int i = ; i < ; i++) { for (int j = ; j < SIZE ; j++) { A += array[j]; } TIME } Plot SIZE Actual Data 45 4 5 5 Series 5 5 4 6 8 Memory

More information

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Distributed Systems Lec 10: Distributed File Systems GFS Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung 1 Distributed File Systems NFS AFS GFS Some themes in these classes: Workload-oriented

More information

! What is virtual memory and when is it useful? ! What is demand paging? ! What pages should be. ! What is the working set model?

! What is virtual memory and when is it useful? ! What is demand paging? ! What pages should be. ! What is the working set model? Virtual Memory Questions? CSCI [4 6] 730 Operating Systems Virtual Memory! What is virtual memory and when is it useful?! What is demand paging?! What pages should be» resident in memory, and» which should

More information

MEMORY: SWAPPING. Shivaram Venkataraman CS 537, Spring 2019

MEMORY: SWAPPING. Shivaram Venkataraman CS 537, Spring 2019 MEMORY: SWAPPING Shivaram Venkataraman CS 537, Spring 2019 ADMINISTRIVIA - Project 2b is out. Due Feb 27 th, 11:59 - Project 1b grades are out Lessons from p2a? 1. Start early! 2. Sketch out a design?

More information

Lecture 21: Reliable, High Performance Storage. CSC 469H1F Fall 2006 Angela Demke Brown

Lecture 21: Reliable, High Performance Storage. CSC 469H1F Fall 2006 Angela Demke Brown Lecture 21: Reliable, High Performance Storage CSC 469H1F Fall 2006 Angela Demke Brown 1 Review We ve looked at fault tolerance via server replication Continue operating with up to f failures Recovery

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to

More information

CS 333 Introduction to Operating Systems. Class 14 Page Replacement. Jonathan Walpole Computer Science Portland State University

CS 333 Introduction to Operating Systems. Class 14 Page Replacement. Jonathan Walpole Computer Science Portland State University CS 333 Introduction to Operating Systems Class 14 Page Replacement Jonathan Walpole Computer Science Portland State University Page replacement Assume a normal page table (e.g., BLITZ) User-program is

More information

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012) Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues

More information

Design of A Memory Latency Tolerant. *Faculty of Eng.,Tokai Univ **Graduate School of Eng.,Tokai Univ. *

Design of A Memory Latency Tolerant. *Faculty of Eng.,Tokai Univ **Graduate School of Eng.,Tokai Univ. * Design of A Memory Latency Tolerant Processor() Naohiko SHIMIZU* Kazuyuki MIYASAKA** Hiroaki HARAMIISHI** *Faculty of Eng.,Tokai Univ **Graduate School of Eng.,Tokai Univ. 1117 Kitakaname Hiratuka-shi

More information

416 Distributed Systems. Distributed File Systems 2 Jan 20, 2016

416 Distributed Systems. Distributed File Systems 2 Jan 20, 2016 416 Distributed Systems Distributed File Systems 2 Jan 20, 2016 1 Outline Why Distributed File Systems? Basic mechanisms for building DFSs Using NFS and AFS as examples NFS: network file system AFS: andrew

More information

CS161 Design and Architecture of Computer Systems. Cache $$$$$

CS161 Design and Architecture of Computer Systems. Cache $$$$$ CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks

More information

A Comparison of Two Distributed Systems: Amoeba & Sprite. By: Fred Douglis, John K. Ousterhout, M. Frans Kaashock, Andrew Tanenbaum Dec.

A Comparison of Two Distributed Systems: Amoeba & Sprite. By: Fred Douglis, John K. Ousterhout, M. Frans Kaashock, Andrew Tanenbaum Dec. A Comparison of Two Distributed Systems: Amoeba & Sprite By: Fred Douglis, John K. Ousterhout, M. Frans Kaashock, Andrew Tanenbaum Dec. 1991 Introduction shift from time-sharing to multiple processors

More information

The Impact of Write Back on Cache Performance

The Impact of Write Back on Cache Performance The Impact of Write Back on Cache Performance Daniel Kroening and Silvia M. Mueller Computer Science Department Universitaet des Saarlandes, 66123 Saarbruecken, Germany email: kroening@handshake.de, smueller@cs.uni-sb.de,

More information

Notes based on prof. Morris's lecture on scheduling (6.824, fall'02).

Notes based on prof. Morris's lecture on scheduling (6.824, fall'02). Scheduling Required reading: Eliminating receive livelock Notes based on prof. Morris's lecture on scheduling (6.824, fall'02). Overview What is scheduling? The OS policies and mechanisms to allocates

More information

Parallel Pipeline STAP System

Parallel Pipeline STAP System I/O Implementation and Evaluation of Parallel Pipelined STAP on High Performance Computers Wei-keng Liao, Alok Choudhary, Donald Weiner, and Pramod Varshney EECS Department, Syracuse University, Syracuse,

More information

Dynamic Multi-Path Communication for Video Trac. Hao-hua Chu, Klara Nahrstedt. Department of Computer Science. University of Illinois

Dynamic Multi-Path Communication for Video Trac. Hao-hua Chu, Klara Nahrstedt. Department of Computer Science. University of Illinois Dynamic Multi-Path Communication for Video Trac Hao-hua Chu, Klara Nahrstedt Department of Computer Science University of Illinois h-chu3@cs.uiuc.edu, klara@cs.uiuc.edu Abstract Video-on-Demand applications

More information

Page 1. Multilevel Memories (Improving performance using a little cash )

Page 1. Multilevel Memories (Improving performance using a little cash ) Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information

Today. Adding Memory Does adding memory always reduce the number of page faults? FIFO: Adding Memory with LRU. Last Class: Demand Paged Virtual Memory

Today. Adding Memory Does adding memory always reduce the number of page faults? FIFO: Adding Memory with LRU. Last Class: Demand Paged Virtual Memory Last Class: Demand Paged Virtual Memory Benefits of demand paging: Virtual address space can be larger than physical address space. Processes can run without being fully loaded into memory. Processes start

More information

Consistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax:

Consistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax: Consistent Logical Checkpointing Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 hone: 409-845-0512 Fax: 409-847-8578 E-mail: vaidya@cs.tamu.edu Technical

More information