Conservative Garbage Collection on Distributed Shared Memory Systems

Size: px
Start display at page:

Download "Conservative Garbage Collection on Distributed Shared Memory Systems"

Transcription

1 Conservative Garbage Collection on Distributed Shared Memory Systems Weimin Yu and Alan Cox Department of Computer Science Rice University Houston, TX fweimin, Abstract In this paper we present the design and implementation of a conservative garbage collection algorithm for distributed shared memory (DSM) applications that use weakly typed languages like C or C++, and evaluate its performance. In the absence of language support to identify references, our algorithm constructed a conservative approximation of the set of cross node references based on local information only. It was also designed to tolerate memory inconsistency on DSM systems that use relaxed consistency protocols. These techniques enabled every node to perform garbage collections without communicating with others, effectively avoiding the high cost of cross node communication in networks of workstations. We measured the performance of our garbage collector against explicit programmer management using three application programs. In two out of the three programs the performance of the GC version is within 15% of the explicit version. The results showed that the garbage collector has two effects on application programs. On one hand, it tends to reduce memory locality, increasing the communication cost; on the other hand, it may eliminate synchronization and memory accesses that would be incurred if memory were managed by the programmer, reducing the communication cost Introduction Over the last decade, both distributed garbage collection and distributed shared memory have become increasingly active areas of research [21, 17]. Despite the activity in these 1 This research was supported in part by the National Science Foundation under NYI Award CCR and by the Texas Advanced Technology Program and Tech-Sym Inc. under Grant areas individually, their intersection has received relatively little attention. Furthermore, none of the published work that we are aware of [18, 14, 11] has measured the performance of an implementation on any application programs. Neither have they addressed the design of a garbage collector for weakly-typed languages such as C and C++. In this paper, we present the design and implementation of a conservative garbage collection algorithm for distributed shared memory systems; and we evaluate its performance on a collection of application programs. Distributed shared memory (DSM) and garbage collection (GC) are motivated by the same desire: to simplify the programmer s task by handling some of the low-level program details automatically in the run-time system. A DSM system handles the communication of data between machines, eliminating the need for the programmer to write message-passing code. Roughly speaking, a DSM system enables processes on different machines to share virtual memory, even though no physical memory is shared by the machines [15]. It is widely accepted that it is easier to program with shared memory than message passing: Instead of sending and receiving messages explicitly, programs can use ordinary loads and stores to access shared data. This enables programmers to concentrate on algorithmic issues rather than on managing partitioned data sets and communicating values. A GC system handles the memory management, eliminating the need for the programmer to write code to track the status of allocated memory, for example, reference counting to determine whether memory can be freed. Conservative GC is a technique that does not require any support from the language implementation, enabling the use of GC with programs written in weakly-typed languages, like C or C++. Several conservative garbage collection algorithms have been implemented in the past few years [9, 4, 3, 8]. Zorn [22] compared the Boehm Weiser algorithm [8] with a few explicit management algorithms used to implement malloc()

2 and concluded that conservative garbage collection is a viable alternative to explicit memory management for many programs. In contrast to shared-memory multiprocessors, interprocessor communication is quite expensive on general-purpose networks of workstations. It is therefore essential to minimize the amount of data movement and especially the number of messages used to implement garbage collection. In contrast to garbage collection algorithms designed for shared-memory multiprocessors, our algorithm avoids oneat-a-time references to non-local (uncached) data that could generate a message exchange per access. Instead, it aggregates these references and piggybacks them onto messages used by the DSM system. To further minimize communication, our algorithm allows the collection of most garbage without global synchronization. This entails knowing when references are communicated to other nodes. With a weakly typed language like C or C++, it isn t obvious when references are communicated. Therefore, our algorithm constructs a conservative approximation of the set of cross node references. In addition, our algorithm is designed to cope with the fact that in high-performance DSM systems updates to shared data are not visible simultaneously at every node. Instead of requiring global synchronization to bring the nodes up to date, our algorithm is designed to tolerate memory inconsistency. Our garbage collector has been implemented on the TreadMarks DSM system [13]. TreadMarks is a highperformance DSM system that runs on standard workstations connected by general-purpose networks. It uses the lazy release consistency algorithm [12] and a multiple-writer protocol [10] to minimize the number of messages and the amount of data communicated, resulting in good performance on a large class of applications [16]. Using our garbage collector, two out of the three application programs used in this study performed within 15% of explicit memory management by the programmer. This paper is organized as follows. Section 2 describes our conservative garbage collection algorithm; and Section 3 describes its implementation. Section 4 presents a performance evaluation based on a small set of application programs. Section 5 examines related work. Finally, Section 6 offers our concluding remarks. 2. Design Most modern garbage collectors work by starting from a root set of memory objects, and following references from these objects to other objects recursively, until all objects reachable from the roots have been found. Inaccessible objects are garbage and can be reclaimed. There are two classes of collectors: copying collectors, which copy accessible objects to another part of the address space and reclaim the entire old region; and mark and sweep collectors, which mark all accessible objects, then scan the heap and reclaim unmarked objects. To tell references from data, most garbage collectors depend on some language support. At a minimum, tags are maintained for each object s type. Conservative garbage collection is a technique that does not require such cooperation and can work with weakly typed languages. It identifies a superset of the true references by treating every word of a memory object as if it contains a reference. In DSM systems, an application s object graph can be large and widely distributed among the nodes. Therefore, it is very expensive to collect all objects at the same time. In our algorithm each node can independently collect its own objects. To keep the amount of communication small, no extra messages are sent. All GC data exchanged between the nodes are piggybacked on messages required by the execution of the application program. To allow a node to collect independently without sending extra messages, we must solve three problems. First, an object owned by one node may be referenced by another node. The collecting node must identify these objects so that they will not be collected while a remote reference persists. In many strongly typed languages this is not a problem because every assignment of references can be detected and examined. In C, however, such detection is impossible. Second, we need to collect the remotely referenced objects if they are no longer used by other nodes. Third, because of race conditions or the delay in updates to shared data reaching every node, a collecting node may miss references it should see. We must solve this problem without synchronization and communication. In the rest of this section, we will present our solutions to these problems Remote reference detection Neither our target languages nor the DSM abstraction can alert us every time a reference created by one node is passed to another node. However, any remote reference must have been communicated in some message. Therefore, if the DSM system makes the contents of the messages available to the garbage collector, the garbage collector will know all objects that may potentially be referenced remotely and avoid reclaiming them. Some of the references in the messages may be passed because of false sharing and not be actually used by the receiving node; some others may not be references at all, just bit patterns that look like references. To be safe, we must assume that everything in a message is a reference, and is used by the receiving node.

3 2.2. Reclaiming exported references If a node finds out that a remote reference is no longer used locally, it is easy to notify its owner about this. This information, which we call a nack, can be piggybacked on a message to the owner node. The problem is for the owner to determine if all other nodes have dropped their references. Assume node N 1 exports obj to nodes N 2 and N 3. When N 2 no longer uses obj, it sends a nack to N 1. Then N 3 passes obj to N 2 and removes its own references to obj. Even though N 1 has received nacks from both nodes, the owner, N 1, must recognize that there is still a valid reference to obj. A technique called weighted reference counting [19, 20, 5] solves this problem. It works as follows: A reference is assigned a predetermined weight when it is first exported by its owner; Whenever a reference is duplicated across a node boundary, the weight of the reference is equally divided between the local reference and the new remote reference, so the sum of the weights remains constant; When a reference is no longer used and is sent back to its owner, its weight is also returned. When the sum of the returned weights equals the original weight assigned at the reference s creation, the owner node is sure no one needs this reference. The weight of a reference may reach one due to repeated export. When such a reference is being exported, its owner should be asked to increase its weight. We call this situation a weight underflow. However, to make our implementation easier, we assign zero to the weights of both copies. This way the reference can only be reclaimed by the infrequent global phase of our collection algorithm, but we avoid communication cost Synchronization and consistency In DSM systems, a race condition may occur if a collecting node traces an object which another node is updating. If the update doesn t reach the collecting node before it starts garbage collection, the local copy of the object may be inconsistent. As a result, the collector may miss references it should see. Figure 1 gives an example. Object O 1 is cached on both N 1 and N 2 and is in an inconsistent state. The only reference to O 2 is assigned to O 1 by N 2, but N 1 has not seen this reference. If N 2 removes the reference to O 1 from its local roots before the reference to O 2 is sent to N 1, O 2 may be mistakenly collected by N 2. This problem can be solved without communication. There are two situations to consider: O 1 is owned by N 2. Since N 1 also has a reference to O 1, N 2 knows that O 1 is remotely referenced. Since remotely referenced objects are treated like local roots, O 2. will not be collected in this case. O 1 is not owned by N 2. For N 2 to be able to access O 1, it must have imported O 1 from its owner. If N 2 remembers all imported objects and traces them in the collections, O 2 will not be reclaimed either. Therefore a collecting node is not required to update the object it is tracing. Care must be taken when this is combined with the reclamation of exported references discussed in Section 2.2. Take the scenario in Figure 1 for example, N 1 will not see the reference to O 2 until the application program requires it to update its copy of O 1. If N 2 deletes its reference to O 1, it will remove O 1 from its imported object list in the next collection, and O 2 will be reclaimed during the collection after that. Our solution is to remember objects like O 1 in a depart table. If an imported object is no longer used, and the changes made to it have not been seen by its owner node, the object is put in the depart table. Objects in the depart table are treated like local roots. When changes made to an object in the depart table are retrieved by its owner node, the object is removed from the table and a nack is sent to its owner. To summarize, our algorithm allows each node to independently perform local garbage collections, and no messages are needed beyond those required by the execution of the application program Limitations The garbage collection algorithm discussed above, which we will call the local collection algorithm, cannot reclaim circular structures, nor can it reclaim the objects lost in the case of weight underflow. To make up for this limitation, we implemented a global collection algorithm in which every node in the system suspends its computation and takes part in the garbage collection. Global collection is only used as a last resort, when a node cannot collect enough memory via local collection and the export table grows over a predefined threshold. 3 Implementation Our implementation was based on a sequential conservative garbage collector by Boehm and Weiser. The shared memory heap provided by TreadMarks is divided into several pools. Each node is responsible for one pool. Memory requests are satisfied from the local pool and an object is owned by its allocator. There is also a global (free) pool from which processes can allocate more

4 Node 1 Node 2 Node 1 Node 2 O2 O1 O1 t1 t2 Local roots Local roots Export table Export table Figure 1. Tracing inconsistent objects. Figure 2. Cycles of references. memory when necessary. In the text that follows, we refer to shared objects allocated from a node s pool as local objects to that node, and other shared objects as remote objects to that node. Whether an object is local or remote is decided by its ownership. A remote object may be cached in the local memory of a node and thus accessed without ongoing communication cost Data structures Each node has an object header table, an import table, an export table, a depart table, and one nack buffer for every other node. An object header is maintained for every object allocated by a node. Potential references are checked against this structure to see if they are valid. Information about an object, such as its size, can also be found in this structure. The import table is the set of remote objects that are referenced locally. The export table is the set of local objects which are referenced by other nodes. The depart table holds the imported objects that are no longer referenced locally but the changes made to them locally have not been propagated to their owner nodes. All three tables are implemented as hash tables, in which each entry has two fields, the object reference and its weight. The nack buffers hold the imported references that will be sent back to their owners The local collection algorithm Message handling. Every message that contains user data is scanned before it is sent. If a reference to a local object is found, and if it is not already in the export table, that reference is inserted into the export table with a default weight; otherwise its weight is incremented by the default amount. If an imported reference is found, its weight in the import table is halved and the two copies are both assigned the new weight. After the message is scanned, the references and their weights are appended to the message s end, and sent out. When a node receives a message, it checks the appended sequence of reference/weight pairs. The references are inserted into the import table with their weights, or have their weights incremented if they are already there. After a local collection, a node may find that some imported references are no longer used locally. At this time these references are removed from the import table. A removed reference with its weight is put into the depart table if the changes made to the referenced object by the local node have not been retrieved by its owner node; otherwise the referenced object is put into the nack buffer for its owner node. The references in the depart table are also put into the nack buffers when changes to the referenced objects have been sent to their owner nodes. When a message is sent to a remote node, the contents of the corresponding nack buffer are appended to the end of the message. The receiver checks the nack against its export table. For each exported reference that also appears in the nack, its weight is decremented by the amount shown in the nack. An exported reference is removed from the export table if its weight reaches zero. The collection. A collection works as follows: 1. Objects reachable from the local roots are recursively marked and traced. Local roots include registers, stack cells, and global variables. 2. Objects reachable from any references in the export table or the depart table are recursively marked and traced. 3. After the first two steps are done, we start from the import table and look for imported references that are not marked. These references are not used locally and need not be marked, but the local references they contain must be recursively marked and traced. The reason for this has been explained in Section The collector sweeps through the local pool and reclaims all local objects that are not marked. 5. The imported references that are not marked are either put in the depart table or in a nack buffer. They will be handled as described under Message handling.

5 3.3. The global collection algorithm A global collection is only invoked as the last resort when a node cannot collect enough memory by local collections and the export table grows over a threshold. Every other node in the system is interrupted to participate. A global collection consists of several marking phases followed by one sweeping phase. At the beginning, each node starts marking from its local roots, including registers, stack, and global variables. The import and export tables are not included in the local roots for global collection. A node only traces local references. Remote references are not traced, but they are recorded. At the end of a marking phase, the nodes synchronize and exchange the remote references they have recorded. In the next phase they start tracing again from the references they just received. This continues until no unmarked remote references are found on any node. Then each node sweeps through its own pools and reclaims the garbage. At the end of a global collection, the import and export tables are reconstituted; the nack buffers are cleared; and the depart table is not affected. To reconstitute the import and export tables, a node must remember all remote references it has sent and all local references it has received. A remote reference found during a marking phase is put in the export table with the default weight. Each local reference received from another node adds to that object s entry in the export table. An object s final weight is the default weight multiplied by the number of nodes that have the reference. 4. Evaluation We have measured the performance of our garbage collector with three applications: Othello, MIP and Pcfrac. With each application, a version using the garbage collector (the GC version) was compared to a version using explicit memory management (the Exp version). All measurements were taken on a cluster of eight SPARCstation-20 Model 61 workstations connected by a 10Mbit/second Ethernet. This section starts with a brief description of the test programs, then presents the results, and concludes with a summary and a discussion of potential improvements The applications Othello is a parallel program that performs game tree search to play a game called Othello. At the beginning, the master process takes the root task, creates a number of derived tasks, and puts them in a shared queue. Then each process repeatedly takes a task from the queue and performs search on the subtree rooted at that task. The computed score of one subtree can be used as the cutoff value in subsequent computations. This program allocates a lot of objects in shared memory, but in the end most of the objects are accessed by only one process. MIP solves the Mixed Integer Programming problem [6], a form of linear programming in which many of the variables are restricted to have integer values. It uses branch and bound to find the optimal solution to the problem. Nodes in the search space are kept in a doubly linked queue. Each process takes a node from this queue, performs some computation, perhaps generating new nodes, and puts these new nodes back into the queue. For each node, the computation involves relaxing the integer restrictions on the variables and solving the corresponding linear program to determine whether a better solution than the current best solution is possible below that node. This is repeated until the solution is found. This application allocates relatively fewer objects, but most of them are shared. Pcfrac is a naive parallelization of a large number factoring program called cfrac [23]. The main data structures in Pcfrac include a task array and a result array. It works as follows: 1. The master process generates some tasks and puts them in the task array while other processes wait. 2. Each process takes an equal share of the tasks and performs the computation. Interesting results are put in the result array. 3. After everyone is done, one of the slave processes collects the results in the result array and does more computation, then goes back to step 1; other processes go directly back to step 1. This procedure is repeated until the problem is solved. The Exp version of Pcfrac uses a complicated reference counting scheme for memory management Running time The normalized running times of the applications are shown in Table 1, with the running time of the sequential Exp version as 1:00. Table 2 presents the average time each node spends on garbage collection. The GC versions of Othello and MIP performed well. In Othello the small difference between the GC version and the Exp version can be explained by the garbage collection cost. MIP does not allocate as much memory as the other programs so there is not much difference between its two versions. This showed that our collector does not do much harm when it is lightly used. In Pcfrac, the difference between the two versions is significant, and it cannot be explained by the garbage collection cost alone. The problem here is the poor spatial locality caused by the garbage collector. For each task generated by the master, and each result computed by the slaves, quite a

6 Procs Othello MIP Pcfrac Exp GC Exp GC Exp GC Table 1. Normalized running time Procs Othello MIP Pcfrac GC Ratio GC Ratio GC Ratio % % % % % % % % % % % % % % % Table 2. GC time per node (in sec.) and its ratio to running time few objects were allocated as temporary variables which became immediately useless. In the Exp version, these objects are immediately reclaimed and used, so the tasks and results are packed together tightly. In the GC version these temporary objects are not reused until after the next collection. So the tasks and results are mingled with garbage. When a node retrieves its tasks or collects the results, it accesses more pages than the Exp version does. For example, with 8 nodes, it takes 93 seconds to collect the results, and 22 Megabytes of data are transferred for this purpose in the GC version. In the Exp version, the numbers are 8.7 seconds and 4.4 megabytes, respectively. The time spent on garbage collection in all of the applications is small. It ranges from almost zero in MIP to about 10% in Othello at 8 nodes. The GC cost follows the same pattern in all three applications. There is a jump in the cost when the program goes parallel, then the average time spent by each node holds steady. The collection cost is higher in the parallel executions because many objects are checked twice: the objects listed in the export table or those contained by imported objects are often reachable from the local roots as well. In the sequential execution, the export table is empty and there are no imported objects. This helps to explain the increases in Pcfrac and Othello when the number of nodes increased from one to two. The handling of imported objects in the current implementation is not very efficient either. In some cases we are not sure about the exact size of an imported object, so the whole page it was in got scanned. This makes up the major collection cost in MIP. This problem can be solved by using a more sophisticated protocol to keep track of the sizes of imported objects. Another cost comes from the message handling by the garbage collector. Our collector checks and modifies all the messages that contain data. We did not present the cost here because the actual cost measured in these applications was negligible. It was almost zero in Othello and less than one second in the other two Communication costs Table 3 shows the number of messages sent between the nodes during the execution of each program. Garbage collection can affect the amount of communication in different ways depending on the memory usage pattern. It may increase the amount of communication because it causes poor spatial locality. On the other hand, it may decrease the communication cost by eliminating shared accesses associated with free space management, for example, updates to a reference count. In Othello, the message count in the GC version is greater than the count in the Exp version. This is because most objects are privately held and there is little false sharing. The Exp version does not incur much cost from accesses to shared objects or from free list management. Therefore there is not much that the GC version can save to offset the locality cost. MIP is just the opposite. Poor locality is not a big problem due to the small number of objects here. However, there is a lot of sharing among the nodes so that any reduction in the number of writes to shared objects is beneficial. The result is a large reduction in the number of messages. In Pcfrac both of these factors exist. The turning point is

7 Procs Othello MIP Pcfrac Exp GC GC/Exp Exp GC GC/Exp Exp GC GC/Exp K 131K K 15.6K K 280K K 27.4K K 387K K 35.3K K 489K K 44.5K 1.39 Table 3. Message Count at 4 nodes. Below that the reduction in the number of writes outweighs the effect of poor locality of reference and we see a reduction in the number of messages in the GC version; beyond that, we see the opposite. The reason is that the more nodes that are sharing a page, the more messages are involved to maintain consistency on that page. With a large number of nodes more pages are used and more messages are needed for each page. At some point this cost cannot be offset by the savings. In our algorithm, when a reference is exported in a message, its value and weight are appended to the end of the message. When it is no longer needed on a node, an acknowledgement is appended to some message sent to its owner. This increases the amount of data transmitted. Table 4 shows the amount of appended references against the total amount of data sent in the GC versions. The size of appended data is small compared with the size of total data moved in the system. It ranges from 5% in Pcfrac to around 20% in MIP Memory usage Table 5 presents the amount of memory used in each application. The numbers were obtained by adding up the memory allocated by every node. The table shows that for all three applications, our conservative garbage collector requires more memory than programmer management. The increase in memory demand ranges from 80% in Othello to 300% in MIP. In Othello, the GC version used 80% more space than the Exp version regardless of the number of nodes. This was because we limited the frequency of garbage collections to improve CPU performance, so the heap was often expanded even though there was garbage that could be reclaimed. When we doubled the garbage collection frequency, the space requirement of the GC version dropped to the same as that of the Exp version, while the garbage collection time increased by 100% to 150% from the numbers presented in Table 2. The effect on the overall running time, however, is small, because garbage collections only accounted for a small percentage of the running time. In Pcfrac, the GC version required three times as much memory as the Exp version. This ratio did not drop when we increased the collection frequency. The main reason is that the program did not overwrite obsolete references fast enough. For example, the tasks generated by the master node will not be overwritten until the beginning of the next round of computation, although each task is useless after the computation on it is finished. In MIP, the GC version required 50% more memory than the Exp version when running sequentially. That ratio increased to more than 300% when there were eight nodes. The main reason why the ratio increased in parallel executions was that there were circular references spanning several nodes, caused by the doubly linked task queue in MIP. Figure 2 illustrates a scenario where a cycle is formed. When two adjacent elements t 1 and t 2 in the queue are managed by different nodes, each of the elements will hold the address of the other. This puts them into their owner s export table, and local collection will not be able to reclaim them. For example, when N 1 starts a local collection, t 1 will be found and traced since it is in N 1 s export table. The reference to t 2 will be found, so N 1 will keep t 2 in its import table, and N 2 will not be able to remove t 2 from its export table. The nodes must cooperate to reclaim the cycles Summary The garbage collector can have two kinds of effect on the performance of the application programs. The poor spatial locality increases the number of messages, negatively affecting the performance; on the other hand, the elimination of memory accesses for free space management can decrease the number of messages. The net effect depends on the memory usage pattern of the application program. The garbage collector may also increase the space requirements of the applications. The reasons are: To improve CPU performance, the collector may expand the heap rather than collect the garbage. Obsolete references are not overwritten fast enough by the program, and Circular structures may be formed, making them un reclaimable by local collections.

8 Procs Othello MIP Pcfrac Append Total Ratio Append Total Ratio Append Total Ratio 2 3.8K 52K 7.3% 1.6M 7.5M 21.3% 1.3M 26.2M 5.0% 4 15K 134K 11.2% 4.7M 22.2M 21.2% 2.1M 54.6M 3.8% 6 24K 191K 12.6% 6.8M 35.3M 19.3% 2.9M 60.4M 4.8% 8 37K 334K 11.1% 9.9M 49.3M 20.1% 3.5M 71.8M 4.9% Table 4. Appended Data (bytes) Othello MIP Pcfrac Procs Exp GC GC/Exp Exp GC GC/Exp Exp GC GC/Exp 1 69K 128K K 721K M 3.67M K 256K K 1638K M 4.22M K 512K K 1835K M 4.98M K 768K K 2028K M 5.77M K 1024K K 2097K M 5.57M 2.83 Table 5. Memory Usage (bytes) 4.6. Future work The problem of poor spatial locality is inherent in the use of mark and sweep garbage collectors. To solve this problem, the garbage collector must be able to move objects. However, to move objects around, the garbage collector must be able to distinguish references from data. This means that the free conversion between reference and non reference types, which is allowed in languages like C, must be forbidden. We will explore if some simple and reasonable restrictions exist that can provide enough information to the garbage collector while not excessively restricting the freedom of the programmers. 5. Related work Concurrent garbage collection for shared-memory multiprocessors [2, 7] and distributed systems [1] has been an active area of research. We are, however, aware of only three attempts to design garbage collectors for DSM systems. None of these report on the cost of garbage collection. Le Sergent [18] described an extension of a copying collector originally designed for a multiprocessor to a DSM system. Their design entails collecting the entire address space across all nodes at the same time. The garbage collector also locks pages while scanning. It cannot be used with weakly typed languages like C. Kordale s GC design [14] for DSM is based on the mark and sweep technique. The design is very complex and relies on a large amount of auxiliary information. Ferreira and Shapiro [11] discussed a copying garbage collector for weakly consistent DSM systems. They were the first to point out that garbage collectors can be designed to tolerate memory inconsistency. Their algorithm allows the nodes to collect independently, but extra messages may be needed during the creation of cross node references and for reclaiming objects with multiple copies. It does not work with weakly typed languages either. 6. Conclusion In this paper we presented the design and implementation of a conservative garbage collection algorithm for DSM systems and evaluated its performance. Our algorithm allows each node to perform garbage collection without communication with the other nodes. It is robust against race conditions (due to concurrent accesses by many nodes to the same object) and memory inconsistency (due to the relaxed consistency protocols). The two sources of overhead are that each node must check every message that contains data, so that an approximation of the set of cross node references can be built; and that GC data is appended to some messages. Our measurements show that neither of these overheads significantly affects application performance. The most detrimental effect of the garbage collector is that it tends to reduce spatial locality. This effect is not always an overwhelming problem. For example, the performance of Othello and MIP using GC is within 15% of the explicit programmer management. Programs most susceptible to this effect are those like Pcfrac, which use many shared objects that are created after some allocations of short lived intermediate variables. Poor spatial locality is inherent in any mark and sweep collector. To more efficiently handle programs like Pcfrac, we must look to copying collectors to improve the spatial locality. This in turn requires us to restrict the ways the pro-

9 grammer can manipulate references. We want to develop reasonable restrictions that allows the programmer maximum freedom while enabling the garbage collector to move data. Our garbage collection algorithm was implemented on the TreadMarks DSM system, but it is not limited to Tread- Marks. As long as the DSM management makes the contents of the messages available to the garbage collector, our algorithm will work. References [1] A. Abdullahi, E. Miranda, and G. Ringwood. Collection schemes for distributed garbage. In International Workshop on Memory Management, September [2] A. Appel, J. Ellis, and K. Li. Real-time concurrent collection on stock multiprocessors. In Proceedings of the SIGPLAN 88 Conference on Programming Language Design and Implementation, pages 11 20, June [3] J. Barlett. Compacting garbage collection with ambiguous roots. Technical Report 88/2, DEC Western Research Lab, [4] J. Barlett. Mostly-copying garbage collection picks up generations and C++. Technical Report TN-12, DEC Western Research Lab, [5] D. I. Bevan. Distributed garbage collection using reference counting. In Parallel Arch. and Lang. Europe, pages , Eindhoven, The Netherlands, June Spring-Verlag Lecture Notes in Computer Science 259. [6] R. Bixby, W. Cook, A. Cox, and E. Lee. Parallel mixed integer programming. Submitted for publication, [7] H. Boehm, A. Demeres, and S. Shenker. Mostly parallel garbage collection. In Proceedings of the SIGPLAN 91 Conference on Programming Language Design and Implementation, pages , June [8] H. Boehm and M. Weiser. Garbage collection in an uncooperative environment. Software: Practice and Experience, 18(9): , September [9] M. Caplinger. A memory allocator with garbage collection for C. In Proceedings of the 1988 Winter Usenix Conference, pages , February [10] J. Carter, J. Bennett, and W. Zwaenepoel. Implementation and performance of Munin. In Proceedings of the 13th ACM Symposium on Operating Systems Principles, pages , Oct [11] P. Ferreira and M. Shapiro. Garbage collection and DSM consistency. In The 1st International Conference on Operating Systems Design and Implementation, [12] P. Keleher, A. L. Cox, and W. Zwaenepoel. Lazy release consistency for software distributed shared memory. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 13 21, May [13] P. Keleher, S. Dwarkadas, A. Cox, and W. Zwaenepoel. Treadmarks: Distributed shared memory on standard workstations and operating systems. In Proceedings of the 1994 Winter Usenix Conference, pages , Jan [14] R. Kordale, M. Ahamad, and J. Shilling. Distributed/concurrent garbage collection in distributed shared memory systems. In Proceedings of the International Workshop on Object Orientation and Operating Systems, December [15] K. Li and P. Hudak. Memory coherence in shared virtual memory systems. ACM Transactions on Computer Systems, 7(4): , Nov [16] H. Lu, S. Dwarkadas, A. Cox, and W. Zwaenepoel. Message passing vs distributed shared memory on networks of workstations. To appear in Supercomputing 95. [17] B. Nitzberg and V. Lo. Distributed shared memory: A survey of issues and algorithms. IEEE Computer, 24(8):52 60, Aug [18] T. L. Sergent and B. Berthomieu. Incremental multi-threaded garbage collection on virtually shared memory architectures. In International Workshop on Memory Management, September [19] R. Thomas. A dataflow computer with improved asymptotic performance. Technical Report TR-265, MIT Laboratory for Computer Science, [20] P. Watson and I. Watson. An efficient garbage collection scheme for parallel computer architectures. In PARLE 87 Parallel Architectures and Languages Europe,number 259 in Lecture Notes in Computer Science, Eindhoven (the Netherlands), June Springer-Verlag. [21] P. R. Wilson. Uniprocessorgarbage collection techniques. In International Workshop on Memory Management, September [22] B. Zorn. The measured cost of conservative garbage collection. Software: Practice and Experience, 23(7): , July [23] B. Zorn and D. Grunwald. Empirical measurements of six allocation-intensive c programs. SIGPLAN NOTICES, 27(12):71 80, Dec 1992.

Lecture Notes on Garbage Collection

Lecture Notes on Garbage Collection Lecture Notes on Garbage Collection 15-411: Compiler Design Frank Pfenning Lecture 21 November 4, 2014 These brief notes only contain a short overview, a few pointers to the literature with detailed descriptions,

More information

Heap Management. Heap Allocation

Heap Management. Heap Allocation Heap Management Heap Allocation A very flexible storage allocation mechanism is heap allocation. Any number of data objects can be allocated and freed in a memory pool, called a heap. Heap allocation is

More information

Deallocation Mechanisms. User-controlled Deallocation. Automatic Garbage Collection

Deallocation Mechanisms. User-controlled Deallocation. Automatic Garbage Collection Deallocation Mechanisms User-controlled Deallocation Allocating heap space is fairly easy. But how do we deallocate heap memory no longer in use? Sometimes we may never need to deallocate! If heaps objects

More information

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11 CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11 CS 536 Spring 2015 1 Handling Overloaded Declarations Two approaches are popular: 1. Create a single symbol table

More information

Robust Memory Management Schemes

Robust Memory Management Schemes Robust Memory Management Schemes Prepared by : Fadi Sbahi & Ali Bsoul Supervised By: Dr. Lo ai Tawalbeh Jordan University of Science and Technology Robust Memory Management Schemes Introduction. Memory

More information

Lecture Notes on Advanced Garbage Collection

Lecture Notes on Advanced Garbage Collection Lecture Notes on Advanced Garbage Collection 15-411: Compiler Design André Platzer Lecture 21 November 4, 2010 1 Introduction More information on garbage collection can be found in [App98, Ch 13.5-13.7]

More information

An Integrated Synchronization and Consistency Protocol for the Implementation of a High-Level Parallel Programming Language

An Integrated Synchronization and Consistency Protocol for the Implementation of a High-Level Parallel Programming Language An Integrated Synchronization and Consistency Protocol for the Implementation of a High-Level Parallel Programming Language Martin C. Rinard (martin@cs.ucsb.edu) Department of Computer Science University

More information

Run-Time Environments/Garbage Collection

Run-Time Environments/Garbage Collection Run-Time Environments/Garbage Collection Department of Computer Science, Faculty of ICT January 5, 2014 Introduction Compilers need to be aware of the run-time environment in which their compiled programs

More information

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Donald S. Miller Department of Computer Science and Engineering Arizona State University Tempe, AZ, USA Alan C.

More information

Algorithms for Dynamic Memory Management (236780) Lecture 4. Lecturer: Erez Petrank

Algorithms for Dynamic Memory Management (236780) Lecture 4. Lecturer: Erez Petrank Algorithms for Dynamic Memory Management (236780) Lecture 4 Lecturer: Erez Petrank!1 March 24, 2014 Topics last week The Copying Garbage Collector algorithm: Basics Cheney s collector Additional issues:

More information

Performance of Non-Moving Garbage Collectors. Hans-J. Boehm HP Labs

Performance of Non-Moving Garbage Collectors. Hans-J. Boehm HP Labs Performance of Non-Moving Garbage Collectors Hans-J. Boehm HP Labs Why Use (Tracing) Garbage Collection to Reclaim Program Memory? Increasingly common Java, C#, Scheme, Python, ML,... gcc, w3m, emacs,

More information

Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java)

Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java) COMP 412 FALL 2017 Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java) Copyright 2017, Keith D. Cooper & Zoran Budimlić, all rights reserved. Students enrolled in Comp 412 at Rice

More information

Consistency Issues in Distributed Shared Memory Systems

Consistency Issues in Distributed Shared Memory Systems Consistency Issues in Distributed Shared Memory Systems CSE 6306 Advance Operating System Spring 2002 Chingwen Chai University of Texas at Arlington cxc9696@omega.uta.edu Abstract In the field of parallel

More information

Garbage Collection. Akim D le, Etienne Renault, Roland Levillain. May 15, CCMP2 Garbage Collection May 15, / 35

Garbage Collection. Akim D le, Etienne Renault, Roland Levillain. May 15, CCMP2 Garbage Collection May 15, / 35 Garbage Collection Akim Demaille, Etienne Renault, Roland Levillain May 15, 2017 CCMP2 Garbage Collection May 15, 2017 1 / 35 Table of contents 1 Motivations and Definitions 2 Reference Counting Garbage

More information

CS 345. Garbage Collection. Vitaly Shmatikov. slide 1

CS 345. Garbage Collection. Vitaly Shmatikov. slide 1 CS 345 Garbage Collection Vitaly Shmatikov slide 1 Major Areas of Memory Static area Fixed size, fixed content, allocated at compile time Run-time stack Variable size, variable content (activation records)

More information

Implementation Garbage Collection

Implementation Garbage Collection CITS 3242 Programming Paradigms Part IV: Advanced Topics Topic 19: Implementation Garbage Collection Most languages in the functional, logic, and object-oriented paradigms include some form of automatic

More information

CS 241 Honors Memory

CS 241 Honors Memory CS 241 Honors Memory Ben Kurtovic Atul Sandur Bhuvan Venkatesh Brian Zhou Kevin Hong University of Illinois Urbana Champaign February 20, 2018 CS 241 Course Staff (UIUC) Memory February 20, 2018 1 / 35

More information

One-Slide Summary. Lecture Outine. Automatic Memory Management #1. Why Automatic Memory Management? Garbage Collection.

One-Slide Summary. Lecture Outine. Automatic Memory Management #1. Why Automatic Memory Management? Garbage Collection. Automatic Memory Management #1 One-Slide Summary An automatic memory management system deallocates objects when they are no longer used and reclaims their storage space. We must be conservative and only

More information

Runtime. The optimized program is ready to run What sorts of facilities are available at runtime

Runtime. The optimized program is ready to run What sorts of facilities are available at runtime Runtime The optimized program is ready to run What sorts of facilities are available at runtime Compiler Passes Analysis of input program (front-end) character stream Lexical Analysis token stream Syntactic

More information

Garbage Collection (1)

Garbage Collection (1) Garbage Collection (1) Advanced Operating Systems Lecture 7 This work is licensed under the Creative Commons Attribution-NoDerivatives 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nd/4.0/

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures Lecture #6: Memory Management CS 61C L06 Memory Management (1) 2006-07-05 Andy Carle Memory Management (1/2) Variable declaration allocates

More information

Programming Language Implementation

Programming Language Implementation A Practical Introduction to Programming Language Implementation 2014: Week 10 Garbage Collection College of Information Science and Engineering Ritsumeikan University 1 review of last week s topics dynamic

More information

Garbage Collection. Steven R. Bagley

Garbage Collection. Steven R. Bagley Garbage Collection Steven R. Bagley Reference Counting Counts number of pointers to an Object deleted when the count hits zero Eager deleted as soon as it is finished with Problem: Circular references

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Memory Management and Garbage Collection CMSC 330 - Spring 2013 1 Memory Attributes! Memory to store data in programming languages has the following lifecycle

More information

Selection-based Weak Sequential Consistency Models for. for Distributed Shared Memory.

Selection-based Weak Sequential Consistency Models for. for Distributed Shared Memory. Selection-based Weak Sequential Consistency Models for Distributed Shared Memory Z. Huang, C. Sun, and M. Purvis Departments of Computer & Information Science University of Otago, Dunedin, New Zealand

More information

CS577 Modern Language Processors. Spring 2018 Lecture Garbage Collection

CS577 Modern Language Processors. Spring 2018 Lecture Garbage Collection CS577 Modern Language Processors Spring 2018 Lecture Garbage Collection 1 BASIC GARBAGE COLLECTION Garbage Collection (GC) is the automatic reclamation of heap records that will never again be accessed

More information

Lecture 13: Garbage Collection

Lecture 13: Garbage Collection Lecture 13: Garbage Collection COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer/Mikkel Kringelbach 1 Garbage Collection Every modern programming language allows programmers

More information

A.Arpaci-Dusseau. Mapping from logical address space to physical address space. CS 537:Operating Systems lecture12.fm.2

A.Arpaci-Dusseau. Mapping from logical address space to physical address space. CS 537:Operating Systems lecture12.fm.2 UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 537 A. Arpaci-Dusseau Intro to Operating Systems Spring 2000 Dynamic Memory Allocation Questions answered in these notes When is a stack

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Memory Management and Garbage Collection CMSC 330 Spring 2017 1 Memory Attributes Memory to store data in programming languages has the following lifecycle

More information

CS 4120 Lecture 37 Memory Management 28 November 2011 Lecturer: Andrew Myers

CS 4120 Lecture 37 Memory Management 28 November 2011 Lecturer: Andrew Myers CS 4120 Lecture 37 Memory Management 28 November 2011 Lecturer: Andrew Myers Heap allocation is a necessity for modern programming tasks, and so is automatic reclamation of heapallocated memory. However,

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information

Automatic Garbage Collection

Automatic Garbage Collection Automatic Garbage Collection Announcements: PS6 due Monday 12/6 at 11:59PM Final exam on Thursday 12/16 o PS6 tournament and review session that week Garbage In OCaml programs (and in most other programming

More information

Automatic Memory Management

Automatic Memory Management Automatic Memory Management Why Automatic Memory Management? Storage management is still a hard problem in modern programming Why Automatic Memory Management? Storage management is still a hard problem

More information

Parallel GC. (Chapter 14) Eleanor Ainy December 16 th 2014

Parallel GC. (Chapter 14) Eleanor Ainy December 16 th 2014 GC (Chapter 14) Eleanor Ainy December 16 th 2014 1 Outline of Today s Talk How to use parallelism in each of the 4 components of tracing GC: Marking Copying Sweeping Compaction 2 Introduction Till now

More information

Run-time Environments -Part 3

Run-time Environments -Part 3 Run-time Environments -Part 3 Y.N. Srikant Computer Science and Automation Indian Institute of Science Bangalore 560 012 NPTEL Course on Compiler Design Outline of the Lecture Part 3 What is run-time support?

More information

Acknowledgements These slides are based on Kathryn McKinley s slides on garbage collection as well as E Christopher Lewis s slides

Acknowledgements These slides are based on Kathryn McKinley s slides on garbage collection as well as E Christopher Lewis s slides Garbage Collection Last time Compiling Object-Oriented Languages Today Motivation behind garbage collection Garbage collection basics Garbage collection performance Specific example of using GC in C++

More information

Lecture Notes on Garbage Collection

Lecture Notes on Garbage Collection Lecture Notes on Garbage Collection 15-411: Compiler Design André Platzer Lecture 20 1 Introduction In the previous lectures we have considered a programming language C0 with pointers and memory and array

More information

Managed runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley

Managed runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley Managed runtimes & garbage collection CSE 6341 Some slides by Kathryn McKinley 1 Managed runtimes Advantages? Disadvantages? 2 Managed runtimes Advantages? Reliability Security Portability Performance?

More information

Managed runtimes & garbage collection

Managed runtimes & garbage collection Managed runtimes Advantages? Managed runtimes & garbage collection CSE 631 Some slides by Kathryn McKinley Disadvantages? 1 2 Managed runtimes Portability (& performance) Advantages? Reliability Security

More information

CMSC 330: Organization of Programming Languages. Memory Management and Garbage Collection

CMSC 330: Organization of Programming Languages. Memory Management and Garbage Collection CMSC 330: Organization of Programming Languages Memory Management and Garbage Collection CMSC330 Fall 2018 1 Memory Attributes Memory to store data in programming languages has the following lifecycle

More information

Garbage Collection (2) Advanced Operating Systems Lecture 9

Garbage Collection (2) Advanced Operating Systems Lecture 9 Garbage Collection (2) Advanced Operating Systems Lecture 9 Lecture Outline Garbage collection Generational algorithms Incremental algorithms Real-time garbage collection Practical factors 2 Object Lifetimes

More information

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1 Agenda CSE P 501 Compilers Java Implementation JVMs, JITs &c Hal Perkins Summer 2004 Java virtual machine architecture.class files Class loading Execution engines Interpreters & JITs various strategies

More information

Hardware-Supported Pointer Detection for common Garbage Collections

Hardware-Supported Pointer Detection for common Garbage Collections 2013 First International Symposium on Computing and Networking Hardware-Supported Pointer Detection for common Garbage Collections Kei IDEUE, Yuki SATOMI, Tomoaki TSUMURA and Hiroshi MATSUO Nagoya Institute

More information

Lecture 15 Garbage Collection

Lecture 15 Garbage Collection Lecture 15 Garbage Collection I. Introduction to GC -- Reference Counting -- Basic Trace-Based GC II. Copying Collectors III. Break Up GC in Time (Incremental) IV. Break Up GC in Space (Partial) Readings:

More information

Design Issues 1 / 36. Local versus Global Allocation. Choosing

Design Issues 1 / 36. Local versus Global Allocation. Choosing Design Issues 1 / 36 Local versus Global Allocation When process A has a page fault, where does the new page frame come from? More precisely, is one of A s pages reclaimed, or can a page frame be taken

More information

ACM Trivia Bowl. Thursday April 3 rd (two days from now) 7pm OLS 001 Snacks and drinks provided All are welcome! I will be there.

ACM Trivia Bowl. Thursday April 3 rd (two days from now) 7pm OLS 001 Snacks and drinks provided All are welcome! I will be there. #1 ACM Trivia Bowl Thursday April 3 rd (two days from now) 7pm OLS 001 Snacks and drinks provided All are welcome! I will be there. If you are in one of the top three teams, I will give you one point of

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 7 C Memory Management 2007-02-06 Hello to Said S. from Columbus, OH CS61C L07 More Memory Management (1) Lecturer SOE Dan Garcia www.cs.berkeley.edu/~ddgarcia

More information

Node N3. Bunch B2. local root

Node N3. Bunch B2. local root Garbage Collection and DSM Consistency y Paulo Ferreira z and Marc Shapiro INRIA - Projet SOR Abstract This paper presents the design of a copying garbage collector for persistent distributed shared objects

More information

Memory Allocation. Static Allocation. Dynamic Allocation. Dynamic Storage Allocation. CS 414: Operating Systems Spring 2008

Memory Allocation. Static Allocation. Dynamic Allocation. Dynamic Storage Allocation. CS 414: Operating Systems Spring 2008 Dynamic Storage Allocation CS 44: Operating Systems Spring 2 Memory Allocation Static Allocation (fixed in size) Sometimes we create data structures that are fixed and don t need to grow or shrink. Dynamic

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 7 C Memory Management!!Lecturer SOE Dan Garcia!!!www.cs.berkeley.edu/~ddgarcia CS61C L07 More Memory Management (1)! 2010-02-03! Flexible

More information

Distributed Deadlock Detection for. Distributed Process Networks

Distributed Deadlock Detection for. Distributed Process Networks 0 Distributed Deadlock Detection for Distributed Process Networks Alex Olson Embedded Software Systems Abstract The distributed process network (DPN) model allows for greater scalability and performance

More information

Garbage Collection. CS 351: Systems Programming Michael Saelee

Garbage Collection. CS 351: Systems Programming Michael Saelee Garbage Collection CS 351: Systems Programming Michael Saelee = automatic deallocation i.e., malloc, but no free! system must track status of allocated blocks free (and potentially reuse)

More information

Simple Garbage Collection and Fast Allocation Andrew W. Appel

Simple Garbage Collection and Fast Allocation Andrew W. Appel Simple Garbage Collection and Fast Allocation Andrew W. Appel Presented by Karthik Iyer Background Motivation Appel s Technique Terminology Fast Allocation Arranging Generations Invariant GC Working Heuristic

More information

Job Re-Packing for Enhancing the Performance of Gang Scheduling

Job Re-Packing for Enhancing the Performance of Gang Scheduling Job Re-Packing for Enhancing the Performance of Gang Scheduling B. B. Zhou 1, R. P. Brent 2, C. W. Johnson 3, and D. Walsh 3 1 Computer Sciences Laboratory, Australian National University, Canberra, ACT

More information

Evaluating the Performance of Software Distributed Shared Memory as a Target for Parallelizing Compilers

Evaluating the Performance of Software Distributed Shared Memory as a Target for Parallelizing Compilers Evaluating the Performance of Software Distributed Shared Memory as a Target for Parallelizing Compilers Alan L. Cox y, Sandhya Dwarkadas z, Honghui Lu y and Willy Zwaenepoel y y Rice University Houston,

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 presented by Kun Suo Outline GFS Background, Concepts and Key words Example of GFS Operations Some optimizations in

More information

CS 167 Final Exam Solutions

CS 167 Final Exam Solutions CS 167 Final Exam Solutions Spring 2018 Do all questions. 1. [20%] This question concerns a system employing a single (single-core) processor running a Unix-like operating system, in which interrupts are

More information

CSCI-1200 Data Structures Fall 2011 Lecture 24 Garbage Collection & Smart Pointers

CSCI-1200 Data Structures Fall 2011 Lecture 24 Garbage Collection & Smart Pointers CSCI-1200 Data Structures Fall 2011 Lecture 24 Garbage Collection & Smart Pointers Review from Lecture 23 Basic exception mechanisms: try/throw/catch Functions & exceptions, constructors & exceptions Today

More information

Garbage Collection. Weiyuan Li

Garbage Collection. Weiyuan Li Garbage Collection Weiyuan Li Why GC exactly? - Laziness - Performance - free is not free - combats memory fragmentation - More flame wars Basic concepts - Type Safety - Safe: ML, Java (not really) - Unsafe:

More information

Chapter 8 Virtual Memory

Chapter 8 Virtual Memory Operating Systems: Internals and Design Principles Chapter 8 Virtual Memory Seventh Edition William Stallings Operating Systems: Internals and Design Principles You re gonna need a bigger boat. Steven

More information

Memory management has always involved tradeoffs between numerous optimization possibilities: Schemes to manage problem fall into roughly two camps

Memory management has always involved tradeoffs between numerous optimization possibilities: Schemes to manage problem fall into roughly two camps Garbage Collection Garbage collection makes memory management easier for programmers by automatically reclaiming unused memory. The garbage collector in the CLR makes tradeoffs to assure reasonable performance

More information

Heckaton. SQL Server's Memory Optimized OLTP Engine

Heckaton. SQL Server's Memory Optimized OLTP Engine Heckaton SQL Server's Memory Optimized OLTP Engine Agenda Introduction to Hekaton Design Consideration High Level Architecture Storage and Indexing Query Processing Transaction Management Transaction Durability

More information

Compiler Construction D7011E

Compiler Construction D7011E Compiler Construction D7011E Lecture 14: Memory Management Viktor Leijon Slides largely by Johan Nordlander with material generously provided by Mark P. Jones. 1 First: Run-time Systems 2 The Final Component:

More information

CSCI-1200 Data Structures Spring 2017 Lecture 27 Garbage Collection & Smart Pointers

CSCI-1200 Data Structures Spring 2017 Lecture 27 Garbage Collection & Smart Pointers CSCI-1200 Data Structures Spring 2017 Lecture 27 Garbage Collection & Smart Pointers Announcements Please fill out your course evaluations! Those of you interested in becoming an undergraduate mentor for

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #5 Memory Management; Intro MIPS 2007-7-2 Scott Beamer, Instructor iphone Draws Crowds www.sfgate.com CS61C L5 Memory Management; Intro

More information

Mark-Sweep and Mark-Compact GC

Mark-Sweep and Mark-Compact GC Mark-Sweep and Mark-Compact GC Richard Jones Anthony Hoskins Eliot Moss Presented by Pavel Brodsky 04/11/14 Our topics today Two basic garbage collection paradigms: Mark-Sweep GC Mark-Compact GC Definitions

More information

Garbage Collection. Hwansoo Han

Garbage Collection. Hwansoo Han Garbage Collection Hwansoo Han Heap Memory Garbage collection Automatically reclaim the space that the running program can never access again Performed by the runtime system Two parts of a garbage collector

More information

Copying Garbage Collection in the Presence of Ambiguous References

Copying Garbage Collection in the Presence of Ambiguous References Copying Garbage Collection in the Presence of Ambiguous References Andrew W. Appel and David R. Hanson Department of Computer Science, Princeton University, Princeton, New Jersey 08544 Research Report

More information

Exploiting the Behavior of Generational Garbage Collector

Exploiting the Behavior of Generational Garbage Collector Exploiting the Behavior of Generational Garbage Collector I. Introduction Zhe Xu, Jia Zhao Garbage collection is a form of automatic memory management. The garbage collector, attempts to reclaim garbage,

More information

File System Interface and Implementation

File System Interface and Implementation Unit 8 Structure 8.1 Introduction Objectives 8.2 Concept of a File Attributes of a File Operations on Files Types of Files Structure of File 8.3 File Access Methods Sequential Access Direct Access Indexed

More information

Preview. Memory Management

Preview. Memory Management Preview Memory Management With Mono-Process With Multi-Processes Multi-process with Fixed Partitions Modeling Multiprogramming Swapping Memory Management with Bitmaps Memory Management with Free-List Virtual

More information

Concurrent Preliminaries

Concurrent Preliminaries Concurrent Preliminaries Sagi Katorza Tel Aviv University 09/12/2014 1 Outline Hardware infrastructure Hardware primitives Mutual exclusion Work sharing and termination detection Concurrent data structures

More information

HOT-Compilation: Garbage Collection

HOT-Compilation: Garbage Collection HOT-Compilation: Garbage Collection TA: Akiva Leffert aleffert@andrew.cmu.edu Out: Saturday, December 9th In: Tuesday, December 9th (Before midnight) Introduction It s time to take a step back and congratulate

More information

A Migrating-Home Protocol for Implementing Scope Consistency Model on a Cluster of Workstations

A Migrating-Home Protocol for Implementing Scope Consistency Model on a Cluster of Workstations A Migrating-Home Protocol for Implementing Scope Consistency Model on a Cluster of Workstations Benny Wang-Leung Cheung, Cho-Li Wang and Kai Hwang Department of Computer Science and Information Systems

More information

Incremental Multi-threaded Garbage Collection on Virtually Shared Memory Architectures

Incremental Multi-threaded Garbage Collection on Virtually Shared Memory Architectures Incremental Multi-threaded Garbage Collection on Virtually Shared Memory Architectures Thierry Le Sergent, Bernard Berthomieu Laboratoire d Automatique et d Analyse des Systèmes du CNRS 7, Avenue du Colonel

More information

FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23

FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23 FILE SYSTEMS CS124 Operating Systems Winter 2015-2016, Lecture 23 2 Persistent Storage All programs require some form of persistent storage that lasts beyond the lifetime of an individual process Most

More information

16 Sharing Main Memory Segmentation and Paging

16 Sharing Main Memory Segmentation and Paging Operating Systems 64 16 Sharing Main Memory Segmentation and Paging Readings for this topic: Anderson/Dahlin Chapter 8 9; Siberschatz/Galvin Chapter 8 9 Simple uniprogramming with a single segment per

More information

AST: scalable synchronization Supervisors guide 2002

AST: scalable synchronization Supervisors guide 2002 AST: scalable synchronization Supervisors guide 00 tim.harris@cl.cam.ac.uk These are some notes about the topics that I intended the questions to draw on. Do let me know if you find the questions unclear

More information

Adaptive Prefetching Technique for Shared Virtual Memory

Adaptive Prefetching Technique for Shared Virtual Memory Adaptive Prefetching Technique for Shared Virtual Memory Sang-Kwon Lee Hee-Chul Yun Joonwon Lee Seungryoul Maeng Computer Architecture Laboratory Korea Advanced Institute of Science and Technology 373-1

More information

Chapter 11: File System Implementation. Objectives

Chapter 11: File System Implementation. Objectives Chapter 11: File System Implementation Objectives To describe the details of implementing local file systems and directory structures To describe the implementation of remote file systems To discuss block

More information

The Operating System. Chapter 6

The Operating System. Chapter 6 The Operating System Machine Level Chapter 6 1 Contemporary Multilevel Machines A six-level l computer. The support method for each level is indicated below it.2 Operating System Machine a) Operating System

More information

Motivation for Dynamic Memory. Dynamic Memory Allocation. Stack Organization. Stack Discussion. Questions answered in this lecture:

Motivation for Dynamic Memory. Dynamic Memory Allocation. Stack Organization. Stack Discussion. Questions answered in this lecture: CS 537 Introduction to Operating Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department Dynamic Memory Allocation Questions answered in this lecture: When is a stack appropriate? When is

More information

A new Mono GC. Paolo Molaro October 25, 2006

A new Mono GC. Paolo Molaro October 25, 2006 A new Mono GC Paolo Molaro lupus@novell.com October 25, 2006 Current GC: why Boehm Ported to the major architectures and systems Featurefull Very easy to integrate Handles managed pointers in unmanaged

More information

Chapter 8 :: Composite Types

Chapter 8 :: Composite Types Chapter 8 :: Composite Types Programming Language Pragmatics, Fourth Edition Michael L. Scott Copyright 2016 Elsevier 1 Chapter08_Composite_Types_4e - Tue November 21, 2017 Records (Structures) and Variants

More information

Optimizing Closures in O(0) time

Optimizing Closures in O(0) time Optimizing Closures in O(0 time Andrew W. Keep Cisco Systems, Inc. Indiana Univeristy akeep@cisco.com Alex Hearn Indiana University adhearn@cs.indiana.edu R. Kent Dybvig Cisco Systems, Inc. Indiana University

More information

a process may be swapped in and out of main memory such that it occupies different regions

a process may be swapped in and out of main memory such that it occupies different regions Virtual Memory Characteristics of Paging and Segmentation A process may be broken up into pieces (pages or segments) that do not need to be located contiguously in main memory Memory references are dynamically

More information

Design Issues. Subroutines and Control Abstraction. Subroutines and Control Abstraction. CSC 4101: Programming Languages 1. Textbook, Chapter 8

Design Issues. Subroutines and Control Abstraction. Subroutines and Control Abstraction. CSC 4101: Programming Languages 1. Textbook, Chapter 8 Subroutines and Control Abstraction Textbook, Chapter 8 1 Subroutines and Control Abstraction Mechanisms for process abstraction Single entry (except FORTRAN, PL/I) Caller is suspended Control returns

More information

Scientific Applications. Chao Sun

Scientific Applications. Chao Sun Large Scale Multiprocessors And Scientific Applications Zhou Li Chao Sun Contents Introduction Interprocessor Communication: The Critical Performance Issue Characteristics of Scientific Applications Synchronization:

More information

Reducing Disk Latency through Replication

Reducing Disk Latency through Replication Gordon B. Bell Morris Marden Abstract Today s disks are inexpensive and have a large amount of capacity. As a result, most disks have a significant amount of excess capacity. At the same time, the performance

More information

CS Operating Systems

CS Operating Systems CS 4500 - Operating Systems Module 9: Memory Management - Part 1 Stanley Wileman Department of Computer Science University of Nebraska at Omaha Omaha, NE 68182-0500, USA June 9, 2017 In This Module...

More information

CS Operating Systems

CS Operating Systems CS 4500 - Operating Systems Module 9: Memory Management - Part 1 Stanley Wileman Department of Computer Science University of Nebraska at Omaha Omaha, NE 68182-0500, USA June 9, 2017 In This Module...

More information

Java Performance Tuning

Java Performance Tuning 443 North Clark St, Suite 350 Chicago, IL 60654 Phone: (312) 229-1727 Java Performance Tuning This white paper presents the basics of Java Performance Tuning and its preferred values for large deployments

More information

Coping with Conflicts in an Optimistically Replicated File System

Coping with Conflicts in an Optimistically Replicated File System Coping with Conflicts in an Optimistically Replicated File System Puneet Kumar School of Computer Science Carnegie Mellon University 1. Introduction Coda is a scalable distributed Unix file system that

More information

Parallel storage allocator

Parallel storage allocator CSE 539 02/7/205 Parallel storage allocator Lecture 9 Scribe: Jing Li Outline of this lecture:. Criteria and definitions 2. Serial storage allocators 3. Parallel storage allocators Criteria and definitions

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 7 More Memory Management CS 61C L07 More Memory Management (1) 2004-09-15 Lecturer PSOE Dan Garcia www.cs.berkeley.edu/~ddgarcia Star Wars

More information

A STUDY IN THE INTEGRATION OF COMPUTER ALGEBRA SYSTEMS: MEMORY MANAGEMENT IN A MAPLE ALDOR ENVIRONMENT

A STUDY IN THE INTEGRATION OF COMPUTER ALGEBRA SYSTEMS: MEMORY MANAGEMENT IN A MAPLE ALDOR ENVIRONMENT A STUDY IN THE INTEGRATION OF COMPUTER ALGEBRA SYSTEMS: MEMORY MANAGEMENT IN A MAPLE ALDOR ENVIRONMENT STEPHEN M. WATT ONTARIO RESEARCH CENTER FOR COMPUTER ALGEBRA UNIVERSITY OF WESTERN ONTARIO LONDON

More information

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012) Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues

More information

CSCI 4717 Computer Architecture

CSCI 4717 Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Symmetric Multiprocessors & Clusters Reading: Stallings, Sections 18.1 through 18.4 Classifications of Parallel Processing M. Flynn classified types of parallel

More information

Dynamic Memory Allocation. Gerson Robboy Portland State University. class20.ppt

Dynamic Memory Allocation. Gerson Robboy Portland State University. class20.ppt Dynamic Memory Allocation Gerson Robboy Portland State University class20.ppt Harsh Reality Memory is not unbounded It must be allocated and managed Many applications are memory dominated Especially those

More information