DMMU: Dynamic Memory Management Unit

Size: px
Start display at page:

Download "DMMU: Dynamic Memory Management Unit"

Transcription

1 DMMU: Dynamic Memory Management Unit J. Morris Chang, Witawas Srisa-an and Chia-Tien Dan Lo Department of Computer Science Illinois Institute of Technology Chicago, IL, , USA (chang sriswit Abstract Dynamic memory management has been a performance bottleneck in many operating systems including multithreaded and real-time operating systems. Moreover, recent advances in software engineering, such as graphical user interface and object-oriented programming, have caused applications to become more memory intensive. The growing popularity of these operating systems and applications increases the importance of high-performance dynamic memory management. Currently, the heap management is done through software routines (e.g. malloc and free in C/C++) and handled in a per process base. This paper presents a hardware implementation of DMMU which will be shared by all the processes in the same way that the TLBs (Translation Look-aside Buffers) used in current MMU. This DMMU employs a novel approach to implement an efficient memory management. Allocation and deallocation are performed in hardware domain. The detailed design and evaluation of the proposed DMMU are presented in this paper. Simulation results show that the hit ratio for 1 Kbits and 8 Kbits buffer are ranging from 78-99% and 95-99%, respectively. At the same time, the memory overhead for 1 Kbits is 10% and for the 8 Kbits is 0%. Index Terms Architectural support for operating system, dynamic memory management, bit-map allocation/ deallocation, object-oriented systems, VLSI implementation. 1. Introduction Dynamic memory management has been a performance bottleneck in many operating systems including multi-threaded and real-time operating systems. In multi-threaded system, threads of the same process share and access a common heap space concurrently. While this alleviated the context switching overhead, the allocation and deallocation within that heap space can be very intensive if there are several threads existing in the same process. In real time operating system, meeting the deadline is the utmost important goal. However, functions such as malloc and free yield non-deterministic turnaround time which can cause real-time systems to fail. 1

2 Moreover, recent advances in software engineering, such as graphical user interface and object-oriented programming, have caused applications to become more memory intensive. The growing popularity of these operating systems and applications increases the importance of high-performance dynamic memory management. This document presents a design of Dynamic Memory Management Unit (DMMU) to facilitate the memory management. The memory-management unit (MMU), a hardware device, can be found in almost every modern CPU. The MMU has been used to map virtual address to physical address for more than two decades. This paper presents an extension to the traditional MMU to include hardware support for heap management. The rationale behind this approach is two folds: 1. The functions for dynamic memory allocation and deallocation are one of the most time consuming functions in many applications today. The object-oriented nature of C++ applications results in much more intensive memory usage than in C applications. In particular, a much larger proportion of total memory usage is taken in allocations on the heap compared to a procedural language, where there is greater use of the stack and static data. Moreover, [18] finds that in six C/C++ programs, 23% - 38% of runtime is spent on dynamic memory allocation and deallocation. Our studies in using GNU s gprof to profile the invocation of malloc and free functions in several programs also confirm with Zorn s findings. Moreover, JAVA applications are more dynamic memory intensive. For example, to allocate an array of class object, we need to dynamically allocate each array element. Additionally, operating systems such as multi-threaded or real-time, also generate higher memory intensity. These illustrate the need for a high-performance dynamic memory allocator and deallocator. 2. As the VLSI technology advances, it becomes more and more attractive to map basic software algorithms into hardware. Recent examples are 57 new instructions for multimedia extension (MMX) in Intel s Pentium and 21 new instructions for 3D graphic and multimedia in AMD s K6-2. By 2001, it is expected to see a 100-milliontransistor CPU which is ten times more complex than Pentium II [19]. Currently, the heap management is done through software routines (e.g. malloc and free in C/C++) and handled in a per process base. This paper presents a hardware implementation of DMMU which will be shared by all the processes in the same way that the TLBs (Translation Look-aside Buffers) used in current MMU. This DMMU employs a novel approach to implement an efficient memory management. The binary buddy system, known for 2

3 speed and simplicity in memory allocation, has been in use for nearly three decades. A software realization incurs the overhead of internal fragmentation and of memory traffic due to splitting and coalescing memory blocks. This paper employs a simple hardware design for buddy-system allocation with a size-encoding scheme that takes full advantage of the speed of a pure combinational-logic implementation[1]. This allows dynamic allocation and deallocation to be done in constant time. Moreover, locality will also be improved, since the allocation and deallocation are done by a dedicated hardware unit. This modified buddy system can effectively eliminate internal fragmentation. Simulation results show that the memory utilization can be 25% to 33% better than the conventional buddy system and is very close (within 10%) to the first-fit approach [1]. The hardware complexity of the proposed scheme is O(n), where n is the size of the bit-vector. Recently, much attention has been paid to research in the field of Processor In Memory (PIM) such as David Patterson s IRAM project [14]. The research is intended to close the performance gap between processor and memory. An intelligent memory system should include high-performance strategies to manage memory dynamically. This DMMU will fit into PIM paradigms very nicely. Moreover, the proposed design can also benefit any application that requires high-performance resource allocation. Such applications include IIT s JAVA chip[5], real-time systems, and embedded systems. The remainder of this paper is organized as follows. Section 2 elaborates on the previous work. Section 3 provides the design of the DMMU and the architectural support for the DMMU. Section 4 provides information on operating system support. Section 5 presents the performance evaluation of the DMMU. The last section presents the conclusions of this paper. 2. Previous work Dynamic memory management has been an important topic in computer systems for over three decades. High-performance algorithms for dynamic memory management are of considerable interest 1. The primary goals of these algorithms are to speed up the dynamic memory allocation and deallocation as well as to minimize internal and external fragmentation. Each of the popular algorithms has its drawbacks. The first fit and next fit algorithms achieve good storage utilization but incur a time penalty associated with scanning a free list [3]. The buddy system, 1. Recently, a very extensive survey (78 pages long) was presented by Paul Wilson [17]. 3

4 which allocates memory in blocks whose lengths are powers of 2, is known for its speed and simplicity, but suffers from internal fragmentation [17]. Modern operating systems employ these strategies with several optimizations in the implementation of dynamic memory management. For example, a Cartesian-tree, better fit algorithm is used in the Sun Operating System; a fast buddy algorithm can be found in the BSD Unix implementation of malloc; a hybrid first-fit, buddy algorithm is used in the GNU publicly available malloc/free implementation[15]. In the modern programming languages, dynamic memory management is part of the runtime system to manage the heap memory. It is typically implemented through function calls (e.g. malloc() and free() in C language). The malloc API specified by ANSI is a general-purpose search for an arbitrarily sized free area of memory. Compilers generally implement C++ operator new directly in terms of malloc. Applications written in C and (especially) C++ spend much of their execution time in allocating and deallocating memory. The increasing popularity of C++ also magnifies the importance of efficient heap management. The inefficiency (in terms of time) of malloc and free is due to the search through a list of free blocks. The current approach in dealing with dynamic memory management requires that each process manages its own heap space, which exists in virtual memory domain, on a per process basis. Each process calls its own malloc and free functions and the allocation information is usually kept on linked lists. Once the process heap space is exhausted, operating system s function call such as sbrk() or brk() is made to request for additional memory [16]. The major drawback of this approach is in the allocation information data structure. By using linked list, such as in sequential fit or segregated fit, the search is done sequentially and the number of memory blocks on the available lists can greatly affect the turnaround time. As the memory becomes more scattered, the search time also grows linearly longer. It is worth noting that heap management algorithms have been around long before the popularity of objectoriented programming. In Pascal, memory is allocated and deallocated by new and dispose operators. In C, the allocation and deallocation are done through malloc and free. Even in C++, the new and delete functions are mapped to malloc and free functions from C. This implicates that while the programming trend is moving toward objectoriented approach, dynamic memory management functions still operate based on the functions designed for the procedural approach which yield much less memory intensity than the object-oriented approach. Apparently, new 4

5 approaches to manage dynamic memory are needed in order to cope with the increasing in memory intensity of the heap region. If storage is allocated in contiguous sequences of fixed-sized blocks, allocation information may be held in a separate bit-map [15]. A bit-map approach provides a great opportunity to implement a hardware version of the buddy system [7]. It is worth noting that the proposed hardware approach is rather simple and yet does not have the difficult issues, splitting and coalescing, which does exist in its software counterpart. In this section, the technical rationale and approach are presented from the performance perspectives. Improvement in speed. When a block of a given size is to be allocated, the buddy system locates a block that is at least as large as the allocation request, and whose size is a power of two. The block is split in half as many times as possible, until it can be split no longer yet still satisfy the memory request. When a block is split, its two halves are known as buddies. At the time a block is freed, if the block s buddy is also free, the buddy system coalesces the two buddies immediately, making available a larger piece of memory that can be used later for allocation. The operations of splitting and coalescing memory dominate the cost of the buddy system in its software realizations. This observation has led to research in reducing the frequency of coalescing [10]. A hardware-maintained bit-map approach; however, eliminates the need for splitting and coalescing operations. Splitting becomes unnecessary because allocation is done using a hardware-maintained binary tree that finds free blocks using combinational logic. The bit-vector forms the base of the binary tree. Deallocation is indicated by resetting bits in the bit-vector, eliminating the need for explicit coalescing. In the software approach, a free list for blocks of each size is maintained; thus, after blocks are coalesced, the pointers to two free lists need be updated. The hardware approach, however, requires no coalescing at all. The freed bits are, in effect, combined immediately with adjacent bits [2]. Because the proposed hardware can be realized in pure combinational logic, the time needed for memory management is greatly reduced. Improvement in memory utilization. Binary buddy systems always allocate memory in sizes of powers of 2, so they may leave much space unused at the end of an allocated block. This is known as internal fragmentation. Bitvector allocation raises the possibility of using the otherwise unused portion of the block later, without increasing the allocation/deallocation time. For example, if the buddy system allocates 8 blocks for a 5-block request, the hardware 5

6 can mark exactly 5 of the 8 bits (one bit per block). This allows using the unused 3 blocks in the future. We will call this the modified buddy system. Again, such bit-manipulation is quite easy in a hardware realization. Recalling the day when multiprogramming operating system was first introduced, the virtual memory space was managed by software algorithms. Later on, the hardware implemented memory management unit (MMU) became the first choice in memory management for most of the CPUs available today. This document presents a novel attempt to create a dynamic memory management unit (DMMU) that can collaborate with operating systems in managing heap space. As the trend moves toward object-oriented programming, multi-threaded operating system, and real-time operating system, the demand for high performance dynamic memory management becomes more intense. It is time to look for alternatives to current software approaches that can respond to the needs of today s computing. The DMMU can be a strong candidate to accomplish such goal because it provides improvement in speed, improvement in memory utilization, and constant turn around time which responds to the needs of today s computing. 3. Design of Dynamic Memory Management Unit (DMMU) This section is separated into two sub-sections. First sub-section provides an overview picture of the DMMU, while the second sub-section provides architectural support for the DMMU. 3.1 Top level design The main purpose of the Dynamic Memory Management Unit (DMMU) is to take responsibility for managing heap space for all processes in the hardware domain. The proposed DMMU utilizes the modified buddy system combined with bit-map approach to store allocation and size information. Usually, each process has a heap associated with it. In the proposed scheme, each heap requires two bit-maps, one for allocation status (A bit-map) and one for object size (S bit-map). It is necessary to place these two bit-maps together all the time, since searching and modification to these two bit-maps are required for each allocation and deallocation. The communication between the CPU and the DMMU is done through extended instruction set designed specifically for the DMMU. This instruction set would allow the CPU to make the allocation or deallocation request and pass on the requested size or starting address information to the DMMU. Figure 1 demonstrates the top-level diagram of the DMMU. 6

7 Figure 1. The top-level description of a DMMU A bit-map S bit-map from different processes Allocation / Deallocation CPU Object Size DMMU sbrk/brk Object pointer O.S. Kernel Figure 1 illustrated the basic functionality of the DMMU. First of all, the DMMU provides service to the CPU by maintaining the memory allocation status inside the heap region of the running process. Thus, the DMMU must have access to the A bit-map and S bit-map of the running process. Similar to the TLB, DMMU is shared among all processes. The parameters that the CPU can pass to the DMMU are: the Allocation/Deallocation signal, the Object_size (for the allocation request), and the Object_pointer. The operation of DMMU is very similar to the function calls (i.e. malloc() and free()) in C language. Thus, object_pointer is either returned from DMMU in allocation or passed on to DMMU during deallocation process. If the allocation should failed, the DMMU would make a request to the operating system for additional memory using system call sbrk() or brk(). Since the algorithms used in DMMU is implemented through pure combinational logic, the time to perform a memory request or a free instruction is constant. On the other hand, the time for a software approach in performing a malloc() or a free() function is non-deterministic. By and large, the practical programs spend more than 20% of execution times in dealing with dynamic memory management [18]. This extensive execution time can be eliminated by using DMMU. The DMMU can be implemented as a co-processor or integrated into the CPU. 3.2 Architectural support for DMMU This section summarizes the process of memory allocation and deallocation in the DMMU. Since the bit-map of a given process may be too large to be handled in the hardware domain, the bit_vector, a small segment of the bitmap, is used in the proposed system. This idea is very similar to the idea of using TLB (Translation Look-aside Buffer) in the virtual memory. Due to the close tie between S bit-map and A bit-map, the term of bit-vector used in this section represents one A bit-vector (of A bit-map) and one S bit-vector (of S bit-map). Figure 2 presents the operation of the proposed DMMU. 7

8 When a memory allocation request is received (step 1), the requested size is compared against the largest_available_size of each bit-vector in a parallel fashion. This operation is similar to the tag comparison in a fully associated cache. However, it is not an equality comparison. There is a hit in DMMU, as long so one of the largest_available_size is greater or equal to the request size. If there were a hit, the corresponding bit-vector is read out (step 2) and sent to the CBT (Complete Binary Tree). The CBT [1] is a hardware unit to perform allocation/ deallocation on bit-vector. For the purpose of illustration, we assume that one bit-vector represents one page of the heap. After the CBT identified the free chuck memory from the chosen page, CBT will update the bit-vector (step 3) and the largest_available_size field (step 3*). The object pointer (in terms of page offset address) of the newly created object is generated by CBT (step 4). This page offset combines the page number (from step 2*) into the resultant address. Figure 2.The allocation and deallocation processes of the DMMU starting address to be freed page number page offset page number bit-vectors largest_available_size A 1 page number page offset starting address of new object Allocation steps: 1 free memory space request * bit-vector read out, bit-vector update A* 3 * update largest_available_size 4 offset address of newly created object 4 2 * page # read out 3 C CBT A B 2 B Deallocation steps: select bit-vector A * starting page-offset address bit-vector read out C bit-vector update C * update largest_available_size 3 * C * For the deallocation, when the DMMU receives a deallocation request, the page number of the object pointer (i.e. a virtual address) is used to select a bit-vector (step A). This process is similar to the tag comparison in cache operation. At the same time, the page offset is sent to CBT as the starting address to be freed (step A*). The corresponding bit-vector is read out (step B) and sent to CBT. The CBT will free the designated number of blocks 8

9 (based on the information from S bit-vector) starting at the address provided by the step A* and update the bit-vector (step C) and the largest available size field (step C*). The page number, bit-vectors, and the largest_available_size are placed in a buffer, called Allocation Look-aside Buffer (ALB). Since the DMMU is shared among all processes, the content of ALB will be swapping during the context switching. This issue also exists in TLB. To solve this problem, we can add a process-id field in ALB. This will allow bit-vectors of different processes to coexist in the ALB. We expect the performance of the ALB to be very similar to the much studied TLB. However, further research in the ALB organization, hit ratio and miss penalty is required. In order for the CPU to communicate with the DMMU, an extended instruction set is needed to take advantage of these new features. These instructions (i.e. MALC, MFRE, etc.) allow the CPU to communicate with the DMMU. For example, if the CPU executing the MALC command, the requesting memory size is sent to the DMMU with the allocation signal. In return, the DMMU would send the starting address back to the CPU. In case of failure, the DMMU would return a null pointer back to the CPU. Similarly, when the CPU executes the MFRE command, it simply issues the starting address of the memory to be freed and send out the deallocation signal to the DMMU. 4. Operating system support for DMMU The two main functions of operating system are resource management and providing an interface between bare machines and users. When the user creates a job, the operating system must provide an environment for the job to execute until it finishes. From the user s point of view, how the operating system implements system calls such as malloc or free is irrelevant. By adding a new resource, DMMU, some parts of operating system need to be modified. 4.1 Compiler Like the multimedia extension (MMX) instructions or three-dimension (3D) instruction sets, a modified compiler is required to generate object code that can take advantage of these new instruction sets. Once the compiler recognizes malloc() or free() in a program, it will generate code using DMM instruction set instead of linking malloc() or free() code from the current software library. Some of the DMMU instruction sets are user-level such as MALC and MFRE whereas most of them are supervisor-level such as ALB management instructions. The supervisorlevel instruction sets can only be used by operating system. Since the DMMU is a resource that shares among all processes, it is necessary to prevent some user processes to modify the content of the DMMU. The only job that the 9

10 modified compiler does is to generate code incorporating the DMMU instruction sets. Hence, the executable code can fully utilize the benefits of the DMMU. After compiling a program, the structure of a executable object code is: code segment, data segment, heap segment, and stack segment. Figure 3 shows a typical memory map for a process in a computer system. The process control block (PCB) stores all the information for the process to be executed such as process status, context switch data, and all the bit-vectors needed for a program. The program code is stored in the code segment. Global variables are store in the code segment. Dynamic variables requesting from malloc or realloc functions are stored in the heap segment. The stack segment is used for auto or local variables. Once the executable code is loaded into memory, the size of every segment is determined. Some operating systems may have only one segment for heap segment and stack segment; however, the combined size is still fixed after being loaded into memory. On the other hand, the maximum number of bit-vectors to represent a heap segment can be obtained during compiled time. Consequently, for storing these bit-vectors, the modified compiler has to reserve memory space which can be in PCB. Figure 3. A typical memory map for a process in a computer system Auto Variables Current Top of Stack Dynamic Memory Area Global Variables Program Process Control Information STACK HEAP DATA CODE PCB DMMU From Figure 3, it is worth noting that memory is allocated from the heap region which is above the data segment and can grow to accommodate the memory allocation requests. The address of the first available location address beyond the data segment is called the break address. A process can manipulate its break address by invoking the two system calls, brk() and sbrk(). The library functions such as malloc(), free(), and realloc() are constructed using either brk() or sbrk() system calls. By introducing DMMU, some signals need to be generated in order to invoke brk() or sbrk() when necessary. 4.2 Context-switch In a multi-process operating system, several processes can be executed concurrently. A CPU scheduler decides the states of processes which are ready, running or idle. When a process is scheduled to be in running state, it 10

11 can get control of CPU and the current running process will be swapped to be ready or idle states. To swap these two process, it requires saving the CPU status of the current process and loading the saved CPU status of the new process. This is so-called context-switch. Generally speaking, there are two context switching mechanisms in modern operating systems: segmentation and paging. Segmentation uses a variable length of memory chunk whereas paging uses many small fixed size of memory chunks for a program. In this proposal, only operating systems with paging will be discussed. Context switching has become an overhead because of the time it takes to switch two processes. One solution to alleviate this cost is introducing threads which reduce context-switching frequency. However, the bit-vectors in DMMU also need to be replaced during context-switch. This overhead can not be avoid but can be reduced by introducing ALB which can be multiple sets of registers. If the architecture of the ALB allows multiple sets of process to be kept at one time, during the context-switch, the only change is simply modifying the content of a register which points to the next scheduled process information in the buffer. On the other hand, if the ALB includes process-id (PID), then the PID can be used to compared with the current process PID to identified which bit-vector is valid for current process. There is another case that DMMU fails to allocate a memory request by looking all the bit-vectors in the ALB. This would happen if there are no space in the heap space or other bit-vectors need to be brought into ALB. Next section discusses this issue in greater detail. 4.3 DMMU service routines In order to execute a program, the huge virtual memory space must be mapped into small limited physical memory space. This is the reason why some of the memory contents must be swapped between memory and disks. The job of the DMMU is to keep track of free memory blocks within the process heap segment. Some bit-vectors are used to represent memory allocation status corresponding to the heap space. Inside the ALB, only a limited number of bit-vectors can be held. For a smaller heap space, it is possible to have all the bit-vectors for a process residing in the ALB. However, for a larger heap space, it may not be practical to hold all bit-vectors inside the ALB. In this case, a replacement policy must be selected and swapping mechanism similar to cache must be utilized. Figure 4 demonstrates this strategy. 11

12 Figure 4. DMMU structure for 24-page heap space Heap... Memory structure P0 P23 page # 0 to page # 23 (P0...P23) S & B bit-vectors for P0 S & B bit-vectors for P1 S & B bit-vectors for P2 S & B bit-vectors for P3 S & B bit-vectors for P4 Allocation Look-aside Buffer (5 entries) In Figure 4, the DMMU can only manage 5 bit-vectors at one time. If we need to allocate memory in P5, an approach similar to cache must be used to replace the bit-vectors inside the ALB. Our recent study on the allocation patterns[4] of C++ programs shows a good spatial locality. This is due to most objects have a short life span in objectoriented systems [7]. This suggests that a pseudo least-recently-used (LRU) or first-in-first-out (FIFO) replacement policies may work well with proposed ALB. Moreover, a prefetching-on-miss strategy may be used to optimize the ALB performance. The management of ALB, including the implementation of the replacement policy, will be done by a set of supervisor-level ALB management instructions. These instructions are very similar to the one used to manage the TLB in PowerPC [12]. If the allocation should fail or just a miss in the ALB, the DMMU service routines will make the sbrk()/brk() to the kernel or apply ALB management instructions to manipulate ALB. 4.4 Multi-process system versus multi-thread system In a multi-process system, the proposed ALB is to bit-maps what the TLB is to page table. During the contextswitch, there is overhead associated with maintaining the TLB. Similarly, ALB also adds cost to the context-switch. However, context-switch cost may be reduced through two proposed strategies: 1) adding a process-id field to the ALB or 2) using a multiple set of bit-vectors; each set is pointed by a process-id. The first approach has been used in TLB. The second approach leads to a possibility of loading the bit-vectors for the next scheduled process while system is in non-cpu cycle. The optimal set size will be investigated through simulation. On the other hand, the emergence of the multi-thread system may drastically reduce the cost for context switching. In multi-threaded system, the content of ALB might need to be updated during the process switching. While it is true that threads, in many aspects, operate in the same manner as processes, they do not have their own heap space. In multi-threaded system, threads within a process share a common heap space; therefore, the content of ALB should 12

13 only be recorded and updated during the process switching. That is no context-switch cost for switching from one thread to another within a process. Consequently, the DMMU can operate very well in a multi-threaded system. 5. Simulation Results This section presents detailed simulation results of the proposed DMMU. The simulator accepts memory allocation and deallocation traces as inputs and provides hit ratio as the result. The memory allocation / deallocation traces are obtained by instrumenting the malloc and free functions of the source programs. In the following subsections, the characteristics of the programs we traced and the performance evaluation of DMMU are summarized. 5.1 Application overview: The four programs traced in our experiment are publicly available software applications. These programs include gcc (C++ compiler), Xsnow, Xearth and Xspim (MIPS R2000/R3000 simulator). The first application is GNU product. Xearth (written by Kirk Lauritz Johnson) and Xsnow (written by Rick Jansen) are freeware. Xspim is developed by James Larus and used widely in academia [13]. They represent several different applications. Gcc is a CPU intensive application with no screen interaction. Xsnow and Xearth are X-window applications without screen interaction. Xspim is also an X-window application and has screen interaction through pull down menu and text input. Table 1 summarizes the traces characteristics of these four programs. The gcc was used to compile the Xspim. The Xspim was use to run an assembly program of recursive Ackerman s function. The numbers of malloc and free invocation are ranging from 2500 to 6400 and 1100 to 7300, respectively. The average object size of memory allocation request for each programs is ranging from 94 bytes to 1581 bytes. This shows that our experiment covers a good variety in allocation patterns. Table 1 Characteristics of program traces Name Malloc calls Free calls Avg. Malloc size (bytes) Total Memory Requested (bytes) GCC ,987,707 XSPIM ,238 XEARTH ,567 XSNOW ,203,586 13

14 Figure 5 depicts the histogram of object size for each application. For Xspim and Xearth, small objects are invoked very frequently. In Xsnow, object sizes are distributed quiet evenly. However, they are in the order of 2 n. It is worth noting that gcc invoked more than 900 objects in the size of 4K bytes (i.e. one page worth memory). This is due to the gcc maintained its own free list for certain objects. The burdensome overhead of malloc and free is a well known issue among experienced programmers. The most common way to lower the penalty is to make less frequent calls to malloc. Thus, programmers tend to request a large chunk of memory once, then keep track of their own free list. It may need to request another large chunk of memory if the current one has run out the space. This scheme is used in gcc and the chunk size is 4K bytes. Figure 5. Histogram of object size. Histogram (GCC) Histogram (XSPIM) Frequency Frequency Size (Bytes) Size (Bytes) Histogram (XEARTH) Histogram (XSNOW) Frequency Frequency Size (Bytes) Size (bytes) As we mentioned earlier, the large objects will be handled separately. This is a common practice in many other schemes [9]. In this research, the objects with size greater than 8K bytes are defined as large objects. In the four programs we traced, there are only 43 objects that are large objects. 5.2 Investigating block size Before we can evaluate the system performance, the first parameter need be studied is the block size. Again, in the bit-map, one bit stands for one block worth memory. The block size affects the bit-map size for a give heap memory size. The larger block size will yield a smaller bit-map size. A smaller bit-map size means a lower cost for 14

15 the bit-map. However, the larger block size may lead to higher internal fragmentation during the allocation. The higher internal fragmentation can contribute to a higher watermark (i.e. the highest memory address allocated). Apparently, the higher watermark is considered as the memory overhead in the proposed scheme. Next Table summarizes the memory overhead (through watermark) with block size ranging from 8 bytes to 64 bytes. The smallest block size, 4 bytes for one block, is used as the benchmark. Table 2 Memory overhead as compared to 4 bytes/block Bytes/Block GCC(%) XSPIM(%) XEARTH(%) XSNOW(%) From the table above, 16 bytes/block is the most logical block size. When compare block size of 16 and 8, the overhead in block size of 16 is minimal (5.88%). However, the overall size of bit-map would be reduced by 50% compared to block size of 8. Thus, we will use 16 bytes/block throughout the subsequence simulations. 5.3 Investigating replacement policy Similar to cache, the replacement policy can be a determining factor in the performance of the ALB. The three most common replacement policies, FIFO, Random, and LRU are investigated in the simulation. The two basic buffer configurations used in the simulation are 4 entries x 512 bits and 4 entries x 1Kbits. Figure 6 demonstrates the performance of the buffers with different replacement policies. Figure 6. Performance of the buffers with different replacement policies Comparison between Replacement Policies (4 entries x 512 bits) Comparison between Replacement Policies (4 entries x 1 Kbits) Hit Ratio Hit Ratio GCC XSPIM XEARTH XSNOW Applications 0.9 GCC XSPIM XEARTH XSNOW Applications FIFO RANDOM LRU FIFO RANDOM LRU 15

16 The performances between different policies do not differ much. A closer look reveals that in most instances, FIFO performs a little better than the other policies. The reason why FIFO performs better is because of the object life span may vary (young object dies young while old object continues to live [20]). FIFO guarantees the bit-vector that contains the oldest objects will always be replaced. Thus FIFO will be used as the replacement policy in our simulations. 5.4 Performance Evaluation We investigate the performance of ALB through two approaches. First, we fix the size of the Bit-Vector Length (BVL) and increase the number of entries (this also increases the buffer size). In doing so, we can find a good saturation point where the hit ratio of all or most of the programs begin to stabilize. The result is illustrated in Figure 7. Figure 7. Buffer size VS. Hit ratio Buffer Size VS. Hit Ratio (Fixed BVL at 512 bits) Buffer Size VS. Hit Ratio (Fixed BVL at 1K bits) Hit Ratio # of entries Hit Ratio # of entries GCC XSPIM XEARTH XSNOW GCC XSPIM XEARTH XSNOW Case a Case b In three of the four programs, the good saturation point for case a is at 16 entries. This translates to the buffer size of 8Kbits. For case b, the saturation point is at 8 entries. This also translate to 8Kbits buffer size. In the second approach, we set the buffer size to the value provided by the first approach (in this case 8K bits). Then, we would investigate the effect of buffer configuration (number of entries x BVL) on the hit ratio. Since the buffer size is set to 8K blocks, we can have the following configuration, 2x4K, 4x2K, 8x1K, and 16x0.5K. The result is demonstrated in Figure 8. 16

17 Figure 8. Buffer size VS. Hit ratio (different configurations). Buffer Size VS. Hit Ratio (Fixed Buffer Size at 8K bits) Hit Ratio x4K 4x2K 8x1K 16x0.5K Configurations GCC XSPIM XEARTH XSNOW From the above Figure, hit ratio decreases as the BVL decreases. This phenomenon is similar to caches (i.e. larger cache line size may lead to a higher hit ratio). It is worth noting that hit ratio varies more with gcc which has a larger average object size. This is due to the higher miss probability occurs in a smaller BVL with many objects that are relative large. Obviously, the configuration the allows longer BVL with less entries (2x4K) has the best hit ratio. On the other hand, the configuration that allows more entries with shorter BVL (16x0.5K) would also have smaller miss penalty (i.e. less data need be moved to buffer for each miss). Similar to the cache design, trade-off between lower miss penalty and higher hit ratio must be made by the system architect. 5.5 Investigating Memory overhead with different configuration It is worth noting that the changes in BVL not only affect the hit ratio but the watermark. This is because the proposed scheme does not allocate objects across the boundaries of the bit-vectors. This can lead to higher external fragmentation. Thus, a higher watermark is reached. Table 3 illustrates the effect of memory overhead with BVL. Fortunately, the BVL does not change memory overhead significantly. Programs Buffer size = (1x watermark) Watermark Table 3 BVL VS. Memory overhead Buffer size= (2x4K) Memory overhead(%) Buffer size= (4x2K) Memory overhead(%) Buffer size= (8x1K) Memory overhead(%) Buffer size= (16x0.5K) Memory overhead(%) GCC 108, XSNOW 26, XSPIM 12, XEARTH 9,

18 Another factor that may change the memory overhead is buffer size. Different buffer size may yield to different allocation address. Thus, this would affect watermark. Table 4 shows the simulation results for two buffer sizes: 2x0.5K and 32x0.5K. Programs Buffer size = (1x watermark) Watermark Table 4 Buffer size VS. Memory overhead Buffer size= (2x0.5K) Watermark Memory overhead(%) Buffer size= (32x0.5K) Watermark Memory overhead(%) GCC 108, , , XSNOW 26,687 26, ,687 0 XSPIM 12,473 13, ,473 0 XEARTH 9,215 9, ,215 0 With a larger buffer size, the overhead is less than 0.5%. In the case with smallest buffer size (i.e. 2x0.5K), the memory overhead is less than 10%. The benchmark used in the comparison is the buffer size equal to watermark and with only one entry. This excludes the potential factor from external fragmentation due to the BVL. 5.6 Summary We have presented performance study over four programs with distinct allocation patterns. Obviously, larger buffer size will lead to a higher hit ratio and lower memory overhead. This observation is confirmed by the simulation with the larger buffer size (in Table 4 and Figure 2). Even with the smallest buffer, 2x0.5K (which is only 128 bytes), the hit ratios are ranging from 78% to 99% depending on the application. And, the memory overhead is still within 10%. It is worth noting that gcc yields the worst performance because the programmer s hesitation to use the malloc function. Without maintaining their own free list which leads to many 4 Kilobytes requests, gcc would perform as well as the other applications. With the proposed hardware malloc and free functions, the programmer should not need to maintain their own free list. In such situation, the ALB would perform very well even with the smallest buffer size. 18

19 6. Conclusion Recent advances in software engineering such as graphical user interface and object-oriented programming have caused the applications to become more memory intensive. To facilitate the data intensive computing, this paper presents a novel design of dynamic memory management unit. The time consuming functions such as allocation and deallocation in software can now be processed through hardware components in the DMMU. By caching a small part of bit-map of the current process, the DMMU can operate efficiently in multiprocessor environments. Detailed design and simulation of the proposed DMMU have been presented in this paper. To evaluate the performance of proposed scheme, four commonly used applications are traced. The average object sizes of these traces range from bytes. Even with the smallest buffer, 2x0.5Kbits (which is only 128 bytes), the hit ratios are ranging from 78% to 99% while the memory overhead is within 10%. With the 8x1Kbits buffer, the hit ratios are ranging from 95% to 99%; at the same time, the memory overhead is down to 0%. 7. References [1] M. Chang, C.D. Lo, and W. Srisa-an, A Hardware Implementation for Malloc and Free Functions, submitted to IEEE International Conference on Computer Design, Austin, Texas, USA, October [2] M. Chang and E. F. Gehringer, A High-Performance Memory Allocator for Object-Oriented Systems, IEEE Transactions on Computers. March, pp [3] C. H. Daugherty and J. M. Chang, Common List Method: A Simple, Efficient Allocator Implementation, Proceedings of Sixth Ann. High-Performance Computing Symposium, Boston, Massachusetts, Apr. 5-9, pp [4] J. Morris Chang, Woo Hyong Lee and Yusuf Hasan, "Measuring Dynamic Memory Invocations in Object-Oriented Programs" Proceedings of 18th IEEE International Performance Conference on Computers and Communications, Phoenix, Arizona, Feb , pp [5] A. Kim and J. M. Chang, Designing a Java Microprocessor Core using FPGA Technology, Proceedings of 7th IEEE International ASIC Conference, Rochester, New York, Sep , pp [6] E. F. Gehringer and J. M. Chang, Hardware-Assisted Memory Management, Proc. OOPSLA 93 Workshop on Memory Management, Sep [7] J. M. Chang and E. F. Gehringer, "Evaluation of an Object-Caching Coprocessor Design for Object-Oriented Systems", Proceedings of IEEE International Conference on Computer Design, Oct. 3-6, 1993, pp [8] J. M. Chang and E. F. Gehringer, "Object Caching for Performance in Object-Oriented Systems", Proceedings of IEEE International Conference on Computer Design, Oct. 1991, pp [9] R. Jones, R. Lins, Garbage Collection: Algorithms for automatic Dynamic Memory Management, John Wiley and Sons, 1996, pp.20-28, 87-95, 296 [10] Arie Kaufman, Tailored-list and recombination-delaying buddy system, ACM Transactions on Programming Languages and Systems, Vol. 6, No. 1, Jan. 1984, pp

20 [11] Microsoft Visual C++ programmer s reference manual, Microsoft press, [12] Motorola MPC 750 Technical Summary, [13] D. Patterson and J. Hennessy, Computer Organization and Design, 2nd edition, Morgan Kaufmann, 1998 [14] D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick, A Case for Intelligent RAM: IRAM, IEEE Micro, April [15] A. Silberschatz and P. B. Galvin, Operating System Concepts, 5th edition, Prentice-Hall, 1998 pp. 397 [16] A. S. Tannenbaum and A. S. Woodhull, Operating System: Design and Implementation, 2th edition, Prentice- Hall, 1997 pp [17] Paul Wilson, M. Johnstone, M Neely and D. Boles, Dynamic Storage Allocation: A Survey and Critical Review, Proc Int l workshop on Memory Management, Scotland, UK, Sept , [18] Benjamin Zorn, Custo-Malloc: efficient synthesized memory allocators, Technical Report CU-CS , Computer Science Department, University of Colorado, July [19] Albert Yu, Creating The Digital Future; The Secrets Of Consistent Innovation At Intel, The Free Press, Aug [20] D.M. Ungar and F. Jackson, "An adaptive tenuring policy for generation scavengers", ACM Transactions on Programming Languages and Systems, 14(1): 1-17,

Architectural Support for Dynamic Memory Management

Architectural Support for Dynamic Memory Management Architectural Support for Dynamic Memory Management J. Morris Chang, Witawas Srisaan and ChiaTien Dan Lo Department of Computer Science Illinois Institute of Technology Chicago, IL, 606163793, USA lchang

More information

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University Frequently asked questions from the previous class survey CS 370: OPERATING SYSTEMS [MEMORY MANAGEMENT] Shrideep Pallickara Computer Science Colorado State University MS-DOS.COM? How does performing fast

More information

Frequently asked questions from the previous class survey

Frequently asked questions from the previous class survey CS 370: OPERATING SYSTEMS [MEMORY MANAGEMENT] Shrideep Pallickara Computer Science Colorado State University L20.1 Frequently asked questions from the previous class survey Virtual addresses L20.2 SLIDES

More information

Basic Memory Management. Basic Memory Management. Address Binding. Running a user program. Operating Systems 10/14/2018 CSC 256/456 1

Basic Memory Management. Basic Memory Management. Address Binding. Running a user program. Operating Systems 10/14/2018 CSC 256/456 1 Basic Memory Management Program must be brought into memory and placed within a process for it to be run Basic Memory Management CS 256/456 Dept. of Computer Science, University of Rochester Mono-programming

More information

Chapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition

Chapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition Chapter 8: Memory- Management Strategies Operating System Concepts 9 th Edition Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Strategies Background Swapping Contiguous Memory Allocation

More information

Chapter 9 Memory Management Main Memory Operating system concepts. Sixth Edition. Silberschatz, Galvin, and Gagne 8.1

Chapter 9 Memory Management Main Memory Operating system concepts. Sixth Edition. Silberschatz, Galvin, and Gagne 8.1 Chapter 9 Memory Management Main Memory Operating system concepts. Sixth Edition. Silberschatz, Galvin, and Gagne 8.1 Chapter 9: Memory Management Background Swapping Contiguous Memory Allocation Segmentation

More information

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find

More information

Process size is independent of the main memory present in the system.

Process size is independent of the main memory present in the system. Hardware control structure Two characteristics are key to paging and segmentation: 1. All memory references are logical addresses within a process which are dynamically converted into physical at run time.

More information

Chapter 8 Virtual Memory

Chapter 8 Virtual Memory Operating Systems: Internals and Design Principles Chapter 8 Virtual Memory Seventh Edition William Stallings Operating Systems: Internals and Design Principles You re gonna need a bigger boat. Steven

More information

Chapter 8: Main Memory

Chapter 8: Main Memory Chapter 8: Main Memory Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel

More information

Chapter 7: Main Memory. Operating System Concepts Essentials 8 th Edition

Chapter 7: Main Memory. Operating System Concepts Essentials 8 th Edition Chapter 7: Main Memory Operating System Concepts Essentials 8 th Edition Silberschatz, Galvin and Gagne 2011 Chapter 7: Memory Management Background Swapping Contiguous Memory Allocation Paging Structure

More information

Memory Management. Reading: Silberschatz chapter 9 Reading: Stallings. chapter 7 EEL 358

Memory Management. Reading: Silberschatz chapter 9 Reading: Stallings. chapter 7 EEL 358 Memory Management Reading: Silberschatz chapter 9 Reading: Stallings chapter 7 1 Outline Background Issues in Memory Management Logical Vs Physical address, MMU Dynamic Loading Memory Partitioning Placement

More information

Addresses in the source program are generally symbolic. A compiler will typically bind these symbolic addresses to re-locatable addresses.

Addresses in the source program are generally symbolic. A compiler will typically bind these symbolic addresses to re-locatable addresses. 1 Memory Management Address Binding The normal procedures is to select one of the processes in the input queue and to load that process into memory. As the process executed, it accesses instructions and

More information

Outlook. Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation Example: The Intel Pentium

Outlook. Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation Example: The Intel Pentium Main Memory Outlook Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation Example: The Intel Pentium 2 Backgound Background So far we considered how to share

More information

Chapter 8: Main Memory. Operating System Concepts 9 th Edition

Chapter 8: Main Memory. Operating System Concepts 9 th Edition Chapter 8: Main Memory Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel

More information

! What is main memory? ! What is static and dynamic allocation? ! What is segmentation? Maria Hybinette, UGA. High Address (0x7fffffff) !

! What is main memory? ! What is static and dynamic allocation? ! What is segmentation? Maria Hybinette, UGA. High Address (0x7fffffff) ! Memory Questions? CSCI [4 6]730 Operating Systems Main Memory! What is main memory?! How does multiple processes share memory space?» Key is how do they refer to memory addresses?! What is static and dynamic

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Spring 2018 L17 Main Memory Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 FAQ Was Great Dijkstra a magician?

More information

Operating Systems Memory Management. Mathieu Delalandre University of Tours, Tours city, France

Operating Systems Memory Management. Mathieu Delalandre University of Tours, Tours city, France Operating Systems Memory Management Mathieu Delalandre University of Tours, Tours city, France mathieu.delalandre@univ-tours.fr 1 Operating Systems Memory Management 1. Introduction 2. Contiguous memory

More information

Chapter 8: Memory-Management Strategies

Chapter 8: Memory-Management Strategies Chapter 8: Memory-Management Strategies Chapter 8: Memory Management Strategies Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel 32 and

More information

Memory Management. Contents: Memory Management. How to generate code? Background

Memory Management. Contents: Memory Management. How to generate code? Background TDIU11 Operating systems Contents: Memory Management Memory Management [SGG7/8/9] Chapter 8 Background Relocation Dynamic loading and linking Swapping Contiguous Allocation Paging Segmentation Copyright

More information

Memory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts

Memory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts Memory management Last modified: 26.04.2016 1 Contents Background Logical and physical address spaces; address binding Overlaying, swapping Contiguous Memory Allocation Segmentation Paging Structure of

More information

CS399 New Beginnings. Jonathan Walpole

CS399 New Beginnings. Jonathan Walpole CS399 New Beginnings Jonathan Walpole Memory Management Memory Management Memory a linear array of bytes - Holds O.S. and programs (processes) - Each cell (byte) is named by a unique memory address Recall,

More information

Chapter 8 Memory Management

Chapter 8 Memory Management Chapter 8 Memory Management Da-Wei Chang CSIE.NCKU Source: Abraham Silberschatz, Peter B. Galvin, and Greg Gagne, "Operating System Concepts", 9th Edition, Wiley. 1 Outline Background Swapping Contiguous

More information

Chapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition

Chapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition Chapter 8: Memory- Management Strategies Operating System Concepts 9 th Edition Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Strategies Background Swapping Contiguous Memory Allocation

More information

Chapter 8: Memory- Management Strategies

Chapter 8: Memory- Management Strategies Chapter 8: Memory Management Strategies Chapter 8: Memory- Management Strategies Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel 32 and

More information

CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES

CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES OBJECTIVES Detailed description of various ways of organizing memory hardware Various memory-management techniques, including paging and segmentation To provide

More information

Virtual Memory. Chapter 8

Virtual Memory. Chapter 8 Virtual Memory 1 Chapter 8 Characteristics of Paging and Segmentation Memory references are dynamically translated into physical addresses at run time E.g., process may be swapped in and out of main memory

More information

Memory Management. Dr. Yingwu Zhu

Memory Management. Dr. Yingwu Zhu Memory Management Dr. Yingwu Zhu Big picture Main memory is a resource A process/thread is being executing, the instructions & data must be in memory Assumption: Main memory is infinite Allocation of memory

More information

Operating Systems Unit 6. Memory Management

Operating Systems Unit 6. Memory Management Unit 6 Memory Management Structure 6.1 Introduction Objectives 6.2 Logical versus Physical Address Space 6.3 Swapping 6.4 Contiguous Allocation Single partition Allocation Multiple Partition Allocation

More information

a process may be swapped in and out of main memory such that it occupies different regions

a process may be swapped in and out of main memory such that it occupies different regions Virtual Memory Characteristics of Paging and Segmentation A process may be broken up into pieces (pages or segments) that do not need to be located contiguously in main memory Memory references are dynamically

More information

Chapter 8: Main Memory

Chapter 8: Main Memory Chapter 8: Main Memory Operating System Concepts 8 th Edition,! Silberschatz, Galvin and Gagne 2009! Chapter 8: Memory Management Background" Swapping " Contiguous Memory Allocation" Paging" Structure

More information

CS420: Operating Systems

CS420: Operating Systems Main Memory James Moscola Department of Engineering & Computer Science York College of Pennsylvania Based on Operating System Concepts, 9th Edition by Silberschatz, Galvin, Gagne Background Program must

More information

Process. One or more threads of execution Resources required for execution. Memory (RAM) Others

Process. One or more threads of execution Resources required for execution. Memory (RAM) Others Memory Management 1 Learning Outcomes Appreciate the need for memory management in operating systems, understand the limits of fixed memory allocation schemes. Understand fragmentation in dynamic memory

More information

Chapter 8: Main Memory

Chapter 8: Main Memory Chapter 8: Main Memory Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation Example: The Intel Pentium 8.2 Silberschatz, Galvin

More information

Chapter 8: Main Memory

Chapter 8: Main Memory Chapter 8: Main Memory Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel 32 and 64-bit Architectures Example:

More information

Basic Memory Management

Basic Memory Management Basic Memory Management CS 256/456 Dept. of Computer Science, University of Rochester 10/15/14 CSC 2/456 1 Basic Memory Management Program must be brought into memory and placed within a process for it

More information

Operating Systems. Memory Management. Lecture 9 Michael O Boyle

Operating Systems. Memory Management. Lecture 9 Michael O Boyle Operating Systems Memory Management Lecture 9 Michael O Boyle 1 Memory Management Background Logical/Virtual Address Space vs Physical Address Space Swapping Contiguous Memory Allocation Segmentation Goals

More information

Chapter 8 & Chapter 9 Main Memory & Virtual Memory

Chapter 8 & Chapter 9 Main Memory & Virtual Memory Chapter 8 & Chapter 9 Main Memory & Virtual Memory 1. Various ways of organizing memory hardware. 2. Memory-management techniques: 1. Paging 2. Segmentation. Introduction Memory consists of a large array

More information

CS370: Operating Systems [Spring 2016] Dept. Of Computer Science, Colorado State University

CS370: Operating Systems [Spring 2016] Dept. Of Computer Science, Colorado State University Frequently asked questions from the previous class survey CS 7: OPERATING SYSTEMS [MEMORY MANAGEMENT] Shrideep Pallickara Computer Science Colorado State University TLB Does the TLB work in practice? n

More information

Memory Management. CSCI 315 Operating Systems Design Department of Computer Science

Memory Management. CSCI 315 Operating Systems Design Department of Computer Science Memory Management CSCI 315 Operating Systems Design Department of Computer Science Notice: The slides for this lecture are based on those from Operating Systems Concepts, 9th ed., by Silberschatz, Galvin,

More information

Chapter 8: Memory Management Strategies

Chapter 8: Memory Management Strategies Chapter 8: Memory- Management Strategies, Silberschatz, Galvin and Gagne 2009 Chapter 8: Memory Management Strategies Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table

More information

Dynamic Memory Allocation

Dynamic Memory Allocation Dynamic Memory Allocation CS61, Lecture 10 Prof. Stephen Chong October 4, 2011 Announcements 1/2 Assignment 4: Malloc Will be released today May work in groups of one or two Please go to website and enter

More information

Chapter 9 Memory Management

Chapter 9 Memory Management Contents 1. Introduction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threads 6. CPU Scheduling 7. Process Synchronization 8. Deadlocks 9. Memory Management 10. Virtual

More information

Memory and multiprogramming

Memory and multiprogramming Memory and multiprogramming COMP342 27 Week 5 Dr Len Hamey Reading TW: Tanenbaum and Woodhull, Operating Systems, Third Edition, chapter 4. References (computer architecture): HP: Hennessy and Patterson

More information

Module 8: Memory Management

Module 8: Memory Management Module 8: Memory Management Background Logical versus Physical Address Space Swapping Contiguous Allocation Paging Segmentation Segmentation with Paging Operating System Concepts 8.1 Silberschatz and Galvin

More information

Memory Management (Chaper 4, Tanenbaum)

Memory Management (Chaper 4, Tanenbaum) Memory Management (Chaper 4, Tanenbaum) Copyright 1996 25 Eskicioglu and Marsland (and Prentice-Hall and Paul Lu) Memory Mgmt Introduction The CPU fetches instructions and data of a program from memory;

More information

CS6401- Operating System UNIT-III STORAGE MANAGEMENT

CS6401- Operating System UNIT-III STORAGE MANAGEMENT UNIT-III STORAGE MANAGEMENT Memory Management: Background In general, to rum a program, it must be brought into memory. Input queue collection of processes on the disk that are waiting to be brought into

More information

Memory Allocation in VxWorks 6.0 White Paper

Memory Allocation in VxWorks 6.0 White Paper Memory Allocation in VxWorks 6.0 White Paper Zoltan Laszlo Wind River Systems, Inc. Copyright 2005 Wind River Systems, Inc 1 Introduction Memory allocation is a typical software engineering problem for

More information

Chapter 8: Virtual Memory. Operating System Concepts

Chapter 8: Virtual Memory. Operating System Concepts Chapter 8: Virtual Memory Silberschatz, Galvin and Gagne 2009 Chapter 8: Virtual Memory Background Demand Paging Copy-on-Write Page Replacement Allocation of Frames Thrashing Memory-Mapped Files Allocating

More information

Frequently asked questions from the previous class survey

Frequently asked questions from the previous class survey CS : OPERATING SYSTEMS [VIRTUAL MEMORY] Shrideep Pallickara Computer Science Colorado State University L. Frequently asked questions from the previous class survey Contents of page table entries in multilevel

More information

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University Frequently asked questions from the previous class survey CS 370: OPERATING SYSTEMS [MEMORY MANAGEMENT] Matrices in Banker s algorithm Max, need, allocated Shrideep Pallickara Computer Science Colorado

More information

CS307: Operating Systems

CS307: Operating Systems CS307: Operating Systems Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building 3-513 wuct@cs.sjtu.edu.cn Download Lectures ftp://public.sjtu.edu.cn

More information

Virtual Memory. CSCI 315 Operating Systems Design Department of Computer Science

Virtual Memory. CSCI 315 Operating Systems Design Department of Computer Science Virtual Memory CSCI 315 Operating Systems Design Department of Computer Science Notice: The slides for this lecture have been largely based on those from an earlier edition of the course text Operating

More information

Memory Management (Chaper 4, Tanenbaum)

Memory Management (Chaper 4, Tanenbaum) Memory Management (Chaper 4, Tanenbaum) Memory Mgmt Introduction The CPU fetches instructions and data of a program from memory; therefore, both the program and its data must reside in the main (RAM and

More information

3. Memory Management

3. Memory Management Principles of Operating Systems CS 446/646 3. Memory Management René Doursat Department of Computer Science & Engineering University of Nevada, Reno Spring 2006 Principles of Operating Systems CS 446/646

More information

CHAPTER 8: MEMORY MANAGEMENT. By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

CHAPTER 8: MEMORY MANAGEMENT. By I-Chen Lin Textbook: Operating System Concepts 9th Ed. CHAPTER 8: MEMORY MANAGEMENT By I-Chen Lin Textbook: Operating System Concepts 9th Ed. Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the

More information

MEMORY MANAGEMENT/1 CS 409, FALL 2013

MEMORY MANAGEMENT/1 CS 409, FALL 2013 MEMORY MANAGEMENT Requirements: Relocation (to different memory areas) Protection (run time, usually implemented together with relocation) Sharing (and also protection) Logical organization Physical organization

More information

DYNAMIC MEMORY ALLOCATOR ALGORITHMS SIMULATION AND PERFORMANCE ANALYSIS

DYNAMIC MEMORY ALLOCATOR ALGORITHMS SIMULATION AND PERFORMANCE ANALYSIS ISTANBUL UNIVERSITY JOURNAL OF ELECTRICAL & ELECTRONICS ENGINEERING YEAR VOLUME NUMBER : 25 : 5 : 2 (1435-1441) DYNAMIC MEMORY ALLOCATOR ALGORITHMS SIMULATION AND PERFORMANCE ANALYSIS 1 Fethullah Karabiber

More information

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction Chapter 6 Objectives Chapter 6 Memory Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured.

More information

Lecture 8 Memory Management Strategies (chapter 8)

Lecture 8 Memory Management Strategies (chapter 8) Bilkent University Department of Computer Engineering CS342 Operating Systems Lecture 8 Memory Management Strategies (chapter 8) Dr. İbrahim Körpeoğlu http://www.cs.bilkent.edu.tr/~korpe 1 References The

More information

Operating Systems. Designed and Presented by Dr. Ayman Elshenawy Elsefy

Operating Systems. Designed and Presented by Dr. Ayman Elshenawy Elsefy Operating Systems Designed and Presented by Dr. Ayman Elshenawy Elsefy Dept. of Systems & Computer Eng.. AL-AZHAR University Website : eaymanelshenawy.wordpress.com Email : eaymanelshenawy@yahoo.com Reference

More information

6 - Main Memory EECE 315 (101) ECE UBC 2013 W2

6 - Main Memory EECE 315 (101) ECE UBC 2013 W2 6 - Main Memory EECE 315 (101) ECE UBC 2013 W2 Acknowledgement: This set of slides is partly based on the PPTs provided by the Wiley s companion website (including textbook images, when not explicitly

More information

Memory. Objectives. Introduction. 6.2 Types of Memory

Memory. Objectives. Introduction. 6.2 Types of Memory Memory Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured. Master the concepts

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective. Part I: Operating system overview: Memory Management

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective. Part I: Operating system overview: Memory Management ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part I: Operating system overview: Memory Management 1 Hardware background The role of primary memory Program

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 21 Main Memory Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 FAQ Why not increase page size

More information

Performance of Various Levels of Storage. Movement between levels of storage hierarchy can be explicit or implicit

Performance of Various Levels of Storage. Movement between levels of storage hierarchy can be explicit or implicit Memory Management All data in memory before and after processing All instructions in memory in order to execute Memory management determines what is to be in memory Memory management activities Keeping

More information

A FAST AND EFFICIENT HARDWARE TECHNIQUE FOR MEMORY ALLOCATION

A FAST AND EFFICIENT HARDWARE TECHNIQUE FOR MEMORY ALLOCATION A FAST AND EFFICIENT HARDWARE TECHNIQUE FOR MEMORY ALLOCATION Fethullah Karabiber 1 Ahmet Sertbaş 1 Hasan Cam 2 1 Computer Engineering Department Engineering Faculty, Istanbul University 34320, Avcilar,

More information

Process. One or more threads of execution Resources required for execution

Process. One or more threads of execution Resources required for execution Memory Management 1 Learning Outcomes Appreciate the need for memory management in operating systems, understand the limits of fixed memory allocation schemes. Understand fragmentation in dynamic memory

More information

Chapter 8: Memory Management. Operating System Concepts with Java 8 th Edition

Chapter 8: Memory Management. Operating System Concepts with Java 8 th Edition Chapter 8: Memory Management 8.1 Silberschatz, Galvin and Gagne 2009 Background Program must be brought (from disk) into memory and placed within a process for it to be run Main memory and registers are

More information

Logical versus Physical Address Space

Logical versus Physical Address Space CHAPTER 8: MEMORY MANAGEMENT Background Logical versus Physical Address Space Swapping Contiguous Allocation Paging Segmentation Segmentation with Paging Operating System Concepts, Addison-Wesley 1994

More information

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang CHAPTER 6 Memory 6.1 Memory 341 6.2 Types of Memory 341 6.3 The Memory Hierarchy 343 6.3.1 Locality of Reference 346 6.4 Cache Memory 347 6.4.1 Cache Mapping Schemes 349 6.4.2 Replacement Policies 365

More information

Engine Support System. asyrani.com

Engine Support System. asyrani.com Engine Support System asyrani.com A game engine is a complex piece of software consisting of many interacting subsystems. When the engine first starts up, each subsystem must be configured and initialized

More information

Memory: Overview. CS439: Principles of Computer Systems February 26, 2018

Memory: Overview. CS439: Principles of Computer Systems February 26, 2018 Memory: Overview CS439: Principles of Computer Systems February 26, 2018 Where We Are In the Course Just finished: Processes & Threads CPU Scheduling Synchronization Next: Memory Management Virtual Memory

More information

In multiprogramming systems, processes share a common store. Processes need space for:

In multiprogramming systems, processes share a common store. Processes need space for: Memory Management In multiprogramming systems, processes share a common store. Processes need space for: code (instructions) static data (compiler initialized variables, strings, etc.) global data (global

More information

A Comprehensive Complexity Analysis of User-level Memory Allocator Algorithms

A Comprehensive Complexity Analysis of User-level Memory Allocator Algorithms 2012 Brazilian Symposium on Computing System Engineering A Comprehensive Complexity Analysis of User-level Memory Allocator Algorithms Taís Borges Ferreira, Márcia Aparecida Fernandes, Rivalino Matias

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 20 Main Memory Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 Pages Pages and frames Page

More information

Virtual Memory. Chapter 8

Virtual Memory. Chapter 8 Chapter 8 Virtual Memory What are common with paging and segmentation are that all memory addresses within a process are logical ones that can be dynamically translated into physical addresses at run time.

More information

COMPUTER SCIENCE 4500 OPERATING SYSTEMS

COMPUTER SCIENCE 4500 OPERATING SYSTEMS Last update: 3/28/2017 COMPUTER SCIENCE 4500 OPERATING SYSTEMS 2017 Stanley Wileman Module 9: Memory Management Part 1 In This Module 2! Memory management functions! Types of memory and typical uses! Simple

More information

Operating Systems. 09. Memory Management Part 1. Paul Krzyzanowski. Rutgers University. Spring 2015

Operating Systems. 09. Memory Management Part 1. Paul Krzyzanowski. Rutgers University. Spring 2015 Operating Systems 09. Memory Management Part 1 Paul Krzyzanowski Rutgers University Spring 2015 March 9, 2015 2014-2015 Paul Krzyzanowski 1 CPU Access to Memory The CPU reads instructions and reads/write

More information

Memory Management Topics. CS 537 Lecture 11 Memory. Virtualizing Resources

Memory Management Topics. CS 537 Lecture 11 Memory. Virtualizing Resources Memory Management Topics CS 537 Lecture Memory Michael Swift Goals of memory management convenient abstraction for programming isolation between processes allocate scarce memory resources between competing

More information

Chapter 8 Virtual Memory

Chapter 8 Virtual Memory Operating Systems: Internals and Design Principles Chapter 8 Virtual Memory Seventh Edition William Stallings Modified by Rana Forsati for CSE 410 Outline Principle of locality Paging - Effect of page

More information

Physical memory vs. Logical memory Process address space Addresses assignment to processes Operating system tasks Hardware support CONCEPTS 3.

Physical memory vs. Logical memory Process address space Addresses assignment to processes Operating system tasks Hardware support CONCEPTS 3. T3-Memory Index Memory management concepts Basic Services Program loading in memory Dynamic memory HW support To memory assignment To address translation Services to optimize physical memory usage COW

More information

Memory management: outline

Memory management: outline Memory management: outline Concepts Swapping Paging o Multi-level paging o TLB & inverted page tables 1 Memory size/requirements are growing 1951: the UNIVAC computer: 1000 72-bit words! 1971: the Cray

More information

UNIT III MEMORY MANAGEMENT

UNIT III MEMORY MANAGEMENT UNIT III MEMORY MANAGEMENT TOPICS TO BE COVERED 3.1 Memory management 3.2 Contiguous allocation i Partitioned memory allocation ii Fixed & variable partitioning iii Swapping iv Relocation v Protection

More information

OPERATING SYSTEMS. After A.S.Tanenbaum, Modern Operating Systems 3rd edition Uses content with permission from Assoc. Prof. Florin Fortis, PhD

OPERATING SYSTEMS. After A.S.Tanenbaum, Modern Operating Systems 3rd edition Uses content with permission from Assoc. Prof. Florin Fortis, PhD OPERATING SYSTEMS #8 After A.S.Tanenbaum, Modern Operating Systems 3rd edition Uses content with permission from Assoc. Prof. Florin Fortis, PhD MEMORY MANAGEMENT MEMORY MANAGEMENT The memory is one of

More information

Memory management: outline

Memory management: outline Memory management: outline Concepts Swapping Paging o Multi-level paging o TLB & inverted page tables 1 Memory size/requirements are growing 1951: the UNIVAC computer: 1000 72-bit words! 1971: the Cray

More information

UNIT 1 MEMORY MANAGEMENT

UNIT 1 MEMORY MANAGEMENT UNIT 1 MEMORY MANAGEMENT Structure Page Nos. 1.0 Introduction 5 1.1 Objectives 6 1.2 Overlays and Swapping 6 1.3 Logical and Physical Address Space 8 1.4 Single Process Monitor 9 1.5 Contiguous Allocation

More information

Memory management. Requirements. Relocation: program loading. Terms. Relocation. Protection. Sharing. Logical organization. Physical organization

Memory management. Requirements. Relocation: program loading. Terms. Relocation. Protection. Sharing. Logical organization. Physical organization Requirements Relocation Memory management ability to change process image position Protection ability to avoid unwanted memory accesses Sharing ability to share memory portions among processes Logical

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Virtual Memory 11282011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review Cache Virtual Memory Projects 3 Memory

More information

8.1 Background. Part Four - Memory Management. Chapter 8: Memory-Management Management Strategies. Chapter 8: Memory Management

8.1 Background. Part Four - Memory Management. Chapter 8: Memory-Management Management Strategies. Chapter 8: Memory Management Part Four - Memory Management 8.1 Background Chapter 8: Memory-Management Management Strategies Program must be brought into memory and placed within a process for it to be run Input queue collection of

More information

Main Memory. CISC3595, Spring 2015 X. Zhang Fordham University

Main Memory. CISC3595, Spring 2015 X. Zhang Fordham University Main Memory CISC3595, Spring 2015 X. Zhang Fordham University 1 Memory Management! Background!! Contiguous Memory Allocation!! Paging!! Structure of the Page Table!! Segmentation!! Example: The Intel Pentium

More information

Main Memory. Electrical and Computer Engineering Stephen Kim ECE/IUPUI RTOS & APPS 1

Main Memory. Electrical and Computer Engineering Stephen Kim ECE/IUPUI RTOS & APPS 1 Main Memory Electrical and Computer Engineering Stephen Kim (dskim@iupui.edu) ECE/IUPUI RTOS & APPS 1 Main Memory Background Swapping Contiguous allocation Paging Segmentation Segmentation with paging

More information

Goals of Memory Management

Goals of Memory Management Memory Management Goals of Memory Management Allocate available memory efficiently to multiple processes Main functions Allocate memory to processes when needed Keep track of what memory is used and what

More information

Memory Allocation with Lazy Fits

Memory Allocation with Lazy Fits Memory Allocation with Lazy Fits Yoo C. Chung Soo-Mook Moon School of Electrical Engineering Seoul National University Kwanak PO Box 34, Seoul 151-742, Korea {chungyc,smoon}@altair.snu.ac.kr ABSTRACT Dynamic

More information

Module 8: Memory Management

Module 8: Memory Management Module 8: Memory Management Background Logical versus Physical Address Space Swapping Contiguous Allocation Paging Segmentation Segmentation with Paging 8.1 Background Program must be brought into memory

More information

Chapter 9: Memory Management. Background

Chapter 9: Memory Management. Background 1 Chapter 9: Memory Management Background Swapping Contiguous Allocation Paging Segmentation Segmentation with Paging 9.1 Background Program must be brought into memory and placed within a process for

More information

Operating System Concepts

Operating System Concepts Chapter 9: Virtual-Memory Management 9.1 Silberschatz, Galvin and Gagne 2005 Chapter 9: Virtual Memory Background Demand Paging Copy-on-Write Page Replacement Allocation of Frames Thrashing Memory-Mapped

More information

Module 9: Memory Management. Background. Binding of Instructions and Data to Memory

Module 9: Memory Management. Background. Binding of Instructions and Data to Memory Module 9: Memory Management Background Logical versus Physical Address Space Swapping Contiguous Allocation Paging Segmentation Segmentation with Paging 9.1 Background Program must be brought into memory

More information

Chapter 8: Memory Management

Chapter 8: Memory Management Chapter 8: Memory Management Chapter 8: Memory Management Background Swapping Contiguous Allocation Paging Segmentation Segmentation with Paging 8.2 Background Program must be brought into memory and placed

More information

I.-C. Lin, Assistant Professor. Textbook: Operating System Principles 7ed CHAPTER 8: MEMORY

I.-C. Lin, Assistant Professor. Textbook: Operating System Principles 7ed CHAPTER 8: MEMORY I.-C. Lin, Assistant Professor. Textbook: Operating System Principles 7ed CHAPTER 8: MEMORY MANAGEMENT Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Paging Structure of

More information