An Evaluation of Using Deduplication in Swappers

Size: px

Start display at page:

Download "An Evaluation of Using Deduplication in Swappers"

Jodie Doyle
5 years ago
Views:

1 An Evaluation of Using in Swappers Weiyan Wang, Chen Zeng Computer Sciences Department University of Wisconsin, Madison Abstract Data deduplication is a specialized data compression technique for eliminating coarse-grained redundant data, typically to improve storage utilization. In this paper, we explore another direction of using deduplication. That is, we use deduplication in the swapper. Before swapping out any page from memory to the swap area in disk, deduplication checks that whether a page of the same contents has been written to the swap area. If so, we can avoid one I/O. We implement that idea in the linux kernel. Our experimental results indicate that using deduplication is able to reduce the overhead of swappers by orders of magnitude when there are many duplicate pages comparing to that of not using deduplication. However, we also notice that deduplication also incurs an overhead when few duplicate pages are present. Introduction Data deduplication is a specialized data compression technique for eliminating coarse-grained redundant data, typically to improve storage utilization. In the deduplication process, duplicate data is deleted, leaving only one copy of the data to be stored, along with references to the unique copy of data. is able to reduce the required storage capacity since only the unique data is stored. Data deduplication is important in saving storage space, which is able to save up to 9% of the storage space []. However, the majority of previous works have only focused on how to increase the compression ratio as well as the throughput of disk. One missing part is, is the deduplication technique useful in memory as well as in disk? To make it concrete, suppose we need to run a program which requires a lot of memory, and the physical memory is smaller than the working set of that program. Therefore, can we utilize the deduplication technique to compress those pages in memory, thus make the working set fit into memory? One example of the applications which requires a lot of memory resource is the sparse matrix. Sparse matrix is a matrix that the majority of the elements is. Therefore, if we can compress that matrix using deduplication technique, then that matrix could fit into memory, and many I/Os can be avoided if we need to operate on that matrix. In this paper, we explore one aspect to utilize deduplication techniques in memory. That is, we are attacking the problem of how to integrate deduplication with the swapper. The swapper is a process that periodically writes pages out to the swap area and reads in other pages. The swapper also manages swap areas on disk, and keeps track of mappings between per-process virtual address and disk block address. In general, our idea is to speed up swapping by first checking to see if a particular page is already in the swap region; if so, the swap can be avoided, thus speeding up the program. Therefore, when the program needs to swap out a page, and that page has already had a duplicate page in the swap area, there is no need to swap that page out. As a result, the program does not feel much overhead of that swap out operation. Hence, another way to interpret our approach of utilizing deduplication in swappers is that our approach actually transparently provides additional physical memory to the program in that the working set of a program seems to fit in memory using deduplication now. Our contribution lies in two folds. First, we implement the idea of integrating deduplication and the swapper together in the Linux kernel.6.6. Second, we empirically evaluate the performance of the deduplication swapper. We test our swapper with a program that sequentially scans an array. Our experimental results indicate that using deduplication in the swapper is able to reduce the accessing time by orders of magnitude when % of the pages in that array are duplicates. Of course, using deduplication in the swapper also incurs an overhead, and we find out that when there is no duplicate pages, our swapper is 7.4% slower than the original swaper. We have also run a benchmark program, gmake, which compiles the Linux kernel, and find out that the performence of our swapper is comparable to the original Linux swapper. The rest of this paper is organized as follows: Section presents our design of how to integrate deduplication with the swapper. Section discusses the details of our

2 Figure : page.! procedure when swapping out a implementation, and several optimizations we have made to improve the efficiency of the swapper. Section 4 empirically evaluate the performance of using deduplication in the swapper, and Section 5 discuss related works. We conclude our work in Section 6. Design A high-level architecture of our design consists of the following components. When a page is going to swap in or swap out, a checksum engine is used to compute a cryptographic hash digest of its content, as described in Section.. A dedup cache keeps track of swap entries of pages on the disk, as described in Section.. A swap cache is maintained to temporarily keep swap-out pages in the memory to avoid disk consistency problem, which is explained in Section.4. Our system uses these components to provide deduplication feature in the swapper. The following section explains the deduplication procedure during swapping out.. Procedure Figure shows the deduplication procedure for swapping out. When the system determines to swap out a page, checksum engine is called to calculate the checksum based on page s content, and return that checksum to the swapper. Then swapper uses the checksum as a key to search the dedup cache to check if some pages with the same content have been swapped out to the disk. If so, dedup cache will return the swap entry to the swapper, and increase the counter of that swap entry to indicate that one more page is deduplicated. The mapping between the page and swap entry is then added to the swap cache. After that, swapper will replace the page table entries of all processes mapped to the page with its swap entry. That process is called unmapping. When unmapping is finished, we remove the mapping of the page from the swap cache. Because the page s content has already been on the disk, we do not need to write that page to the disk again. Therefore, we avoid one disk access. However, if the lookup of the dedup cache fails, swapper must search in the swap area to find a free swap entry for the page. When the page s swap entry is determined, similar to the above case, the swapper will update the swap cache, and unmap the page from all related processes. Then a disk write will be issued to write the page to the swap area. When the pageout succeeds, the page s swap entry will be added to the dedup cache for further deduplication, the counter of the swap entry will be initialized to. And finally, the mapping of the page in the swap cache is removed. " #$% &'()* " #$% +$+, & 5, & '8'& 9 4 6% : &.6% +$+, & <=> : &+)&$& +6'(&)?@ AB C=D E F - &;& /) : &.6% +$+, & - &$. /). 4 G.. ( " #$% +$+, & Figure : procedure when swapping in a page. Figure demonstrates the update of the dedup cache for swap in process. When swapper wants to swap in a page with a swap entry, it will first search the swap cache to check whether that page is in the memory. If not, swapper must issue a disk read to load the page from disk to the memory. Then it will add the mapping between swap entry and page to the swap cache to avoid swapping in from disk again. When that page is loaded, checksum engine will compute its checksum, and dedup cache is indexed with checksum. If the swap entry found in the dedup cache is the same as the swap in one, we will decrease its counter. If the counter drops to zero, then all duplicate pages have been swapped in, we can remove it from the dedup cache. The following subsections explain the detail of each component.

3 . Checksum Engine In the checksum engine, we compute SHA- hash of a page s content. According to [8], a 6-bit SHA- hash has a collision probability of one in 8. This probability is fairly small that we can assume two pages are identical if they have the same checksum. Therefore, unlike [], we don t keep the whole page associated with checksum in the dedup cache to verify the content of page. Another reason for this trade-off is that for each swapout page, we need one page frame to keep the copy of it in the dedup cache, which will increase the memory pressure. Because of the implementation of dedup cache, dedup cache can only index a -bit key. Therefore, instead of using all bits of SHA- hash, we use its first -bit as the checksum for each page. The major reason for that is because of our implementation of the dedup cache, which we will address later. As a result, the collision probability drops to one in 6. We does observe that two different pages may be mapped to the same checksum, and leave the problem of how to improve the consistency of the swapper for future works.. Dedup Cache Figure : Structure of dedup entry. For each swap entry, we create a new structure called dedup entry to keep track of its status in the dedup cache. Figure shows the contents of dedup entry. Variable base stores the data of swap entry. Variable count is the counter of the swap entry, which maintains the number of duplicate pages for this entry. Count will be increased(decreased) by when the page is swapped out(swapped in). When count drops to zero, it means that all duplicate pages have been swapped in, we can remove the dedup entry from the dedup cache, and delete that dedup entry. Variable ref is similar to pagecount, it maintains the number of processes currently using the dedup entry. We can only delete the dedup entry when ref is equal to one. ref is necessary by considering the following scenario: Swapout process A is using dedup entry to update the page table entries, while at the same time swapin process B finds count is dropped to zero, it then deletes the dedup entry. Error happens when process A accesses it later. With ref, when process A is using the dedup entry, its ref will be increased by. Then process B can t delete dedup entry, which preserves dedup entry until process A finishes processing. The spin lock lock is used to support atomic updates of ref and count. Dedup cache stores the mappings from page checksums to dedup entries. For each swapout page, dedup cache is used to check if any page with the same content has been swapped out to the disk. If so, dedup cache will return the dedup entry for that page pretending the page was swapped out, one disk write can be avoided. We use a radix tree to implement dedup cache for two reasons. First, the time complexity of lookup and update for radix tree is O( key ) where key is the length of the key, which is as efficient as hash table. Second, radix tree is a well-implemented data structure in the linux kernel, we can directly use it instead of creating a bug-free hash table by ourselves. The only disadvantage of using built-in radix tree is that it can only handle a -bit key, which forces us to use the first -bit of SHA- hash as the checksum and this may hurt the consistency of system..4 Swap cache When swapping out pages to the swap area, Linux avoids writing pages if it does not have to. There are times when a page is both in a swap area and in physical memory. This happens when a page that was swapped out of memory was then brought back into memory when it was again accessed by a process. So long as the page in memory is not written to, the copy in the swap area remains valid. Linux uses the swap cache to track these pages. The swap cache is a list of page table entries, one per physical page in the system. That is a page table entry for a swapped out page and describes which swap file the page is being held in together with its location in the swap area. If a swap cache entry is non-zero, it represents a page which is being held in a swap file that has not been modified. If the page is subsequently modified (by being written to), its entry is removed from the swap cache. When Linux needs to swap a physical page out to the swap area, it consults the swap cache and, if there is a valid entry for this page, it does not need to write the page out to the swap area. This is because the page in memory has not been modified since it was last read from the swap area. The entries in the swap cache are page table entries for swapped out pages. They are marked as invalid but contain information which allow Linux to find the right slot within that swap area. The implementation of the swap cache is a radix tree, where the key is the index of the swap entry in swap area However, using deduplication in the swapper poses a possible inconsistency problem in the swap cache. Consider the following steps:. A process T needs to swap out page P, and install the mappinge P into swap cache

4 . Another process T requires to swap out another pagep, whose content is identical top. Therefore, we do not need to swap out P because the content of P is in the swap entry E but we can not install the mappinge P.. The pagep is swapped out to the swap area, and the mappinge P is deleted. 4. When T needs to access P again, T can not find the swap entry ofp, which causes inconsistency. To remedy that problem, we change the structure of the swap cache. That is, instead of storing the mapping between a swap entry and a single page, we store the mapping between a swap entry and a list of pages, whose contents are identical. The rationale of that implementation is that deduplication actually maps multiple pages to one page, which has the same content. Therefore, different pages could have different physical addresses but share the same slot in swap area. After that modification, the inconsistency does not happen.. A processt needs to swap out page P, and install the mappinge P into swap cache. Another process T requires to swap out another page P, whose content is identical to P. We add P to the list, and the mapping is E P,P > now.. The page P is swapped out to the swap area, and the mappinge P is deleted,and the mapping is E P 4. When T needs to access P again, T can find the swap entry ofp. We utilize the implementation of list in kernel to change the structure of the swap cache. Implementations In this section we discuss several implementation issues and optimizations of our design.. Using Built-in Swap Counter During the implementation, we find out that swap counter built in the kernel is a better choice than count variables in the dedup entry to reflect the usage of swap entry. That is because, every swap entry has its own swap counter to maintain the number of references to it in the memory in the kernel. Like the count variable, swap counter is updated when the corresponding page is swapped out or swapped in. Besides swapping, swap counter is updated when the swap entry is inserted into the swap cache or the page table entry containing the swap entry is copied from one process to another process. The latter happens when the system call fork is called to create a child process. The count variable can t be updated for that situation because only page table entry is involved. Without a page, we can t locate the dedup entry in the dedup cache to update count variable. As a result, when the count is dropped to zero, the corresponding swap entry may still be referenced by some processes. Because it is removed from dedup cache, it cannot be used for further deduplication to avoid more pageouts. In contrast, when swap counter is dropped to one, the corresponding swap entry is only referenced by the swap cache. We can safely remove it from the dedup cache at this time. To use swap counter in deduplication, we need to make a small change: besides the original updates, swap counter will be updated whenever the swap entry is inserted into the dedup cache. That is because we want to make sure the swap entry is not released by the swapper during its time in the dedup cache. Therefore, when a page is swapped in and the swap counter drops to two, we know that the swap entry is only referenced by the swap cache and dedup cache, then we can safely remove it from the dedup cache and swap cache. However, we should still keep ref in the dedup entry. Consider the scenario discussed in Section. again. If the process A is still using the dedup entry while process B notices that all pages in the swap entry have been swapped in. If the process B removes that entry from both the dedup cache and swap cache, then that swap entry will be released and may be used by other swapper process. Inconsistency will occur because the process A still considers that swap entry has the same content with the page that is swapping out.. Changes to swap cache Because the swap cache is implemented in radix tree, we optimize the structure of the list by having a header of the list. The reason of that is to reduce the overhead to maintain the structure in the radix tree. To make it concrete, suppose a swap entry E is mapped to a list P,P where P is the first element in that list. When we need to delete the mappinge P, we need to update the mapping to E P. Unfortunately, the implementation of radix tree does not allow us to explicitly do that update. Instead, that update consists of two parts: delete the mapping E P,P, and insert the mapping E P, which incurs two additional calls. Our optimization, which always have a header for that list, is able to eliminate that additional overhead. The reason is that the entry in the swap cache is changed to E head,p,p, and after deletingp, the mapping 4

5 ise head,p, where thee still points to the head. Hence, the structure is consistent, and we can avoid to delete the entry and reinsert a new entry. Another problem we find out is that the list can not increase infinitely. That is because the cost of search in the list increases significantly. Moreover, we find out that when the number of pages in that list is too long, the swapper just stops. Unfortunately, we can not find a good explanation for that phenomena. We solve that problem by limiting the number of pages in a list, and if that number exceeds a threshold, then we explicitly shrink that list.. Policies of Swap Cache Page List In the swap cache, we maintain a page list to store all pages refer to it for each swap entry. There are three operations related to the page list: insert a page during swapout, remove a page when swapout is finished, and return a page during swapin. Considering the page list is organized as a cyclic double linked list, we use the following policies to optimize those operations: To remove a page, search the list in the same direction with insertion. For example, if we add the page as the next page of the head each time, to remove a page, we will go through the list with next page pointer each time. This policy is based on one observation: most of time, the page added to the list is a duplicate page. It is likely to be removed from the list very soon after the insertion since it does not need to be swapped out to the disk. Therefore, start searching from the most recently added page is more likely to find the page to be removed. Return the oldest page in the list to the caller during swapin. The swapin process lookups the swap cache to find a usable page in the memory, it doesn t matter which page to be returned. However, when a page is returned to the caller, it will be kept in the page table of some process until the page is no long used, it can t be released after the swapout. If we return the newest page to the caller, since the list is growing, it s highly possible that we return different pages when lookup function is called at different time. As a result, these duplicate pages would be kept in the memory for a relatively long time and can not be released, which reduces the effect of swapper. In contrast, returning the oldest page enables other pages to be released after swapout as we expect. It s also memory efficient because a page is shared by processes as many as possible..4 Checksum Inside Dedup Entry Because using -bit checksum may have potential consistency problem, we try to solve this by including the whole SHA- checksum in the dedup entry. In this way, to retrieve the dedup entry for a page, we first use the first -bit of its SHA- hash to index the radix tree. If a nonnull dedup entry is returned, we then compare the whole checksum inside the dedup entry with the page s SHA- hash to do the further verification. Only when two SHA- hash are the same we return the dedup entry to the swapper. However, our experiments show that this small change hurts the performance of swapper especially when all pages are different. It s mainly because each dedup entry now costs much more space than before, which will increase memory pressure a lot if all dedup entries are kept in the dedup cache for a relatively long time. What s more, we need more time to compare the whole checksum. For this reason, we must make a trade-off between consistency at some point for the performance. For example, only the second -bit of SHA- hash is stored in the dedup entry. 4 Experimental evaluations We implement our algorithm in the linux kernel.6.6 of Fedora running on VMWare 7. workstation. We set the memory of of the virtual machine to be 8 MB. We run several workloads to compare the pros and cons of our approach. 4. Sequentially scanning an array The first one is to sequentially access the elements in an array, each of which is of the same size of a page, 4KB. We sequentially scan that array for ten times, and report the average accessing time of that array. We change the size of the array and the percentage of duplicate pages in that array to check the impact of those two parameters in the swappers. We sequentially scan that array for ten times, and report the average accessing time of that array. 4.. Under memory contention First, we set that all of the pages in an array are of the same contents. The experimental result is shown in Figure 4. We observe that using deduplication in the swapper is able to reduce the overhead of swapping by orders of magnitude. The reason why the access time increases significantly in original linux swapper when the size of the array is is that the operating system thrashes at that point. That is, the total size of the array is MB. 5

6 7 x x Figure 4: Access time when all of the pages are duplicates Figure 6: Access time when 4% of the pages are duplicates x Figure 5: Access time when % of the pages are duplicates 4.5 x Therefore, the memory contention is quite high. As a result, when all of the pages in memory are used, and the program needs to access another page, the original linux swapper has to swap out one page in memory to the swap area in disk. However, our swapper is able to detect that the contents of those pages are identical. Therefore, we can save that I/O. As a result, using deduplication in swapper is able to reduce the sequential access time by orders of magnitude when the array is large, and a lot of duplicate pages are present in that array. We also present the result when % and 4% of the pages in the array are duplicates in Figure 5 and Figure 6, respetively. We can still observe that using deduplication is able to significantly reduce the overhead of accessing the array for orders of magnitude when the array does not fit into memory. However, one question is, is deduplication always ben- Figure 7: Access time when none of the pages are duplicates eficial? In our first experiment, because all of the pages are duplicates, using deduplication in the swappers is able to significantly reduce the access time of sequential scanning. In the second experiment, we also push to another limit, where there are no duplicate pages in that array. Figure 7 shows that result. We find out that using deduplication also incurs a significant overhead. That is, when the size of the array is, the access time of scanning using deduplication is 7.4% longer than that of no deduplication. Therefore, using deduplication comes with a cost. We will analyze the cost of using deduplication later. 6

7 Figure 8: No memory contention, all duplicate pages Figure 9: Access time when 6% of the pages are duplicates 4.. No memory contention We are also interested in the overheads incurred by using deduplication. Therefore, we also present the result when there is no memory contention. Figure 8 illustrates the result when all of the pages are duplicates. We clearly observe that the overhead of using deduplication when there is no memory contention is negligible. That is because the average access time is comparable to the original Linux kernel. The reason is that the working set of the array is able to fit into memory. Therefore, the swapper is not invoked to relieve the memory pressure. Hence, no I/O occurs. As a result, the behavior of using deduplication in swapper is similar to the original swapper without deduplication. We observe similar results when we set the percentage of duplicate pages to 6% and 8%. Those results are show in Figure 9 and Figure, respectively Figure : Access time when 8% of the pages are duplicates 4. Overheads of using deduplication In Figure 7, we observe that when there is no duplicate pages, the accessing time of using deduplication in a swapper is actually worse than the original swapper. We further investigate the overheads of each function we have changed, and list the overheads of each function in Table. We find out there are two major overheads. The first one is to calculate the checksum of a page. We find out that the average overhead of calculating the checksum is 6. µs. Furthermore, we need to calculate the checksum value for each page whenever that page is swapped out or swapped in. Therefore, if there is no duplicate pages, then our swapper needs to swap in/out pages as original swappers, and calculate the checksums for each swap in/out. Moreover, we also observe that the function decrease dentry ref is 98.87µs. The reason for that is that Function Overhead checksum 6.µs can be dedup.7 µs add to dedup cache.5 µs decrease dentry ref µs Table : Overhead of modified functions in addition to calculating the checksum in that function, we also use locks to maintain the consistency of dentry ref field of a dedup entry. Callers have to wait on those locks to update the dentry ref field of a dedup entry, which becomes a bottleneck. Those two functions are the reasons why our swapper is slower than the original swapper when there are no duplicate pages in that array. 7

8 Using deduplication No deduplication.8 s.689 s Table : Overhead of modified functions 4. Real world applications To further investigate the pros and cons of using deduplication in swappers, we run a gmake benchmark to test the efficiency of our approach. We compile Linux kernel.6.6 with the following command for ten times, make -j >/dev/null and report the average compiling time. Table shows the experimental result. We find out that the average compile time of using deduplication in swappers is comparable to that of not using deduplication. Therefore, using deduplication does not pose a significant overhead in the gmake benchmark. One of the reasons is that the memory usage is not that intense in that gmake benchmark. Therefore, the experimental result is similar to Figure 8 that using deduplication does not incur a significant overhead when there is no memory contention. 5 Related works Much work on deduplication focused on basic methods to save the storage space. Early deduplication storage systems use file-level hashing to detect duplicate files and reclaim their storage space [,, 4]. Since such systems also use file hashes to address files. Some call such systems content addressed storage or CAS. Since their deduplication is at file level, such systems can achieve only limited global compression. Modern systems detect deduplication at a finer granularity than files. Removing duplications at content-based data segment level has been applied to network protocols and applications [, 9, 7, 5] and has reduced network traffic for distributed file systems [6, ]. However, all of those efforts aim to save storage space by detecting duplications. Therefore, our work is orthogonal to their in that we focus on using deduplication in memory instead of in disk. So far to our knowledge, our work is the first effort to integrate the deduplication technique into the swapper. Although [] also detect deduplication, their focus is to reduce the memory usage when two virtual machines require the same page. Our work is different to them in two ways. The first one is that our work focuses on a different aspect than theirs in that we do not consider page sharing among different virtual machines. Instead, we focus on increasing the efficiency of the swapper. Second, we trade consistency for performance in our implementation, which only uses checksum to identify two pages with same contents while their approach will still perform contents based comparison. 6 Conclusions In this paper, we propose to use deduplication techniques in swappers, and implement that idea in Linux kernel.6.. Our experimental results indicate that using deduplication is able to significantly improve the efficiency of the swapper when many pages are duplicates, and the operating system is under heavy memory contention. Furthermore, we also show that using deduplication is able to provide comparable performance when few pages are duplicates. Although we do find out that using swappers could incur a significant overhead when there are no duplicate pages comparing to the original swapper, we think that problem could be alleviated by better implementation. So far to our knowledge, our paper is the first effort to integrate the deduplication with the swappers. In addition to improving the efficiency of the original swapper, one fundamental benefit of using deduplication is to reduce the size of the working set of a program. That is, the pages of the same contents are considered to be one page by our swapper. Therefore, another perspective to understand deduplication in the swapper is to think that the swapper transparently enlarge the size of the memory. As a result, the working set of a program could fit into the memory using deduplication in swapper, whose size is even larger than the memory. Therefore, the execution of that program with our swapper is more efficient than using the original swapper. Of course, there are many future directions to improve our implementations. Currently, we calculate the checksum of a page whenever that page is swapped in or out. Therefore, we have to pay a performance penalty of 7 µs whenever that page is swapped in and out. However, we do observe that we can store the checksum with that page to avoid those redundant computations of checksum values of a same page if the content of that page does not change. As described in our experiments, another bottleneck is to update the dentry ref field. One possible way is to employ the idea of sloppy counter [] to reduce the overhead of concurrency access. We have made a trade-off in our implementation: we trade consistency for performance. That is, our implementation only uses checksums to compare two pages. Therefore, it is possible that two pages are not identical but have the same checksum. That problem is exaggerated by our implementation, which only uses the first bits of the 6 bits of a checksum value. Therefore, it could be beneficial to implement a 6 bits dedup cache to reduce the probability of collision. How to improve the consistency 8

9 while still improve performance of using deduplication in swappers is also an interesting topic for future research. References [] A. Adya, W. J. Bolosky, M. Castro, G. Cermak, R. Chaiken, J. R. Douceur, Jon, J. Howell, J. R. Lorch, M. Theimer, and R. P. Wattenhofer. Farsite: Federated, available, and reliable storage for an incompletely trusted environment. In In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI. [] S. Boyd-wickizer, A. T. Clements, O. Mao, A. Pesterev, M. F. Kaashoek, R. Morris, and N. Zeldovich. An analysis of linux scalability to many cores. In OSDI,. [] N. Jain, M. Dahlin, and R. Tewari. Taper: Tiered approach for eliminating redundancy in replica synchronization. In In USENIX Conference on File and Storage Technologies, pages 8 94, 5. [4] P. Kulkarni, F. Douglis, J. LaVoie, and J. M. Tracey. Redundancy elimination within large collections of files. In Proceedings of the annual conference on USENIX Annual Technical Conference. [5] J. C. Mogul, Y. M. Chan, and T. Kelly. Design, implementation, and evaluation of duplicate transfer detection in http. In Proceedings of the st conference on Symposium on Networked Systems Design and Implementation - Volume, 4. [6] A. Muthitacharoen, B. Chen, and D. Mazières. A low-bandwidth network file system. SIGOPS Oper. Syst. Rev. [7] S. C. Rhea, K. Liang, and E. Brewer. Value-based web caching. In Proceedings of the th international conference on World Wide Web, WWW,. [8] V. Rijmen and E. Oswald. Update on sha-. In In Lecture Notes in Computer Science, pages Springer, 5. [9] C. P. Sapuntzakis, R. Chandra, B. Pfaff, J. Chow, M. S. Lam, and M. Rosenblum. Optimizing the migration of virtual computers. SIGOPS Oper. Syst. Rev.,. [] N. T. Spring and D. Wetherall. A protocol-independent technique for eliminating redundant network traffic. In In Proceedings of ACM SIGCOMM, pages 87 95,. [] N. Tolia, M. Kozuch, M. Satyanarayanan, B. Karp, T. Bressoud, and A. Perrig. Opportunistic use of content addressable storage for distributed file systems. In IN PROCEEDINGS OF THE USENIX ANNUAL TECHNICAL CONFERENCE. [] C. A. Waldspurger. Memory resource management in vmware esx server. SIGOPS Oper. Syst. Rev. [] B. Zhu, K. Li, and H. Patterson. Avoiding the disk bottleneck in the data domain deduplication file system. In Proceedings of the 6th USENIX Conference on File and Storage Technologies, FAST 8. 9

ECE 669 Parallel Computer Architecture

ECE 669 Parallel Computer Architecture Lecture 9 Workload Evaluation Outline Evaluation of applications is important Simulation of sample data sets provides important information Working sets indicate